Bug 37698 - Line not wrapped at certain punctuations
Summary: Line not wrapped at certain punctuations
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Text (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-04-15 23:17 PDT by Xianzhu Wang
Modified: 2019-04-30 08:35 PDT (History)
7 users (show)

See Also:


Attachments
test case (1.42 KB, text/html)
2010-04-15 23:17 PDT, Xianzhu Wang
no flags Details
The test case to generate the line breaking matrix (1.54 KB, text/html)
2010-04-19 23:42 PDT, Xianzhu Wang
no flags Details
The test case to generate the line breaking matrix (1.54 KB, text/html)
2010-04-20 02:23 PDT, Xianzhu Wang
no flags Details
patch (20.40 KB, patch)
2010-04-20 22:16 PDT, Xianzhu Wang
no flags Details | Formatted Diff | Diff
(For your preview) another patch using a table to do line breaking for ASCII chars. (10.09 KB, patch)
2010-05-24 23:42 PDT, Xianzhu Wang
no flags Details | Formatted Diff | Diff
Updated patch (14.56 KB, patch)
2010-07-23 02:53 PDT, Xianzhu Wang
mitz: review-
Details | Formatted Diff | Diff
new patch 7/26 (14.94 KB, patch)
2010-07-26 02:23 PDT, Xianzhu Wang
no flags Details | Formatted Diff | Diff
performance test (6.03 KB, application/zip)
2010-07-26 21:32 PDT, Xianzhu Wang
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Xianzhu Wang 2010-04-15 23:17:36 PDT
Created attachment 53515 [details]
test case

Other browsers tested:
Add OK or FAIL after other browsers where you have tested this issue:
     Safari 4: FAIL
  Firefox 3.x: OK
         IE 7: OK
         IE 8: OK
        Opera: OK

What steps will reproduce the problem?
1. Open the attached test file

What is the expected result?
The lines should be wrapped.

What happens instead?
The lines are not wrapped.

Please provide any additional information below. Attach a screenshot if
possible.
Opening puntuations '(', '{', '{' and closing punctuations ')', ']', '}'
should introduce line breaking opportunities in certain contexts, and a
too-long line should wrap at some line breaking opportunities.

Chrome bug: http://crbug.com/41736
Comment 1 Xianzhu Wang 2010-04-16 00:32:47 PDT
This bug will disappear if I change needsLineBreakIterator() in WebCore/rendering/break_lines.cpp to always return true. That is, abandon all internetExplorerLineBreaksAfterQuestionMarkTable and shouldBreakAfter things and totally use ICU.  I'm going to create a patch in this way.
Comment 2 Xianzhu Wang 2010-04-19 23:42:50 PDT
Created attachment 53775 [details]
The test case to generate the line breaking matrix

The test case can generate the line breaking matrix for all pair of printable ASCII characters (where 0 stands for all digits, and a stands for all letters).
The first column of the matrix contains the first characters of the pairs, and the first row contains the second characters of the pairs. A "X" in the matrix means the browser may break the line between the pair.

Chrome/Safari:
__!_"_#_$_%_&_'_(_)_*_+_,_-_._/_0_:_;_<_=_>_?_@_[_]_^___`_{_|_}_~_a_
!___________________________________________________________________
"___________________________________________________________________
#___________________________________________________________________
$___________________________________________________________________
%___________________________________________________________________
&___________________________________________________________________
'___________________________________________________________________
(___________________________________________________________________
)___________________________________________________________________
*___________________________________________________________________
+___________________________________________________________________
,___________________________________________________________________
-_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_
.___________________________________________________________________
/___________________________________________________________________
0___________________________________________________________________
:___________________________________________________________________
;___________________________________________________________________
<___________________________________________________________________
=___________________________________________________________________
>___________________________________________________________________
?_____X_X_X_X___X___X_X___X_____X_____X_X_X___X_X___X_X_X_X_X___X_X_
@___________________________________________________________________
[___________________________________________________________________
]___________________________________________________________________
^___________________________________________________________________
____________________________________________________________________
`___________________________________________________________________
{___________________________________________________________________
|___________________________________________________________________
}___________________________________________________________________
~___________________________________________________________________
a___________________________________________________________________


IE8 Compatibility View:
__!_"_#_$_%_&_'_(_)_*_+_,_-_._/_0_:_;_<_=_>_?_@_[_]_^___`_{_|_}_~_a_
!_____X_X_X_X___X___X_X___X_____X_____X_X_X___X_X___X_X_X_X_X___X_X_
"___________________________________________________________________
#_______________X_________X_____________________X_________X_________
$_________X_______________X_________________________________________
%_____X_X_X_X___X___X_X___X_____X_____X_X_X___X_X___X_X_X_X_X___X_X_
&_______________X_________X_____________________X_________X_________
'___________________________________________________________________
(___________________________________________________________________
)_____X_X___X___X___X_X___X_____X_____X_X_X___X_X___X_X_X_X_X___X_X_
*_______________X_________X_____________________X_________X_________
+_______________X_________X_____________________X_________X_________
,_______________X_________X_____________________X_________X_________
-_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_X_
._______________X_________X_____________________X_________X_________
/_______X_______X_________X_____________________X_________X_________
0_______X_______X_________X_____________________X_________X_________
:_______________X_________X_____________________X_________X_________
;_______________X_________X_____________________X_________X_________
<_______________X_________X_____________________X_________X_________
=_______________X_________X_____________________X_________X_________
>_______________X_________X_____________________X_________X_________
?_____X_X_X_X___X___X_X___X_____X_____X_X_X___X_X___X_X_X_X_X___X_X_
@_______________X_________X_____________________X_________X_________
[___________________________________________________________________
]_____X_X___X___X___X_X___X_____X_____X_X_X___X_X___X_X_X_X_X___X_X_
^_______________X_________X_____________________X_________X_________
________________X_________X_____________________X_________X_________
`_______________X_________X_____________________X_________X_________
{___________________________________________________________________
|_______________X_________X_____________________X_________X_________
}_____X_X___X___X___X_X___X_____X_____X_X_X___X_X___X_X_X_X_X___X_X_
~_______________X_________X_____________________X_________X_________
a_______________X_________X_____________________X_________X_________

IE8 Standard mode:
__!_"_#_$_%_&_'_(_)_*_+_,_-_._/_0_:_;_<_=_>_?_@_[_]_^___`_{_|_}_~_a_
!_____X_X_X_X___X___X_X___X_____X_____X_X_X___X_X___X_X_X_X_X___X_X_
"___________________________________________________________________
#_______________X_________X_____________________X_________X_________
$_________X_______________X_________________________________________
%_____X_X_X_X___X___X_X___X_____X_____X_X_X___X_X___X_X_X_X_X___X_X_
&_______________X_________X_____________________X_________X_________
'___________________________________________________________________
(___________________________________________________________________
)_____X_X___X___X___X_X___X_____X_____X_X_X___X_X___X_X_X_X_X___X_X_
*_______________X_________X_____________________X_________X_________
+_______________X_________X_____________________X_________X_________
,_______________X_________X_____________________X_________X_________
-___X_X_X___X_X_X___X_X_X___X_X_X_X_X_X_X_X___X_X___X_X_X_X_X___X_X_
._______________X_________X_____________________X_________X_________
/_______X_______X_________X_____________________X_________X_________
0_______X_______X_________X_____________________X_________X_________
:_______________X_________X_____________________X_________X_________
;_______________X_________X_____________________X_________X_________
<_______________X_________X_____________________X_________X_________
=_______________X_________X_____________________X_________X_________
>_______________X_________X_____________________X_________X_________
?_____X_X_X_X___X___X_X___X_____X_____X_X_X___X_X___X_X_X_X_X___X_X_
@_______________X_________X_____________________X_________X_________
[___________________________________________________________________
]_____X_X___X___X___X_X___X_____X_____X_X_X___X_X___X_X_X_X_X___X_X_
^_______________X_________X_____________________X_________X_________
________________X_________X_____________________X_________X_________
`_______________X_________X_____________________X_________X_________
{___________________________________________________________________
|_______________X_________X_____________________X_________X_________
}_____X_X___X___X___X_X___X_____X_____X_X_X___X_X___X_X_X_X_X___X_X_
~_______________X_________X_____________________X_________X_________
a_______________X_________X_____________________X_________X_________


FireFox 3.6:
__!_"_#_$_%_&_'_(_)_*_+_,_-_._/_0_:_;_<_=_>_?_@_[_]_^___`_{_|_}_~_a_
!_______________X_____________________X_________X_________X_________
"_______________X_____________________X_________X_________X_________
#_______________X_____________________X_________X_________X_________
$___________________________________________________________________
%_______________X_____________________X_________X_________X_________
&_______________X_____________________X_________X_________X_________
'___________________________________________________________________
(___________________________________________________________________
)_______________X_____________________X_________X_________X_________
*_______________X_____________________X_________X_________X_________
+_______________X_____________________X_________X_________X_________
,_______________X_____________________X_________X_________X_________
-_____________X_________________X_____________X_____X_X_X_________X_
._______________X_____________________X_________X_________X_________
/___________________________________________________________________
0___________________________________________________________________
:_______________X_____________________X_________X_________X_________
;_______________X_____________________X_________X_________X_________
<___________________________________________________________________
=_______________X_____________________X_________X_________X_________
>_______________X_____________________X_________X_________X_________
?_______________X_____________________X_________X_________X_________
@___________________________________________________________________
[___________________________________________________________________
]_______________X_____________________X_________X_________X_________
^___________________________________________________________________
____________________________________________________________________
`___________________________________________________________________
{___________________________________________________________________
|_______________X_____________________X_________X_________X_________
}_______________X_____________________X_________X_________X_________
~_______________X_____________________X_________X_________X_________
a___________________________________________________________________

icu (tested in webkit with modified break_line.cpp that disables shouldBreakAfter() and needsLineBreakIterator()):
__!_"_#_$_%_&_'_(_)_*_+_,_-_._/_0_:_;_<_=_>_?_@_[_]_^___`_{_|_}_~_a_
!_____X_X_X_X___X___X_X_________X_____X_X_X___X_X___X_X_X_X_____X_X_
"___________________________________________________________________
#_______X_X___________X_____________________________________________
$_______X_X_____X_____X_________________________X_________X_________
%_______X_X_____X_____X_________________________X_________X_________
&_______X_X___________X_____________________________________________
'___________________________________________________________________
(___________________________________________________________________
)_______X_X_____X_____X_________________________X_________X_________
*_______X_X___________X_____________________________________________
+_______X_X_____X_____X_________________________X_________X_________
,_______X_X_____X_____X_________X_______________X_________X_________
-_____X_X_X_X___X___X_X_______________X_X_X___X_X___X_X_X_X_____X_X_
._______X_X_____X_____X_________X_______________X_________X_________
/_____X_X_X_X___X___X_X_________X_____X_X_X___X_X___X_X_X_X_____X_X_
0___________________________________________________________________
:_______X_X_____X_____X_________X_______________X_________X_________
;_______X_X_____X_____X_________X_______________X_________X_________
<_______X_X___________X_____________________________________________
=_______X_X___________X_____________________________________________
>_______X_X___________X_____________________________________________
?_____X_X_X_X___X___X_X_________X_____X_X_X___X_X___X_X_X_X_____X_X_
@_______X_X___________X_____________________________________________
[___________________________________________________________________
]_______X_X_____X_____X_________________________X_________X_________
^_______X_X___________X_____________________________________________
________X_X___________X_____________________________________________
`_______X_X___________X_____________________________________________
{___________________________________________________________________
|_____X_X_X_X___X___X_X_________X_____X_X_X___X_X___X_X_X_X_____X_X_
}_______X_X_____X_____X_________________________X_________X_________
~_______X_X___________X_____________________________________________
a_______X_X___________X_____________________________________________
Comment 3 Xianzhu Wang 2010-04-20 02:23:33 PDT
Created attachment 53789 [details]
 The test case to generate the line breaking matrix

Fixed backslash issue of the last test case.
Comment 4 Xianzhu Wang 2010-04-20 22:16:54 PDT
Created attachment 53916 [details]
patch

This patch improves the line breaking behavior of WebKit by letting the underlying line breaking library (e.g. ICU) handle the following characters: '-', '(', ')', '[', ']', '{', '}' and soft hyphen.

The line breaking matrix with the patch is:
__!_"_#_$_%_&_'_(_)_*_+_,_-_._/_0_:_;_<_=_>_?_@_[_\_]_^___`_{_|_}_~_a_
!_______________X_______________________________X___________X_________
"_____________________________________________________________________
#_____________________________________________________________________
$_______________X_______________________________X___________X_________
%_______________X_______________________________X___________X_________
&_____________________________________________________________________
'_____________________________________________________________________
(_____________________________________________________________________
)_______X_X_____X_____X_________________________X_X_________X_________
*_____________________________________________________________________
+_______________X_______________________________X___________X_________
,_______________X_______________________________X___________X_________
-_____X_X_X_X___X___X_X_______________X_X_X___X_X_X___X_X_X_X_____X_X_
._______________X_______________________________X___________X_________
/_______________X_______________________________X___________X_________
0_____________________________________________________________________
:_______________X_______________________________X___________X_________
;_______________X_______________________________X___________X_________
<_____________________________________________________________________
=_____________________________________________________________________
>_____________________________________________________________________
?_____X_X_X_X___X___X_X___X_____X_____X_X_X___X_X_X___X_X_X_X_X___X_X_
@_____________________________________________________________________
[_____________________________________________________________________
\_______________X_______________________________X___________X_________
]_______X_X_____X_____X_________________________X_X_________X_________
^_____________________________________________________________________
______________________________________________________________________
`_____________________________________________________________________
{_____________________________________________________________________
|_______________X_______________________________X___________X_________
}_______X_X_____X_____X_________________________X_X_________X_________
~_____________________________________________________________________
a_____________________________________________________________________
Comment 5 Dimitri Glazkov (Google) 2010-05-04 10:00:44 PDT
Ditto here, please mark the patch for review.
Comment 6 WebKit Review Bot 2010-05-11 02:28:11 PDT
Attachment 53916 [details] did not pass style-queue:

Failed to run "['WebKitTools/Scripts/check-webkit-style', '--no-squash']" exit_code: 1
WebCore/rendering/break_lines.cpp:88:  A case label should not be indented, but line up with its switch statement.  [whitespace/indent] [4]
Total errors found: 1 in 17 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 7 Darin Adler 2010-05-11 09:53:49 PDT
Comment on attachment 53916 [details]
patch

Making this change will slow page rendering down quite a bit; we need to measure, but I am pretty sure this is true. Maybe there's another way to achieve this effect without such a significant speed cost.
Comment 8 Xianzhu Wang 2010-05-11 22:37:14 PDT
(In reply to comment #7)
> (From update of attachment 53916 [details])
> Making this change will slow page rendering down quite a bit; we need to measure, but I am pretty sure this is true. Maybe there's another way to achieve this effect without such a significant speed cost.

The patch is not using the method described in comment #1, but only letting the underlying line breaking library (e.g. ICU) handle the following characters: '-', '(', ')', '[', ']', '{', '}' and soft hyphen, so I guess the performance change will be trivial.  Anyway I'll measure the performance change.
Comment 9 Xianzhu Wang 2010-05-24 23:42:26 PDT
Created attachment 56979 [details]
(For your preview) another patch using a table to do line breaking for ASCII chars.
Comment 10 Xianzhu Wang 2010-06-20 19:39:38 PDT
Comment on attachment 53916 [details]
patch

I'll create a new patch after I get a Snow Leopard mac book.
Comment 11 Xianzhu Wang 2010-07-23 02:53:33 PDT
Created attachment 62400 [details]
Updated patch

This patch uses a line breaking table for all printable ASCII characters. It should have comparable performance as the original code. It keeps as much compatibility with the original code.
Comment 12 mitz 2010-07-25 19:00:04 PDT
Comment on attachment 62400 [details]
Updated patch

> +static const int asciiLineBreakTableFirstChar = '!';
> +static const int asciiLineBreakTableLastChar = 127;

Perhaps UChar is a more appropriate type for these. Can we define the latter in terms of the size of asciiLineBreakTable, or compile-time assert that the size matches the constants?

> +// Pack 8 bits into one byte
> +#define B(a, b, c, d, e, f, g, h) \
> +    ((a) | ((b) << 1) | ((c) << 2) | ((d) << 3) | ((e) << 4) | ((f) << 5) | ((g) << 6) | ((h) << 7))

I wonder if the memory savings are worth the extra work at runtime for shifting bits. I guess if there’s no measurable performance regression then it doesn’t matter.

> +// Line breaking table row for each digit (0-9)
> +#define DI { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }
> +
> +// Line breaking table row for ascii letters (a-z A-Z)
> +#define AL { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }
> +
> +#define F 0xFF
> +
> +// Line breaking table for printable ASCII characters. Most part of the table is compatibile with Firefox 3.6,
> +// except that the rows '-' and '?' still remain the behaviors before this change.

People reading this comment in the future will not know what “this change” is. I think it’s useful, when describing the behavior, to refer to the Unicode standard and to other browser engines, and call out the differences.

>                                                                                      This is a trade-off between
> +// compatibility with other browsers and backward-compatibility. Additional line break opportunities are added
> +// at least as possible.

I am not sure what the last sentence means. The least possible additional line breaking opportunities would be zero.

> +static const unsigned char asciiLineBreakTable[][12] = {
> +    //  !  "  #  $  %  &  '  (     )  *  +  ,  -  .  /  0  1-8   9  :  ;  <  =  >  ?  @     A-X      Y  Z  [  \  ]  ^  _  `     a-x      y  z  {  |  }  ~  DEL
> +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // !
> +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // "
> +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // #
> +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // $
> +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // %
> +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // &
> +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // '
> +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // (
> +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // )
> +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // *
> +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // +
> +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // ,
> +    { B(1, 1, 1, 1, 1, 1, 1, 1), B(1, 1, 1, 1, 1, 1, 1, 1), F, B(1, 1, 1, 1, 1, 1, 1, 1), F, F, F, B(1, 1, 1, 1, 1, 1, 1, 1), F, F, F, B(1, 1, 1, 1, 1, 1, 1, 1) }, // -
> +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // .
> +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // /
> +    DI,  DI,  DI,  DI,  DI,  DI,  DI,  DI,  DI,  DI, // 0-9
> +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // :
> +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // ;
> +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // <
> +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // =
> +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // >
> +    { B(0, 0, 1, 1, 1, 1, 0, 1), B(0, 1, 1, 0, 1, 0, 0, 1), F, B(1, 0, 0, 1, 1, 1, 0, 1), F, F, F, B(1, 1, 1, 1, 0, 1, 1, 1), F, F, F, B(1, 1, 1, 1, 0, 1, 1, 0) }, // ?
> +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // @
> +    AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL, // A-Z
> +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // [
> +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // '\'
> +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // ]
> +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // ^
> +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // _
> +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // `
> +    AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL, // a-z
> +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // {
> +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // |
> +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // }
> +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // ~
> +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // DEL
> +};

Please #undef the macros (B, AL, DI, F) here.

> +    default:
> +        // If both ch and nextCh are ASCII characters, use a lookup table for enhanced speed and for compatibility
> +        // with Internet Explorer 8 compatible mode.

Is there such a thing as “Internet Explorer 8 compatible mode”? What is it compatible with? Can we just say that this is done for compatibility with that thing? It’s slightly confusing to read about IE8 here and about Firefox above. I suggest putting all the specifics of the table in one place. Here it’s enough to say that we use the lookup table because it’s faster and it has different rules from the Unicode specification.

> +        if (ch >= asciiLineBreakTableFirstChar && ch <= asciiLineBreakTableLastChar
> +                && nextCh >= asciiLineBreakTableFirstChar && nextCh <= asciiLineBreakTableLastChar) {
> +            const unsigned char* tableRow = asciiLineBreakTable[ch - asciiLineBreakTableFirstChar];
> +            int nextChIndex = nextCh - asciiLineBreakTableFirstChar;
> +            unsigned char b = tableRow[nextChIndex / 8];
> +            return (b >> (nextChIndex % 8)) & 1;

To me, the expression
               b & (1 << (nextChIndex % 8))
(with a corresponding change to the B macro) is clearer than the one you wrote, but I guess that’s just a matter of personal preference.

Have you tested this patch for performance impact?
Comment 13 Xianzhu Wang 2010-07-26 02:23:34 PDT
Created attachment 62550 [details]
new patch 7/26
Comment 14 Xianzhu Wang 2010-07-26 02:31:36 PDT
(In reply to comment #12)
Thanks very much for review!

> (From update of attachment 62400 [details])
> > +static const int asciiLineBreakTableFirstChar = '!';
> > +static const int asciiLineBreakTableLastChar = 127;
> 
> Perhaps UChar is a more appropriate type for these. Can we define the latter in terms of the size of asciiLineBreakTable, or compile-time assert that the size matches the constants?

Done.

> 
> > +// Pack 8 bits into one byte
> > +#define B(a, b, c, d, e, f, g, h) \
> > +    ((a) | ((b) << 1) | ((c) << 2) | ((d) << 3) | ((e) << 4) | ((f) << 5) | ((g) << 6) | ((h) << 7))
> 
> I wonder if the memory savings are worth the extra work at runtime for shifting bits. I guess if there’s no measurable performance regression then it doesn’t matter.

I created a test to compare the performance of the following configurations:
1) original code
2) new code with packing
3) new code without packing

All configurations are run 1000 times to calculate line breaking opportunities between all adjacent chars in break_lines.cpp itself. The differences among the timings are very small (<5%).

> 
> > +// Line breaking table row for each digit (0-9)
> > +#define DI { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }
> > +
> > +// Line breaking table row for ascii letters (a-z A-Z)
> > +#define AL { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }
> > +
> > +#define F 0xFF
> > +
> > +// Line breaking table for printable ASCII characters. Most part of the table is compatibile with Firefox 3.6,
> > +// except that the rows '-' and '?' still remain the behaviors before this change.
> 
> People reading this comment in the future will not know what “this change” is. I think it’s useful, when describing the behavior, to refer to the Unicode standard and to other browser engines, and call out the differences.
>
> 

Comments re-written.

>                                                                                   This is a trade-off between
> > +// compatibility with other browsers and backward-compatibility. Additional line break opportunities are added
> > +// at least as possible.
> 
> I am not sure what the last sentence means. The least possible additional line breaking opportunities would be zero.

Comments re-written.

> 
> > +static const unsigned char asciiLineBreakTable[][12] = {
> > +    //  !  "  #  $  %  &  '  (     )  *  +  ,  -  .  /  0  1-8   9  :  ;  <  =  >  ?  @     A-X      Y  Z  [  \  ]  ^  _  `     a-x      y  z  {  |  }  ~  DEL
> > +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // !
> > +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // "
> > +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // #
> > +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // $
> > +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // %
> > +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // &
> > +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // '
> > +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // (
> > +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // )
> > +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // *
> > +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // +
> > +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // ,
> > +    { B(1, 1, 1, 1, 1, 1, 1, 1), B(1, 1, 1, 1, 1, 1, 1, 1), F, B(1, 1, 1, 1, 1, 1, 1, 1), F, F, F, B(1, 1, 1, 1, 1, 1, 1, 1), F, F, F, B(1, 1, 1, 1, 1, 1, 1, 1) }, // -
> > +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // .
> > +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // /
> > +    DI,  DI,  DI,  DI,  DI,  DI,  DI,  DI,  DI,  DI, // 0-9
> > +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // :
> > +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // ;
> > +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // <
> > +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // =
> > +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // >
> > +    { B(0, 0, 1, 1, 1, 1, 0, 1), B(0, 1, 1, 0, 1, 0, 0, 1), F, B(1, 0, 0, 1, 1, 1, 0, 1), F, F, F, B(1, 1, 1, 1, 0, 1, 1, 1), F, F, F, B(1, 1, 1, 1, 0, 1, 1, 0) }, // ?
> > +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // @
> > +    AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL, // A-Z
> > +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // [
> > +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // '\'
> > +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // ]
> > +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // ^
> > +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // _
> > +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // `
> > +    AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL,  AL, // a-z
> > +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // {
> > +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // |
> > +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // }
> > +    { B(0, 0, 0, 0, 0, 0, 0, 1), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 1, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 1, 0, 0, 0, 0, 0) }, // ~
> > +    { B(0, 0, 0, 0, 0, 0, 0, 0), B(0, 0, 0, 0, 0, 0, 0, 0), 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0), 0, 0, 0, B(0, 0, 0, 0, 0, 0, 0, 0) }, // DEL
> > +};
> 
> Please #undef the macros (B, AL, DI, F) here.
>
Done.
 
> > +    default:
> > +        // If both ch and nextCh are ASCII characters, use a lookup table for enhanced speed and for compatibility
> > +        // with Internet Explorer 8 compatible mode.
> 
> Is there such a thing as “Internet Explorer 8 compatible mode”? What is it compatible with? Can we just say that this is done for compatibility with that thing? It’s slightly confusing to read about IE8 here and about Firefox above. I suggest putting all the specifics of the table in one place. Here it’s enough to say that we use the lookup table because it’s faster and it has different rules from the Unicode specification.
> 

Sorry, this was a mistake due to my carelessness. Comments re-written.

> > +        if (ch >= asciiLineBreakTableFirstChar && ch <= asciiLineBreakTableLastChar
> > +                && nextCh >= asciiLineBreakTableFirstChar && nextCh <= asciiLineBreakTableLastChar) {
> > +            const unsigned char* tableRow = asciiLineBreakTable[ch - asciiLineBreakTableFirstChar];
> > +            int nextChIndex = nextCh - asciiLineBreakTableFirstChar;
> > +            unsigned char b = tableRow[nextChIndex / 8];
> > +            return (b >> (nextChIndex % 8)) & 1;
> 
> To me, the expression
>                b & (1 << (nextChIndex % 8))
> (with a corresponding change to the B macro) is clearer than the one you wrote, but I guess that’s just a matter of personal preference.
>

Done. B macro doesn't need change.

> Have you tested this patch for performance impact?

Done. See above about byte packing and shifting.
Comment 15 mitz 2010-07-26 11:40:50 PDT
(In reply to comment #14)
> I created a test to compare the performance of the following configurations:
> 1) original code
> 2) new code with packing
> 3) new code without packing
> 
> All configurations are run 1000 times to calculate line breaking opportunities between all adjacent chars in break_lines.cpp itself. The differences among the timings are very small (<5%).

Thank you for testing! The results are a little vague. Did this register as slower than the original code in your test? If so, did you test the impact on any page-loading test?

> > To me, the expression
> >                b & (1 << (nextChIndex % 8))
> > (with a corresponding change to the B macro) is clearer than the one you wrote, but I guess that’s just a matter of personal preference.
> >
> 
> Done. B macro doesn't need change.

Thanks. I confused myself into believing it needed to change.

> > Have you tested this patch for performance impact?
> 
> Done. See above about byte packing and shifting.

I want to r+ the patch but I still have some concerns about performance. Please answer my question above. Thanks again!
Comment 16 Xianzhu Wang 2010-07-26 21:32:03 PDT
Created attachment 62641 [details]
performance test

(In reply to comment #15)
> Thank you for testing! The results are a little vague. Did this register as slower than the original code in your test? If so, did you test the impact on any page-loading test?
> I want to r+ the patch but I still have some concerns about performance. Please answer my question above. Thanks again!

Sorry, after investigation I found that yesterday's test results were incorrect because I just called shouldBreakAfter() but didn't let the return values depend on some long-lived variable, so the compiler optimized the whole function out.

I modified the test code and run again (CPU 2.4G i5). The test runs 15x5000 loops of each shouldBreakAfter() implementation. break_lines.cpp is used as the input to test line breaking opportunity between each adjacent characters.

The results are (first column: calling shouldBreakAfter() only; second column: calling 'isBreakableSpace() || shouldBreakAfter()'):

original:     1.972549s 4.934085s
new unpacked: 4.302849s 4.680052s
new packed:   4.751770s 4.520014s

I'm curious about that the second column shows the new implementation is even faster than the original. I can't explain this from the source code. I also briefly read the optimized assembly code, and didn't find the reason.

I'm also trying to run page load tests. Hopefully I'll give results tomorrow.
Comment 17 Xianzhu Wang 2010-07-27 19:35:45 PDT
I just ran page load test to compare the performance before and after the patch. To compare the performance of line breaking, I selected some URLs from w3.org that contain very long text. The URLs are as below:

http://www.w3.org/TR/1999/REC-html401-19991224/struct/tables.html
http://www.w3.org/TR/1999/REC-html401-19991224/struct/objects.html
http://www.w3.org/TR/1999/REC-html401-19991224/interact/forms.html
http://www.w3.org/TR/1999/REC-html401-19991224/sgml/loosedtd.html
http://www.w3.org/TR/1999/REC-html401-19991224/appendix/changes.html
http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html
http://www.w3.org/TR/2008/REC-CSS2-20080411/syndata.html
http://www.w3.org/TR/2008/REC-CSS2-20080411/visuren.html
http://www.w3.org/TR/2008/REC-CSS2-20080411/generate.html
http://www.w3.org/TR/2008/REC-CSS2-20080411/fonts.html
http://www.w3.org/TR/2008/REC-CSS2-20080411/tables.html
http://www.w3.org/TR/2008/REC-CSS2-20080411/propidx.html

I put a file named w3.pltsuite under directory /Applications/Safari.app/Contents/Resources. The configurations of Page Load Test are:
Suite: w3
Web-Kit-only window
Don't clear cache
Run test 20 times
Hide this window during test

I repeated 10 times (each repeated 20 times by Page Load Test) of Page Load Test before and after applying the patch. The results are:
Before patch   	After patch
Average	RMS  	Average	RMS
40.1	38.6	37.6	36.1
37.7	36.3	37.2	35.7
36.4	34.9	36.4	35
39.7	38.1	38.9	37.5
37.4	35.9	36.6	35.1
36.3	34.9	36.4	34.9
36.5	35.0	37.3	35.9
40.5	38.9	36.4	34.9
37.1	35.6	36.4	34.9
36.3	34.8	38.9	37.4
Average:
37.8	36.3	37.21	35.74

I think I can draw the conclusion that the patch has no visible impact on page load time of text-intensive web pages.
Comment 18 Xianzhu Wang 2010-07-27 19:54:05 PDT
Comment on attachment 62550 [details]
new patch 7/26

Thanks mitz for review! Could you please help commit the patch?
Comment 19 WebKit Commit Bot 2010-07-28 09:01:06 PDT
Comment on attachment 62550 [details]
new patch 7/26

Clearing flags on attachment: 62550

Committed r64207: <http://trac.webkit.org/changeset/64207>
Comment 20 WebKit Commit Bot 2010-07-28 09:01:12 PDT
All reviewed patches have been landed.  Closing bug.