<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>3963</bug_id>
          
          <creation_ts>2005-07-12 07:39:03 -0700</creation_ts>
          <short_desc>Trailing space included in line if next line begins with non-Latin-1 character</short_desc>
          <delta_ts>2005-08-27 12:48:51 -0700</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>Layout and Rendering</component>
          <version>412</version>
          <rep_platform>Mac</rep_platform>
          <op_sys>OS X 10.4</op_sys>
          <bug_status>VERIFIED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>Normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter>mitz</reporter>
          <assigned_to name="Dave Hyatt">hyatt</assigned_to>
          <cc>ap</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>14438</commentid>
    <comment_count>0</comment_count>
    <who name="">mitz</who>
    <bug_when>2005-07-12 07:39:03 -0700</bug_when>
    <thetext>Summary: when a line break occurs at a space, that space should not be rendered on either line. 
However, when the character following the space is not in Latin-1, WebKit renders the space at the end 
of the first line.

To reproduce: open the testcase in Safari.

Expected: &quot;Lorem&quot; to be flush against the right edge in each of the three blue boxes, like in the green 
box.

Actual: In the first blue box, there is an underlined space between &quot;Lorem&quot; and the right edge of the 
box, like in the red box. The other two blue boxes render as expected.

Analysis: The problem is in khtml::isBreakable(). &quot;A B&quot; can be broken before the space as well as before 
the B. Contrary to what the comment in the code says, isBreakable considers it breakable only before 
the space. Which is OK. The problem is that with characters outside Latin-1, isBreakable invokes 
UCFindTextBreak, which actually says that the break can happen only before the B.

I&apos;m not sure how to fix it. Changing break_lines.cpp:108 to return true if end == pos + 1 actually fixes 
the testcase, but I&apos;m not sure it&apos;s a complete solution.

Anyway, the comment should be changed to reflect exactly what isBreakable() does.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>14439</commentid>
    <comment_count>1</comment_count>
      <attachid>2925</attachid>
    <who name="">mitz</who>
    <bug_when>2005-07-12 07:39:41 -0700</bug_when>
    <thetext>Created attachment 2925
Testcase</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>14632</commentid>
    <comment_count>2</comment_count>
    <who name="Oliver Hunt">oliver</who>
    <bug_when>2005-07-13 23:16:00 -0700</bug_when>
    <thetext>Reproducable on ToT -- good testcase</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>15992</commentid>
    <comment_count>3</comment_count>
    <who name="Justin Garcia">justin.garcia</who>
    <bug_when>2005-08-05 13:58:21 -0700</bug_when>
    <thetext>Looking into this.  Assigning this to myself.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>16728</commentid>
    <comment_count>4</comment_count>
    <who name="Justin Garcia">justin.garcia</who>
    <bug_when>2005-08-15 11:32:26 -0700</bug_when>
    <thetext>Haven&apos;t had a chance to look at this yet.  Unassigning.  Mitz?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>16730</commentid>
    <comment_count>5</comment_count>
    <who name="">mitz</who>
    <bug_when>2005-08-15 12:01:49 -0700</bug_when>
    <thetext>(In reply to comment #4)
&gt; Haven&apos;t had a chance to look at this yet.  Unassigning.  Mitz?

The first thing I want to do is to not call isBreakable() from findNextLineBreak(), and instead call 
UCFindTextBreak() only as necessary (i.e. call it once and not call it again before we reach the position it 
gave us). This will eliminate the O(n^2) bit, and by moving all the relevant logic into findNextLineBreak() 
will hopefully make it easier to move on with fixing the actual bug.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>17041</commentid>
    <comment_count>6</comment_count>
    <who name="Dave Hyatt">hyatt</who>
    <bug_when>2005-08-21 01:37:12 -0700</bug_when>
    <thetext>I think a simpler option would be to just fix isBreakable.  I think there are ways you can query for 
breakability without having to &quot;search&quot; for the position of the next text break.
</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>17045</commentid>
    <comment_count>7</comment_count>
    <who name="">mitz</who>
    <bug_when>2005-08-21 04:02:14 -0700</bug_when>
    <thetext>(In reply to comment #6)
&gt; I think a simpler option would be to just fix isBreakable.  I think there are ways you can query for 
&gt; breakability without having to &quot;search&quot; for the position of the next text break.

Sure. The only way I know of is calling UCFindTextBreak with textLength equal to startOffset+1. I don&apos;t 
think ICU has anything better to offer.

Anyway, I&apos;m not sure what the expected behavior of isBreakable is. What it currently does in the Latin-1 
case works, but doesn&apos;t fit any definition of line-breakability -- in particular not UCFindTextBreak&apos;s -- 
other than &quot;what findNextLineBreak expects&quot;. Can that definition be extended to cover the general case?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>17046</commentid>
    <comment_count>8</comment_count>
      <attachid>3491</attachid>
    <who name="">mitz</who>
    <bug_when>2005-08-21 07:59:21 -0700</bug_when>
    <thetext>Created attachment 3491
Extend Latin-1 behavior to the general case</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>17047</commentid>
    <comment_count>9</comment_count>
      <attachid>3491</attachid>
    <who name="">mitz</who>
    <bug_when>2005-08-21 08:06:07 -0700</bug_when>
    <thetext>Comment on attachment 3491
Extend Latin-1 behavior to the general case

This patch fixes this bug and doesn&apos;t affect any of the layout tests. If it is
landed, then I&apos;ll open a separate bug about the O(n^2) complexity, and yet
another bug about many breaking opportunities we miss in Latin-1, e.g. breaking
before a left parenthesis.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>17048</commentid>
    <comment_count>10</comment_count>
    <who name="Darin Adler">darin</who>
    <bug_when>2005-08-21 09:09:49 -0700</bug_when>
    <thetext>Side comment (not about the patch). I think we&apos;d like to switch to ICU for finding the breaks rather than 
using Carbon, unless there&apos;s a reason not to. Currently the only reason not to that I know of is a concern 
that Thai line breaking is not implemented yet in ICU.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>17342</commentid>
    <comment_count>11</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2005-08-24 11:21:59 -0700</bug_when>
    <thetext>(In reply to comment #10)
&gt; Side comment (not about the patch). I think we&apos;d like to switch to ICU for finding the breaks rather 
than using Carbon, unless there&apos;s a reason not to.

  A reason that you may or may not find important... We (Mac developers outside Apple) are strongly 
advised not to use ICU in our projects for various reasons (which probably don&apos;t apply to WebKit). Still, I 
find &quot;eating your own dog food&quot; a good thing to do in this case - after all, WebKit has no special text 
processing requirements, and should be served by CFString well. If something is missing, chances are 
that we all would benefit from it being added to CFString.

  Moving from Unicode Utilities to an open source alternative, be it ICU or CFString, sounds great to me.
</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>17349</commentid>
    <comment_count>12</comment_count>
    <who name="Darin Adler">darin</who>
    <bug_when>2005-08-24 13:27:30 -0700</bug_when>
    <thetext>The CFString APIs require that your text be in a CFString. That&apos;s not the case in WebKit, so I don&apos;t think 
those APIs are going to be suitable for this work.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>17355</commentid>
    <comment_count>13</comment_count>
    <who name="Alexey Proskuryakov">ap</who>
    <bug_when>2005-08-24 21:52:35 -0700</bug_when>
    <thetext>(In reply to comment #12)

Yes, ICU works on buffers of UChars - but that&apos;s pretty easy to emulate with CFString:
CFStringRef str = CFStringCreateWithCharactersNoCopy(0, unichars, numChars, kCFAllocatorNull);
// do whatever you need; there&apos;s also CFStringCreateMutableWithExternalCharactersNoCopy()
CFRelease(str);

Given the complexity of algorithms involved, these additional function calls shouldn&apos;t be much of an 
overhead (especially since WebKit&apos;s native format isn&apos;t always buffers of UChars).

A nice thing about CFString is that it works with multiple encodings natively, without necessarily 
converting them into Unicode:
CFStringRef str = CFStringCreateWithCStringNoCopy(..., kSomeEncoding).
This transparently offers performance benefits in cases when the encoding is a simple one - try to 
match that with ICU! ;)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>17361</commentid>
    <comment_count>14</comment_count>
    <who name="">mitz</who>
    <bug_when>2005-08-24 22:50:25 -0700</bug_when>
    <thetext>Created bug 4628 - Use ICU instead of Carbon in khtml::isBreakable() </thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>17715</commentid>
    <comment_count>15</comment_count>
      <attachid>3491</attachid>
    <who name="Dave Hyatt">hyatt</who>
    <bug_when>2005-08-26 17:04:45 -0700</bug_when>
    <thetext>Comment on attachment 3491
Extend Latin-1 behavior to the general case

r=me</thetext>
  </long_desc>
      
          <attachment
              isobsolete="0"
              ispatch="0"
              isprivate="0"
          >
            <attachid>2925</attachid>
            <date>2005-07-12 07:39:41 -0700</date>
            <delta_ts>2005-07-12 07:39:41 -0700</delta_ts>
            <desc>Testcase</desc>
            <filename>lineBreaks.html</filename>
            <type>text/html</type>
            <size>2394</size>
            <attacher>mitz</attacher>
            
              <data encoding="base64">/v8APAAhAEQATwBDAFQAWQBQAEUAIABIAFQATQBMACAAUABVAEIATABJAEMAIAAiAC0ALwAvAFcA
MwBDAC8ALwBEAFQARAAgAEgAVABNAEwAIAA0AC4AMAAxACAAVAByAGEAbgBzAGkAdABpAG8AbgBh
AGwALwAvAEUATgAiACAACgAiAGgAdAB0AHAAOgAvAC8AdwB3AHcALgB3ADMALgBvAHIAZwAvAFQA
UgAvAGgAdABtAGwANAAvAGwAbwBvAHMAZQAuAGQAdABkACIAPgAKADwAaAB0AG0AbAA+AA0ACgAg
ACAAIAAgADwAaABlAGEAZAA+AAoAIAAgACAAIAAgACAAIAAgADwAdABpAHQAbABlAD4ATABpAG4A
ZQAgAGIAcgBlAGEAawBzADwALwB0AGkAdABsAGUAPgAKACAAIAAgACAAPAAvAGgAZQBhAGQAPgAK
ACAAIAAgACAAPABiAG8AZAB5AD4ACgAgACAAIAAgACAAIAAgACAAVABoAGkAcwAgAGkAcwAgAGcA
bwBvAGQAOgAKACAAIAAgACAAIAAgACAAIAA8AGQAaQB2ACAAcwB0AHkAbABlAD0AIgBmAG8AbgB0
AC0AZgBhAG0AaQBsAHkAOgAnAEwAdQBjAGkAZABhACAARwByAGEAbgBkAGUAJwA7ACAAZgBvAG4A
dAAtAHMAaQB6AGUAOgAxADYAcAB0ADsAIAB0AGUAeAB0AC0AZABlAGMAbwByAGEAdABpAG8AbgA6
AHUAbgBkAGUAcgBsAGkAbgBlADsAIAB3AGkAZAB0AGgAOgAxADAAMABwAHgAOwAgAHQAZQB4AHQA
LQBhAGwAaQBnAG4AOgByAGkAZwBoAHQAOwAiAD4ACgAgACAAIAAgACAAIAAgACAAIAAgACAAIAA8
AHAAIABzAHQAeQBsAGUAPQAiAGIAbwByAGQAZQByADoAcwBvAGwAaQBkACAAZwByAGUAZQBuACAA
MQBwAHgAOwAiAD4ACgAgACAAIAAgACAAIAAgACAAIAAgACAAIABMAG8AcgBlAG0AIABpAHAAcwB1
AG0ACgAgACAAIAAgACAAIAAgACAAIAAgACAAIAA8AC8AcAA+AAoAIAAgACAAIAAgACAAIAAgADwA
LwBkAGkAdgA+AAoAIAAgACAAIAAgACAAIAAgAFQAaABlACAAZgBvAGwAbABvAHcAaQBuAGcAIAB0
AGgAcgBlAGUAIABzAGgAbwB1AGwAZAAgAGwAbwBvAGsAIABsAGkAawBlACAAJgBsAGQAcQB1AG8A
OwBnAG8AbwBkACYAcgBkAHEAdQBvADsAOgAKACAAIAAgACAAIAAgACAAIAA8AGQAaQB2ACAAcwB0
AHkAbABlAD0AIgBmAG8AbgB0AC0AZgBhAG0AaQBsAHkAOgAnAEwAdQBjAGkAZABhACAARwByAGEA
bgBkAGUAJwA7ACAAZgBvAG4AdAAtAHMAaQB6AGUAOgAxADYAcAB0ADsAIAB0AGUAeAB0AC0AZABl
AGMAbwByAGEAdABpAG8AbgA6AHUAbgBkAGUAcgBsAGkAbgBlADsAIAB3AGkAZAB0AGgAOgAxADAA
MABwAHgAOwAgAHQAZQB4AHQALQBhAGwAaQBnAG4AOgByAGkAZwBoAHQAOwAiAD4ACgAgACAAIAAg
ACAAIAAgACAAIAAgACAAIAA8AHAAIABzAHQAeQBsAGUAPQAiAGIAbwByAGQAZQByADoAcwBvAGwA
aQBkACAAYgBsAHUAZQAgADEAcAB4ADsAIgA+AAoAIAAgACAAIAAgACAAIAAgACAAIAAgACAATABv
AHIAZQBtACAAJgAjAHgAMAAxADMAMQA7AHAAcwB1AG0ACgAgACAAIAAgACAAIAAgACAAIAAgACAA
IAA8AC8AcAA+AAoAIAAgACAAIAAgACAAIAAgACAAIAAgACAAPABwACAAcwB0AHkAbABlAD0AIgBi
AG8AcgBkAGUAcgA6AHMAbwBsAGkAZAAgAGIAbAB1AGUAIAAxAHAAeAA7ACIAPgAKACAAIAAgACAA
IAAgACAAIAAgACAAIAAgAEwAbwByAGUAbQAgACAAJgAjAHgAMAAxADMAMQA7AHAAcwB1AG0ACgAg
ACAAIAAgACAAIAAgACAAIAAgACAAIAA8AC8AcAA+AAoAIAAgACAAIAAgACAAIAAgACAAIAAgACAA
PABwACAAcwB0AHkAbABlAD0AIgBiAG8AcgBkAGUAcgA6AHMAbwBsAGkAZAAgAGIAbAB1AGUAIAAx
AHAAeAA7ACIAPgAKACAAIAAgACAAIAAgACAAIAAgACAAIAAgAEwAbwByAGUAJgAjAHgAMQBlADMA
ZgA7ACAAaQBwAHMAdQBtAAoAIAAgACAAIAAgACAAIAAgACAAIAAgACAAPAAvAHAAPgAKACAAIAAg
ACAAIAAgACAAIAA8AC8AZABpAHYAPgAKACAAIAAgACAAIAAgACAAIABUAGgAaQBzACAAaQBzACAA
YgBhAGQAOgAKACAAIAAgACAAIAAgACAAIAA8AGQAaQB2ACAAcwB0AHkAbABlAD0AIgBmAG8AbgB0
AC0AZgBhAG0AaQBsAHkAOgAnAEwAdQBjAGkAZABhACAARwByAGEAbgBkAGUAJwA7ACAAZgBvAG4A
dAAtAHMAaQB6AGUAOgAxADYAcAB0ADsAIAB0AGUAeAB0AC0AZABlAGMAbwByAGEAdABpAG8AbgA6
AHUAbgBkAGUAcgBsAGkAbgBlADsAIAB3AGkAZAB0AGgAOgAxADAAMABwAHgAOwAgAHQAZQB4AHQA
LQBhAGwAaQBnAG4AOgByAGkAZwBoAHQAOwAiAD4ACgAgACAAIAAgACAAIAAgACAAIAAgACAAIAA8
AHAAIABzAHQAeQBsAGUAPQAiAGIAbwByAGQAZQByADoAcwBvAGwAaQBkACAAcgBlAGQAIAAxAHAA
eAA7ACIAPgAKACAAIAAgACAAIAAgACAAIAAgACAAIAAgAEwAbwByAGUAbQAmAG4AYgBzAHAAOwAg
AGkAcABzAHUAbQAKACAAIAAgACAAIAAgACAAIAAgACAAIAAgADwALwBwAD4ACgAgACAAIAAgACAA
IAAgACAAPAAvAGQAaQB2AD4ACgAgACAAIAAgADwALwBiAG8AZAB5AD4ACgA8AC8AaAB0AG0AbAA+
</data>

          </attachment>
          <attachment
              isobsolete="0"
              ispatch="1"
              isprivate="0"
          >
            <attachid>3491</attachid>
            <date>2005-08-21 07:59:21 -0700</date>
            <delta_ts>2005-08-26 17:04:45 -0700</delta_ts>
            <desc>Extend Latin-1 behavior to the general case</desc>
            <filename>3963_patch_r1.txt</filename>
            <type>text/plain</type>
            <size>778</size>
            <attacher>mitz</attacher>
            
              <data encoding="base64">SW5kZXg6IGJyZWFrX2xpbmVzLmNwcAo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09ClJDUyBmaWxlOiAvY3ZzL3Jvb3QvV2Vi
Q29yZS9raHRtbC9yZW5kZXJpbmcvYnJlYWtfbGluZXMuY3BwLHYKcmV0cmlldmluZyByZXZpc2lv
biAxLjIxCmRpZmYgLXAgLXUgLXIxLjIxIGJyZWFrX2xpbmVzLmNwcAotLS0gYnJlYWtfbGluZXMu
Y3BwCTI5IEp1bCAyMDA1IDIzOjQyOjQ4IC0wMDAwCTEuMjEKKysrIGJyZWFrX2xpbmVzLmNwcAky
MSBBdWcgMjAwNSAxNDo1MjoxNCAtMDAwMApAQCAtMTA1LDcgKzEwNSw3IEBAIGJvb2wgaXNCcmVh
a2FibGUoY29uc3QgUUNoYXIgKnMsIGludCBwb3MKIAogICAgICAgICAvLyBJZiBjYXJib24gZmFp
bHMsIGZhaWwgYmFjayBvbiBzaW1wbGUgd2hpdGUgc3BhY2UgZGV0ZWN0aW9uLgogICAgICAgICBp
ZiAoZmluZFN0YXR1cyA9PSAwKQotICAgICAgICAgICAgcmV0dXJuICgoaW50KWVuZCA9PSBwb3Mp
ID8gdHJ1ZSA6IGZhbHNlOworICAgICAgICAgICAgcmV0dXJuICgoaW50KWVuZCA9PSBwb3MgJiYg
IShsYXN0Q2ggPT0gJyAnIHx8IGxhc3RDaCA9PSAnXG4nIHx8IGxhc3RDaCA9PSAnXHQnIHx8IChi
cmVha05CU1AgJiYgbGFzdENoID09IDB4YTApKSk7CiAgICAgfQogICAgIAogICAgIC8vIE1hdGNo
IFdpbklFJ3MgYnJlYWtpbmcgc3RyYXRlZ3ksIHdoaWNoIGlzIHRvIGFsd2F5cyBhbGxvdyBicmVh
a3MgYWZ0ZXIgaHlwaGVucyBhbmQgcXVlc3Rpb24gbWFya3MuCg==
</data>
<flag name="review"
          id="411"
          type_id="1"
          status="+"
          setter="hyatt"
    />
          </attachment>
      

    </bug>

</bugzilla>