WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
4171
text-transform:capitalize exhibits incorrect behavior in many edge cases
https://bugs.webkit.org/show_bug.cgi?id=4171
Summary
text-transform:capitalize exhibits incorrect behavior in many edge cases
Beth Dakin
Reported
2005-07-27 17:20:29 PDT
This is the continuation of
http://bugzilla.opendarwin.org/show_bug.cgi?id=3406
Bugzilla 3406 began to address some of the problems with text-transform:capitalize, but fundamentally, to fix all of the problems (see attached test case which was submitted by Nicholas Shanks for
bug 3406
), we need a better word-break algorithm. Darin suggests we implement the algorithms provided in <unicode/ubrk.h> This bug is also being tracked with Radar 4195862.
Attachments
Test case created by Nicholas Shanks for 3406
(8.07 KB, text/html)
2005-07-27 17:21 PDT
,
Beth Dakin
no flags
Details
First attempt at implementing UBreakIterator
(4.56 KB, patch)
2006-03-14 20:19 PST
,
Beth Dakin
mjs
: review+
Details
Formatted Diff
Diff
screenshot of observed effect
(14.66 KB, image/png)
2006-03-15 10:35 PST
,
Nicholas Shanks
no flags
Details
View All
Add attachment
proposed patch, testcase, etc.
Beth Dakin
Comment 1
2005-07-27 17:21:48 PDT
Created
attachment 3125
[details]
Test case created by Nicholas Shanks for 3406
Alexey Proskuryakov
Comment 2
2005-07-29 01:39:23 PDT
Just wondering, has CFString been considered for WebCore? It's open source, cross-platform, and has a lot of string manipulation routines, including CFStringCapitalize(). Under OS X, that would give results consistent with other applications, and reduce memory footprint. Under other OSes, that would give a better tested CF-Lite, which is a good thing IMO.
Beth Dakin
Comment 3
2006-03-14 20:19:06 PST
Created
attachment 7069
[details]
First attempt at implementing UBreakIterator Here is a first pass at implementing text-transform:capitalize with a UBreakIterator. This fixes a TOT bug I noticed where we were no longer capitalizing after non-breaking spaces. It also changes some of our current behavior in ways that I think is good. For example, instead of "Newcastle-upon-tyne," this patch makes it "Newcastle-Upon-Tyne." Also instead of "E.g." this new patch writes out "E.G." These new behaviors are not necessarily correct, but they at least seem consistent and more correct than the old behavior for these edge cases. The patch came out very interleaved so it is kind of hard to read.
Maciej Stachowiak
Comment 4
2006-03-14 21:44:56 PST
Comment on
attachment 7069
[details]
First attempt at implementing UBreakIterator Looks great! Reusing the word break iterator like this is good. One thing I'm not 100% sure about: + QChar previous = 0; Does the ICU word break iterator handle the case of an initial null character properly? For example, if you do a document that contains only <span style="text-transform: capitalize">word</style> properly capitalize the word? If so, fine to land as is, otherwise I think a space might be a better default choice. r=me assuming this case works.
Beth Dakin
Comment 5
2006-03-14 22:40:42 PST
Setting previous to 0 does work, but I think that a space makes more sense, so I changed it. I committed the patch. Our behavior still isn't perfect, so we may want to create another bug for edge cases, but I am going to conisder this to be the bug representing our switch to the UBreakIterator and mark it fixed.
Nicholas Shanks
Comment 6
2006-03-15 07:40:24 PST
In the test case, this turns "earth<span>worm</span>" into "EarthOrm" ! What is happening to the 'w' ?
Beth Dakin
Comment 7
2006-03-15 09:33:12 PST
I am not seeing that behavior. I just see "Earthworm." Are you sure you have updated your tree? That is very strange.
Nicholas Shanks
Comment 8
2006-03-15 10:18:47 PST
I updated to ToT (r. 13304) a few hours ago, compiled and ran it. Incredibly, turning on "Use ATSU for text" causes the rendering to appear as "Earth?Orm", despite no hyphen occurring in the HTML, and a copy/paste causes "EarthCorm" to be put on the clipboard under 'TEXT' and "Earth-Orm" as NSStringPboardType!
Nicholas Shanks
Comment 9
2006-03-15 10:25:05 PST
Oops. That question mark was supposed to be a HYPHEN MINUS.
Alexey Proskuryakov
Comment 10
2006-03-15 10:32:07 PST
I can confirm that different text goes into different clipboard flavors (like Earthworm vs. EarthWorm), but I don't see any dependence on the "Use ATSU for text" setting, nor any seriously broken text. FWIW.
Nicholas Shanks
Comment 11
2006-03-15 10:35:31 PST
Created
attachment 7090
[details]
screenshot of observed effect Rendering of test case with ToT (r. 13304) and "Use ATSU For All Text" turned on. Includes result of a copy/paste into TextEdit's plain text (right) and rich text (left) windows. The hyphen on the clipboard is U+2D HYPHEN MINUS. Turning the ATSU option off causes a SegFault on page reload.
Alexey Proskuryakov
Comment 12
2006-03-15 10:44:01 PST
Oh, I see something that can kind of explain the problem (which would likely be separate from different pasteboard flavors getting different results): Safari(2737,0xa000ed68) malloc: *** error for object 0x12fe29e0: incorrect checksum for freed object - object was probably modified after being freed, break at szone_error to debug Safari(2737,0xa000ed68) malloc: *** set a breakpoint in szone_error to debug
Beth Dakin
Comment 13
2006-03-15 10:52:46 PST
I still need to investigate the ATSU problem, but I just checked in a patch for a leak and some occassional crashes that were happening through dump render tree. I don't expect that to fix whatever this ATSU problem is, but I figure it is worth noting.
Nicholas Shanks
Comment 14
2006-03-15 11:02:36 PST
Revision 13306 now renders as Earth"Orm. Clipboard 'TEXT' = Earth"Orm NSPboardStringType = EarthGorm (ATSU off) I suspect this is a memory clobbering issue, but that the recent checkin is not the one causing it.
Beth Dakin
Comment 15
2006-03-15 11:17:38 PST
Okay, so I also see "EarthWorm" in the pasteboard, but I haven't been able to reproduce any of the other strange behaviors you have described, with or without ATSU enabled. Is there anything else different about your settings that might make a difference? I am going to add Darin to the cc list because he has a lot of expertise in this area. (Hope you don't mind, Darin!)
Nicholas Shanks
Comment 16
2006-03-15 13:12:29 PST
I originally thought it might have been caused by some of the stuff I have installed, like Inquisitor and SafariStand, but I ruled that out after disabling it and still seeing the problems. 417.9.2 does not exhibit the SegFault on switching between ATSU on/off nor the letter munging behaviour (which also causes the first letter of the line in Klingon to disappear), either with or without the extensions. The difference in clipboard contents is not testable in released version of safari.
Beth Dakin
Comment 17
2006-03-15 13:44:22 PST
Could you file a new bug for the problems you are seeing? Still can't repro. Thanks!
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug