Bug 11160 - Double Click to Select Words does not work on Windows
Summary: Double Click to Select Words does not work on Windows
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: New Bugs (show other bugs)
Version: 420+
Hardware: PC Windows XP
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-10-04 17:17 PDT by Hunter L. Williams
Modified: 2007-07-04 22:52 PDT (History)
3 users (show)

See Also:


Attachments
straw man patch (8.37 KB, patch)
2006-10-04 17:20 PDT, Hunter L. Williams
darin: review-
Details | Formatted Diff | Diff
more complete but still not done patch (18.15 KB, patch)
2006-10-05 11:34 PDT, Hunter L. Williams
no flags Details | Formatted Diff | Diff
patch for review (26.35 KB, patch)
2006-10-05 18:34 PDT, Hunter L. Williams
darin: review-
Details | Formatted Diff | Diff
patch, comments addressed (34.81 KB, patch)
2006-10-10 11:36 PDT, Hunter L. Williams
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Hunter L. Williams 2006-10-04 17:17:51 PDT
On Windows, if you double click to select a word, it does not work. This appears to be because finding word boundaries is implemented in a Mac specific way and there is no multiplatform version. 

A solution might be to use ICU on all platforms to do word breaking, rather than using NSString.
Comment 1 Hunter L. Williams 2006-10-04 17:20:05 PDT
Created attachment 10919 [details]
straw man patch

I am interested in WebKit developers' thoughts on this approach. This is not a complete patch (no Changelog or equivalent Mac project file work) but could be completed if desired.
Comment 2 Darin Adler 2006-10-05 07:48:31 PDT
Comment on attachment 10919 [details]
straw man patch

Looks fine.

But we don't want to add a new copy of the code from TextUtilities.mm. Instead, we need to delete currentTextBreakLocaleID, and the two sentence functions from TextUtilities.mm, and we need to update the project file on Macintosh as well as on Windows to include TextUtilities.cpp, with proper ifdefs. At some point we might even share the word boundary code, but for now we should probbably leave the Objective-C version on Macintosh in case it does something better than the ICU version.

review- because we need a patch that doesn't create two copies of the code.
Comment 3 Darin Adler 2006-10-05 08:30:56 PDT
Another thing I'm thinking is that the locale-determining code is probably in the wrong place. I actually don't know where the Locale().getLanguage() code is coming from, but wherever that is, that's where we should put currentTextBreakLocaleID or the equivalent.

Anyway, looks good.
Comment 4 Darin Adler 2006-10-05 10:55:41 PDT
Another thought:

If this code is at all hot, we don't want to keep creating and destroying UBreakIterator objects.

StringImpl.h has the getWordBreakIterator function, which returns a single global shared word break iterator. I suggest we create a BreakIterator.h/cpp set of files in platform/icu and use that in this code as well as in StringImpl.cpp (and remove getWordBreakIterator from StringImpl.h).
Comment 5 Hunter L. Williams 2006-10-05 11:05:31 PDT
Locale is a class provided by ICU. I used it to get the system language, since it provided it in a format that ubrk_open() could handle (ISO-639)

OK I'll look into that thanks Darin. 
Comment 6 Hunter L. Williams 2006-10-05 11:09:03 PDT
In StringImpl.cpp:

> ubrk_open(UBRK_WORD, "en_us", 0, 0, &status);

I'm guessing this should be ubrk_open(UBRK_WORD, Locale().getLanguage(), 0, 0, &status); ? ;-)

Question though - shouldn't word breaking on a page be defined by the language of the page, rather than the host OS? (Although that would assume the page had provided its language in the response header, and we had access to a response header abstraction at this point in the code). 
Comment 7 Hunter L. Williams 2006-10-05 11:28:53 PDT
Darin - there is no platform/icu directory, do you want me to create one with just BreakIterator.h/cpp in it or put those files in the platform directory?
Comment 8 Hunter L. Williams 2006-10-05 11:34:32 PDT
Created attachment 10933 [details]
more complete but still not done patch

Does the suggested code reshuffling. Not yet built on Mac, going to do that next (attaching so I can transfer it from machine to machine!)
Comment 9 Hunter L. Williams 2006-10-05 18:34:32 PDT
Created attachment 10937 [details]
patch for review

Here's a patch that implements what was suggested. This works on Windows, doesn't seem to regress anything per the regression tests on 10.4.7.
Comment 10 Hunter L. Williams 2006-10-05 18:38:42 PDT
Note: I changed the locale determination for non Mac platforms to use uloc_getDefault() instead of Locale().getLanguage(), since it seems that not all systems have the C++ header locid.h, and if icu ever becomes the canonical location on OS X for this information (instead of the plist it's read out of here) then it would probably be best to get the info from the same function call on all platforms. 
Comment 11 Darin Adler 2006-10-08 07:08:46 PDT
Comment on attachment 10937 [details]
patch for review

Looks great. A few nits:

We don't use relative paths in includes. So this:

+#include "icu/BreakIterator.h"

should just be:

#include "BreakIterator.h"

And we deal with the paths in the project setup.

The copyright in BreakIterator.h/cpp only needs to mention Apple and 2006. Use of ICU for word breaking came in 2006, so there's no need to copy the other copyrights from StringImpl.h since they didn't cover the break iterator. No way you could have known that, but should fix it.

+namespace WebCore {
+
+typedef void UBreakIterator;

UBreakIterator should be declared outside the WebCore namespace to match how it's defined in ICU. This declaration instead adds a new UBreakIterator inside the WebCore namespace.
Comment 12 Hunter L. Williams 2006-10-10 11:36:34 PDT
Created attachment 11020 [details]
patch, comments addressed
Comment 13 Darin Adler 2006-10-11 11:47:38 PDT
Comment on attachment 11020 [details]
patch, comments addressed

r=me
Comment 14 Maciej Stachowiak 2006-10-12 19:02:03 PDT
First off, this patch is a great change!

But I'm a little confused about what code goes where with this patch. It puts ICU specific code in the platform/ directory directly, while some other ICU-specific code is in platform/icu. Meanwhile there's also mac-specific code in platform/icu. Maybe that could go in platform/mac and the non-mac-related bits could be in #if !PLATFORM(MAC), although I guess that's a matter of taste. We do similar things in other cases though.

Probably all of the implementation should go in platform/icu unless there's some truly generic portion.
Comment 15 Hunter L. Williams 2006-10-13 13:37:56 PDT
Maciej, 

I think Darin's suggestion was that the platform/icu part wasn't all code that _used_ icu, but rather ICU utilities for code that does use it - in other words code to create the shared break iterator. 

The Mac specific code that's in there is solely to get the system breaking language. I could move that one function to a new .mm/.h file in platform/mac, but platform/icu/BreakIterator.cpp would still need to #include that new header #ifdef PLATFORM(MAC) ... (other platforms don't need to use that specific function since they just rely on ICU to tell what the breaking language is - if you think Mac should do that too, then this is simplified a lot and the Mac code goes away). 

The Mac specific word boundary code remains unchanged in platform/mac/TextBoundaries.mm

Comment 16 Mark Rowe (bdash) 2006-10-24 04:57:21 PDT
Does this patch need updated to address Maciej's comments, or is it fine to land as-is?
Comment 17 Hunter L. Williams 2006-12-04 18:52:42 PST
Sounds like there's some confusion here. Are there any more changes needed to at least get the Mac/cross platform parts landed? 
Comment 18 David Kilzer (:ddkilzer) 2006-12-05 09:45:53 PST
Darin & Maciej, please comment on Comment #15, Comment #16 and Comment #17.
Comment 19 Maciej Stachowiak 2007-01-26 03:48:12 PST
I thik this patch is fine but I am not sure it still applies.
Comment 20 David Kilzer (:ddkilzer) 2007-01-27 21:24:57 PST
Comment on attachment 11020 [details]
patch, comments addressed

Clearing darin's r+.

Sorry, this patch has bit-rotten.  Please resubmit it against the current tree.