WebKit Bugzilla
New
Browse
Log In
×
Sign in with GitHub
or
Remember my login
Create Account
·
Forgot Password
Forgotten password account recovery
RESOLVED FIXED
25321
RLE/LRE characters are removed from any text entered into a text box (either using a keyboard or by pasting)
https://bugs.webkit.org/show_bug.cgi?id=25321
Summary
RLE/LRE characters are removed from any text entered into a text box (either ...
Xiaomei Ji
Reported
2009-04-21 18:22:36 PDT
Steps to reproduce: 1.open the attached test file. 2.In the textarea box: a.Enter U-202B character (e.g., using Unicode Input available at
http://www.fileformat.info/tool/unicodeinput/index.htm
) b.Type some text in RTL language, then type some text in English What is the expected result? Text properly aligned (compare with e.g., Firefox), for example: DAVID שלום What happens instead? RLE not respected (no effect), for example: שלום DAVID ============================================================================= How to use the Unicode input: 1. Download Unicode input by click "here" link on
http://www.fileformat.info/tool/unicodeinput/index.htm
2. Run unicodeinput.exe 3. Focus the cursor in an input box 4. Hit Alt + GreyPlus 5. There should be an "Unicode Input" dialog pop up now 6. Type "202B" 7. Click "Send" button Or you can choose other methods in
http://www.fileformat.info/tip/microsoft/enter_unicode.htm
Attachments
test file
(99 bytes, text/html)
2009-04-21 18:23 PDT
,
Xiaomei Ji
no flags
Details
patch w/ layout test
(27.51 KB, patch)
2010-08-26 14:36 PDT
,
Xiaomei Ji
no flags
Details
Formatted Diff
Diff
patch w/ layout test
(27.46 KB, patch)
2010-08-26 15:09 PDT
,
Xiaomei Ji
darin
: review-
Details
Formatted Diff
Diff
patch w/ layout test
(75.36 KB, patch)
2010-09-09 09:52 PDT
,
Xiaomei Ji
no flags
Details
Formatted Diff
Diff
patch w/ layout test
(38.21 KB, patch)
2010-10-14 17:34 PDT
,
Xiaomei Ji
no flags
Details
Formatted Diff
Diff
patch w/ layout
(38.21 KB, patch)
2010-10-15 09:12 PDT
,
Xiaomei Ji
no flags
Details
Formatted Diff
Diff
patch w/ layout test
(70.65 KB, patch)
2010-10-21 18:21 PDT
,
Xiaomei Ji
no flags
Details
Formatted Diff
Diff
patch w/ layout test
(73.98 KB, patch)
2010-10-26 15:54 PDT
,
Xiaomei Ji
mitz: review+
Details
Formatted Diff
Diff
Show Obsolete
(6)
View All
Add attachment
proposed patch, testcase, etc.
Xiaomei Ji
Comment 1
2009-04-21 18:23:01 PDT
Created
attachment 29673
[details]
test file
Xiaomei Ji
Comment 2
2009-04-21 18:24:00 PDT
related chrome bug:
http://code.google.com/p/chromium/issues/detail?id=5034
mitz
Comment 3
2009-04-21 18:32:01 PDT
I think the bug is incorrectly--and too generally--titled. The issue appears to be only when the RLE is typed as the first character in an empty element. In that case, the next character that is typed replaces the RLE. In other cases (RLE typed in a non-empty element), the RLE persists. At any rate, RLE, LRE and PDF are respected when they are present. The issue here is that the typing operation removes them.
Xiaomei Ji
Comment 4
2009-05-29 14:38:19 PDT
(In reply to
comment #3
)
> I think the bug is incorrectly--and too generally--titled. The issue appears to > be only when the RLE is typed as the first character in an empty element. In > that case, the next character that is typed replaces the RLE. In other cases > (RLE typed in a non-empty element), the RLE persists. At any rate, RLE, LRE and > PDF are respected when they are present. The issue here is that the typing > operation removes them. >
Xiaolu and I both tried again, and looks like the issue happens when typing RLE into element, no matter whether the element is empty or not, no matter whether RLE is the first character typed in the element or not, the typed in RLE will not take affect. Although rendering an element with RLE in it is correct.
Uri Bernstein
Comment 5
2010-07-28 11:37:34 PDT
I can confirm that with
r64156
, RLE characters are simply stripped from any text entered into a text box (either using a keyboard or by pasting).
Xiaomei Ji
Comment 6
2010-07-30 17:21:42 PDT
When typing, looks like the RLE character is treated as insignificant character and removed. Inside InsertTextCommand::input(), there is the following line: deleteInsignificantText(startPosition.upstream(), startPosition.downstream()); When RLE is the previously entered character, startPosition.upstream() will return the position before RLE while downstream() returns the position after RLE, and what is in between (the RLE character) is removed inside the function, no matter whether RLE is the first character in the element or not.
Xiaomei Ji
Comment 7
2010-08-02 16:29:58 PDT
As to pasting lost RLE characters: the RLE character was lost during canonicalPosition() phase. When RLE is the first character on the copied string, canonicalizing on the position of RLE character will return its downstream position (the one after the RLE character). Stack is at the end of the post. I do not have idea on how to fix the problem. Since downstream and upstream are low-level and basic operations, I do not know whether that is where we should fix. Or the fix should be in some other layer, maybe rendering layer? Would appreciate if mitz could shed some lights. When pasting a string containing RLE in the middle, looks like the RLE character was not removed (verified by copying the pasted string to other editor, such as editbox in FireFox and windows notepad), but the string rendered as if the control character was removed, which is not correct either. For example, copy and past string "xU202By!" rendered "xy!" in above test file. chrome.dll!WebCore::VisiblePosition::canonicalPosition(const WebCore::Position & position={...}) Line 461 C++ chrome.dll!WebCore::VisiblePosition::init(const WebCore::Position & position={...}, WebCore::EAffinity affinity=DOWNSTREAM) Line 61 + 0x10 bytes C++ chrome.dll!WebCore::VisiblePosition::VisiblePosition(const WebCore::Position & pos={...}, WebCore::EAffinity affinity=DOWNSTREAM) Line 49 C++ chrome.dll!WebCore::VisibleSelection::setBaseAndExtentToDeepEquivalents() Line 244 + 0x17 bytes C++ chrome.dll!WebCore::VisibleSelection::validate(WebCore::TextGranularity granularity=CharacterGranularity) Line 407 C++ chrome.dll!WebCore::VisibleSelection::VisibleSelection(const WebCore::Position & base={...}, const WebCore::Position & extent={...}, WebCore::EAffinity affinity=DOWNSTREAM) Line 65 C++ chrome.dll!WebCore::VisibleSelection::selectionFromContentsOfNode(WebCore::Node * node=0x082c49b0) Line 93 + 0x2c bytes C++ chrome.dll!WebCore::ReplacementFragment::ReplacementFragment(WebCore::Document * document=0x07bdd000, WebCore::DocumentFragment * fragment=0x07b784c0, bool matchStyle=true, const WebCore::VisibleSelection & selection={...}) Line 155 + 0x1c bytes C++ chrome.dll!WebCore::ReplaceSelectionCommand::doApply() Line 793 C++ chrome.dll!WebCore::EditCommand::apply() Line 91 + 0xf bytes C++ chrome.dll!WebCore::applyCommand(WTF::PassRefPtr<WebCore::EditCommand> command={...}) Line 213 C++ chrome.dll!WebCore::Editor::replaceSelectionWithFragment(WTF::PassRefPtr<WebCore::DocumentFragment> fragment={...}, bool selectReplacement=false, bool smartReplace=false, bool matchStyle=true) Line 323 + 0x45 bytes C++ chrome.dll!WebCore::Editor::replaceSelectionWithText(const WebCore::String & text={...}, bool selectReplacement=false, bool smartReplace=false) Line 329 + 0x38 bytes C++ chrome.dll!WebCore::Editor::pasteAsPlainTextWithPasteboard(WebCore::Pasteboard * pasteboard=0x07c07a98) Line 290 C++ chrome.dll!WebCore::Editor::paste() Line 1082 C++ .................
Xiaomei Ji
Comment 8
2010-08-26 14:36:03 PDT
Created
attachment 65619
[details]
patch w/ layout test I am not sure whether this is the right way to fix it. Use it as a starter.
WebKit Review Bot
Comment 9
2010-08-26 14:38:48 PDT
Attachment 65619
[details]
did not pass style-queue: Failed to run "['WebKitTools/Scripts/check-webkit-style']" exit_code: 1 WebCore/dom/Position.cpp:485: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] WebCore/dom/Position.cpp:578: Boolean expressions that span multiple lines should have their operators on the left side of the line instead of the right side. [whitespace/operators] [4] Total errors found: 2 in 7 files If any of these errors are false positives, please file a bug against check-webkit-style.
Xiaomei Ji
Comment 10
2010-08-26 15:09:37 PDT
Created
attachment 65626
[details]
patch w/ layout test fix style error.
Darin Adler
Comment 11
2010-08-29 12:43:11 PDT
Comment on
attachment 65626
[details]
patch w/ layout test
> +static bool unicodeBiDiControlCharacter(UChar ch)
This function name needs the word "is" in it, because it does not return a "Unicode bi di control character". It also makes no sense to capitalize the "D" in "bidi", which is short for "bidirectional".
> + if (textOffset == box->start() + box->len() + 1 && unicodeBiDiControlCharacter((static_cast<Text*>(currentNode))->data()[textOffset - 1])) > + return currentPos;
This code needs a why comment. Why doe this one particular function need a special case for Unicode bidirectional control characters. Aren't the other characters that need this sort of special handling? What makes these bidirectional control characters so important.
> // Insert the character at the leftmost candidate. > + startPosition = startPosition.downstream(); > startPosition = startPosition.upstream();
This code needs a why comment. Otherwise someone would just remove the call to downstream. These issues are enough for a review-, but I’d also like Dan Bernstein to comment on the appropriate way of fixing this.
Xiaomei Ji
Comment 12
2010-08-30 14:53:25 PDT
Thanks for the review! It makes me realized that this is not the right fix because upstream/downstream are not symmetric when handling unicode bidi control characters. I need to think it more and would appreciate if Dan could comment on the appropriate way of fixing this. (In reply to
comment #11
)
> (From update of
attachment 65626
[details]
) > > +static bool unicodeBiDiControlCharacter(UChar ch) > > This function name needs the word "is" in it, because it does not return a "Unicode bi di control character". It also makes no sense to capitalize the "D" in "bidi", which is short for "bidirectional". > > > + if (textOffset == box->start() + box->len() + 1 && unicodeBiDiControlCharacter((static_cast<Text*>(currentNode))->data()[textOffset - 1])) > > + return currentPos; > > This code needs a why comment. Why doe this one particular function need a special case for Unicode bidirectional control characters. Aren't the other characters that need this sort of special handling? What makes these bidirectional control characters so important. > > > // Insert the character at the leftmost candidate. > > + startPosition = startPosition.downstream(); > > startPosition = startPosition.upstream(); > > This code needs a why comment. Otherwise someone would just remove the call to downstream. > > These issues are enough for a review-, but I’d also like Dan Bernstein to comment on the appropriate way of fixing this.
Xiaomei Ji
Comment 13
2010-09-09 09:52:34 PDT
Created
attachment 67049
[details]
patch w/ layout test 1. I've tried to change Position::upstream(), Position::downstream(), and Position::isCandidate() universally, but it breaks editing/selection/home-end (test case 16) and editing/selection/extend-selection (test case 17) tests, where the Unicode bidi control characters are the first characters in the element. In those situation, we should allow the control characters to be ignored. So, I am adding a flag to differentiate the insert/cut/copy/paste operations from others. 2. Currently, the patch only fixes problem in insertion and paste. Since cut/copy does not work correctly, the result of the paste in the test case is actually wrong (although paste itself works correctly). If the approach is the right one, I will fix the problem in cut/copy as well.
Ryosuke Niwa
Comment 14
2010-10-12 16:18:11 PDT
Comment on
attachment 67049
[details]
patch w/ layout test Thanks for tackling this bug! Here are some preliminary feedbacks.
> WebCore/dom/Position.cpp:556 > + if (renderer->isText() && mode == PreserveUnicodeBidiControlCharacter) {
Can we do: if (!renderer->isText()) continue; instead of checking renderer->isText() here
> WebCore/dom/Position.cpp:557 > + unsigned textOffset = currentPos.offsetInLeafNode();
There's a code below that does the exact same thing. Only difference being checking whether or not there's rendered text. Can we somehow combine these two code?
> WebCore/dom/Position.cpp:567 > if (renderer->isText() && toRenderText(renderer)->firstTextBox()) {
and here? You can file a separate bug to do this cleanup first to make the patch smaller.
> WebCore/dom/Position.cpp:693 > + if (renderer->isText() && mode == PreserveUnicodeBidiControlCharacter) {
Same comment goes here.
> WebCore/dom/Position.cpp:792 > + if (!nodeIsUserSelectNone(node())) { > + bool rendered = inRenderedText(); > + // In the case of PreserveUnicodeBidiControlCharacter, consider a position begins with unicode bidi control > + // character as a candidate as well. > + return mode == PreserveUnicodeBidiControlCharacter ? rendered || startWithUnicodeBidiControlCharacter() : rendered; > + } > + return false;
(I think) you can simplify this as: return !nodeIsUserSelectNone(node()) && (inRenderedText() || (mode == PreserveUnicodeBidiControlCharacter ? startWithUnicodeBidiControlCharacter() : true))
> WebCore/dom/Position.cpp:813 > +bool Position::startWithUnicodeBidiControlCharacter() const
This function name does not seem to match what the function does.
> WebCore/dom/Position.cpp:830 > + for (int i = m_offset; i < static_cast<int>(static_cast<Text*>(node())->length()); ++i) { > + UChar ch = static_cast<Text*>(node())->data()[i]; > + if (!isBidiControlCharacter(ch)) > + return false; > + } > + return true;
e.g. this code seems to return true only if everything in the text box was bidi control character, which doesn't match what the function name implies. Could you clarify what you're trying to do in this function?
> WebCore/dom/Position.cpp:858 > + // Check against unicode bidi control characters after text box end. > + if (m_offset > static_cast<int>(box->start() + box->len())) { > + int i = static_cast<int>(box->start() + box->len()); > + for (; i < m_offset; i++) { > + UChar ch = static_cast<Text*>(node())->data()[i]; > + if (!isBidiControlCharacter(ch)) > + break; > + } > + if (i == m_offset) > + return true; > + }
I don't understand why we need to check the control characters after the text box. Is that because control characters after the text box influence whatever proceeds it? Could you clarify?
> WebCore/dom/Position.h:70 > + enum EditingMode {
I think EditingMode is very general name, and can easily be confused with editing behavior and other things. Please rename it to something more specific like UnicodeBidiControlCharacterPreservationMode or BidiControlCharacterMode if you think the formar is too long.
> WebCore/dom/Position.h:164 > + Position upstream(EditingBoundaryCrossingRule = CannotCrossEditingBoundary, EditingMode = NotPreserveUnicodeBidiControlCharacter) const; > + Position downstream(EditingBoundaryCrossingRule = CannotCrossEditingBoundary, EditingMode = NotPreserveUnicodeBidiControlCharacter) const;
Is there a case where we don't want to preserve the bidi control character? i.e. are there cases where we intentionally remove them?
> WebCore/editing/VisiblePosition.cpp:1 > +
You probably didn't intend to insert a blank line here, did you?
> WebCore/editing/VisiblePosition.cpp:48 > +VisiblePosition::VisiblePosition(const Position &pos, EAffinity affinity, Position::EditingMode editMode) > + : m_editMode(editMode)
Again, editMode sounds too general.
> WebCore/editing/VisibleSelection.cpp:171 > + s = m_start.downstream(Position::CannotCrossEditingBoundary, m_editMode); > + e = m_end.upstream(Position::CannotCrossEditingBoundary, m_editMode);
You may consider changing the order of EditingBoundaryCrossingRule and EditingMode (or whatever you rename to) so that you don't have to keep repeating CannotCrossEditingBoundary here and there. As far as I searched, only RenderObject::createVisiblePosition calls upstream / downstream with CanCrossEditingBoundary.
> LayoutTests/editing/inserting/insert-paste-bidi-control.html:89 > +runEditingTest();
Do we need editing delegates dump & rendering results for this test? If we're only concerned about bidi control characters being preserved, then we can use dump-as-text test. i.e. you call runDumpAsTextEditingTest() here instead of the regular runEditingTest. If that's not sufficient, then you can add a little script above to check the existence of bidi control characters and print PASS/FAIL.
Xiaomei Ji
Comment 15
2010-10-14 17:31:58 PDT
Thanks for the review! Please see my replies inlined. (In reply to
comment #14
)
> (From update of
attachment 67049
[details]
) > Thanks for tackling this bug! Here are some preliminary feedbacks. > > > WebCore/dom/Position.cpp:556 > > + if (renderer->isText() && mode == PreserveUnicodeBidiControlCharacter) { > > Can we do: > if (!renderer->isText()) > continue; > instead of checking renderer->isText() here
Done.
> > > WebCore/dom/Position.cpp:557 > > + unsigned textOffset = currentPos.offsetInLeafNode(); > > There's a code below that does the exact same thing. Only difference being checking whether or not there's rendered text. Can we somehow combine these two code?
Combined.
> > > WebCore/dom/Position.cpp:567 > > if (renderer->isText() && toRenderText(renderer)->firstTextBox()) { > > and here? You can file a separate bug to do this cleanup first to make the patch smaller.
Done.
> > > WebCore/dom/Position.cpp:693 > > + if (renderer->isText() && mode == PreserveUnicodeBidiControlCharacter) { > > Same comment goes here.
Done.
> > > WebCore/dom/Position.cpp:792 > > + if (!nodeIsUserSelectNone(node())) { > > + bool rendered = inRenderedText(); > > + // In the case of PreserveUnicodeBidiControlCharacter, consider a position begins with unicode bidi control > > + // character as a candidate as well. > > + return mode == PreserveUnicodeBidiControlCharacter ? rendered || startWithUnicodeBidiControlCharacter() : rendered; > > + } > > + return false; > > (I think) you can simplify this as: > return !nodeIsUserSelectNone(node()) && (inRenderedText() > || (mode == PreserveUnicodeBidiControlCharacter ? startWithUnicodeBidiControlCharacter() : true))
"true" should be changed to "false". Changed to: return !nodeIsUserSelectNone(node()) && (inRenderedText() || mode == PreserveUnicodeBidiControlCharacter && startWithUnicodeBidiControlCharacter())
> > > WebCore/dom/Position.cpp:813 > > +bool Position::startWithUnicodeBidiControlCharacter() const > > This function name does not seem to match what the function does.
It checks whether the current position starts with continuous unicode bidi control characters. It has 3 scenarios that I will explain in detail below.
> > > WebCore/dom/Position.cpp:830 > > + for (int i = m_offset; i < static_cast<int>(static_cast<Text*>(node())->length()); ++i) { > > + UChar ch = static_cast<Text*>(node())->data()[i]; > > + if (!isBidiControlCharacter(ch)) > > + return false; > > + } > > + return true; > > e.g. this code seems to return true only if everything in the text box was bidi control character, which doesn't match what the function name implies. Could you clarify what you're trying to do in this function?
This is the first scenario: Unicode control characters are the only characters entered in a content editable area. At this time, there is no text node yet (since unicode control characters are not included in text node). This happens when you enter a unicode control character as the first character in a editing area (and when you continuing entering unicode control characters only). The 2nd scenario is unicode control characters entered between 2 text boxes. For example, abc + Control_Characters + def. When the position is at Control_Characters, it is before text box "def", if the characters between position and "d" are all unicode control characters, the position is considered as a valid candidate position.
> > > WebCore/dom/Position.cpp:858 > > + // Check against unicode bidi control characters after text box end. > > + if (m_offset > static_cast<int>(box->start() + box->len())) { > > + int i = static_cast<int>(box->start() + box->len()); > > + for (; i < m_offset; i++) { > > + UChar ch = static_cast<Text*>(node())->data()[i]; > > + if (!isBidiControlCharacter(ch)) > > + break; > > + } > > + if (i == m_offset) > > + return true; > > + } > > I don't understand why we need to check the control characters after the text box. Is that because control characters after the text box influence whatever proceeds it? Could you clarify?
This is the 3rd scenario, it covers when Unicode control characters are the ending characters in editing area (I just realized that it might have overlap with the 2nd scenario. It might be enough to run this only after the end of the last text box). For example, abc + Control_Characters, if position is at control_characters, if the characters between "c" and position are all unicode control characters, that position should be considered as valid too. I am open to the suggestion of a good function name.
> > > WebCore/dom/Position.h:70 > > + enum EditingMode { > > I think EditingMode is very general name, and can easily be confused with editing behavior and other things. Please rename it to something more specific like UnicodeBidiControlCharacterPreservationMode or BidiControlCharacterMode if you think the formar is too long.
changed to UnicodeBidiControlCharacterPreservationMode
> > > WebCore/dom/Position.h:164 > > + Position upstream(EditingBoundaryCrossingRule = CannotCrossEditingBoundary, EditingMode = NotPreserveUnicodeBidiControlCharacter) const; > > + Position downstream(EditingBoundaryCrossingRule = CannotCrossEditingBoundary, EditingMode = NotPreserveUnicodeBidiControlCharacter) const; > > Is there a case where we don't want to preserve the bidi control character? i.e. are there cases where we intentionally remove them?
upstream() and downstream() (or their callers) are used in many places. In other non-editing mode, I think they should still be ignored. One case to ignore bidi control character is for caret movement. If the caret is at the beginning of a block element containing unicode bidi control character as the first character, move caret to right should ignore unicode control character and move caret after the first visible character. Please refer to editing/selection/home-end (test case 16) and editing/selection/extend-selection (test case 17) for detail.
> > > WebCore/editing/VisiblePosition.cpp:1 > > + > > You probably didn't intend to insert a blank line here, did you?
You are right. removed.
> > > WebCore/editing/VisiblePosition.cpp:48 > > +VisiblePosition::VisiblePosition(const Position &pos, EAffinity affinity, Position::EditingMode editMode) > > + : m_editMode(editMode) > > Again, editMode sounds too general.
Changed to bidiPreservationMode.
> > > WebCore/editing/VisibleSelection.cpp:171 > > + s = m_start.downstream(Position::CannotCrossEditingBoundary, m_editMode); > > + e = m_end.upstream(Position::CannotCrossEditingBoundary, m_editMode); > > You may consider changing the order of EditingBoundaryCrossingRule and EditingMode (or whatever you rename to) so that you don't have to keep repeating CannotCrossEditingBoundary here and there. As far as I searched, only RenderObject::createVisiblePosition calls upstream / downstream with CanCrossEditingBoundary.
Good suggestion. Done.
> > > LayoutTests/editing/inserting/insert-paste-bidi-control.html:89 > > +runEditingTest(); > > Do we need editing delegates dump & rendering results for this test? If we're only concerned about bidi control characters being preserved, then we can use dump-as-text test. i.e. you call runDumpAsTextEditingTest() here instead of the regular runEditingTest. If that's not sufficient, then you can add a little script above to check the existence of bidi control characters and print PASS/FAIL.
Changed to dumpAsText. But removed the test for <textarea> since <textarea> is not part of the DOM tree (user can not traverse to it from root). Typing into a text area does not alter the DOM, so the result is not showed up in dumpAsText() (You can not get the text entered in <textarea> using innerText of innerHTML).
Xiaomei Ji
Comment 16
2010-10-14 17:34:33 PDT
Created
attachment 70808
[details]
patch w/ layout test updated per Ryosuke's feedback.
Early Warning System Bot
Comment 17
2010-10-14 17:58:31 PDT
Attachment 70808
[details]
did not build on qt: Build output:
http://queues.webkit.org/results/4379041
Csaba Osztrogonác
Comment 18
2010-10-14 23:17:41 PDT
(In reply to
comment #17
)
>
Attachment 70808
[details]
did not build on qt: > Build output:
http://queues.webkit.org/results/4379041
../../../WebCore/dom/Position.cpp: In member function 'bool WebCore::Position::isCandidate(WebCore::Position::UnicodeBidiControlCharacterPreservationMode) const': ../../../WebCore/dom/Position.cpp:794: error: suggest parentheses around && within || Could you fix this build error to make our bot happy?
Xiaomei Ji
Comment 19
2010-10-15 09:12:30 PDT
Created
attachment 70872
[details]
patch w/ layout fix QT compilation error.
mitz
Comment 20
2010-10-15 13:22:15 PDT
<
http://trac.webkit.org/browser/trunk/WebCore/platform/text/BidiResolver.h?rev=61921#L909
> suggests that the reason embedding characters are omitted from the BidiRuns, and consequently from the InlineTextBoxes, is their effect on rendering complex text with ATSUI. Perhaps this bug can be resolved with two localized changes and without touching so much code, by (1) removing the aforementioned BidiResolver code that omits over Unicode control characters from BidiRuns and (2) patching the ATSUI-based implementation of the ComplexTextRun constructor to check for a leading embedding character and remove it if necessary (analogously to how it adds embedding characters for implicit embedding).
mitz
Comment 21
2010-10-15 13:23:19 PDT
I think you can easily try doing just (1) and seeing if it resolves the bug. Doing (2) would be necessary just in order to prevent a regression on builds that use ATSUI.
Xiaomei Ji
Comment 22
2010-10-19 11:11:59 PDT
Hi Mitz, Thanks for the valuable suggestion! I tried the change per your suggestion, which solves the bug. But it has side-effect on caret-movement: unicode control characters might be considered as a character in caret movement. For example: <div contenteditable>start ‫car דהו אבג.‬ end</div> The visual display looks like "...... car end". You need to press right-arrow 4 times to move the caret from before 'r' to after 'e'. And it takes 4 left-arrow operations to move caret from after 'e' to before 'r'. Firefox has similar issue: it takes 4 right-arrow operations to move caret from before 'r' to after 'e. And it takes 3 right-arrow operations to move caret from before ' car' to after 'c'. But left-arrow movement is ok (control characters are ignored). Index: BidiResolver.h =================================================================== --- BidiResolver.h (revision 69971) +++ BidiResolver.h (working copy) @@ -875,6 +875,7 @@ last = current; +/* if (emptyRun && !(dirCurrent == RightToLeftEmbedding || dirCurrent == LeftToRightEmbedding || dirCurrent == RightToLeftOverride @@ -883,7 +884,14 @@ sor = current; emptyRun = false; } +*/ + + if (emptyRun) { + sor = current; + emptyRun = false; + } + increment(); if (!m_currentExplicitEmbeddingSequence.isEmpty()) { commitExplicitEmbedding(); @@ -901,6 +909,7 @@ } } +/* if (emptyRun && (dirCurrent == RightToLeftEmbedding || dirCurrent == LeftToRightEmbedding || dirCurrent == RightToLeftOverride @@ -911,6 +920,7 @@ last = current; sor = current; } +*/ if (!pastEnd && (current == end || current.atEnd())) { if (emptyRun)
mitz
Comment 23
2010-10-19 11:20:18 PDT
(In reply to
comment #22
)
> Hi Mitz, > > Thanks for the valuable suggestion! > > I tried the change per your suggestion, which solves the bug. > But it has side-effect on caret-movement: unicode control characters might be considered as a character in caret movement. > > For example: > <div contenteditable>start ‫car דהו אבג.‬ end</div> > > The visual display looks like "...... car end". > You need to press right-arrow 4 times to move the caret from before 'r' to after 'e'. And it takes 4 left-arrow operations to move caret from after 'e' to before 'r'.
This is a progression. Being able to use the arrow keys (and other mechanisms) to place the caret on either side of the control characters is useful for editing around them, and is also consistent with what TextEdit does on Mac OS X. I imagine it is also consistent with Microsoft Word, but I haven’t tested this.
Xiaomei Ji
Comment 24
2010-10-19 11:54:39 PDT
(In reply to
comment #23
)
> (In reply to
comment #22
) > > Hi Mitz, > > > > Thanks for the valuable suggestion! > > > > I tried the change per your suggestion, which solves the bug. > > But it has side-effect on caret-movement: unicode control characters might be considered as a character in caret movement. > > > > For example: > > <div contenteditable>start ‫car דהו אבג.‬ end</div> > > > > The visual display looks like "...... car end". > > You need to press right-arrow 4 times to move the caret from before 'r' to after 'e'. And it takes 4 left-arrow operations to move caret from after 'e' to before 'r'. > > This is a progression. Being able to use the arrow keys (and other mechanisms) to place the caret on either side of the control characters is useful for editing around them, and is also consistent with what TextEdit does on Mac OS X. I imagine it is also consistent with Microsoft Word, but I haven’t tested this.
Yes, you are right. it is the same in Window's notepad. Thanks for the quick reply!
Xiaomei Ji
Comment 25
2010-10-21 18:21:12 PDT
Created
attachment 71517
[details]
patch w/ layout test 2 questions on ATSUI: 1. inside ComplexTextController::ComplexTextRun::ComplexTextRun() in ComplexTextControllerATSUI.cpp, there is the following line: status = ATSUGetGlyphBounds(atsuTextLayout, 0, 0, 0, m_stringLength, kATSUseFractionalOrigins, 0, 0, &boundsCount); Since there is ATSUTextInserted() and newly added ATSUTextDeleted(), should it be: status = ATSUGetGlyphBounds(atsuTextLayout, 0, 0, 0, kATSUToTextEnd, kATSUseFractionalOrigins, 0, 0, &boundsCount); ? 2. I am running on snow leopard, how do I test ATSUI?
Jeremy Moskovich
Comment 26
2010-10-22 03:25:18 PDT
xji: Chromium on Mac compiles with both ATSUI & Core Text support enabled and we toggle them at runtime. We do this since unlike Safari we ship a single binary for both 10.5 & 10.6. To test ATSUI on 10.6 you just need to find the place where we perform the runtime check and change it to always use ATSUI. I think the runtime check is in platform/cocoa's SimpleFontData or platform/chromium's SimpleFontDataChromiumMac , I'll take a closer look on Sunday when I'm back at work and get back to you about the exact placement if you don't find it.
Xiaomei Ji
Comment 27
2010-10-26 15:54:02 PDT
Created
attachment 71955
[details]
patch w/ layout test I tested ATSUI part in os x 10.5. But I did not find any difference in the rendered result (and in caret movement) with and without the changes in ATSUI code.
mitz
Comment 28
2010-11-05 14:18:25 PDT
Comment on
attachment 71955
[details]
patch w/ layout test View in context:
https://bugs.webkit.org/attachment.cgi?id=71955&action=review
> WebCore/platform/graphics/mac/ComplexTextControllerATSUI.cpp:33 > // ATSUTextInserted() is SPI in 64-bit.
I guess you should amend this comment now.
Xiaomei Ji
Comment 29
2010-11-08 13:36:32 PST
Committed
r71566
: <
http://trac.webkit.org/changeset/71566
>
WebKit Review Bot
Comment 30
2010-11-08 16:28:02 PST
http://trac.webkit.org/changeset/71566
might have broken GTK Linux 64-bit Debug
Xiaomei Ji
Comment 31
2010-11-08 17:14:02 PST
Committed
r71588
: <
http://trac.webkit.org/changeset/71588
>
Xiaomei Ji
Comment 32
2010-11-08 17:17:59 PST
r71588
is a rebaseline for GTK. Since I added the bug number, it auto-generated Dan as reviewer although he did not review it. There will be rebaseline for other platforms as well, such as chromium.
Xiaomei Ji
Comment 33
2010-11-09 16:35:39 PST
The change breaks bidi-control-chars-treated-as-ZWS.html in leopard, and I am investigating.
Note
You need to
log in
before you can comment on or make changes to this bug.
Top of Page
Format For Printing
XML
Clone This Bug