Bug 25822 - DOM normalize does not remove empty text node between element nodes
Summary: DOM normalize does not remove empty text node between element nodes
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: Macintosh OS X 10.5
: P2 Normal
Assignee: Darin Adler
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-05-15 07:52 PDT by Kai Brüning
Modified: 2009-05-18 14:16 PDT (History)
0 users

See Also:


Attachments
Test case for this bug (884 bytes, text/html)
2009-05-15 07:53 PDT, Kai Brüning
no flags Details
Patch to fix this bug (5.59 KB, patch)
2009-05-15 09:23 PDT, Kai Brüning
darin: review+
Details | Formatted Diff | Diff
Path with suggested code changes (6.07 KB, patch)
2009-05-15 13:51 PDT, Kai Brüning
darin: review+
Details | Formatted Diff | Diff
Patch with more thorough set of test cases (11.17 KB, patch)
2009-05-17 06:54 PDT, Kai Brüning
darin: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kai Brüning 2009-05-15 07:52:25 PDT
The DOM node normalize() function does not remove empty text nodes which are not adjacent to other text nodes nor the only child node.
That is, empty text nodes between element nodes or at the start or end of the child list are not removed.
This is clearly in contradiction to the DOM level 2 core specification.
Note also that it works in FireFox (tested with 3.0.10).
A test case is attached, a patch forthcoming.
Comment 1 Kai Brüning 2009-05-15 07:53:19 PDT
Created attachment 30384 [details]
Test case for this bug
Comment 2 Kai Brüning 2009-05-15 09:23:19 PDT
Created attachment 30390 [details]
Patch to fix this bug

I do not have a Tiger machine to run layout tests, but I made sure that the set of failed layout tests under Leopard did not change by the patch.
Comment 3 Darin Adler 2009-05-15 10:26:11 PDT
Comment on attachment 30390 [details]
Patch to fix this bug

> -    for (; node; node = node->traverseNextNodePostOrder()) {
> +    for (; node; ) {

Should be changed a while, then.

> +        } else {
> +            node = node->traverseNextNodePostOrder();

WebKit style doesn't use braces around single lines. Typically we'd use "early continue" in a case like this:

    if (type != TEXT_NODE) {
        node = node->traverseNextNodePostOrder();
        continue;
    }

I'd really like to see a more thorough set of test cases. Adding a test for only the single case that shows the current bug seems too minimal.

r=me as is -- none of my requested changes are mandatory
Comment 4 Kai Brüning 2009-05-15 13:51:35 PDT
Created attachment 30400 [details]
Path with suggested code changes

I integrated the code-level suggestions of Darin.
Adding more test will take a little longer; guess I expected tests for the other cases to already exist.
Should this be one file per tested case or is it ok to combine several cases into one test file?
Comment 5 Kai Brüning 2009-05-15 13:56:36 PDT
One additional remark: I am a little puzzled by the fact that the original code carefully and specifically handled the "single empty text node child" case, but ignored the mixed empty text node/elements case. I just would feel better knowing the rationale behind that code.
Comment 6 Darin Adler 2009-05-15 14:15:23 PDT
(In reply to comment #4)
> Should this be one file per tested case or is it ok to combine several cases
> into one test file?

A single test file is better. The key to making a good test is making sure you can understand what its testing and see what's failing and what's working.
Comment 7 Darin Adler 2009-05-15 14:16:11 PDT
(In reply to comment #5)
> One additional remark: I am a little puzzled by the fact that the original code
> carefully and specifically handled the "single empty text node child" case, but
> ignored the mixed empty text node/elements case. I just would feel better
> knowing the rationale behind that code.

You can look at the ChangeLog to see if there are any comments that make it clear. I doubt there was any rationale -- it was probably a simple mistake. I trust test cases more than I trust code.
Comment 8 Kai Brüning 2009-05-17 06:54:19 PDT
Created attachment 30421 [details]
Patch with more thorough set of test cases

I extended the test case to (hopefully) test the full range of the normalize() functionality. To express this, I renamed it to "normalize.html". The test lives inside "dom", therefore I consider the term "normalize" well defined. Please advice if a longer name is necessary.

Having these tests and after learning that document->textNodesMerged() does adjust DOMRanges to the changed text nodes and nothing more, I changed the code of Node::normalize() a little more to make it more concise and less historic.

Concerning rationale of the old code, I checked the ChangeLogs and found the entry about the addition of the "single empty text node child" case. It is indeed rather inexpressively, just stating the addition of this case.
Comment 9 Darin Adler 2009-05-17 08:51:13 PDT
Comment on attachment 30421 [details]
Patch with more thorough set of test cases

> +        // Merge non-empty text nodes.
> +        while (1) {
> +            Node* nextSibling = node->nextSibling();
> +            if (!nextSibling || !nextSibling->isTextNode()) {
> +                node = node->traverseNextNodePostOrder();
> +                break;
> +            }

I think it would be clearer to have the node = node->traverseNextNodePostOrder() line outside the if statement. I also would prefer to have a while condition instead of just while (1).

Here's how I'd write it:

    while (Node* nextSibling = node->nextSibling()) {
        if (!nextSiblng->isTextNode())
            break;
        ...
    }

    node = node->traverseNextNodePostOrder();

Patch looks good, r=me even without that change.
Comment 10 Darin Adler 2009-05-17 09:02:47 PDT
I'll take care of landing this. I think I'll tweak the test a bit.
Comment 11 Kai Brüning 2009-05-17 11:11:22 PDT
> I'll take care of landing this. I think I'll tweak the test a bit.

Great. Please tweak the code, too, as you suggested. I agree that both changes make it clearer.
Comment 12 Darin Adler 2009-05-18 14:16:27 PDT
http://trac.webkit.org/changeset/43809