Now that the HTML5 tokenizer appears to be sticking, it's time to start working on the tree building algorithm. This is a master bug that will accumulate dependencies on the individual patches. We're going to try to re-use as much code from the LegacyHTMLTreeBuilder as possible.
Created attachment 60603 [details] current html5lib test failure diff We're close to "parity" in the sense that we have roughly the same number of test progressions as test failures. The + lines are new failures. The - lines are test progressions. (Of course, this metric doesn't account for the importance of each test.)
It's also possible to create the attached diff in trac directly: http://trac.webkit.org/changeset?old_path=/trunk/LayoutTests/html5lib/runner-expected.txt&old=62584&new_path=/trunk/LayoutTests/html5lib/runner-expected-html5.txt&new=62584 (You have to update the revisions every time though.)
Created attachment 60993 [details] current html5lib test failure diff Once we land all the patches that are pending-review and pending-commit, we'll be down to one (or possibly two) regressions on the HTML5 lib test suite (and a massive number of progressions). The main things left to implement are the following: 1) <textarea> isn't properly ignoring the next token if it's a newline. 2) There's some bug in our reconstruction of formatting elements after tables. 3) We don't handle foreign content (i.e., SVG-in-HTML and MathML-in-HTML). Currently, we appear to be failing almost all the LayoutTests, so there's a bunch of work still left to do once we finishing the HTML5lib test suite. :)
Oh, I forgot. We still need to do fragment parsing.
We've triaged all the remaining LayoutTest failures: https://spreadsheets.google.com/ccc?key=0AlC4tS7Ao1fIdEo0SFdLaVpiclBHMVNQcHlTenV5TEE&hl=en There are about 150 failures, but only a handful a bugs. Some of the popular items for different test results are the insertion of fake colgroup elements and the insertion of whitespace nodes between </head> and <body>. There are certainly some real bugs in there that need to be fixed, however.
I think we're ready to try turning this on. The bugs we've been fixing recently aren't really blocking issues anymore.
I measured the PLT and the new tree builder was a 1% win. Testing r64159.
HTML5 tree builder enabled in http://trac.webkit.org/changeset/64712