Bug 41123 - MASTER: WebKit needs an HTML5 tree builder
Summary: MASTER: WebKit needs an HTML5 tree builder
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebCore Misc. (show other bugs)
Version: 528+ (Nightly build)
Hardware: All All
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on: 41124 41126 41131 41133 41183 41184 41189 41191 41225 41232 41239 41257 41262 41263 41264 41265 41271 41272 41273 41276 41277 41293 41306 41314 41316 41317 41319 41324 41335 41337 41344 41399 41402 41405 41433 41436 41439 41440 41448 41453 41500 41501 41502 41503 41505 41555 41556 41557 41558 41559 41560 41561 41582 41587 41588 41590 41591 41623 41627 41646 41647 41650 41652 41654 41656 41659 41660 41663 41671 41684 41688 41716 41720 41728 41729 41731 41733 41734 41735 41736 41739 41740 41741 41743 41744 41751 41752 41754 41756 41778 41807 41812 41838 41907 41916 41921 41922 41936 41939 41940 41942 41943 41946 41947 41949 41991 41998 42000 42002 42022 42023 42050 42059 42096 42106 42133 42138 42143 42187 42199 42222 42233 42235 42238 42285 42294 42312 42314 42346 42347 42348 42349 42351 42404 42431 42590 42594 42599 42604 42644 42654 42668 42708 42713 42725 42727 42728 42731 42773 42776 42791 42792 42794 42804 42870 42875 42877 42879 42880 42881 42948 42950 42951 42952 43072 43073 43075
Blocks: 32934 html5test 18415
  Show dependency treegraph
 
Reported: 2010-06-23 19:04 PDT by Adam Barth
Modified: 2010-08-05 00:16 PDT (History)
10 users (show)

See Also:


Attachments
current html5lib test failure diff (1.81 KB, patch)
2010-07-06 01:25 PDT, Adam Barth
no flags Details | Formatted Diff | Diff
current html5lib test failure diff (1.86 KB, patch)
2010-07-08 18:40 PDT, Adam Barth
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Adam Barth 2010-06-23 19:04:15 PDT
Now that the HTML5 tokenizer appears to be sticking, it's time to start working on the tree building algorithm.  This is a master bug that will accumulate dependencies on the individual patches.  We're going to try to re-use as much code from the LegacyHTMLTreeBuilder as possible.
Comment 1 Adam Barth 2010-07-06 01:25:09 PDT
Created attachment 60603 [details]
current html5lib test failure diff

We're close to "parity" in the sense that we have roughly the same number of test progressions as test failures.  The + lines are new failures.  The - lines are test progressions.  (Of course, this metric doesn't account for the importance of each test.)
Comment 2 Eric Seidel (no email) 2010-07-06 13:17:03 PDT
It's also possible to create the attached diff in trac directly:
http://trac.webkit.org/changeset?old_path=/trunk/LayoutTests/html5lib/runner-expected.txt&old=62584&new_path=/trunk/LayoutTests/html5lib/runner-expected-html5.txt&new=62584
(You have to update the revisions every time though.)
Comment 3 Adam Barth 2010-07-08 18:40:40 PDT
Created attachment 60993 [details]
current html5lib test failure diff

Once we land all the patches that are pending-review and pending-commit, we'll be down to one (or possibly two) regressions on the HTML5 lib test suite (and a massive number of progressions).  The main things left to implement are the following:

1) <textarea> isn't properly ignoring the next token if it's a newline.
2) There's some bug in our reconstruction of formatting elements after tables.
3) We don't handle foreign content (i.e., SVG-in-HTML and MathML-in-HTML).

Currently, we appear to be failing almost all the LayoutTests, so there's a bunch of work still left to do once we finishing the HTML5lib test suite.  :)
Comment 4 Adam Barth 2010-07-08 18:41:55 PDT
Oh, I forgot.  We still need to do fragment parsing.
Comment 5 Adam Barth 2010-07-15 02:57:24 PDT
We've triaged all the remaining LayoutTest failures:

https://spreadsheets.google.com/ccc?key=0AlC4tS7Ao1fIdEo0SFdLaVpiclBHMVNQcHlTenV5TEE&hl=en

There are about 150 failures, but only a handful a bugs.  Some of the popular items for different test results are the insertion of fake colgroup elements and the insertion of whitespace nodes between </head> and <body>.  There are certainly some real bugs in there that need to be fixed, however.
Comment 6 Adam Barth 2010-07-21 18:12:55 PDT
I think we're ready to try turning this on.  The bugs we've been fixing recently aren't really blocking issues anymore.
Comment 7 Stephanie Lewis 2010-07-27 21:07:40 PDT
I measured the PLT and the new tree builder was a 1% win.  Testing r64159.
Comment 8 Adam Barth 2010-08-05 00:16:21 PDT
HTML5 tree builder enabled in http://trac.webkit.org/changeset/64712