Bug 41123

Summary: MASTER: WebKit needs an HTML5 tree builder
Product: WebKit Reporter: Adam Barth <abarth>
Component: WebCore Misc.Assignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: bfulgham, eric, ggaren, mike, mjs, Ms2ger, proppy, slewis, tonyg, webmaster
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 41124, 41126, 41131, 41133, 41183, 41184, 41189, 41191, 41225, 41232, 41239, 41257, 41262, 41263, 41264, 41265, 41271, 41272, 41273, 41276, 41277, 41293, 41306, 41314, 41316, 41317, 41319, 41324, 41335, 41337, 41344, 41399, 41402, 41405, 41433, 41436, 41439, 41440, 41448, 41453, 41500, 41501, 41502, 41503, 41505, 41555, 41556, 41557, 41558, 41559, 41560, 41561, 41582, 41587, 41588, 41590, 41591, 41623, 41627, 41646, 41647, 41650, 41652, 41654, 41656, 41659, 41660, 41663, 41671, 41684, 41688, 41716, 41720, 41728, 41729, 41731, 41733, 41734, 41735, 41736, 41739, 41740, 41741, 41743, 41744, 41751, 41752, 41754, 41756, 41778, 41807, 41812, 41838, 41907, 41916, 41921, 41922, 41936, 41939, 41940, 41942, 41943, 41946, 41947, 41949, 41991, 41998, 42000, 42002, 42022, 42023, 42050, 42059, 42096, 42106, 42133, 42138, 42143, 42187, 42199, 42222, 42233, 42235, 42238, 42285, 42294, 42312, 42314, 42346, 42347, 42348, 42349, 42351, 42404, 42431, 42590, 42594, 42599, 42604, 42644, 42654, 42668, 42708, 42713, 42725, 42727, 42728, 42731, 42773, 42776, 42791, 42792, 42794, 42804, 42870, 42875, 42877, 42879, 42880, 42881, 42948, 42950, 42951, 42952, 43072, 43073, 43075    
Bug Blocks: 32934, 40829, 18415    
Attachments:
Description Flags
current html5lib test failure diff
none
current html5lib test failure diff none

Description Adam Barth 2010-06-23 19:04:15 PDT
Now that the HTML5 tokenizer appears to be sticking, it's time to start working on the tree building algorithm.  This is a master bug that will accumulate dependencies on the individual patches.  We're going to try to re-use as much code from the LegacyHTMLTreeBuilder as possible.
Comment 1 Adam Barth 2010-07-06 01:25:09 PDT
Created attachment 60603 [details]
current html5lib test failure diff

We're close to "parity" in the sense that we have roughly the same number of test progressions as test failures.  The + lines are new failures.  The - lines are test progressions.  (Of course, this metric doesn't account for the importance of each test.)
Comment 2 Eric Seidel (no email) 2010-07-06 13:17:03 PDT
It's also possible to create the attached diff in trac directly:
http://trac.webkit.org/changeset?old_path=/trunk/LayoutTests/html5lib/runner-expected.txt&old=62584&new_path=/trunk/LayoutTests/html5lib/runner-expected-html5.txt&new=62584
(You have to update the revisions every time though.)
Comment 3 Adam Barth 2010-07-08 18:40:40 PDT
Created attachment 60993 [details]
current html5lib test failure diff

Once we land all the patches that are pending-review and pending-commit, we'll be down to one (or possibly two) regressions on the HTML5 lib test suite (and a massive number of progressions).  The main things left to implement are the following:

1) <textarea> isn't properly ignoring the next token if it's a newline.
2) There's some bug in our reconstruction of formatting elements after tables.
3) We don't handle foreign content (i.e., SVG-in-HTML and MathML-in-HTML).

Currently, we appear to be failing almost all the LayoutTests, so there's a bunch of work still left to do once we finishing the HTML5lib test suite.  :)
Comment 4 Adam Barth 2010-07-08 18:41:55 PDT
Oh, I forgot.  We still need to do fragment parsing.
Comment 5 Adam Barth 2010-07-15 02:57:24 PDT
We've triaged all the remaining LayoutTest failures:

https://spreadsheets.google.com/ccc?key=0AlC4tS7Ao1fIdEo0SFdLaVpiclBHMVNQcHlTenV5TEE&hl=en

There are about 150 failures, but only a handful a bugs.  Some of the popular items for different test results are the insertion of fake colgroup elements and the insertion of whitespace nodes between </head> and <body>.  There are certainly some real bugs in there that need to be fixed, however.
Comment 6 Adam Barth 2010-07-21 18:12:55 PDT
I think we're ready to try turning this on.  The bugs we've been fixing recently aren't really blocking issues anymore.
Comment 7 Stephanie Lewis 2010-07-27 21:07:40 PDT
I measured the PLT and the new tree builder was a 1% win.  Testing r64159.
Comment 8 Adam Barth 2010-08-05 00:16:21 PDT
HTML5 tree builder enabled in http://trac.webkit.org/changeset/64712