Bug 30827 - Off-by-one hard-to-trigger memory corruption in CSSParser (seen only with GCC 4.4)
Summary: Off-by-one hard-to-trigger memory corruption in CSSParser (seen only with GCC...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: CSS (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC All
: P2 Normal
Assignee: Evan Martin
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-10-27 10:31 PDT by Evan Martin
Modified: 2009-11-03 10:37 PST (History)
7 users (show)

See Also:


Attachments
lapack css (2.17 KB, text/css)
2009-10-27 10:51 PDT, Evan Martin
no flags Details
repro of the crasher (264 bytes, text/html)
2009-10-30 16:09 PDT, Evan Martin
no flags Details
patch and layout test (3.83 KB, patch)
2009-10-30 16:33 PDT, Evan Martin
darin: review-
Details | Formatted Diff | Diff
patch and layout test (4.57 KB, patch)
2009-11-02 12:23 PST, Evan Martin
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Evan Martin 2009-10-27 10:31:12 PDT
See http://code.google.com/p/chromium/issues/detail?id=23362 for the gory details.  Here's a summary of multiple people's hard work.

Build WebKit with GCC 4.4 and use either:
 - glibc's MALLOC_CHECK support
 - valgrind
and find that some pages cause, respectively, either a crash or an invalid read/write.

Example URLs:
  http://artofapogee.blogspot.com/2009/05/ubuntu-my-2009-t-shirt-design.html
  http://msnbc.com
  http://www.cs.colorado.edu/~jessup/lapack/

The crash is due to a double free in WebCore::~CSSParser, but that could be due to memory corruption.
A partial backtrace at the invalid write found by valgrind follows:

#0  0x0000000001a811a0 in WebCore::CSSParser::lex (this=0x7feffd780) at ../css/tokenizer.flex:47
#1  0x0000000001a7ef4e in WebCore::CSSParser::lex (this=0x7feffd780, yylvalWithoutType=0x7feffd1e0)
    at third_party/WebKit/WebCore/css/CSSParser.cpp:4607
#2  0x0000000001c7e8aa in cssyylex (cssyylval=0x7feffd1e0, parser=0x7feffd780)
    at /home/shenki/src/chromium/src/third_party/WebKit/WebCore/css/CSSGrammar.y:95
#3  0x0000000001c7eb2e in cssyyparse (parser=0x7feffd780) at out/Debug/obj.target/geni/CSSGrammar.cpp:2060
#4  0x0000000001a6ebc5 in WebCore::CSSParser::parseSheet (this=0x7feffd780, sheet=0x101b32c0, string=...)
    at third_party/WebKit/WebCore/css/CSSParser.cpp:225
#5  0x0000000001ad617d in WebCore::CSSStyleSheet::parseString (this=0x101b32c0, string=..., strict=false)
    at third_party/WebKit/WebCore/css/CSSStyleSheet.cpp:167
#6  0x00000000014972ec in WebCore::HTMLLinkElement::setCSSStyleSheet (this=0x1013e050, url=..., charset=..., sheet=0x1013eef0)
    at third_party/WebKit/WebCore/html/HTMLLinkElement.cpp:269
#7  0x00000000014f22d7 in WebCore::CachedCSSStyleSheet::checkNotify (this=0x1013eef0)
    at third_party/WebKit/WebCore/loader/CachedCSSStyleSheet.cpp:115
#8  0x00000000014f21c3 in WebCore::CachedCSSStyleSheet::data (this=0x1013eef0, data=..., allDataReceived=true)
    at third_party/WebKit/WebCore/loader/CachedCSSStyleSheet.cpp:103

The CSS for the lapack URL does not cause a crash if the size of the file is adjusted by any number of bytes(!!), smaller or larger.  It also does not cause a crash if the error in the css is fixed (line 98, add a closing brace).

If you add 1 to the data malloc'd in WebCore/css/CSSParser.cpp line 199, it works around the issue.
Comment 1 Evan Martin 2009-10-27 10:51:10 PDT
Created attachment 41966 [details]
lapack css

This is the style.css that is involved in one instance of the crash, with the missing semicolon on line 98 mentioned in the bug.
Comment 2 Darin Adler 2009-10-27 11:06:36 PDT
The fact that allocating one more byte make the bug goes away makes me think this is a buffer overrun of some sort.

The flex documentation mentions that you need to supply a buffer where the last two bytes are 0 when explaining how to use the yy_scan_buffer, which I think is analogous to how we use flex. So it makes sense that CSSParser sets the last two characters to 0, and that should guarantee we never run off the end of the buffer.

What I can't tell is exactly how flex guarantees it won't read or write more than two characters past the end of the buffer. That guarantee somehow comes from the state table that flex generates.

Trying to guess how the incorrect CSS in the lapack example affects this I think it's possible the problem might stem from calling flex again after flex returns an end of file indication.
Comment 3 Evan Martin 2009-10-27 11:30:48 PDT
One additional fact I left out: it appears that the discriminating difference to trigger this is GCC 4.4 rather than GCC 4.3.  It doesn't happen to Google-based Chrome developers who are using GCC 4.3 (and likely also OS X WebKit devs).  It does happen to people building on both Fedora and Ubuntu with GCC 4.4.
Comment 4 Darin Adler 2009-10-27 11:39:03 PDT
(In reply to comment #3)
> One additional fact I left out: it appears that the discriminating difference
> to trigger this is GCC 4.4 rather than GCC 4.3.  It doesn't happen to
> Google-based Chrome developers who are using GCC 4.3 (and likely also OS X
> WebKit devs).  It does happen to people building on both Fedora and Ubuntu with
> GCC 4.4.

That points to either a compiler bug or a side effect of more aggressive optimization, then.
Comment 5 Craig Schlenter 2009-10-27 11:51:31 PDT
A couple of quick comments:

The valgrind trace shows 2 bytes being overwritten not just 1.

Valgrind also shows a 2 byte read error. So while adding 1 byte might fix it, I
think 2 might be better if we just wanted to paper over the error.

It would be nice to understand why the parsing is overrunning though ... my
reading of the valgrind trace and the code suggests that it should be here:
(from tokenizer.cpp which is autogenerated)

#line 47 "../css/tokenizer.flex"
{yyTok = STRING; return yyTok;}
        YY_BREAK
case 15:
/* rule 15 can match eol */
YY_RULE_SETUP

but I'm not seeing anything writing to memory there which is confusing...
Comment 6 Darin Adler 2009-10-27 11:54:57 PDT
To get a better understanding of what is going on you should look at the generated code, tokenizer.cpp, not the source code, tokenizer.flex. And tokenizer.cpp is included in CSSParser.cpp, which has other code you'll need to see to understand it.

As I said, the guarantee it won't run off the end of the buffer comes from the flex-generated state tables, and is not obvious from just reading the code.
Comment 7 Chris Evans 2009-10-27 12:43:32 PDT
It may not be a GCC4.4 optimization issue, but an issue with the lex code itself.

e.g. here's a crash report on Windows in ::lex()

http://crash/reportdetail?reportid=3b8c4b9de1f9b57a

It crashes hitting a page boundary - which is one possible symptom of an off-by-one. It's not a strong signal, though.
Comment 8 Evan Martin 2009-10-27 13:18:45 PDT
Paste of the backtrace from that Chrome bug for the benefit of the WebKit guys:
0x01dcdb73	 [chrome.dll	 - tokenizer.flex:21]	 WebCore::CSSParser::lex()
0x01dccbe5	 [chrome.dll	 - cssparser.cpp:4331]	 WebCore::CSSParser::lex(void *)
0x01e74aa3	 [chrome.dll	 - cssgrammar.cpp:2119]	 cssyyparse(void *)
0x01d86eca	 [chrome.dll	 - cssstylesheet.cpp:164]	 WebCore::CSSStyleSheet::parseString(WebCore::String const &,bool)
0x01cbfd02	 [chrome.dll	 - htmllinkelement.cpp:255]	 WebCore::HTMLLinkElement::setCSSStyleSheet(WebCore::String const &,WebCore::String const &,WebCore::CachedCSSStyleSheet const *)
Comment 9 Darin Adler 2009-10-27 13:24:36 PDT
An editorial comment: This lexer code dates back to very early in the project, and was done by Lars Knoll.

There is not necessarily a “WebKit guy” who’s already expert on this. So don’t wait for an expert to show up! Make yourself into one.
Comment 10 Evan Martin 2009-10-27 13:26:43 PDT
Yeah, I didn't mean to say I was pawning it off on you -- I just hate it when people paste bug tracker URLs to bug trackers that I can't see.  :)
Comment 11 Darin Adler 2009-10-27 13:32:18 PDT
(In reply to comment #4)
> (In reply to comment #3)
> > One additional fact I left out: it appears that the discriminating difference
> > to trigger this is GCC 4.4 rather than GCC 4.3.
> 
> That points to either a compiler bug or a side effect of more aggressive
> optimization, then.

(In reply to comment #7)
> It may not be a GCC4.4 optimization issue, but an issue with the lex code
> itself.

Agreed.

What I meant by “side effect of more aggressive optimization” was to raise the possibility GCC 4.4 might do a correct optimization that GCC 4.3 does not, and this means that incorrect code produced by lex has a symptom that was latent before.

If it showed up with GCC 4.3 as well, I would not suspect that, but since it does not, I do suspect it.

Has anyone reproduced this bug with an unoptimized debug build?
Comment 12 Craig Schlenter 2009-10-28 08:58:55 PDT
Joel's backtrace appears to have come from a debug build:

From valgrind-lapack.txt from http://code.google.com/p/chromium/issues/detail?id=23362#c41

tools/valgrind/valgrind.sh out/Debug/chrome --user-data-dir=/tmp/chrome http://www.cs.colorado.edu/~jessup/lapack/

Darin: would you consider a patch that cranks up the buffer size to work around this temporarily until we can make more sense of this?

I unfortunately can't reproduce this on my machine so debugging it is problematic :(
Comment 13 Darin Adler 2009-10-28 09:17:46 PDT
(In reply to comment #12)
> Darin: would you consider a patch that cranks up the buffer size to work around
> this temporarily until we can make more sense of this?

I’d like to hear your take on the pros and cons of taking that action.

If our primary goal is to diagnose and fully fix the problem, it’s possible that applying a patch that makes its symptoms go away for reasons we don’t fully understand may make our task more difficult rather than easier. Perhaps you have another goal?
Comment 14 Evan Martin 2009-10-28 09:22:37 PDT
Since this only affects people who build with GCC 4.4, at least in the Chrome world that means Linux distros, who already carry a variety of downstream patches.  I guess this ought to affect the other Linux WebKits as well; I don't know whether they also apply out-of-tree patches before releasing.
Comment 15 Craig Schlenter 2009-10-28 11:27:38 PDT
(In reply to comment #13)
> If our primary goal is to diagnose and fully fix the problem, it’s possible
> that applying a patch that makes its symptoms go away for reasons we don’t
> fully understand may make our task more difficult rather than easier. Perhaps
> you have another goal?

My goal was to stop possible crashes in the field without requiring package builders using gcc 4.4 to carry extra patches until the problem is properly resolved. Part of the rationale for suggesting this was that it looks like the problem might take a while to solve and I was anticipating that when debugging the problem it would be easy to revert the patch locally if desired.

But if you feel that approach is undesirable, it's easy enough for the packagers to include a local patch for now...
Comment 16 Darin Adler 2009-10-28 12:34:57 PDT
(In reply to comment #15)
> My goal was to stop possible crashes in the field without requiring package
> builders using gcc 4.4 to carry extra patches until the problem is properly
> resolved.

Seems OK to me, although a bit sloppy.

> Part of the rationale for suggesting this was that it looks like the
> problem might take a while to solve and I was anticipating that when debugging
> the problem it would be easy to revert the patch locally if desired.

I agree this won’t cause much difficulty debugging the problem.

Typically we hold off on making a change that just makes a bug go away until we understand why it does. Maybe this is a special case.
Comment 17 Evan Martin 2009-10-30 15:54:35 PDT
After some quality time with Valgrind, I have a fix; will post a patch and an explanation in a bit.  Going to play around a bit to see if I can make a reduced test case.
Comment 18 Evan Martin 2009-10-30 16:09:45 PDT
Created attachment 42241 [details]
repro of the crasher

The attached test reliably crashes on my 64-bit machine with GCC 4.4.

If anyone's on a 32-bit machine that has reproduced the crasher, can you verify that this also crashes for you?  If not, can you try adding more letters (one at a time) to the "tenbytes" seen in the style tag and tell me if there's any other number of bytes that better triggers the crash for you?
Comment 19 Evan Martin 2009-10-30 16:31:34 PDT
Ok, so here's what's going on.  Darin's guess seems right on.

We run flex to generate a lexer, then the perl script "maketokenizer" to munge the flex output into something WebKit can use -- for example, the character buffer contains UChars, not chars.

maketokenizer is pretty magical; part of my patch is to add a few more comments to it.  One of the things it does is remove the flex end-of-buffer handling code, which is 20+ lines of code that normally tries to go on to the next "input file" (which doesn't make sense within WebKit).

flex hits the end of buffer state when the input hits two consecutive NULs.  The first point where Valgrind complained was that we were reading one character (two bytes -- that's why the valgrind complaints we were seeing involved 2 bytes instead of one, because characters are two bytes) past the end of the buffer.  Some careful reading of the flex end of buffer code reveals that it backs up by one byte in the EOB case -- I imagine this is because we still want to point at the trailing two NULs if we attempt to resume lexing on the same buffer.

I'll attach a patch that fixes my layout test, but I'm not perfectly confident about it.  flex has a lot of magical pointer variables and they're all not clear.  I'll paste the normal flex end of buffer code below -- the key observation to make is that yy_amount_of_matched_text gets computed with an extra -1, but then yy_c_buf_p gets updated to be yytext + yy_amount_of_matched_text without adding in the -1.

I don't understand flex at all, but yy_cp is a variable that starts at yy_c_buf_p and moves forward during the loop, the latter is the one that persists.

My patch does fix the crash, though I'm not certain it's correct or that it'll work on other systems.  Needs more experimentation.

	case YY_END_OF_BUFFER:
		{
		/* Amount of text matched not including the EOB char. */
		int yy_amount_of_matched_text = (int) (yy_cp - (yytext_ptr)) - 1;

		/* Undo the effects of YY_DO_BEFORE_ACTION. */
		*yy_cp = (yy_hold_char);
		YY_RESTORE_YY_MORE_OFFSET

		if ( YY_CURRENT_BUFFER_LVALUE->yy_buffer_status == YY_BUFFER_NEW )
			{
			/* We're scanning a new file or input source.  It's
			 * possible that this happened because the user
			 * just pointed yyin at a new source and called
			 * yylex().  If so, then we have to assure
			 * consistency between YY_CURRENT_BUFFER and our
			 * globals.  Here is the right place to do so, because
			 * this is the first action (other than possibly a
			 * back-up) that will match for the new input source.
			 */
			(yy_n_chars) = YY_CURRENT_BUFFER_LVALUE->yy_n_chars;
			YY_CURRENT_BUFFER_LVALUE->yy_input_file = yyin;
			YY_CURRENT_BUFFER_LVALUE->yy_buffer_status = YY_BUFFER_NORMAL;
			}

		/* Note that here we test for yy_c_buf_p "<=" to the position
		 * of the first EOB in the buffer, since yy_c_buf_p will
		 * already have been incremented past the NUL character
		 * (since all states make transitions on EOB to the
		 * end-of-buffer state).  Contrast this with the test
		 * in input().
		 */
		if ( (yy_c_buf_p) <= &YY_CURRENT_BUFFER_LVALUE->yy_ch_buf[(yy_n_chars)] )
			{ /* This was really a NUL. */
			yy_state_type yy_next_state;

			(yy_c_buf_p) = (yytext_ptr) + yy_amount_of_matched_text;
Comment 20 Evan Martin 2009-10-30 16:33:04 PDT
Created attachment 42244 [details]
patch and layout test
Comment 21 Darin Adler 2009-10-30 16:40:16 PDT
Comment on attachment 42244 [details]
patch and layout test

Next time, please set review? on a patch to indicate you'd like review. Unless you intentionally left that out here.

> +<style>tenbytes {</style>
> +
> +This test tickles a subtle off-by-one bug in how WebKit's CSS lexer
> +handles end of buffer conditions.<p>
> +
> +The contents of the style tag satisfy (length mod 8 = 2) and contain an
> +unclosed curly brace.  We pass if we don't crash.<p>
> +
> +PASS

The use of <p> here is incorrect. It's a container, not a separator. <p> goes at the start of a paragraph and then </p> at the end of it.

This test needs to have some code to make it dump text-only results so we have a platform-independent test. Like this:

    <script>
    if (window.layoutTestController)
        layoutTestController.dumpAsText();
    </script>

The patch needs to include expected results for the test, generated by running the run-webkit-tests script.

Patch otherwise looks very good. Great fix!

review- because the test needs results
Comment 22 Evan Martin 2009-10-30 19:47:42 PDT
I didn't set review? because I knew I needed to do more work before it was good to go, like test baselines and such, and I was running out the door.  Not quite sure what the proper protocol for "this isn't good enough yet, but what do you think?" is.

I'll make a proper patch on Monday.  If anyone reading this can try the test attached to this bug on a 32-bit system where they encountered the crash, I'd be interested to hear positive/negative results.
Comment 23 Darin Adler 2009-10-31 15:55:46 PDT
(In reply to comment #22)
> Not quite
> sure what the proper protocol for "this isn't good enough yet, but what do you
> think?" is.

Comment in the bug or email.
Comment 24 Evan Martin 2009-11-02 12:23:59 PST
Created attachment 42334 [details]
patch and layout test
Comment 25 Evan Martin 2009-11-02 12:25:22 PST
Comment on attachment 42334 [details]
patch and layout test

I'm not perfectly happy with this test (it only seems to repro on GCC 4.4 + 64-bit, not even 32-bit) but I have confirmed that I get a crash with the patch and no crash without.
Comment 26 Darin Adler 2009-11-02 13:22:19 PST
(In reply to comment #25)
> I'm not perfectly happy with this test (it only seems to repro on GCC 4.4 +
> 64-bit, not even 32-bit) but I have confirmed that I get a crash with the patch
> and no crash without.

One think that might make the test more valuable is to find a way to introduce bounds checking assertions that check the value of the buffer pointer at key points. With that, a debug build and this test case could demonstrate the buffer overrun even in cases where it's not causing a crash.
Comment 27 WebKit Commit Bot 2009-11-03 10:37:47 PST
Comment on attachment 42334 [details]
patch and layout test

Clearing flags on attachment: 42334

Committed r50467: <http://trac.webkit.org/changeset/50467>
Comment 28 WebKit Commit Bot 2009-11-03 10:37:54 PST
All reviewed patches have been landed.  Closing bug.