Bug 106919

Summary: Make AtomicMarkupTokenBase use a bare UChar* for external characters
Product: WebKit Reporter: Tony Gentilcore <tonyg>
Component: New BugsAssignee: Tony Gentilcore <tonyg>
Status: RESOLVED FIXED    
Severity: Normal CC: abarth, benjamin, msaboff, ojan.autocc, webkit.review.bot
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 106127    
Attachments:
Description Flags
Patch none

Description Tony Gentilcore 2013-01-15 10:06:07 PST
Make AtomicMarkupTokenBase use a bare UChar* for external characters
Comment 1 Tony Gentilcore 2013-01-15 10:07:26 PST
Created attachment 182800 [details]
Patch
Comment 2 Eric Seidel (no email) 2013-01-15 10:35:36 PST
Comment on attachment 182800 [details]
Patch

ok.
Comment 3 WebKit Review Bot 2013-01-15 11:07:51 PST
Comment on attachment 182800 [details]
Patch

Clearing flags on attachment: 182800

Committed r139760: <http://trac.webkit.org/changeset/139760>
Comment 4 WebKit Review Bot 2013-01-15 11:07:54 PST
All reviewed patches have been landed.  Closing bug.
Comment 5 Benjamin Poulain 2013-01-15 23:33:12 PST
Comment on attachment 182800 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=182800&action=review

> Source/WebCore/xml/parser/MarkupTokenBase.h:529
> +    size_t charactersLength() const
> +    {
> +        ASSERT(m_type == Token::Type::Character);
> +        return m_externalCharactersLength;
>      }

While you are at it: this would be better as unsigned instead of size_t.
Character buffers are usually unsigned (WTFString included).

> Source/WebCore/xml/parser/MarkupTokenBase.h:588
> +    size_t m_externalCharactersLength;

ditto.
Comment 6 Tony Gentilcore 2013-01-16 10:37:44 PST
Comment on attachment 182800 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=182800&action=review

>> Source/WebCore/xml/parser/MarkupTokenBase.h:529
>>      }
> 
> While you are at it: this would be better as unsigned instead of size_t.
> Character buffers are usually unsigned (WTFString included).

abarth tells me that on a 64bit system, unsigned will still be 32 bits but size_t will be 64. Since Vectors are arbitrary length, using size_t is appropriate here.
Comment 7 Adam Barth 2013-01-16 11:02:15 PST
I wonder if we should crash if we get a character token that's larger than 32 bits.  I bet we do later anyway due to overflow checks in StringImpl.
Comment 8 Benjamin Poulain 2013-01-16 13:44:21 PST
> abarth tells me that on a 64bit system, unsigned will still be 32 bits but size_t will be 64. 

That is correct.

> Since Vectors are arbitrary length, using size_t is appropriate here.

Well, that's all nice when creating the Token, but the method you call will get you un trouble. E.g.:
    String characters = String(token->characters(), token->charactersLength());

Your patch changed from using Vector types to Character types, which is why I suggest we move to string conventions.


> I wonder if we should crash if we get a character token that's larger than 32 bits.  I bet we do later anyway due to overflow checks in StringImpl.

Yep, I think you are right.
Seeing what is here, I think that's the way to make this correct.
Comment 9 Adam Barth 2013-01-16 14:22:24 PST
Yeah.  Would you be willing to file a bug about this issue?  The issue pre-dates this patch since the old code used Vector::size as well.