Bug 52036

Summary: Feed libxml2 with raw data, relying on it to do character set decoding
Product: WebKit Reporter: Patrick R. Gansterer <paroga>
Component: XMLAssignee: Patrick R. Gansterer <paroga>
Status: NEW ---    
Severity: Normal CC: ap, darin
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: All   
OS: All   
Bug Depends on: 52547, 52085, 53398    
Bug Blocks: 43085    
Attachments:
Description Flags
Work in progress
none
Work in progress none

Description Patrick R. Gansterer 2011-01-06 17:20:20 PST
Created attachment 78193 [details]
Work in progress

I created a patch of the work I've done already. Maybe you can give me some early feedback.
At the moment only about 5 test fail, because of some missing encoding problem. (I don't teach libxml2 about all known TextEncodings in the current state).

XMLDocumentParser is a subclass of ScriptableDocumentParser which is a subclass of DecodedDataDocumentParser.
Normally DecodedDataDocumentParser handles the appendBytes method, which I've implemented in the XMLDocumentParser to get the raw data.
IMHO this is a kind of "layer violation". Can you give me a tip how to implement this in a correct way? Do I need to change the inheritance of all "parser classes"?
Comment 1 Patrick R. Gansterer 2011-01-16 17:01:15 PST
Created attachment 79118 [details]
Work in progress

I did some small performance tests (see bug 52547) with this new patch:

                  avg      median  stdev  min   max
HTML              6517.25  6770.5  505    5242  7286
XML (original)    6254.5   6366    573    5462  7118
XML (with patch)  5735.45  5385.5  704    4159  6853
                  -8.3%    -15.4%