<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://bugs.webkit.org/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4.1"
          urlbase="https://bugs.webkit.org/"
          
          maintainer="admin@webkit.org"
>

    <bug>
          <bug_id>34489</bug_id>
          
          <creation_ts>2010-02-02 10:28:09 -0800</creation_ts>
          <short_desc>[Qt] Text codec lookup is slow</short_desc>
          <delta_ts>2010-02-15 00:33:49 -0800</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebKit</product>
          <component>Text</component>
          <version>528+ (Nightly build)</version>
          <rep_platform>S60 Hardware</rep_platform>
          <op_sys>S60 3rd edition</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>INVALID</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords>Performance, Qt</keywords>
          <priority>P2</priority>
          <bug_severity>Major</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>0</everconfirmed>
          <reporter name="David Leong">david.leong</reporter>
          <assigned_to name="Nobody">webkit-unassigned</assigned_to>
          <cc>benjamin</cc>
    
    <cc>christian.webkit</cc>
    
    <cc>hausmann</cc>
    
    <cc>laszlo.gombos</cc>
    
    <cc>zecke</cc>
          

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>187035</commentid>
    <comment_count>0</comment_count>
      <attachid>47948</attachid>
    <who name="David Leong">david.leong</who>
    <bug_when>2010-02-02 10:28:09 -0800</bug_when>
    <thetext>Created attachment 47948
Proposed text codec patch

Qt&apos;s text codec implementation is slow when converting many strings of the same codec type. It creates a new QTextCodec for each string that it needs to convert, and internally does name comparisons against a list of codecs and aliases. 

We have a use case where we are passing data from a NPPlugin to Javascript. Many of the NPAPIs internally converts strings and uses the TextCodec. We have found that the TextCodec construction is taking over 50% of the execution time.

We propose to add a fast hashmap lookup table to the TextCodec implementation. If the same codec is used repeatedly it can be quickly retrieved.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>187039</commentid>
    <comment_count>1</comment_count>
    <who name="Laszlo Gombos">laszlo.gombos</who>
    <bug_when>2010-02-02 10:34:06 -0800</bug_when>
    <thetext>Needs Changelog, idea looks good to me.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>187079</commentid>
    <comment_count>2</comment_count>
      <attachid>47948</attachid>
    <who name="Ariya Hidayat">ariya.hidayat</who>
    <bug_when>2010-02-02 11:50:12 -0800</bug_when>
    <thetext>Comment on attachment 47948
Proposed text codec patch

Hmm, seems the proper fix is to optimize QTextCodec.

Anyway, some comments:

&gt; -    m_codec = QTextCodec::codecForName(m_encoding.name());
&gt; +    // Very likely to reuse the same text codecs in frequent order,
&gt; +    // QTextCodec is very slow with matching encoding name to codec, so we make a cache here
&gt; +    // 
&gt; +    const char* name = m_encoding.name();
&gt; +    QString qName = QString::fromAscii(name);

This creates the unnecessary deep copy of name in the heap. However, I don&apos;t think using QLatin1String would work either since you potentially insert the encoding name later on to the cache.

&gt; +    TextCodecHash::iterator it = staticTextCodecCache.find(qName);
&gt; +    if( it != staticTextCodecCache.end() ) {
&gt; +        m_codec = it.value();
&gt; +        
&gt; +    }
&gt; +    else {
&gt; +        m_codec = QTextCodec::codecForName(m_encoding.name());
&gt; +        staticTextCodecCache.insert(qName, m_codec);
&gt; +    }

A more readable version would be to use QHash::value(). Maybe something like:

m_codec = staticTextCodecCache.value(qName);
if (!m_coded) {
        m_codec = QTextCodec::codecForName(m_encoding.name());
       staticTextCodecCache.insert(qName, m_codec);
}

Also, any particular reason why the codec cache can&apos;t be a static member in TextCodecQt class instead of a global one like that?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>187105</commentid>
    <comment_count>3</comment_count>
    <who name="David Leong">david.leong</who>
    <bug_when>2010-02-02 13:00:42 -0800</bug_when>
    <thetext>Thanks for the comments, I will rework this as suggested by Laszlo and Ariya.

QTextCodec internally already maintains a cache of text codecs. The performance problem is caused by the fact that QTextCodec has to check for the name and aliases associated with each codec in order to return from their cache. The proposed webkit local cache is for usage locality when we request the same codec over and over again. (I&apos;ll also log something to Qt if appropriate)

(In reply to comment #2)
&gt; (From update of attachment 47948 [details])
&gt; Hmm, seems the proper fix is to optimize QTextCodec.
&gt; 
&gt; Anyway, some comments:
&gt; 
&gt; &gt; -    m_codec = QTextCodec::codecForName(m_encoding.name());
&gt; &gt; +    // Very likely to reuse the same text codecs in frequent order,
&gt; &gt; +    // QTextCodec is very slow with matching encoding name to codec, so we make a cache here
&gt; &gt; +    // 
&gt; &gt; +    const char* name = m_encoding.name();
&gt; &gt; +    QString qName = QString::fromAscii(name);
&gt; 
&gt; This creates the unnecessary deep copy of name in the heap. However, I don&apos;t
&gt; think using QLatin1String would work either since you potentially insert the
&gt; encoding name later on to the cache.

David: I originally wanted to use QString( char* ) to not deep copy the data but the API has been deprecate in Qt 4.6

&gt; 
&gt; &gt; +    TextCodecHash::iterator it = staticTextCodecCache.find(qName);
&gt; &gt; +    if( it != staticTextCodecCache.end() ) {
&gt; &gt; +        m_codec = it.value();
&gt; &gt; +        
&gt; &gt; +    }
&gt; &gt; +    else {
&gt; &gt; +        m_codec = QTextCodec::codecForName(m_encoding.name());
&gt; &gt; +        staticTextCodecCache.insert(qName, m_codec);
&gt; &gt; +    }
&gt; 
&gt; A more readable version would be to use QHash::value(). Maybe something like:
&gt; 
&gt; m_codec = staticTextCodecCache.value(qName);
&gt; if (!m_coded) {
&gt;         m_codec = QTextCodec::codecForName(m_encoding.name());
&gt;        staticTextCodecCache.insert(qName, m_codec);
&gt; }
&gt; 
&gt; Also, any particular reason why the codec cache can&apos;t be a static member in
&gt; TextCodecQt class instead of a global one like that?

David: OK, i will make the cache a static member in TextCodecQt and rework the other code.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>187175</commentid>
    <comment_count>4</comment_count>
      <attachid>47978</attachid>
    <who name="David Leong">david.leong</who>
    <bug_when>2010-02-02 15:18:50 -0800</bug_when>
    <thetext>Created attachment 47978
Updated patch.

Updated the patch as per Laszlo and Ariya&apos;s suggestion. 

-Added change log
-Ran style check script
-Updated use of value() instead of iterator as per Ariya

I could not move the static to the class, it generated a missing symbol with RVCT 2.2.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>187207</commentid>
    <comment_count>5</comment_count>
      <attachid>47978</attachid>
    <who name="Ariya Hidayat">ariya.hidayat</who>
    <bug_when>2010-02-02 16:59:00 -0800</bug_when>
    <thetext>Comment on attachment 47978
Updated patch.


&gt; +        QtTextCodec creation is very slow
&gt; +        https://bugs.webkit.org/show_bug.cgi?id=34489

Better to describe the solution rather than only the problem.
Maybe something like &quot;[Qt] Text codec caching to speed up codec look-up&quot;.

&gt; +typedef QHash&lt;QString, QTextCodec*&gt; TextCodecHash;
&gt; +static TextCodecHash textCodecCache;

I guess static problem with RVCT 2.2 has been solved in the past. Maybe Laszlo knows this?

&gt; +    // qDebug() &lt;&lt; &quot;TextCodecQt::TextCodecQt() - &quot; &lt;&lt; name;

Do we want this?

BTW, try to ask Simon about his opinion, he&apos;s the text expert :)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>187209</commentid>
    <comment_count>6</comment_count>
    <who name="David Leong">david.leong</who>
    <bug_when>2010-02-02 17:12:41 -0800</bug_when>
    <thetext>Ok, Laszlo/I will ping Simon and then update the patch as needed.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>187297</commentid>
    <comment_count>7</comment_count>
    <who name="Simon Hausmann">hausmann</who>
    <bug_when>2010-02-03 02:06:30 -0800</bug_when>
    <thetext>(In reply to comment #3)
&gt; Thanks for the comments, I will rework this as suggested by Laszlo and Ariya.
&gt; 
&gt; QTextCodec internally already maintains a cache of text codecs. The performance
&gt; problem is caused by the fact that QTextCodec has to check for the name and
&gt; aliases associated with each codec in order to return from their cache. The
&gt; proposed webkit local cache is for usage locality when we request the same
&gt; codec over and over again. (I&apos;ll also log something to Qt if appropriate)

I have to agree with Ariya&apos;s earlier comment: This should be fixed in Qt, not worked around with increased memory usage in WebKit.

Depending on your use-case there may be different ways of fixing this. For example if in your case it&apos;s always the same codec that is requested, perhaps it would make most sense to cache the last returned codec inside of QTextCodec::codecByName and before traversing through the entire list, check if the requested codec is like the last one and return it.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>187459</commentid>
    <comment_count>8</comment_count>
    <who name="David Leong">david.leong</who>
    <bug_when>2010-02-03 11:48:25 -0800</bug_when>
    <thetext>Logged to QT, http://bugreports.qt.nokia.com/browse/QTBUG-7888

Will follow up to see if Qt will accept the caching idea at the Qt level.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>187462</commentid>
    <comment_count>9</comment_count>
    <who name="Holger Freyther">zecke</who>
    <bug_when>2010-02-03 12:00:32 -0800</bug_when>
    <thetext>Please make your test case available. Without having a publicly available test case there is no point in optimizing. Once a benchmark for this issue exists we can start to look into solutions for the problem.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>187464</commentid>
    <comment_count>10</comment_count>
    <who name="Benjamin Poulain">benjamin</who>
    <bug_when>2010-02-03 12:11:58 -0800</bug_when>
    <thetext>I close the task, no point of having it open in the list.

(In reply to comment #8)
&gt; Logged to QT, http://bugreports.qt.nokia.com/browse/QTBUG-7888
&gt; Will follow up to see if Qt will accept the caching idea at the Qt level.

Note that you can fix that in Qt without waiting for this task: http://qt.gitorious.org/</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>190392</commentid>
    <comment_count>11</comment_count>
    <who name="Simon Hausmann">hausmann</who>
    <bug_when>2010-02-15 00:33:49 -0800</bug_when>
    <thetext>This is fixed in Qt 4.6 now, Olivier added caching to QTextCodec directly (yay!)</thetext>
  </long_desc>
      
          <attachment
              isobsolete="1"
              ispatch="1"
              isprivate="0"
          >
            <attachid>47948</attachid>
            <date>2010-02-02 10:28:09 -0800</date>
            <delta_ts>2010-02-02 15:18:50 -0800</delta_ts>
            <desc>Proposed text codec patch</desc>
            <filename>textcodec.patch</filename>
            <type>text/plain</type>
            <size>1225</size>
            <attacher name="David Leong">david.leong</attacher>
            
              <data encoding="base64">SW5kZXg6IHBsYXRmb3JtL3RleHQvcXQvVGV4dENvZGVjUXQuY3BwDQo9PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09DQotLS0g
cGxhdGZvcm0vdGV4dC9xdC9UZXh0Q29kZWNRdC5jcHAJKHJldmlzaW9uIDU0MjQxKQ0KKysrIHBs
YXRmb3JtL3RleHQvcXQvVGV4dENvZGVjUXQuY3BwCSh3b3JraW5nIGNvcHkpDQpAQCAtMzQsNiAr
MzQsOCBAQA0KIG5hbWVzcGFjZSBXZWJDb3JlIHsKIAogc3RhdGljIFFTZXQ8UUJ5dGVBcnJheT4g
KnVuaXF1ZV9uYW1lcyA9IDA7Cit0eXBlZGVmIFFIYXNoPFFTdHJpbmcsIFFUZXh0Q29kZWMqPiBU
ZXh0Q29kZWNIYXNoOworc3RhdGljIFRleHRDb2RlY0hhc2ggc3RhdGljVGV4dENvZGVjQ2FjaGU7
CiAKIHN0YXRpYyBjb25zdCBjaGFyICpnZXRBdG9taWNOYW1lKGNvbnN0IFFCeXRlQXJyYXkgJm5h
bWUpCiB7CkBAIC04NCw3ICs4NiwyMCBAQA0KIFRleHRDb2RlY1F0OjpUZXh0Q29kZWNRdChjb25z
dCBUZXh0RW5jb2RpbmcmIGVuY29kaW5nKQogICAgIDogbV9lbmNvZGluZyhlbmNvZGluZykKIHsK
LSAgICBtX2NvZGVjID0gUVRleHRDb2RlYzo6Y29kZWNGb3JOYW1lKG1fZW5jb2RpbmcubmFtZSgp
KTsKKyAgICAvLyBWZXJ5IGxpa2VseSB0byByZXVzZSB0aGUgc2FtZSB0ZXh0IGNvZGVjcyBpbiBm
cmVxdWVudCBvcmRlciwKKyAgICAvLyBRVGV4dENvZGVjIGlzIHZlcnkgc2xvdyB3aXRoIG1hdGNo
aW5nIGVuY29kaW5nIG5hbWUgdG8gY29kZWMsIHNvIHdlIG1ha2UgYSBjYWNoZSBoZXJlCisgICAg
Ly8gCisgICAgY29uc3QgY2hhciogbmFtZSA9IG1fZW5jb2RpbmcubmFtZSgpOworICAgIFFTdHJp
bmcgcU5hbWUgPSBRU3RyaW5nOjpmcm9tQXNjaWkobmFtZSk7CisgICAgVGV4dENvZGVjSGFzaDo6
aXRlcmF0b3IgaXQgPSBzdGF0aWNUZXh0Q29kZWNDYWNoZS5maW5kKHFOYW1lKTsKKyAgICBpZigg
aXQgIT0gc3RhdGljVGV4dENvZGVjQ2FjaGUuZW5kKCkgKSB7CisgICAgICAgIG1fY29kZWMgPSBp
dC52YWx1ZSgpOworICAgICAgICAKKyAgICB9CisgICAgZWxzZSB7CisgICAgICAgIG1fY29kZWMg
PSBRVGV4dENvZGVjOjpjb2RlY0Zvck5hbWUobV9lbmNvZGluZy5uYW1lKCkpOworICAgICAgICBz
dGF0aWNUZXh0Q29kZWNDYWNoZS5pbnNlcnQocU5hbWUsIG1fY29kZWMpOworICAgIH0KIH0KIAog
VGV4dENvZGVjUXQ6On5UZXh0Q29kZWNRdCgpCg==
</data>

          </attachment>
          <attachment
              isobsolete="0"
              ispatch="1"
              isprivate="0"
          >
            <attachid>47978</attachid>
            <date>2010-02-02 15:18:50 -0800</date>
            <delta_ts>2010-02-02 16:58:59 -0800</delta_ts>
            <desc>Updated patch.</desc>
            <filename>textCodecPatch.patch</filename>
            <type>text/plain</type>
            <size>1914</size>
            <attacher name="David Leong">david.leong</attacher>
            
              <data encoding="base64">SW5kZXg6IFdlYkNvcmUvQ2hhbmdlTG9nCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT0KLS0tIFdlYkNvcmUvQ2hhbmdlTG9n
CShyZXZpc2lvbiA1NDI0OSkKKysrIFdlYkNvcmUvQ2hhbmdlTG9nCSh3b3JraW5nIGNvcHkpCkBA
IC0xLDMgKzEsMTMgQEAKKzIwMTAtMDItMDIgIERhdmlkIExlb25nICA8ZGF2aWQubGVvbmdAbm9r
aWEuY29tPgorCisgICAgICAgIFJldmlld2VkIGJ5IE5PQk9EWSAoT09QUyEpLgorCisgICAgICAg
IFF0VGV4dENvZGVjIGNyZWF0aW9uIGlzIHZlcnkgc2xvdworICAgICAgICBodHRwczovL2J1Z3Mu
d2Via2l0Lm9yZy9zaG93X2J1Zy5jZ2k/aWQ9MzQ0ODkKKworICAgICAgICAqIHBsYXRmb3JtL3Rl
eHQvcXQvVGV4dENvZGVjUXQuY3BwOgorICAgICAgICAoV2ViQ29yZTo6VGV4dENvZGVjUXQ6OlRl
eHRDb2RlY1F0KTogQWRkZWQgYSBsb2NhbCBjYWNoZSB0byBzdG9yZSBRVGV4dENvZGVjIHJlc3Vs
dHMKKwogMjAxMC0wMi0wMiAgQWRhbSBSb2JlbiAgPGFyb2JlbkBhcHBsZS5jb20+CiAKICAgICAg
ICAgQ29weSBXZWJDb3JlJ3MgYmluZGluZ3MgZ2VuZXJhdGlvbiBzY3JpcHRzIHRvIGEgbW9yZSBz
ZW5zaWJsZSBsb2NhdGlvbgpJbmRleDogV2ViQ29yZS9wbGF0Zm9ybS90ZXh0L3F0L1RleHRDb2Rl
Y1F0LmNwcAo9PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09Ci0tLSBXZWJDb3JlL3BsYXRmb3JtL3RleHQvcXQvVGV4dENvZGVj
UXQuY3BwCShyZXZpc2lvbiA1NDI0OSkKKysrIFdlYkNvcmUvcGxhdGZvcm0vdGV4dC9xdC9UZXh0
Q29kZWNRdC5jcHAJKHdvcmtpbmcgY29weSkKQEAgLTM0LDYgKzM0LDggQEAKIG5hbWVzcGFjZSBX
ZWJDb3JlIHsKIAogc3RhdGljIFFTZXQ8UUJ5dGVBcnJheT4gKnVuaXF1ZV9uYW1lcyA9IDA7Cit0
eXBlZGVmIFFIYXNoPFFTdHJpbmcsIFFUZXh0Q29kZWMqPiBUZXh0Q29kZWNIYXNoOworc3RhdGlj
IFRleHRDb2RlY0hhc2ggdGV4dENvZGVjQ2FjaGU7CiAKIHN0YXRpYyBjb25zdCBjaGFyICpnZXRB
dG9taWNOYW1lKGNvbnN0IFFCeXRlQXJyYXkgJm5hbWUpCiB7CkBAIC04Miw5ICs4NCwxOSBAQCB2
b2lkIFRleHRDb2RlY1F0OjpyZWdpc3RlckNvZGVjcyhUZXh0Q29kCiB9CiAKIFRleHRDb2RlY1F0
OjpUZXh0Q29kZWNRdChjb25zdCBUZXh0RW5jb2RpbmcmIGVuY29kaW5nKQotICAgIDogbV9lbmNv
ZGluZyhlbmNvZGluZykKKyAgICA6IG1fZW5jb2RpbmcoZW5jb2RpbmcpLCBtX2NvZGVjKDApCiB7
Ci0gICAgbV9jb2RlYyA9IFFUZXh0Q29kZWM6OmNvZGVjRm9yTmFtZShtX2VuY29kaW5nLm5hbWUo
KSk7CisgICAgLy8gVmVyeSBsaWtlbHkgdG8gcmV1c2UgdGhlIHNhbWUgdGV4dCBjb2RlY3MgaW4g
ZnJlcXVlbnQgb3JkZXIsCisgICAgLy8gUVRleHRDb2RlYyBpcyB2ZXJ5IHNsb3cgd2l0aCBtYXRj
aGluZyBlbmNvZGluZyBuYW1lIHRvIGNvZGVjLCBzbyB3ZSBtYWtlIGEgY2FjaGUgaGVyZQorICAg
IC8vCisgICAgY29uc3QgY2hhciogbmFtZSA9IG1fZW5jb2RpbmcubmFtZSgpOworICAgIFFTdHJp
bmcgcU5hbWUgPSBRU3RyaW5nOjpmcm9tQXNjaWkobmFtZSk7CisgICAgLy8gcURlYnVnKCkgPDwg
IlRleHRDb2RlY1F0OjpUZXh0Q29kZWNRdCgpIC0gIiA8PCBuYW1lOworICAgIG1fY29kZWMgPSB0
ZXh0Q29kZWNDYWNoZS52YWx1ZShxTmFtZSk7CisgICAgaWYgKCAhbV9jb2RlYyApIHsKKyAgICAg
ICAgbV9jb2RlYyA9IFFUZXh0Q29kZWM6OmNvZGVjRm9yTmFtZShtX2VuY29kaW5nLm5hbWUoKSk7
CisgICAgICAgIHRleHRDb2RlY0NhY2hlLmluc2VydChxTmFtZSwgbV9jb2RlYyk7CisgICAgfQog
fQogCiBUZXh0Q29kZWNRdDo6flRleHRDb2RlY1F0KCkK
</data>
<flag name="review"
          id="30567"
          type_id="1"
          status="-"
          setter="ariya.hidayat"
    />
          </attachment>
      

    </bug>

</bugzilla>