Bug 34489 - [Qt] Text codec lookup is slow
Summary: [Qt] Text codec lookup is slow
Status: RESOLVED INVALID
Alias: None
Product: WebKit
Classification: Unclassified
Component: Text (show other bugs)
Version: 528+ (Nightly build)
Hardware: S60 Hardware S60 3rd edition
: P2 Major
Assignee: Nobody
URL:
Keywords: Performance, Qt
Depends on:
Blocks:
 
Reported: 2010-02-02 10:28 PST by David Leong
Modified: 2010-02-15 00:33 PST (History)
5 users (show)

See Also:


Attachments
Proposed text codec patch (1.20 KB, patch)
2010-02-02 10:28 PST, David Leong
no flags Details | Formatted Diff | Diff
Updated patch. (1.87 KB, patch)
2010-02-02 15:18 PST, David Leong
ariya.hidayat: review-
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description David Leong 2010-02-02 10:28:09 PST
Created attachment 47948 [details]
Proposed text codec patch

Qt's text codec implementation is slow when converting many strings of the same codec type. It creates a new QTextCodec for each string that it needs to convert, and internally does name comparisons against a list of codecs and aliases. 

We have a use case where we are passing data from a NPPlugin to Javascript. Many of the NPAPIs internally converts strings and uses the TextCodec. We have found that the TextCodec construction is taking over 50% of the execution time.

We propose to add a fast hashmap lookup table to the TextCodec implementation. If the same codec is used repeatedly it can be quickly retrieved.
Comment 1 Laszlo Gombos 2010-02-02 10:34:06 PST
Needs Changelog, idea looks good to me.
Comment 2 Ariya Hidayat 2010-02-02 11:50:12 PST
Comment on attachment 47948 [details]
Proposed text codec patch

Hmm, seems the proper fix is to optimize QTextCodec.

Anyway, some comments:

> -    m_codec = QTextCodec::codecForName(m_encoding.name());
> +    // Very likely to reuse the same text codecs in frequent order,
> +    // QTextCodec is very slow with matching encoding name to codec, so we make a cache here
> +    // 
> +    const char* name = m_encoding.name();
> +    QString qName = QString::fromAscii(name);

This creates the unnecessary deep copy of name in the heap. However, I don't think using QLatin1String would work either since you potentially insert the encoding name later on to the cache.

> +    TextCodecHash::iterator it = staticTextCodecCache.find(qName);
> +    if( it != staticTextCodecCache.end() ) {
> +        m_codec = it.value();
> +        
> +    }
> +    else {
> +        m_codec = QTextCodec::codecForName(m_encoding.name());
> +        staticTextCodecCache.insert(qName, m_codec);
> +    }

A more readable version would be to use QHash::value(). Maybe something like:

m_codec = staticTextCodecCache.value(qName);
if (!m_coded) {
        m_codec = QTextCodec::codecForName(m_encoding.name());
       staticTextCodecCache.insert(qName, m_codec);
}

Also, any particular reason why the codec cache can't be a static member in TextCodecQt class instead of a global one like that?
Comment 3 David Leong 2010-02-02 13:00:42 PST
Thanks for the comments, I will rework this as suggested by Laszlo and Ariya.

QTextCodec internally already maintains a cache of text codecs. The performance problem is caused by the fact that QTextCodec has to check for the name and aliases associated with each codec in order to return from their cache. The proposed webkit local cache is for usage locality when we request the same codec over and over again. (I'll also log something to Qt if appropriate)

(In reply to comment #2)
> (From update of attachment 47948 [details])
> Hmm, seems the proper fix is to optimize QTextCodec.
> 
> Anyway, some comments:
> 
> > -    m_codec = QTextCodec::codecForName(m_encoding.name());
> > +    // Very likely to reuse the same text codecs in frequent order,
> > +    // QTextCodec is very slow with matching encoding name to codec, so we make a cache here
> > +    // 
> > +    const char* name = m_encoding.name();
> > +    QString qName = QString::fromAscii(name);
> 
> This creates the unnecessary deep copy of name in the heap. However, I don't
> think using QLatin1String would work either since you potentially insert the
> encoding name later on to the cache.

David: I originally wanted to use QString( char* ) to not deep copy the data but the API has been deprecate in Qt 4.6

> 
> > +    TextCodecHash::iterator it = staticTextCodecCache.find(qName);
> > +    if( it != staticTextCodecCache.end() ) {
> > +        m_codec = it.value();
> > +        
> > +    }
> > +    else {
> > +        m_codec = QTextCodec::codecForName(m_encoding.name());
> > +        staticTextCodecCache.insert(qName, m_codec);
> > +    }
> 
> A more readable version would be to use QHash::value(). Maybe something like:
> 
> m_codec = staticTextCodecCache.value(qName);
> if (!m_coded) {
>         m_codec = QTextCodec::codecForName(m_encoding.name());
>        staticTextCodecCache.insert(qName, m_codec);
> }
> 
> Also, any particular reason why the codec cache can't be a static member in
> TextCodecQt class instead of a global one like that?

David: OK, i will make the cache a static member in TextCodecQt and rework the other code.
Comment 4 David Leong 2010-02-02 15:18:50 PST
Created attachment 47978 [details]
Updated patch.

Updated the patch as per Laszlo and Ariya's suggestion. 

-Added change log
-Ran style check script
-Updated use of value() instead of iterator as per Ariya

I could not move the static to the class, it generated a missing symbol with RVCT 2.2.
Comment 5 Ariya Hidayat 2010-02-02 16:59:00 PST
Comment on attachment 47978 [details]
Updated patch.


> +        QtTextCodec creation is very slow
> +        https://bugs.webkit.org/show_bug.cgi?id=34489

Better to describe the solution rather than only the problem.
Maybe something like "[Qt] Text codec caching to speed up codec look-up".

> +typedef QHash<QString, QTextCodec*> TextCodecHash;
> +static TextCodecHash textCodecCache;

I guess static problem with RVCT 2.2 has been solved in the past. Maybe Laszlo knows this?

> +    // qDebug() << "TextCodecQt::TextCodecQt() - " << name;

Do we want this?

BTW, try to ask Simon about his opinion, he's the text expert :)
Comment 6 David Leong 2010-02-02 17:12:41 PST
Ok, Laszlo/I will ping Simon and then update the patch as needed.
Comment 7 Simon Hausmann 2010-02-03 02:06:30 PST
(In reply to comment #3)
> Thanks for the comments, I will rework this as suggested by Laszlo and Ariya.
> 
> QTextCodec internally already maintains a cache of text codecs. The performance
> problem is caused by the fact that QTextCodec has to check for the name and
> aliases associated with each codec in order to return from their cache. The
> proposed webkit local cache is for usage locality when we request the same
> codec over and over again. (I'll also log something to Qt if appropriate)

I have to agree with Ariya's earlier comment: This should be fixed in Qt, not worked around with increased memory usage in WebKit.

Depending on your use-case there may be different ways of fixing this. For example if in your case it's always the same codec that is requested, perhaps it would make most sense to cache the last returned codec inside of QTextCodec::codecByName and before traversing through the entire list, check if the requested codec is like the last one and return it.
Comment 8 David Leong 2010-02-03 11:48:25 PST
Logged to QT, http://bugreports.qt.nokia.com/browse/QTBUG-7888

Will follow up to see if Qt will accept the caching idea at the Qt level.
Comment 9 Holger Freyther 2010-02-03 12:00:32 PST
Please make your test case available. Without having a publicly available test case there is no point in optimizing. Once a benchmark for this issue exists we can start to look into solutions for the problem.
Comment 10 Benjamin Poulain 2010-02-03 12:11:58 PST
I close the task, no point of having it open in the list.

(In reply to comment #8)
> Logged to QT, http://bugreports.qt.nokia.com/browse/QTBUG-7888
> Will follow up to see if Qt will accept the caching idea at the Qt level.

Note that you can fix that in Qt without waiting for this task: http://qt.gitorious.org/
Comment 11 Simon Hausmann 2010-02-15 00:33:49 PST
This is fixed in Qt 4.6 now, Olivier added caching to QTextCodec directly (yay!)