Consider the following HTML document:
<p>John said <q>Mary said <q>blah</q> to me.</q></p>
All these quotes will be displayed as ASCII ones, which is not right.
I think that we should use system language as the last fallback for document language, as this would be correct more often than not. Of course, lang attribute or HTTP headers would still have precedence.
A related Chromium bug is https://code.google.com/p/chromium/issues/detail?id=179331
This should be pretty easy to do, we already track system language in WebCore (on Mac).
(and many other WebKit based browsers have the language user configurable)
bug 18085 is also related. The precedence order would be
1. explicit lang/xml:lang (and Content-Language)
2. charset to lang mapping
3. OS/System language (or UI langauge if UI language is different from OS/System language)
BTW, when we do step #2, we also have to take into account system/UI language to break the degeneracy (ISO-8859-1, windows-1252 : what to map to? 'en' is a good fallback, but if the system/UI language is French, 'fr' would be better).