Bug 20027 - Keyup/keydown keyIdentifier assignments are not W3C compliant
Summary: Keyup/keydown keyIdentifier assignments are not W3C compliant
Status: NEW
Alias: None
Product: WebKit
Classification: Unclassified
Component: DOM (show other bugs)
Version: 528+ (Nightly build)
Hardware: PC Windows XP
: P2 Normal
Assignee: Nobody
URL:
Keywords: HasReduction, InRadar
Depends on:
Blocks:
 
Reported: 2008-07-13 07:12 PDT by Allan Jacobs
Modified: 2016-09-30 16:13 PDT (History)
6 users (show)

See Also:


Attachments
Keycode to keyIdentifier mapping for Windows keyboard layouts (419.86 KB, text/plain)
2008-07-13 07:15 PDT, Allan Jacobs
no flags Details
Testcase demonstrating keyIdentifier assignments (5.13 KB, text/html)
2008-07-13 09:28 PDT, Allan Jacobs
no flags Details
Keycode/charIdentifier to w3cIdentifier assignment on Ubuntu Linux 9.10 / Chrome 4.0.249.4. (7.76 MB, text/plain)
2009-11-22 12:11 PST, Allan Jacobs
no flags Details
SQL to assign w3c identifiers to keyboard events. (235.77 KB, text/x-sql)
2009-11-22 12:19 PST, Allan Jacobs
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Allan Jacobs 2008-07-13 07:12:53 PDT
Safari, on Windows XP, attached an attribute named keyIdentifier to Event objects.  The intent was clearly an attempt to implement part of the behavior of the textInput event type as described in the W3C DOM Level 3 Event specification.  The intent was good but the execution is not in compliance with the specification.

On all layouts, Safari is simply prepending "U+" to the value of the keyCode in hex.  This works for simple ASCII characters but is not correct for thousands of other keys.  The consequences for web development are severe for keys on the extreme right of the keyboard (so-called OEM keys) that are often associated with punctuation characters.  It also makes coding much less natural for languages other than English.

This bug report addresses the hard part of the problem:  what to do with keys that are used for letters and digits.  Control characters (like arrow keys or the Ctrl key) are not addressed.

The keyIdentifier should be a function of both layout and keycode.  The keyIdentifier should not simply be the keycode converted into hex.  The W3C specification says that the assignment should reflect the Unicode values of the characters produced by the key.

The algorithm used here:


1. Use a numeric value if it is available in normal, shift, or AltGr state.

2. If 1 does not apply and if the key has an upper/lower case assignment that is state dependent, use the Unicode for an upper case character.

3. If neither 1 nor 2 applies, use the Unicode for the normal state key assignment.

4. If 1, 2, and 3 do not apply, use the Unicode for the shift state key assignment.

5. If 1, 2, 3, and 4 do not apply, use the Unicode for the AltGr state key assignment.

Shift state is obtained by depressing the Shift key and then the key in question.  The less familiar AltGr state obtains when epressing the right Alt key and then the key in question.  The Shift+AltGr state obtains when the Shift and right Alt keys are depressed when the key in question is keyed down.  Normal state obtains when a key is depressed and neither the Shift nor the Alt keys are depressed.  There is one more state that obtains for the Hebrew layout that is not relevant to this bug.

The algorithm sketched above can be used to construct a new mapping for each of the Windows keyboard layouts.  The table does not fit in a Bugzilla description field.  The mapping starts out as follows:

mysql> select distinct layout,keycode,upper(w3cIdentifier) from keymap where os='Win' and browser='IE' order by layout,keycode;
+---------------------------------------+---------+----------------------+
| layout                                | keycode | upper(w3cIdentifier) |
+---------------------------------------+---------+----------------------+
| Albanian                              |      48 | U+0030               | 
| Albanian                              |      49 | U+0031               | 
| Albanian                              |      50 | U+0032               | 
| Albanian                              |      51 | U+0033               | 
| Albanian                              |      52 | U+0034               | 
| Albanian                              |      53 | U+0035               | 
| Albanian                              |      54 | U+0036               | 
| Albanian                              |      55 | U+0037               | 
| Albanian                              |      56 | U+0038               | 
| Albanian                              |      57 | U+0039               | 
| Albanian                              |      65 | U+0041               | 
| Albanian                              |      66 | U+0042               | 
| Albanian                              |      67 | U+0043               | 
| Albanian                              |      68 | U+0044               | 
| Albanian                              |      69 | U+0045               | 
| Albanian                              |      70 | U+0046               | 
| Albanian                              |      71 | U+0047               | 
| Albanian                              |      72 | U+0048               | 
| Albanian                              |      73 | U+0049               | 
| Albanian                              |      74 | U+004A               | 
| Albanian                              |      75 | U+004B               | 
| Albanian                              |      76 | U+004C               | 
| Albanian                              |      77 | U+004D               | 
| Albanian                              |      78 | U+004E               | 
| Albanian                              |      79 | U+004F               | 
| Albanian                              |      80 | U+0050               | 
| Albanian                              |      81 | U+0051               | 
| Albanian                              |      82 | U+0052               | 
| Albanian                              |      83 | U+0053               | 
| Albanian                              |      84 | U+0054               | 
| Albanian                              |      85 | U+0055               | 
| Albanian                              |      86 | U+0056               | 
| Albanian                              |      87 | U+0057               | 
| Albanian                              |      88 | U+0058               | 
| Albanian                              |      89 | U+0059               | 
| Albanian                              |      90 | U+005A               | 
| Albanian                              |     186 | U+00CB               | 
| Albanian                              |     187 | U+003D               | 
| Albanian                              |     188 | U+002C               | 
| Albanian                              |     189 | U+002D               | 
| Albanian                              |     190 | U+002E               | 
| Albanian                              |     191 | U+002F               | 
| Albanian                              |     192 | U+005C               | 
| Albanian                              |     219 | U+00C7               | 
| Albanian                              |     220 | U+005D               | 
| Albanian                              |     221 | U+0040               | 
| Albanian                              |     222 | U+005B               | 
| Albanian                              |     226 | U+003C               | 
| Arabic (101)                          |      48 | U+0030               | 
| Arabic (101)                          |      49 | U+0031               | 
| Arabic (101)                          |      50 | U+0032               | 
| Arabic (101)                          |      51 | U+0033               | 
| Arabic (101)                          |      52 | U+0034               | 
| Arabic (101)                          |      53 | U+0035               | 
| Arabic (101)                          |      54 | U+0036               | 
| Arabic (101)                          |      55 | U+0037               | 
| Arabic (101)                          |      56 | U+0038               | 
| Arabic (101)                          |      57 | U+0039               | 
| Arabic (101)                          |      65 | U+0634               | 
| Arabic (101)                          |      66 | U+0644               | 
| Arabic (101)                          |      67 | U+0624               | 
| Arabic (101)                          |      68 | U+064A               | 
| Arabic (101)                          |      69 | U+062B               | 
| Arabic (101)                          |      70 | U+0628               | 
| Arabic (101)                          |      71 | U+0644               | 
| Arabic (101)                          |      72 | U+0627               | 
| Arabic (101)                          |      73 | U+0647               | 
| Arabic (101)                          |      74 | U+062A               | 
| Arabic (101)                          |      75 | U+0646               | 
| Arabic (101)                          |      76 | U+0645               | 
| Arabic (101)                          |      77 | U+0649               | 
| Arabic (101)                          |      78 | U+0627               | 
| Arabic (101)                          |      79 | U+062E               | 
| Arabic (101)                          |      80 | U+062D               | 
| Arabic (101)                          |      81 | U+0636               | 
| Arabic (101)                          |      82 | U+0642               | 
| Arabic (101)                          |      83 | U+0633               | 
| Arabic (101)                          |      84 | U+0641               | 
| Arabic (101)                          |      85 | U+0639               | 
| Arabic (101)                          |      86 | U+0631               | 
| Arabic (101)                          |      87 | U+0635               | 
| Arabic (101)                          |      88 | U+0621               | 
| Arabic (101)                          |      89 | U+063A               | 
| Arabic (101)                          |      90 | U+0626               | 
| Arabic (101)                          |     186 | U+0643               | 
| Arabic (101)                          |     187 | U+003D               | 
| Arabic (101)                          |     188 | U+0629               | 
| Arabic (101)                          |     189 | U+002D               | 
| Arabic (101)                          |     190 | U+0648               | 
| Arabic (101)                          |     191 | U+0632               | 
| Arabic (101)                          |     192 | U+0630               | 
| Arabic (101)                          |     219 | U+062C               | 
| Arabic (101)                          |     220 | U+005C               | 
| Arabic (101)                          |     221 | U+062F               | 
| Arabic (101)                          |     222 | U+0637               |
Comment 1 Allan Jacobs 2008-07-13 07:15:10 PDT
Created attachment 22260 [details]
Keycode to keyIdentifier mapping for Windows keyboard layouts
Comment 2 Allan Jacobs 2008-07-13 09:28:02 PDT
Created attachment 22261 [details]
Testcase demonstrating keyIdentifier assignments

Testcase.

Add Arabic (101) layout.  Use the Windows Control panel.  Choose Regional and Language Options.  In the Regional and Language Options dialog, choose the Languages pane and click on Details.  Try adding Arabic/Arabic (101).

Once added, use the operating system to change the layout for the Safari browser window to Arabic (101).  Type in some characters into the text field.  The application will tabulate some of the keyCode (1st column) and keyIdentifier (fourth column) assignments.  The second column is the Unicode of the character that the key in this layout produces (retrieved by combining keydown with keypress information).

For instance, typing 'asdf' keys in sequence:

65	U+0634	ش		U+0041
83	U+0633	س		U+0053
68	U+064A	ي		U+0044
70	U+0628	ب		U+0046
Comment 3 Mark Rowe (bdash) 2008-07-14 12:49:15 PDT
<rdar://problem/6073947>
Comment 4 Alexey Proskuryakov 2008-07-15 14:08:35 PDT
Please note that the specification is still in draft, and thus is subject to change. Without a rationale available, it is hard to predict which direction the specification will take - e.g. some use cases call for a key identifier that is NOT dependent on the layout.
Comment 5 Allan Jacobs 2008-07-23 20:49:01 PDT
There were a few misassignments of keycodes made that have
consequences for keyIdentifier assignments.  These were discovered
by comparing the keyIdentifier assignments independently made for
Firefox and Opera.  With these corrections, it is my belief that
the probability of an error is now roughly 1 in 10000 for Firefox
and Internet Explorer.  The error rate for Opera is larger.  The
database detected conflicts in keyCode assignments for Opera that
cannot be patched.

The assignment  
Czech (QWERTY)  32      U+00b4
should not have been made at all.

The line reading
Czech (QWERTY)  187     U+02c7
should read
Czech (QWERTY)  187     U+00b4

The line reading
Czech (QWERTY)  220     U+0027
should read
Czech (QWERTY)  220     U+00a8

The line reading
Latin American  187     U+002a
should read
Latin American  187     U+002b

The line reading
Latin American  191     U+002b
should read
Latin American  191     U+007d
Comment 6 O. Andersen 2009-10-21 15:22:12 PDT
(In reply to comment #4)
> some use cases call for a key identifier
> that is NOT dependent on the layout.

The Dashboard Widget ‘Tastiera’ <http://coq.no/widget/tastiera/en> is an example of such a use case. If the standardisation effort ends up with only layout-dependent identifiers, adding a physical identifier (ADB codes or a more logical key numbering) as well might be a good idea.

An additional problem for non-US keyboard layouts is that dead keys cannot be detected: no onkeypress event is generated, and onkeyup/down both give keyIdentifier = Unidentified, which = 0, keyCode = 0. This was reported to Apple as bug No. 6600446.

Has there been any progress on this topic lately?  Should I open a new bug for the problem with dead keys?
Comment 7 Alexey Proskuryakov 2009-10-21 16:13:45 PDT
> Should I open a new bug for the problem with dead keys?

Since the existing Radar bug is only visible to Apple employees, I'd say yes.
Comment 8 O. Andersen 2009-10-21 16:59:15 PDT
> > Should I open a new bug for the problem with dead keys?
> Since the existing Radar bug is only visible to Apple employees, I'd say yes.

Filed as bug No. 30652.
Comment 9 Allan Jacobs 2009-11-22 12:11:14 PST
Created attachment 43686 [details]
Keycode/charIdentifier to w3cIdentifier assignment on Ubuntu Linux 9.10 / Chrome 4.0.249.4.

Keycode assignments on Linux are buggy, so keycode and charIdentifier are both included in columns of this file.  The SQL used to make the assignments will be attached shortly.  The AltGr and Shift+AltGr states for the 105th key on my 105-key keyboard (lower leftmost character key) was often not properly active -- these were culled out with the w3cIdentifer not null clause (top of the attachment).
Comment 10 Allan Jacobs 2009-11-22 12:19:18 PST
Created attachment 43687 [details]
SQL to assign w3c identifiers to keyboard events.

mysql> describe layout;
+--------------+-------------+------+-----+---------+-------+
| Field        | Type        | Null | Key | Default | Extra |
+--------------+-------------+------+-----+---------+-------+
| id           | int(5)      | NO   | PRI | NULL    |       | 
| name         | varchar(66) | YES  |     | NULL    |       | 
| display_name | varchar(66) | YES  |     | NULL    |       | 
+--------------+-------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

mysql> describe keymap;
+----------------+-------------+------+-----+---------+-------+
| Field          | Type        | Null | Key | Default | Extra |
+----------------+-------------+------+-----+---------+-------+
| id             | int(5)      | NO   | PRI | NULL    |       | 
| os             | varchar(10) | YES  |     | NULL    |       | 
| browser        | varchar(10) | YES  |     | NULL    |       | 
| layout_id      | int(5)      | YES  | MUL | NULL    |       | 
| state          | varchar(20) | YES  |     | NULL    |       | 
| keycode        | int(6)      | YES  |     | NULL    |       | 
| charIdentifier | varchar(15) | YES  |     | NULL    |       | 
| w3cIdentifier  | varchar(15) | YES  |     | NULL    |       | 
| deadKey        | int(1)      | YES  |     | NULL    |       | 
| layout         | varchar(66) | YES  |     | NULL    |       | 
| comment        | varchar(66) | YES  |     | NULL    |       | 
+----------------+-------------+------+-----+---------+-------+
11 rows in set (0.00 sec)

Is there any interest in getting a copy of the database?
Comment 11 Allan Jacobs 2010-09-09 16:45:10 PDT
The W3C specification has changed.  Identifying the key independent of state
(that is, of a modifier) is no longer one of it's ambitions.  This makes this bug irrelevant.

Bug 20027 should be closed.

"keyidentifier" refers to content at 
http://www.w3.org/TR/2007/WD-DOM-Level-3-Events-20071221/events.html#Events-KeyboardEvent
and at
http://www.w3.org/TR/2007/WD-DOM-Level-3-Events-20071221/keyset.html .

Refer to http://www.w3.org/TR/DOM-Level-3-Events/#keys-Guide , in a
version of the W3C specification (dated Sept 7, 2010 -- two days ago).  In the
new draft, keyup, keydown, and keypress implement the KeyboardEvent interface
which mandates the presence of attributes 'char', 'key', and 'keyCode'.

Keycode is legacy.  User codes will see a wild variation when changing browsers,
operating systems, keyboard locales, and even shift states.

For most visible characters, 'char' and 'key' will be assigned the same
Unicode character value.  The value depends on modifier state.

'char' and 'key' differ for some punctuation characters.  For instance, hitting
the spacebar causes an event with 'char' set to a space (\u0020) and 'key' set
to the string 'Spacebar'.

Characters with no character representation have 'char' set to null and 'key'
assigned a meaningful value.  For instance, an Up Arrow key has 'char'=null and
'key'='Up'.
Comment 12 Eric Seidel (no email) 2012-10-27 01:30:07 PDT
Is this still an issue?
Comment 13 mikolaj.konarski 2015-10-03 09:52:34 PDT
> Is this still an issue?

Unfortunately, yes. Handling normal and control keys in the same code is a nightmare, even not taking into account browser quirks. This

http://webkitgtk.org/reference/webkitdomgtk/stable/WebKitDOMKeyboardEvent.html

is years behind that

https://developer.mozilla.org/en-US/docs/Web/API/KeyboardEvent

In particular, we are using the non-standard and deprecated 

https://developer.mozilla.org/en-US/docs/Web/API/KeyboardEvent/keyIdentifier

which absolutely doesn't agree with

https://developer.mozilla.org/en-US/docs/Web/API/KeyboardEvent/key

and I can't see a way of getting the functionality of the latter (apart of coding it from scratch using the functions that we have, hacking around browser quirks, if even possible).
Comment 14 mikolaj.konarski 2016-09-30 16:13:42 PDT
Additionally, Chrome will soon drop keyIdentifier
https://www.chromestatus.com/features/5316065118650368
so JS that uses keyIdentifier (because of webkit) will no longer
work on Chrome, so it would be incredibly useful if webkit
implemented the current standard
https://w3c.github.io/uievents/#events-keyboardevents