Bug 23499

Summary: Add WML specific logic for WML <input>
Product: WebKit Reporter: yichao.yin <yichao.yin>
Component: XMLAssignee: Nobody <webkit-unassigned>
Status: RESOLVED FIXED    
Severity: Normal CC: staikos, yichao.yin, zimmermann
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: PC   
OS: Linux   
Bug Depends on:    
Bug Blocks: 20393    
Attachments:
Description Flags
initial patch
zimmermann: review-
updated patch zimmermann: review+

Description yichao.yin 2009-01-23 04:02:22 PST
The upcoming patch implements the WML specific attributes of WMLInputElement, such as format, emptyok.
Comment 1 yichao.yin 2009-01-23 04:04:17 PST
Created attachment 26967 [details]
initial patch
Comment 2 Nikolas Zimmermann 2009-01-23 08:33:01 PST
Comment on attachment 26967 [details]
initial patch

Hey Yichao, nice work - especially the new testcase :-)
Though I think it will need to handle a lot more 'corner cases' especially in the input mask validation. I'll show my issues using the code below:

I don't have time for a full review, only talking about validateInputMask() now:

> +String WMLInputElement::validateInputMask(const String& inputMask)
> +{
> +    bool isValid = true;
> +    bool hasWildcard = false;
> +    unsigned escapeCharCnts = 0;
> +    unsigned maskLen = inputMask.length();
> +    UChar formatCode;
Naming, why not use "maskLength" and "escapeCharCount" :-)

> + 
> +    for (unsigned i = 0; i < maskLen; ++i) {
> +        formatCode = inputMask[i];
> +        if (s_formatCodes.find(formatCode) == -1) {
> +            if (formatCode == '*' || (WTF::isASCIIDigit(formatCode) && formatCode != '0')) {
> +                // validate codes which ends with '*f' or 'nf'
> +                formatCode = inputMask[++i];
What happens if we're at i == maskLen - 1, then inputMask[maskLen] would crash, no?

> +                if ((i + 1 != maskLen) || s_formatCodes.find(formatCode) == -1) {
> +                    isValid = false;
> +                    break;
> +                }

Okay this takes care of only allowing the "*f' or "nf" to appear at the end of the string, fine.
The WML spec says these format codes are only allowed to appear _once_ though.
So maybe you could add the following, before the last 'if' above:
if (hasWildcard) { isValid = false; break; }

> +                hasWildcard = true;
> +            } else if (formatCode == '\\') {
> +                //skip over the next mask character
> +                i++;
> +                escapeCharCnts++;
> +            } else {
> +                isValid = false;
> +                break;
> +            }
> +        }
> +    }
> + 
> +    // calculate the number of characters allowed to be entered by input mask
> +    if (isValid) {

Use early exit here: if (!isValid) return String();

> +        m_numOfCharsAllowedByMask = maskLen;
> + 
> +        if (escapeCharCnts)
> +            m_numOfCharsAllowedByMask -= escapeCharCnts;
> + 
> +        if (hasWildcard) {
> +            formatCode = inputMask[maskLen - 2];
> +            if (formatCode == '*')
> +                m_numOfCharsAllowedByMask = m_data.maxLength();
> +            else {
> +                unsigned leftLen = String(&formatCode).toInt();
> +                m_numOfCharsAllowedByMask = leftLen + m_numOfCharsAllowedByMask - 2;
> +            }
> +        }
> +
> +        return inputMask;
> +    }
> + 
> +    return String();
> +}

Okay, this method is mostly correct, but all of the corner cases need to be tested in your wml/input-formal.html test:
- 'nf' / '*f' statements in the beginning/middle of the input mask (should fail)
- 'nf' statements, with n>9, should fail
- invalid mask characters 'YyKkQq', should be tested in all variations (combined with valid mask chars, as well)
- 'escape characters' should be tested

I hope that's it for now :-)
I'll continue the review tonight.
Comment 3 Nikolas Zimmermann 2009-01-23 09:02:38 PST
Comment on attachment 26967 [details]
initial patch

Review, part 2:

> @@ -215,6 +235,9 @@ void WMLInputElement::attach()

> +    // Here we initialize the WML <input> element
> +    init();
I'm not sure if it's desired to call init() from attach().
How about calling it from 'finishedParsingChildren()' ?
The drawback using attach() would be that I could force the renderer to be attached, detach and re-attached.
This way we'd store the variable state twice, that's not what we want I fear.

> +        else if (renderer() && !isConformedToInputMask(textEvent->data()[0], static_cast<RenderTextControl*>(renderer())->text().length() + 1))
> +            // If the inputed char doesn't conform to the input mask, stop handling 
> +            return;
> +    }
Why length() + 1 here?
  
> +void WMLInputElement::init()
> +{
> +    String nameVar = name();
> +    String varValue;
> +    WMLPageState* pageSate = wmlPageStateForDocument(document()); 
> +    ASSERT(pageSate);
> +    if (!nameVar.isEmpty())
> +        varValue = pageSate->getVariable(nameVar);
> +
> +    if (varValue.isEmpty() || !isConformedToInputMask(varValue)) {
> +        String val = value();
> +        if (isConformedToInputMask(val))
> +            varValue = val;
> +        else
> +            varValue = "";
> + 
> +        pageSate->storeVariable(nameVar, varValue);
> +    }
> +    setValue(varValue);
> + 
> +    if (getAttribute(WMLNames::emptyokAttr).isEmpty()) {
You can use if (!hasAttribute(WMLNames::emptyokAttr)) here.

> +        if (m_formatMask.isEmpty() || 
> +            // check if the format codes is a "*f" code
> +           (m_formatMask.length() == 2 && m_formatMask[0] == '*' && s_formatCodes.find(m_formatMask[1]) != -1))
> +            m_isEmptyOk = true;
> +    }
This only takes care of the 'f' format code, what about the others?

> +bool WMLInputElement::isConformedToInputMask(const String& inputChars)
> +{
> +    bool ok = true;
> +    for (unsigned i = 0; i < inputChars.length(); ++i) {
> +        ok = isConformedToInputMask(inputChars[i], i + 1, false);
> +        if (!ok)
> +            break;
> +    }
You can rewrite as follows, to save a line:
if (ok = isConformedToInputMask(inputChars[i], i + 1, false))
    continue;

> +bool WMLInputElement::isConformedToInputMask(UChar inChar, unsigned inputCharCounts, bool isInputedByUsr)
s/isInputedByUsr/isUserInput/
s/inputCharCounts/inputCharCount/

> +{
> +    bool ok = true;
> +    if (m_formatMask.isEmpty())
> +        return ok;
Just 'return true' here, and save the 'ok' variable declaration. Move it below 'UChar mask = ..' where it's atually needed.

> +    unsigned idxOfMask = 0;
s/idxOfMask/maskIndex/

> +    if (isInputedByUsr) {
> +        int cursorPos = 0;
> +        if (renderer())
> +            cursorPos = static_cast<RenderTextControl*>(renderer())->selectionStart(); 
> +        else
> +            cursorPos = m_data.cachedSelectionStart();
> +
> +        idxOfMask = cursorPosToIdxOfMask(cursorPos);
> +    } else
> +        idxOfMask = cursorPosToIdxOfMask(inputCharCounts - 1);
s/cursorPosToIdxOfMask/cursorPositionToMaskIndex/

> +
> +    UChar mask = m_formatMask[idxOfMask];
You should be able to use switch (mask) here - should read more cleanly:

> +    // match the inputed character with input mask
> +    if (mask == 'A')
> +        ok = !WTF::isASCIIDigit(inChar) && !WTF::isASCIILower(inChar) && WTF::isASCIIPrintable(inChar);
> +    else if (mask == 'a')
> +        ok = !WTF::isASCIIDigit(inChar) && !WTF::isASCIIUpper(inChar) && WTF::isASCIIPrintable(inChar);
> +    else if (mask == 'N')
> +        ok = WTF::isASCIIDigit(inChar);
> +    else if (mask == 'n')
> +        ok = !WTF::isASCIIAlpha(inChar) && WTF::isASCIIPrintable(inChar);
> +    else if (mask == 'X')
> +        ok = !WTF::isASCIILower(inChar) && WTF::isASCIIPrintable(inChar);
> +    else if (mask == 'x')
> +        ok = !WTF::isASCIIUpper(inChar) && WTF::isASCIIPrintable(inChar);
> +    else if (mask == 'M')
> +        ok = WTF::isASCIIPrintable(inChar);
> +    else if (mask == 'm')
> +        ok = WTF::isASCIIPrintable(inChar);
> +    else
> +        ok = (mask == inChar);
> +
> +    return ok;
> +}
> +


> +unsigned WMLInputElement::cursorPosToIdxOfMask(unsigned cursorPos)
> +{
> +    UChar mask;
> +    int idx = -1;
> +    do {
> +        mask = m_formatMask[++idx];
> +        if (mask == '\\')
> +            idx++;
> +        else if (mask == '*' || (WTF::isASCIIDigit(mask) && mask != '0')) {
> +            idx = m_formatMask.length() - 1;
> +            break;
> +        }
> +    } while (cursorPos--);
> + 
> +    return idx;
> +}
s/idx++/++idx/
s/idx/index/

> +
>  }
>  
>  #endif
> Index: WebCore/wml/WMLInputElement.h
> ===================================================================
> --- WebCore/wml/WMLInputElement.h	(revision 40084)
> +++ WebCore/wml/WMLInputElement.h	(working copy)
> @@ -89,10 +89,21 @@ public:
>      virtual void willMoveToNewOwnerDocument();
>      virtual void didMoveToNewOwnerDocument();
>  
> +    bool isConformedToInputMask(const String&);
> +    bool isConformedToInputMask(UChar, unsigned, bool isInputedByUsr = true);
> +
> +    static const String s_formatCodes;
I'm not sure wheter it's wise to declare a static const String.
What you could do is add a private function: "static const AtomicString& formatCodes();"

const AtomicString& WMLInputElement::formatCodes()
{
    DEFINE_STATIC_LOCAL(AtomicString, codes, ("AaNnXxMm"));
    return codes;
}

Okay, really need to leave now :-)
Have fun!
Comment 4 Nikolas Zimmermann 2009-01-23 15:31:35 PST
Fix bug dependencies, WML master bug 20393 should depend on this, not the other way round.
Comment 5 yichao.yin 2009-01-23 18:54:54 PST
(In reply to comment #2)

> I don't have time for a full review, only talking about validateInputMask()
> now:
> > +String WMLInputElement::validateInputMask(const String& inputMask)
> > +{
> > +    bool isValid = true;
> > +    bool hasWildcard = false;
> > +    unsigned escapeCharCnts = 0;
> > +    unsigned maskLen = inputMask.length();
> > +    UChar formatCode;
> Naming, why not use "maskLength" and "escapeCharCount" :-)

Just for saving several bytes :). Okay, I will name it more readable

> > + 
> > +    for (unsigned i = 0; i < maskLen; ++i) {
> > +        formatCode = inputMask[i];
> > +        if (s_formatCodes.find(formatCode) == -1) {
> > +            if (formatCode == '*' || (WTF::isASCIIDigit(formatCode) && formatCode != '0')) {
> > +                // validate codes which ends with '*f' or 'nf'
> > +                formatCode = inputMask[++i];
> > +                if ((i + 1 != maskLen) || s_formatCodes.find(formatCode) == -1) {
> > +                    isValid = false;
> > +                    break;
> > +                }
> Okay this takes care of only allowing the "*f' or "nf" to appear at the end of
> the string, fine.
> The WML spec says these format codes are only allowed to appear _once_ though.
> So maybe you could add the following, before the last 'if' above:
> if (hasWildcard) { isValid = false; break; }

No need, "if (i + 1 != MaskLen)" is used to check this since we search the "*" from left to right. If there are more 
than one "*f" or "nf", the if statement will be true and break the for loop with isValid = false.


> > +                hasWildcard = true;
> > +            } else if (formatCode == '\\') {
> > +                //skip over the next mask character
> > +                i++;
> > +                escapeCharCnts++;
> > +            } else {
> > +                isValid = false;
> > +                break;
> > +            }
> > +        }
> > +    }
> > + 
> > +    // calculate the number of characters allowed to be entered by input mask
> > +    if (isValid) {
> Use early exit here: if (!isValid) return String();

Yes, that's better, will fix it.

> > +        m_numOfCharsAllowedByMask = maskLen;
> > + 
> > +        if (escapeCharCnts)
> > +            m_numOfCharsAllowedByMask -= escapeCharCnts;
> > + 
> > +        if (hasWildcard) {
> > +            formatCode = inputMask[maskLen - 2];
> > +            if (formatCode == '*')
> > +                m_numOfCharsAllowedByMask = m_data.maxLength();
> > +            else {
> > +                unsigned leftLen = String(&formatCode).toInt();
> > +                m_numOfCharsAllowedByMask = leftLen + m_numOfCharsAllowedByMask - 2;
> > +            }
> > +        }
> > +
> > +        return inputMask;
> > +    }
> > + 
> > +    return String();
> > +}
> Okay, this method is mostly correct, but all of the corner cases need to be
> tested in your wml/input-formal.html test:
> - 'nf' / '*f' statements in the beginning/middle of the input mask (should
> fail)
> - 'nf' statements, with n>9, should fail
> - invalid mask characters 'YyKkQq', should be tested in all variations
> (combined with valid mask chars, as well)
> - 'escape characters' should be tested

You are right, I should add more test cases for it.

Thanks a lot for your good comments. I will update the patch following them ASAP.

Comment 6 yichao.yin 2009-01-23 19:34:59 PST
(In reply to comment #3)
> (From update of attachment 26967 [details] [review])
> Review, part 2:
> > @@ -215,6 +235,9 @@ void WMLInputElement::attach()
> > +    // Here we initialize the WML <input> element
> > +    init();
> I'm not sure if it's desired to call init() from attach().
> How about calling it from 'finishedParsingChildren()' ?
> The drawback using attach() would be that I could force the renderer to be
> attached, detach and re-attached.
> This way we'd store the variable state twice, that's not what we want I fear.

You are right, finishParsingChildren() is a better place to do this. 


> > +        else if (renderer() && !isConformedToInputMask(textEvent->data()[0], static_cast<RenderTextControl*>(renderer())->text().length() + 1))
> > +            // If the inputed char doesn't conform to the input mask, stop handling 
> > +            return;
> > +    }
> Why length() + 1 here?

the second parameter of isConformedToInputMask(UChar inChar, unsigned inputCharCount, bool isInputedByUsr) is used to specify the character number user has inputed, including current one, which has not been stored in RenderTextControl::text().


> > + 
> > +    if (getAttribute(WMLNames::emptyokAttr).isEmpty()) {
> You can use if (!hasAttribute(WMLNames::emptyokAttr)) here.

Good, better solution. will fix.


> > +        if (m_formatMask.isEmpty() || 
> > +            // check if the format codes is a "*f" code
> > +           (m_formatMask.length() == 2 && m_formatMask[0] == '*' && s_formatCodes.find(m_formatMask[1]) != -1))
> > +            m_isEmptyOk = true;
> > +    }
> This only takes care of the 'f' format code, what about the others?

That piece of code is to check the input mask with value of "*f" only.  so we need to check the length and whole string.

> > +bool WMLInputElement::isConformedToInputMask(const String& inputChars)
> > +{
> > +    bool ok = true;
> > +    for (unsigned i = 0; i < inputChars.length(); ++i) {
> > +        ok = isConformedToInputMask(inputChars[i], i + 1, false);
> > +        if (!ok)
> > +            break;
> > +    }
> You can rewrite as follows, to save a line:
> if (ok = isConformedToInputMask(inputChars[i], i + 1, false))
>     continue;

Exactly, Thanks. :)

> > +bool WMLInputElement::isConformedToInputMask(UChar inChar, unsigned inputCharCounts, bool isInputedByUsr)
> s/isInputedByUsr/isUserInput/
> s/inputCharCounts/inputCharCount/
> > +{
> > +    bool ok = true;
> > +    if (m_formatMask.isEmpty())
> > +        return ok;
> Just 'return true' here, and save the 'ok' variable declaration. Move it below
> 'UChar mask = ..' where it's atually needed.

Good, will fix.

> > +    unsigned idxOfMask = 0;
> s/idxOfMask/maskIndex/
> > +    if (isInputedByUsr) {
> > +        int cursorPos = 0;
> > +        if (renderer())
> > +            cursorPos = static_cast<RenderTextControl*>(renderer())->selectionStart(); 
> > +        else
> > +            cursorPos = m_data.cachedSelectionStart();
> > +
> > +        idxOfMask = cursorPosToIdxOfMask(cursorPos);
> > +    } else
> > +        idxOfMask = cursorPosToIdxOfMask(inputCharCounts - 1);
> s/cursorPosToIdxOfMask/cursorPositionToMaskIndex/
> > +
> > +    UChar mask = m_formatMask[idxOfMask];
> You should be able to use switch (mask) here - should read more cleanly:

Sounds better. Agreed.


> > +    // match the inputed character with input mask
> > +    if (mask == 'A')
> > +        ok = !WTF::isASCIIDigit(inChar) && !WTF::isASCIILower(inChar) && WTF::isASCIIPrintable(inChar);
> > +    else if (mask == 'a')
> > +        ok = !WTF::isASCIIDigit(inChar) && !WTF::isASCIIUpper(inChar) && WTF::isASCIIPrintable(inChar);
> > +    else if (mask == 'N')
> > +        ok = WTF::isASCIIDigit(inChar);
> > +    else if (mask == 'n')
> > +        ok = !WTF::isASCIIAlpha(inChar) && WTF::isASCIIPrintable(inChar);
> > +    else if (mask == 'X')
> > +        ok = !WTF::isASCIILower(inChar) && WTF::isASCIIPrintable(inChar);
> > +    else if (mask == 'x')
> > +        ok = !WTF::isASCIIUpper(inChar) && WTF::isASCIIPrintable(inChar);
> > +    else if (mask == 'M')
> > +        ok = WTF::isASCIIPrintable(inChar);
> > +    else if (mask == 'm')
> > +        ok = WTF::isASCIIPrintable(inChar);
> > +    else
> > +        ok = (mask == inChar);
> > +
> > +    return ok;
> > +}
> > +
> > +unsigned WMLInputElement::cursorPosToIdxOfMask(unsigned cursorPos)
> > +{
> > +    UChar mask;
> > +    int idx = -1;
> > +    do {
> > +        mask = m_formatMask[++idx];
> > +        if (mask == '\\')
> > +            idx++;
> > +        else if (mask == '*' || (WTF::isASCIIDigit(mask) && mask != '0')) {
> > +            idx = m_formatMask.length() - 1;
> > +            break;
> > +        }
> > +    } while (cursorPos--);
> > + 
> > +    return idx;
> > +}
> s/idx++/++idx/
> s/idx/index/

Will fix.

> >  }
> >  
> >  #endif
> > Index: WebCore/wml/WMLInputElement.h
> > ===================================================================
> > --- WebCore/wml/WMLInputElement.h	(revision 40084)
> > +++ WebCore/wml/WMLInputElement.h	(working copy)
> > @@ -89,10 +89,21 @@ public:
> >      virtual void willMoveToNewOwnerDocument();
> >      virtual void didMoveToNewOwnerDocument();
> >  
> > +    bool isConformedToInputMask(const String&);
> > +    bool isConformedToInputMask(UChar, unsigned, bool isInputedByUsr = true);
> > +
> > +    static const String s_formatCodes;
> I'm not sure wheter it's wise to declare a static const String.
> What you could do is add a private function: "static const AtomicString&
> formatCodes();"
> const AtomicString& WMLInputElement::formatCodes()
> {
>     DEFINE_STATIC_LOCAL(AtomicString, codes, ("AaNnXxMm"));
>     return codes;
> }


Good idea! It should be more like Webkit-style code.

Thanks a lot to let me learn more.
Comment 7 Nikolas Zimmermann 2009-01-23 20:11:45 PST
(In reply to comment #6)
> > > +        if (m_formatMask.isEmpty() || 
> > > +            // check if the format codes is a "*f" code
> > > +           (m_formatMask.length() == 2 && m_formatMask[0] == '*' && s_formatCodes.find(m_formatMask[1]) != -1))
> > > +            m_isEmptyOk = true;
> > > +    }
> > This only takes care of the 'f' format code, what about the others?
> 
> That piece of code is to check the input mask with value of "*f" only.  so we
> need to check the length and whole string.
Yeah, it's just fine - but emptyok has to be handled as well for *A, *A, etc..
So I think we need to iterate over all possible combinations here, no?
 
> Good idea! It should be more like Webkit-style code.
> 
> Thanks a lot to let me learn more.

You're welcome! 
Comment 8 yichao.yin 2009-01-24 01:17:49 PST
Created attachment 26994 [details]
updated patch

updated patch which fixes some issues Niko points out.


Addtional explain for Niko's comments

> > +    static const String s_formatCodes;
> I'm not sure wheter it's wise to declare a static const String.
> What you could do is add a private function: "static const AtomicString&
> formatCodes();"
> const AtomicString& WMLInputElement::formatCodes()
> {
>     DEFINE_STATIC_LOCAL(AtomicString, codes, ("AaNnXxMm"));
>     return codes;
> }

Since formatCodes() is private and independent, I think it's better add the function "static const AtomicString& formatCodes() const" in the source file instead of defining it as a member of HTMLInputElement.



> > @@ -215,6 +235,9 @@ void WMLInputElement::attach()
> > +    // Here we initialize the WML <input> element
> > +    init();
> I'm not sure if it's desired to call init() from attach().
> How about calling it from 'finishedParsingChildren()' ?
> The drawback using attach() would be that I could force the renderer to be
> attached, detach and re-attached.
> This way we'd store the variable state twice, that's not what we want I fear.

From functionality, placing it in finishedParsingChildren() is ok under most situations. but the run-webkit-tests script can't get the expected rendering result similar with that QtLauncher gets. As a result some cases will fail. But if I place it in attach(), both of them works well. It seems the layout test frameworks for WML needs calling init() multiple times to make sure the rendering tree is right. exactly, ensure the vaiable value is correct. Anyway I still keep it in attach() to make everthing is ok for now.
Comment 9 yichao.yin 2009-01-24 01:28:47 PST
(In reply to comment #7)
> (In reply to comment #6)
> > > > +        if (m_formatMask.isEmpty() || 
> > > > +            // check if the format codes is a "*f" code
> > > > +           (m_formatMask.length() == 2 && m_formatMask[0] == '*' && s_formatCodes.find(m_formatMask[1]) != -1))
> > > > +            m_isEmptyOk = true;
> > > > +    }
> > > This only takes care of the 'f' format code, what about the others?
> > 
> > That piece of code is to check the input mask with value of "*f" only.  so we
> > need to check the length and whole string.
> Yeah, it's just fine - but emptyok has to be handled as well for *A, *A, etc..
> So I think we need to iterate over all possible combinations here, no?

Seems my comment in the code is not exactly, Here 'f' stands for any one code of the format codes "AaNnXxMm". that is, it already contains all possible comination that you meant. :)
Comment 10 Nikolas Zimmermann 2009-02-02 08:06:51 PST
(In reply to comment #9)
> > Yeah, it's just fine - but emptyok has to be handled as well for *A, *A, etc..
> > So I think we need to iterate over all possible combinations here, no?
> 
> Seems my comment in the code is not exactly, Here 'f' stands for any one code
> of the format codes "AaNnXxMm". that is, it already contains all possible
> comination that you meant. :)
Yes, I misread, it's fine :-)

Comment 11 Nikolas Zimmermann 2009-02-02 08:07:46 PST
Comment on attachment 26994 [details]
updated patch

Thanks for CC'in me George, I forgot to r+ last week already!
Comment 12 George Staikos 2009-02-02 11:29:43 PST
Committed in r40482.