Bug 127127

Summary: JSC: Adding UnlinkedCodeBlock::opDebugBytecodeOffsetForLineAndColumn().
Product: WebKit Reporter: Mark Lam <mark.lam>
Component: JavaScriptCoreAssignee: Mark Lam <mark.lam>
Status: RESOLVED FIXED    
Severity: Normal CC: eflews.bot, fpizlo, ggaren, gtk-ews, gyuyoung.kim, ltilve+ews, mhahnenberg, msaboff, oliver, xan.lopez
Priority: P2    
Version: 528+ (Nightly build)   
Hardware: Unspecified   
OS: Unspecified   
Bug Depends on:    
Bug Blocks: 122836    
Attachments:
Description Flags
patch for a white box test.
none
the patch.
ggaren: review+, eflews.bot: commit-queue-
patch for landing.
ggaren: review+
patch 3 for landing using std::lower_bound() instead none

Description Mark Lam 2014-01-16 11:53:17 PST
In order to implement bytecode level breakpoints, we need a mechanism for computing the best fit op_debug bytecode offset for any valid given line and column value in the source.  The “best fit” in this case is the nearest preceding op_debug bytecode relative to the position specified by the line and column.
Comment 1 Mark Lam 2014-01-16 12:07:18 PST
Created attachment 221397 [details]
patch for a white box test.

It's difficult to test UnlinkedCodeBlock::opDebugBytecodeOffsetForLineAndColumn() in a generic way without a full blown bytecode level breakpoint system available.  However, we don't want to wait till we have the full system before validating that this line and column to op_debug bytecode mapping does work properly.  So, this patch hacks up the existing Debugger.cpp's pauseIfNeeded() to always call UnlinkedCodeBlock::opDebugBytecodeOffsetForLineAndColumn() with its line and column args, and perform sanity check assertions on the returned bytecodeOffset, actualLine, and actualColumn.  Since the best fit op_debug is the nearest preceding one, the following assertions should be true:

1. The returned bytecodeOffset should point to an op_debug bytecode.
2. The returned bytecodeOffset should be less or equal to the current executing bytecode offset (corresponding to the current line and column).
3. The returned line should be less than the current line, or the returned line should equal the current line and the returned column should be less ot equal to the current column.

The test in this patch asserts these 3 conditions.  The test also bypasses the optimization that allows debug hook callbacks to not be called when no breakpoints are set.  With this bypass, we'll be calling the debug hook callbacks and exercising the test on every op_debug bytecode we encounter.

To exercise the test, I ran the layout tests' inspector-protocol tests which exercises the debugger in various ways and on various types of JS source.  I also stepped through the code in UnlinkedCodeBlock::opDebugBytecodeOffsetForLineAndColumn() a few times to get a sense of how the code actually behaves and confirm that it matches my intuition and expectation of how it should behave.

While this test is not conclusive in demonstrating correctness, it does provide a lot of coverage, and did uncover several bugs in the implementation that was fixed.

I'm uploading the test here for future reference.
Comment 2 Mark Lam 2014-01-16 12:31:55 PST
Created attachment 221402 [details]
the patch.
Comment 3 EFL EWS Bot 2014-01-16 13:20:03 PST
Comment on attachment 221402 [details]
the patch.

Attachment 221402 [details] did not pass efl-ews (efl):
Output: http://webkit-queues.appspot.com/results/6388873441574912
Comment 4 EFL EWS Bot 2014-01-16 14:10:25 PST
Comment on attachment 221402 [details]
the patch.

Attachment 221402 [details] did not pass efl-wk2-ews (efl-wk2):
Output: http://webkit-queues.appspot.com/results/6648600851382272
Comment 5 Geoffrey Garen 2014-01-16 16:55:26 PST
Comment on attachment 221402 [details]
the patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=221402&action=review

r=me

Please fix the build, and a few issues below.

> Source/JavaScriptCore/bytecode/CodeBlock.cpp:2578
> +    if (bytecodeOffset != UINT_MAX) {

Please use WTF::notFound.

> Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:260
> +inline void UnlinkedCodeBlock::decodeExpressionRangeLineAndColumn(ExpressionRangeInfo& info,

Let's call this "getLineAndColumn".

> Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:283
> +// dumpExpressionRangeInfo() and dumpLineColumnInfoList() are utility functions used for
> +// debugging only. They are not normally called, but we include them in the debug build
> +// by default to avoid bit-rot.

I don't think this comment adds anything.

> Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:371
> +        return UINT_MAX;

Please use WTF::notFound.

> Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:-307
> +    // Find the best fit op_debug opcode for the requested line and column:
> +    // The best fit op_debug will be the one that is preceding but is nearest to
> +    // the specified line and column.
> +    LineColumnInfo::LineColumnPair target(line, column);
> +    LineColumnInfoList& infoList = opDebugLineColumnInfoList();
> +    int low = 0;
> +    int high = infoList.size();
> +    while (high - low > 1) {
> +        int current = low + (high - low) / 2;
> +        LineColumnInfo& currentInfo = infoList[current];
> +        if (currentInfo < target)
> +            low = current;
> +        else
> +            high = current;
>      }
> -    } // switch

This looks like a binary chop algorithm. Did you consider WTF::binarySearch or std::binary_search?

> Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:396
> +    // Find the earliest op_debug opcode that has this same best fit line and column:

It would be better to comment here *why* you're looking for the earliest item. You explained that in your ChageLog, but not in your code.

> Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:431
> +    m_rareData->m_opDebugLineColumnInfoList = std::unique_ptr<LineColumnInfoList>(new LineColumnInfoList());

Please use make_unique.

> Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.h:559
>          Vector<ExpressionRangeInfo::FatPosition> m_expressionInfoFatPositions;
> +        std::unique_ptr<LineColumnInfoList> m_opDebugLineColumnInfoList;

Why is m_opDebugLineColumnInfoList a pointer to vector, while m_expressionInfoFatPositions is an inline vector? Should they both be pointers to vectors?
Comment 6 kov's GTK+ EWS bot 2014-01-16 17:45:05 PST
Comment on attachment 221402 [details]
the patch.

Attachment 221402 [details] did not pass gtk-ews (gtk):
Output: http://webkit-queues.appspot.com/results/5521117175349248
Comment 7 Mark Lam 2014-01-17 12:06:26 PST
Thanks for the review.  My answers below.

(In reply to comment #5)
> > Source/JavaScriptCore/bytecode/CodeBlock.cpp:2578
> > +    if (bytecodeOffset != UINT_MAX) {
> 
> Please use WTF::notFound.

WTF::noFound is of type size_t whereas the returned value is of type unsigned.  A comparison between the 2 would always yield false (not what we want).  So, I used static_cast<unsigned>(WTF::notFound) instead.

> > Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:260
> > +inline void UnlinkedCodeBlock::decodeExpressionRangeLineAndColumn(ExpressionRangeInfo& info,
> 
> Let's call this "getLineAndColumn".

Fixed.

> > Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:283
> > +// dumpExpressionRangeInfo() and dumpLineColumnInfoList() are utility functions used for
> > +// debugging only. They are not normally called, but we include them in the debug build
> > +// by default to avoid bit-rot.
> 
> I don't think this comment adds anything.

Removed.

> > Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:371
> > +        return UINT_MAX;
> 
> Please use WTF::notFound.

Fixed as above.
 
> > Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:-307
> > +    // Find the best fit op_debug opcode for the requested line and column:
> > +    // The best fit op_debug will be the one that is preceding but is nearest to
> > +    // the specified line and column.
> > +    LineColumnInfo::LineColumnPair target(line, column);
> > +    LineColumnInfoList& infoList = opDebugLineColumnInfoList();
> > +    int low = 0;
> > +    int high = infoList.size();
> > +    while (high - low > 1) {
> > +        int current = low + (high - low) / 2;
> > +        LineColumnInfo& currentInfo = infoList[current];
> > +        if (currentInfo < target)
> > +            low = current;
> > +        else
> > +            high = current;
> >      }
> > -    } // switch
> 
> This looks like a binary chop algorithm. Did you consider WTF::binarySearch or std::binary_search?

It is a binary search algorithm which I inherited from UnlinkedCodeBlock::expressionRangeForBytecodeOffset(), and modified for my needs.  In our case here, we need an algorithm that will find the nearest preceding op_debug bytecode.

Looking at WTF::binarySearch(), the closest that comes to what we need is with the ReturnAdjacentElementIfKeyIsNotPresent search mode.  However, that can return us either the preceeding or the subsequent op_debug bytecode.  And additional check will be needed after the search to ensure that we’ll referring to the preceding op_debug. 

According to https://stdcxx.apache.org/doc/stdlibug/14-4.html, std::binary_search() returns true or false depending on whether an exact match is found.  This is not what we need.  There is a std::lower_bound() that insert position for the provided value.  We can probably use std::lower_bound() and then iterate backwards by 1 from the returned position to get to the preceding entry for what we need.  However, the documentation also says that std::lower_bound() will "match only when the element is not currently found in the sequence”.  So, there’s more finessing that may be needed.

So, I prefer to stick with what we have that we know will serve the exact purpose (of finding the nearest preceding result which can be a direct match as well).

One other detail: while looking thru WTF::binarySearch(), I realized that my search implementation did not provide a quick exit when an exact match is found.  While this is not strictly needed for correctness, it’s a simple to add this optimization.  Basically, I added the following before the test of (currentInfo < target):

    if (currentInfo == target) {
        low = current;
        break;
    }

> > Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:396
> > +    // Find the earliest op_debug opcode that has this same best fit line and column:
> 
> It would be better to comment here *why* you're looking for the earliest item. You explained that in your ChageLog, but not in your code.

I imported and adapted the comment from the ChangeLog and replaced this comment.

> > Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:431
> > +    m_rareData->m_opDebugLineColumnInfoList = std::unique_ptr<LineColumnInfoList>(new LineColumnInfoList());
> 
> Please use make_unique.

Done.

> > Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.h:559
> >          Vector<ExpressionRangeInfo::FatPosition> m_expressionInfoFatPositions;
> > +        std::unique_ptr<LineColumnInfoList> m_opDebugLineColumnInfoList;
> 
> Why is m_opDebugLineColumnInfoList a pointer to vector, while m_expressionInfoFatPositions is an inline vector? Should they both be pointers to vectors?

m_opDebugLineColumnInfoList is a pointer because the pointer being null can serve as a marker that the list has not been initialized.  This is needed because this list is only created on demand / first use.

As for m_expressionInfoFatPositions, if the fat positions exists (due to the nature of the JS source), then this Vector will always be populated when the ClassBlock is created.  It could have been implemented as a pointer to a vector as well, but the savings would only be the size of 1 pointer at best.  I didn’t think it was worth it back when I implemented m_expressionInfoFatPositions.

Shortly, I’ll upload a new patch with the fixes above and attempt to resolve the EFL and GTK build failures.
Comment 8 Mark Lam 2014-01-17 12:30:39 PST
(In reply to comment #7)
> One other detail: while looking thru WTF::binarySearch(), I realized that my search implementation did not provide a quick exit when an exact match is found.  While this is not strictly needed for correctness, it’s a simple to add this optimization.  Basically, I added the following before the test of (currentInfo < target):
> 
>     if (currentInfo == target) {
>         low = current;
>         break;
>     }

This uncovered a bug in the section of UnlinkedCodeBlock::opDebugBytecodeOffsetForLineAndColumn() that searches backwards to find the first op_debug with the same line and column value as the one found by the binary search.  That backward search was iterating as follows:

    for (int i = low - 1; i > 0; i--)

Instead if should be iterating as follows:

    for (int i = low - 1; i >= 0; i--)

The bug was that the entry at index 0 was being excluded as a candidate for this backward search, and it should not be excluded.  This will be fixed in my upcoming patch.
Comment 9 Mark Lam 2014-01-17 13:36:28 PST
Created attachment 221490 [details]
patch for landing.
Comment 10 Geoffrey Garen 2014-01-17 16:05:27 PST
Comment on attachment 221490 [details]
patch for landing.

View in context: https://bugs.webkit.org/attachment.cgi?id=221490&action=review

r=me, with two changes below:

> Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:387
> +    // Find the best fit op_debug opcode for the requested line and column:
> +    // The best fit op_debug will be the one that is preceding but is nearest to
> +    // the specified line and column.
> +    LineColumnInfo::LineColumnPair target(line, column);
> +    LineColumnInfoList& infoList = opDebugLineColumnInfoList();
> +    int low = 0;
> +    int high = infoList.size();
> +    while (high - low > 1) {
> +        int current = low + (high - low) / 2;
> +        LineColumnInfo& currentInfo = infoList[current];
> +        if (currentInfo == target) {
> +            low = current;
> +            break;
> +        }
> +        if (currentInfo < target)
> +            low = current;
> +        else
> +            high = current;
>      }

Let's use std::lower_bound. Your understanding that it will "match only when the element is not currently found in the sequence” seems to be out of date.

Specifically:
it = lower_bound()
if (target < *it)
    --it;
ASSERT(it >= begin && it < end && *it <= target)

> Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:408
> +    // Search backwards from the found op_debug to see if there are any
> +    // preceding op_debug bytecodes with the exact same line and column as the
> +    // one we found. We want the earliest one because when we have multiple
> +    // potential breakpoint targets that map to a given line and column, a
> +    // debugger user would expect to break on the first one first and step
> +    // through the rest thereafter if needed.
> +    LineColumnInfo::LineColumnPair match(line, column);
> +    for (int i = low - 1; i >= 0; i--) {
> +        LineColumnInfo& currentInfo = infoList[i];
> +        if (currentInfo != match)
> +            break;
> +        low = i;
> +    }

Let's not do this. The Web Inspector doesn't set multiple breakpoints at the exact same line/column position.

Instead, let's ASSERT:

    ASSERT(it == begin || *(it - 1) < *it);
Comment 11 Mark Lam 2014-01-17 16:09:04 PST
(In reply to comment #10)
> Let's use std::lower_bound. Your understanding that it will "match only when the element is not currently found in the sequence” seems to be out of date.
> 
> Specifically:
> it = lower_bound()
> if (target < *it)
>     --it;
> ASSERT(it >= begin && it < end && *it <= target)

OK, will fix.

> > Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:408
> > +    // Search backwards from the found op_debug to see if there are any
> > +    // preceding op_debug bytecodes with the exact same line and column as the
> > +    // one we found. We want the earliest one because when we have multiple
> > +    // potential breakpoint targets that map to a given line and column, a
> > +    // debugger user would expect to break on the first one first and step
> > +    // through the rest thereafter if needed.
> > +    LineColumnInfo::LineColumnPair match(line, column);
> > +    for (int i = low - 1; i >= 0; i--) {
> > +        LineColumnInfo& currentInfo = infoList[i];
> > +        if (currentInfo != match)
> > +            break;
> > +        low = i;
> > +    }
> 
> Let's not do this. The Web Inspector doesn't set multiple breakpoints at the exact same line/column position.
> 
> Instead, let's ASSERT:
> 
>     ASSERT(it == begin || *(it - 1) < *it);

The issue isn’t that the WebInspector will set multiple breakpoints.  It is that there can be multiple op_debug bytecodes for the same given line and column values.  The WebInspector will request 1 breakpoint at that line and column value.  We need to set it on the earliest op_debug bytecode that matches that line and column.

Hence, we still need to do this backward search.
Comment 12 Mark Lam 2014-01-17 16:34:14 PST
(In reply to comment #11)
> > Let's not do this. The Web Inspector doesn't set multiple breakpoints at the exact same line/column position.
> > 
> > Instead, let's ASSERT:
> > 
> >     ASSERT(it == begin || *(it - 1) < *it);
> 
> The issue isn’t that the WebInspector will set multiple breakpoints.  It is that there can be multiple op_debug bytecodes for the same given line and column values.  The WebInspector will request 1 breakpoint at that line and column value.  We need to set it on the earliest op_debug bytecode that matches that line and column.
> 
> Hence, we still need to do this backward search.

Nevermind, I think I get it.  With the use of lower_bound(), we should be at the earliest position already.  I’ll look into making this so.
Comment 13 Mark Lam 2014-01-18 02:40:22 PST
Created attachment 221542 [details]
patch 3 for landing using std::lower_bound() instead

According to the spec for std::lower_bound() at <http://www.cplusplus.com/reference/algorithm/lower_bound/>, “if all the element in the range compare less than val, the function returns last.”  Hence, after the lower_bound search, we need to check for it == end as well as follows:

    if (it == list.end() || *it > target)
        --it;

This patch has been tested with the white box test I attached earlier, and passes without issues.  I’ll wait for the EWS bots to go green before landing.
Comment 14 Mark Lam 2014-01-18 09:51:56 PST
Landed in r162256: <http://trac.webkit.org/r162256>.