Bug 124245 - Add tracking of endColumn for Executables
Summary: Add tracking of endColumn for Executables
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Mark Lam
URL:
Keywords:
Depends on:
Blocks: 122836
  Show dependency treegraph
 
Reported: 2013-11-12 16:19 PST by Mark Lam
Modified: 2013-11-19 13:53 PST (History)
11 users (show)

See Also:


Attachments
the patch. (75.88 KB, patch)
2013-11-13 20:16 PST, Mark Lam
mark.lam: review-
Details | Formatted Diff | Diff
patch 2. (75.42 KB, patch)
2013-11-13 20:23 PST, Mark Lam
eflews.bot: commit-queue-
Details | Formatted Diff | Diff
patch 3. (75.43 KB, patch)
2013-11-13 20:31 PST, Mark Lam
gtk-ews: commit-queue-
Details | Formatted Diff | Diff
patch 4: work in progress. (72.50 KB, patch)
2013-11-14 09:45 PST, Mark Lam
buildbot: commit-queue-
Details | Formatted Diff | Diff
patch 5: fixes export bug that the Window build revealed, but still a work in progress. (74.28 KB, patch)
2013-11-14 11:27 PST, Mark Lam
buildbot: commit-queue-
Details | Formatted Diff | Diff
patch 6: addressed all feedback + many bug fixes. (121.93 KB, patch)
2013-11-18 18:48 PST, Mark Lam
ggaren: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Lam 2013-11-12 16:19:59 PST
To implement bytecode level breakpoint efficiently, the Debugger needs to be able to tell if a given source location (e.g. for a breakpoint) falls within the bounds of a block of source code.  From the Executable, we currently have the start line, end line, and start column which gets us most of the info we need.  The missing piece is the end column.  Will be adding support for tracking this end column value.
Comment 1 Mark Lam 2013-11-13 20:16:51 PST
Created attachment 216892 [details]
the patch.
Comment 2 Mark Lam 2013-11-13 20:19:53 PST
Comment on attachment 216892 [details]
the patch.

View in context: https://bugs.webkit.org/attachment.cgi?id=216892&action=review

> Source/WebCore/testing/Internals.h:-37
> -#include <wtf/PassRefPtr.h>
> -#include <wtf/RefCounted.h>

Oops.  Will revert this change.
Comment 3 Mark Lam 2013-11-13 20:23:22 PST
Created attachment 216893 [details]
patch 2.
Comment 4 Mark Lam 2013-11-13 20:26:51 PST
Comment on attachment 216893 [details]
patch 2.

View in context: https://bugs.webkit.org/attachment.cgi?id=216893&action=review

> Source/WebCore/testing/Internals.idl:141
> +    // Calling scriptBoundsForFunction() with null gets the bounds for the script of the current scope.

Stale comment.  "scriptBoundsForFunction" ==> "scriptBoundsFor".  Will fix before landing.
Comment 5 EFL EWS Bot 2013-11-13 20:28:41 PST
Comment on attachment 216893 [details]
patch 2.

Attachment 216893 [details] did not pass efl-ews (efl):
Output: http://webkit-queues.appspot.com/results/22899211
Comment 6 Mark Lam 2013-11-13 20:31:10 PST
Created attachment 216894 [details]
patch 3.
Comment 7 kov's GTK+ EWS bot 2013-11-13 21:50:57 PST
Comment on attachment 216894 [details]
patch 3.

Attachment 216894 [details] did not pass gtk-ews (gtk):
Output: http://webkit-queues.appspot.com/results/22640232
Comment 8 Geoffrey Garen 2013-11-13 22:02:21 PST
Comment on attachment 216894 [details]
patch 3.

View in context: https://bugs.webkit.org/attachment.cgi?id=216894&action=review

> Source/JavaScriptCore/ChangeLog:3
> +        Add tracking of endColumn for Executables.

Your ChangeLog should include a sentence about what you're going to use this for.

> Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:103
> +    if (!m_lineCount)
> +        m_functionEndColumn -= m_functionStartColumn;

Why?

> Source/JavaScriptCore/runtime/VM.cpp:782
> +JSValue VM::scriptBoundsFor(ExecState* exec, JSValue script)

This function doesn't really belong in VM, since it doesn't pertain to maintaining the internal invariants of the VM. I would put this and GetCallerCodeBlockFunctor inside JSInternals::scriptBoundsFor.
Comment 9 Geoffrey Garen 2013-11-13 22:24:13 PST
Comment on attachment 216894 [details]
patch 3.

View in context: https://bugs.webkit.org/attachment.cgi?id=216894&action=review

> Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:211
> +    , m_endColumn(UINT_MAX)

Why do we track an end column but not a start column?

> Source/JavaScriptCore/parser/Parser.cpp:1240
> +        unsigned bodyEndColumn = endLocation.startOffset - endLocation.lineStartOffset;
> +        if (endLocation.line == bodyStartLine)
> +            bodyEndColumn = endLocation.startOffset - m_token.m_data.lineStartOffset;

Why is endLocation.lineStartOffset not equal to m_token.m_data.lineStartOffset in this case?

> Source/JavaScriptCore/runtime/Executable.h:375
> +        , m_endColumn(UINT_MAX)

Why is this a property of the executable and not of the SourceCode (m_source)? SourceCode already tracks firstLine, startColumn, startOffset, endOffset, and length.

> Source/JavaScriptCore/runtime/Executable.h:416
> +        ASSERT(endColumn != UINT_MAX);

Why does endColumn get a special signaling value, but startColumn doesn't? Let's not add arbitrary differences to the code just because two different engineers worked on the two column variables.

> Source/JavaScriptCore/runtime/VM.h:463
> +        // This is only used for testing purposes:

If you move this function into window.internals, you won't need this comment. The comment is a good indication that the function is in the wrong place.

> Source/WebCore/testing/Internals.idl:141
> +    // Calling scriptBoundsFor() with null gets the bounds for the script of the current scope.

A more JavaScript-y API way to do things is to make passing no arguments, or passing undefined, return the metadata for the current scope.

> Source/WebCore/testing/Internals.idl:142
> +    [Custom] DOMString scriptBoundsFor(any funcOrNull);

Let's call this "parserMetadata". "script" is not a great word to put in a JS function name because everything has to do with script. "bounds" is not a great word because it isn't a term of art in the parser, and because this function returns more than just boundary information.

> Source/WebCore/testing/JSInternalsCustom.cpp:34
> +JSC::JSValue JSInternals::scriptBoundsFor(JSC::ExecState* exec)

Why can't this go into Internals.cpp, like other functions? That's a good thing to explain in your ChangeLog.

> LayoutTests/js/dom/script-tests/script-start-end-locations.js:1
> +var value = 10; var sum = 1; debug(window.internals.scriptBoundsFor(null)); function f0() { bounds(f0); function f0a() { bounds(f0a); function f0b() { bounds(f0b); eval('sum += value; debug(window.internals.scriptBoundsFor(null));'); } f0b(); } f0a(); } f0();  var x0 = function(key) { bounds(x0);var x0a = function() { bounds(x0a); var x0b = function() { bounds(x0b); sum += value; eval('sum += value; debug(window.internals.scriptBoundsFor(null));'); }; x0b(); }; x0a(); }; x0();

Can you move the explicit calls to bounds, which pass in function names, to a stand-alone part of the file toward the end? Having all the calls in one place will make the test easier to reduce if it starts failing. Mixing the output code like this into the thing that's being debugged makes this pretty hard to work with.
Comment 10 Oliver Hunt 2013-11-14 09:32:39 PST
Comment on attachment 216894 [details]
patch 3.

View in context: https://bugs.webkit.org/attachment.cgi?id=216894&action=review

>> Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:103
>> +        m_functionEndColumn -= m_functionStartColumn;
> 
> Why?

Presumably this is because in unlinked bytecode all locations are relative to the start of the function

>> Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:211
>> +    , m_endColumn(UINT_MAX)
> 
> Why do we track an end column but not a start column?

In unlinked bytecode the start column is always 0 IIRC

We may actually need to change how we track function bodies now, as deconstruction can result in exceptions occurring ahead of the function body

>> Source/JavaScriptCore/runtime/Executable.h:375
>> +        , m_endColumn(UINT_MAX)
> 
> Why is this a property of the executable and not of the SourceCode (m_source)? SourceCode already tracks firstLine, startColumn, startOffset, endOffset, and length.

The endOffset is the offset into the sourceProvider, to get the end column you would have to walk back an arbitrary disantce until you found a newline to get the column

>> Source/JavaScriptCore/runtime/Executable.h:416
>> +        ASSERT(endColumn != UINT_MAX);
> 
> Why does endColumn get a special signaling value, but startColumn doesn't? Let's not add arbitrary differences to the code just because two different engineers worked on the two column variables.

Yeah i agree with this sentiment -- i also don't understand why we would ever not have the start and end column
Comment 11 Mark Lam 2013-11-14 09:45:52 PST
Created attachment 216952 [details]
patch 4: work in progress.

This patch addresses most of Gepff's feedback except for the parts about why the endColumn is not handled as a pair with startColumn (to be investigated and potentially address in another patch update).  Will address specifics of feedback in a separate comment to follow.
Comment 12 Geoffrey Garen 2013-11-14 10:15:32 PST
> >> Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:103
> >> +        m_functionEndColumn -= m_functionStartColumn;
> > 
> > Why?
> 
> Presumably this is because in unlinked bytecode all locations are relative to the start of the function

So "!m_lineCount" means "this is unlinked bytecode"? Can we find a better way to say that?

> >> Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:211
> >> +    , m_endColumn(UINT_MAX)
> > 
> > Why do we track an end column but not a start column?
> 
> In unlinked bytecode the start column is always 0 IIRC

In that case, it doesn't make sense to me that we would track an end column in unlinked bytecode at all. If the start column is bogus, so is the end column.

> >> Source/JavaScriptCore/runtime/Executable.h:375
> >> +        , m_endColumn(UINT_MAX)
> > 
> > Why is this a property of the executable and not of the SourceCode (m_source)? SourceCode already tracks firstLine, startColumn, startOffset, endOffset, and length.
> 
> The endOffset is the offset into the sourceProvider, to get the end column you would have to walk back an arbitrary disantce until you found a newline to get the column

I think you're saying that we have a SourceCode before we parse the function body, but we don't know what the end column will be until after we've parsed the function body.

Is it really true that the parser doesn't scan for line and column information when it parses a FunctionBodyNode? I don't think that's true.

If it is true, then endColumn should be a property of CodeBlock, not Executable, since CodeBlock is the object that represents our state after we parse the internals of our function body.
Comment 13 Build Bot 2013-11-14 10:33:01 PST
Comment on attachment 216952 [details]
patch 4: work in progress.

Attachment 216952 [details] did not pass win-ews (win):
Output: http://webkit-queues.appspot.com/results/23608012
Comment 14 Oliver Hunt 2013-11-14 10:39:57 PST
(In reply to comment #12)
> > >> Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:103
> > >> +        m_functionEndColumn -= m_functionStartColumn;
> > > 
> > > Why?
> > 
> > Presumably this is because in unlinked bytecode all locations are relative to the start of the function
> 
> So "!m_lineCount" means "this is unlinked bytecode"? Can we find a better way to say that?

No, the fact that you're in an UnlinkedCodeBlock is what tells you that :D

I'm not sure what the !m_lineCount is about sorry

> 
> > >> Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:211
> > >> +    , m_endColumn(UINT_MAX)
> > > 
> > > Why do we track an end column but not a start column?
> > 
> > In unlinked bytecode the start column is always 0 IIRC
> 
> In that case, it doesn't make sense to me that we would track an end column in unlinked bytecode at all. If the start column is bogus, so is the end column.

We need that end column, so that we can compute the actual end column in the linked version of the code.

On the other hand when i think about this our string match means that the end co,umm should not be a relative offset.  Hmmm - MarkL thoughts?


> > The endOffset is the offset into the sourceProvider, to get the end column you would have to walk back an arbitrary disantce until you found a newline to get the column
> 
> I think you're saying that we have a SourceCode before we parse the function body, but we don't know what the end column will be until after we've parsed the function body.
> 
> Is it really true that the parser doesn't scan for line and column information when it parses a FunctionBodyNode? I don't think that's true.

It does accumulate this info, but it hasn't been recording it (FunctionBodyNode doesn't hang around post codegen)

> 
> If it is true, then endColumn should be a property of CodeBlock, not Executable, since CodeBlock is the object that represents our state after we parse the internals of our function body.

I agree with this whole heartedly, that where all the other source location properties live (the code block accessor functions do m_<property> + m_unlinkedCodeBlock-><property>Offset()
Comment 15 Mark Lam 2013-11-14 10:49:39 PST
(In reply to comment #8)
> (From update of attachment 216894 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=216894&action=review
> 
> > Source/JavaScriptCore/ChangeLog:3
> > +        Add tracking of endColumn for Executables.
> 
> Your ChangeLog should include a sentence about what you're going to use this for.

Fixed in patch 4.
 
> > Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:103
> > +    if (!m_lineCount)
> > +        m_functionEndColumn -= m_functionStartColumn;
> 
> Why?

As Oliver pointed out, this is because UnlinkedCodeBlock needs to adjust its positions relative to the start of the function.  However, for columns, there's the additional trickiness that the column only need to be adjusted if the position is in the first line of text.  If the end position is also on the same line as the start position (the entire function is on 1 line), then m_lineCount will be 0.  Yes, this is funky because one would expect m_lineCount to be 1 in that case.  Regardless, this is a pre-existing condition, and I didn't want to do the extra work to change how that works as it is orthogonal to the endColumn tracking implementation.
 
> > Source/JavaScriptCore/runtime/VM.cpp:782
> > +JSValue VM::scriptBoundsFor(ExecState* exec, JSValue script)
> 
> This function doesn't really belong in VM, since it doesn't pertain to maintaining the internal invariants of the VM. I would put this and GetCallerCodeBlockFunctor inside JSInternals::scriptBoundsFor.

Fixed in patch 4.  I did it this way because I originally didn't want to export a lot more of JSC in order to get it to work.  That said, I like that putting the functionality there in JSInternals::scriptBoundsFor does have the added benefit of not having that code linked in (it's in a separate link object) when we're not running tests.  The side effect of this change is that I had to add a ForwardingHeader for CodeBlock.h, export a s_info struct, and moved a bunch of seemingly unrelated header files in the xcode project in order for JSInternals::scriptBoundsFor() (now Internals::parserMetaData()) to build. 


(In reply to comment #9)
> (From update of attachment 216894 [details])
> View in context: https://bugs.webkit.org/attachment.cgi?id=216894&action=review
> 
> > Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:211
> > +    , m_endColumn(UINT_MAX)
> 
> Why do we track an end column but not a start column?

I did start out with the assumption and goal that startColumn and endColumn should be computed as a set along with startLine and endLine.  However, upon working through the code, I discovered that, contrary to intuition, startColumn and endColumn values come from different sources:
1. startColumn comes from the SourceProvider.
2. endColumn comes from the Lexer / Parser.

This resulted in the 2 values being computed at different phases in the parsing process, which in turn precipitate in the asymmetric implementation.  That said, I find myself wondering how SourceProvider computed startColumn ahead of time.  I wrote that code but have since purged it from memory.  I will go back and take a second look to see if there's a way to have some greater symmetry between startColumn and endColumn.  Note: patch 4 does not include this work.

> > Source/JavaScriptCore/parser/Parser.cpp:1240
> > +        unsigned bodyEndColumn = endLocation.startOffset - endLocation.lineStartOffset;
> > +        if (endLocation.line == bodyStartLine)
> > +            bodyEndColumn = endLocation.startOffset - m_token.m_data.lineStartOffset;
> 
> Why is endLocation.lineStartOffset not equal to m_token.m_data.lineStartOffset in this case?

endlocation.lineStartOffset came from cachedInfo which may have been computed based on a different SourceProvider.  This is my educated guess, and has not been proven yet.  I do know empirically that it can be different.  I will take another look at this to see if I can confirm the cause of the difference.

> > Source/JavaScriptCore/runtime/Executable.h:375
> > +        , m_endColumn(UINT_MAX)
> 
> Why is this a property of the executable and not of the SourceCode (m_source)? SourceCode already tracks firstLine, startColumn, startOffset, endOffset, and length.

Because, contrary to intuition, startColumn and endColumn are not computed in the same places.  I will revisit to see if I can improve their symmetry.

> > Source/JavaScriptCore/runtime/Executable.h:416
> > +        ASSERT(endColumn != UINT_MAX);
> 
> Why does endColumn get a special signaling value, but startColumn doesn't? Let's not add arbitrary differences to the code just because two different engineers worked on the two column variables.
> 
> > Source/JavaScriptCore/runtime/VM.h:463
> > +        // This is only used for testing purposes:
> 
> If you move this function into window.internals, you won't need this comment. The comment is a good indication that the function is in the wrong place.

Fixed in patch 4.

> > Source/WebCore/testing/Internals.idl:141
> > +    // Calling scriptBoundsFor() with null gets the bounds for the script of the current scope.
> 
> A more JavaScript-y API way to do things is to make passing no arguments, or passing undefined, return the metadata for the current scope.

Fixed in patch 4.  I had consider this, but the scriptBoundsFor() name did not lend itself to this.  The much better parserMetaData() name you suggested makes it work nicely.  

> > Source/WebCore/testing/Internals.idl:142
> > +    [Custom] DOMString scriptBoundsFor(any funcOrNull);
> 
> Let's call this "parserMetadata". "script" is not a great word to put in a JS function name because everything has to do with script. "bounds" is not a great word because it isn't a term of art in the parser, and because this function returns more than just boundary information.

Fixed in patch 4.

> > Source/WebCore/testing/JSInternalsCustom.cpp:34
> > +JSC::JSValue JSInternals::scriptBoundsFor(JSC::ExecState* exec)
> 
> Why can't this go into Internals.cpp, like other functions? That's a good thing to explain in your ChangeLog.

Fixed in patch 4.  I thought I needed the more raw interface of a custom implementation.  Hence, I implemented it as a member JSInternals and place it in JSInternalsCustom.cpp.  I've now made it work as Internals::parseMetaData().

> > LayoutTests/js/dom/script-tests/script-start-end-locations.js:1
> > +var value = 10; var sum = 1; debug(window.internals.scriptBoundsFor(null)); function f0() { bounds(f0); function f0a() { bounds(f0a); function f0b() { bounds(f0b); eval('sum += value; debug(window.internals.scriptBoundsFor(null));'); } f0b(); } f0a(); } f0();  var x0 = function(key) { bounds(x0);var x0a = function() { bounds(x0a); var x0b = function() { bounds(x0b); sum += value; eval('sum += value; debug(window.internals.scriptBoundsFor(null));'); }; x0b(); }; x0a(); }; x0();
> 
> Can you move the explicit calls to bounds, which pass in function names, to a stand-alone part of the file toward the end? Having all the calls in one place will make the test easier to reduce if it starts failing. Mixing the output code like this into the thing that's being debugged makes this pretty hard to work with.

1. Can't do this because f0a() and f0b() (and many many other functions like them in this test) are inner functions that are not addressable by name once we get to the end of the file.
2. It's not that hard to work with since:
    a. The test is designed them to be evaluated in order.
    b. You can always find them by proximity to adjacent test cases that are correct.  The correct ones will have start and end positions that will point out the approximate position in the source file where the failed test case is.
    c. The failure will result in a text diff in the test results.  The text diff of the expected result will tell you the exact start and end locations of the function which is failing.

    Any of these 3 can be used to find the failing test case.

Incidentally, I renamed bounds() to dump() in patch 4.


(In reply to comment #12)
> > >> Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:103
> > >> +        m_functionEndColumn -= m_functionStartColumn;
> > > 
> > > Why?
> > 
> > Presumably this is because in unlinked bytecode all locations are relative to the start of the function
> 
> So "!m_lineCount" means "this is unlinked bytecode"? Can we find a better way to say that?

No.  This code is in UnlinkedFunctionExecutable which means it needs the special handling of unlinked code.  The "!m_lineCount" means that the entire function source is on 1 line.  Hence, the functionEndColumn needs to be adjusted relative to some start offset when the executable gets linked later just like functionStartColumn.  Hence, it needs to be encoded as relative to the start of the function.

Sounds like I should put a comment here for the benefit of folks who are not familiar with this code?

> > >> Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:211
> > >> +    , m_endColumn(UINT_MAX)
> > > 
> > > Why do we track an end column but not a start column?
> > 
> > In unlinked bytecode the start column is always 0 IIRC
> 
> In that case, it doesn't make sense to me that we would track an end column in unlinked bytecode at all. If the start column is bogus, so is the end column.

The start column can be computed by adding the start of the SourceProvider to some offset.  The end column on the other hand, if we don't track it, requires a reverse walk of the characters in the source string to find the start of the last line.  This can turn out to be expensive if the source is on a single very long line (as past experience has shown).  Again, the issue here is that contrary to intuition, startColumn and endColumn computation is not symmetrical ... at least not yet.  My next step is to look into whether we can make this symmetrical.

> > >> Source/JavaScriptCore/runtime/Executable.h:375
> > >> +        , m_endColumn(UINT_MAX)
> > > 
> > > Why is this a property of the executable and not of the SourceCode (m_source)? SourceCode already tracks firstLine, startColumn, startOffset, endOffset, and length.
> > 
> > The endOffset is the offset into the sourceProvider, to get the end column you would have to walk back an arbitrary disantce until you found a newline to get the column
> 
> I think you're saying that we have a SourceCode before we parse the function body, but we don't know what the end column will be until after we've parsed the function body.

Yes.  This is true.

> Is it really true that the parser doesn't scan for line and column information when it parses a FunctionBodyNode? I don't think that's true.

Oliver already answered this better than I can.

> If it is true, then endColumn should be a property of CodeBlock, not Executable, since CodeBlock is the object that represents our state after we parse the internals of our function body.

Don't forget that there is also a linked Executable (see ScriptExecutable).  That's where the start line (m_firstLine), end line (m_lastLine), start column (m_startColumn) actually lives.  I'm just continuing with the same organization.  When the CodeBlock wants to compute its position, it used these values from its ownerExecutable.  Hence, there is nothing wrong with putting these values in the Executable whose role is to track the source code (and these values pertain to the source code).  I'd argue that it is the proper place to put them.
Comment 16 Mark Lam 2013-11-14 11:27:37 PST
Created attachment 216963 [details]
patch 5: fixes export bug that the Window build revealed, but still a work in progress.
Comment 17 Geoffrey Garen 2013-11-14 12:03:21 PST
Comment on attachment 216952 [details]
patch 4: work in progress.

View in context: https://bugs.webkit.org/attachment.cgi?id=216952&action=review

> Source/JavaScriptCore/bytecode/UnlinkedCodeBlock.cpp:104
>      , m_functionStartColumn(node->startColumn())
> +    , m_functionEndColumn(node->endColumn())
>      , m_startOffset(node->source().startOffset() - source.startOffset())
>      , m_sourceLength(node->source().length())
>      , m_features(node->features())
>      , m_functionNameIsInScopeToggle(node->functionNameIsInScopeToggle())
>  {
> +    if (!m_lineCount)
> +        m_functionEndColumn -= m_functionStartColumn;
>  }

Bearing in mind the fact that UnlinkedFunctionExecutable wants context-free relocatable source code location information, I have a few comments:

(1) I think this would be more clearly expressed as: m_functionEndColumn(m_lineCount ? node->endColumn() : node->endColumn() - node->startColumn()). It's FunctionBodyNode that holds the unrelocatable data, so it's clearer to express fixup in terms of it.

(2) I would rename "function*" to "unlinked*". So, "m_unlinkedEndColumn". That helps to express why we're doing this transformation, and why we do inverse transformation in CodeBlock.

(3) Why don't we need to fix up m_functionStartColumn? In a context-free relocatable format, I would expect the start column to be 0, or perhaps just the offset between "f" and "{" in "function f() { ...". In other words, what happens in this case?:

     <script>function f() { }</script>
<script>function f() { }</script>

The first function starts on column 14, and the second on column 9. The cache will hold the UnlinkedFunctionExecutable from the first function. Will the second function report that it starts on column 14?

> Source/JavaScriptCore/runtime/Executable.h:366
> +        , m_endColumn(UINT_MAX)

I still think it's weird to initialize and test m_endColumn with sentinel value, but not m_startColumn.
Comment 18 Build Bot 2013-11-14 12:08:07 PST
Comment on attachment 216963 [details]
patch 5: fixes export bug that the Window build revealed, but still a work in progress.

Attachment 216963 [details] did not pass win-ews (win):
Output: http://webkit-queues.appspot.com/results/22298042
Comment 19 Mark Lam 2013-11-18 18:48:42 PST
Created attachment 217262 [details]
patch 6: addressed all feedback + many bug fixes.

This column computation work is turning out to be more complex than the bytecode level breakpoint patch that depends on it.  Anyway, the patch looks much better now and we have more tests to prove that it works.

(In reply to comment #17)
> (1) I think this would be more clearly expressed as: m_functionEndColumn(m_lineCount ? node->endColumn() : node->endColumn() - node->startColumn()). It's FunctionBodyNode that holds the unrelocatable data, so it's clearer to express fixup in terms of it.

I agree, but I ended up renaming m_functionStart/EndColumn to m_unlinkedBodyStart/EndColumn to distinguish it from m_unlinkedFunctionNameStart which points to the start of the function name instead.

> (2) I would rename "function*" to "unlinked*". So, "m_unlinkedEndColumn". That helps to express why we're doing this transformation, and why we do inverse transformation in CodeBlock.

Done.

> (3) Why don't we need to fix up m_functionStartColumn? In a context-free relocatable format, I would expect the start column to be 0, or perhaps just the offset between "f" and "{" in "function f() { ...". In other words, what happens in this case?:
> 
>      <script>function f() { }</script>
> <script>function f() { }</script>
> 
> The first function starts on column 14, and the second on column 9. The cache will hold the UnlinkedFunctionExecutable from the first function. Will the second function report that it starts on column 14?

Previously, both <script> tag cases are reported as starting at column 1 where column 1 is the char immediately following the <script> tag.  This is similar to how eval code is handled.  However, it is unsatisfying and we can do better (as you expected in your comment).  This is now fixed.  The handling of this case is in UnlinkedFunctionExecutable::link() which now adjusts startColumn accordingly.  I've also added test cases to the new test that exercises this code.

> > Source/JavaScriptCore/runtime/Executable.h:366
> > +        , m_endColumn(UINT_MAX)
> 
> I still think it's weird to initialize and test m_endColumn with sentinel value, but not m_startColumn.

Agreed.  m_startColumn is now initialized to UINT_MAX as well, and asserted accordingly at all the appropriate places.

Previously, I also said that I would look into increasing the symmetry of the way we compute startColumn and endColumn.  After revisiting all the cases, I decided that:
1. It doesn't make sense to store the endColumn in SourceCode because the endColumn is computed by the lexer / parser after SourceCode is instantiated.
2. It also doesn't make sense to go back and update SourceCode with an appropriate endColumn value post-parsing.  This is because the SourceCode instance used for a subsequent linkage of the UnlinkedExecutable may be different anyway.  The best place to store the endColumn is still in the UnlinkedExecutable along with the start line, last line, and startColumn that were already there.

That said, I did re-factored the code to be more symmetrical in the handling of startColumn and endColumn.  Now, when a class / struct has an endColumn, it also has a startColumn.  However, in the case of UnlinkedCodeBlock and EvalNode, m_startColumn is always a constant 0.
Comment 20 Geoffrey Garen 2013-11-19 12:06:45 PST
Comment on attachment 217262 [details]
patch 6: addressed all feedback + many bug fixes.

View in context: https://bugs.webkit.org/attachment.cgi?id=217262&action=review

r=me

Much improved. I think this is ready to land, with some fixups below.

> Source/JavaScriptCore/ChangeLog:18
> +           FunctionExecutables. This has been fixed so that they all use base-0
> +           columns. When the executable gets linked, the column is adjusted into
> +           a base-1 value.

Why should unlinked and linked executables disagree on the base used for column counting?

> Source/JavaScriptCore/ChangeLog:42
> +           The second case is needed so that we can add an appropriate to the end

Missing a word after "appropriate".

> Source/JavaScriptCore/ChangeLog:52
> +           This interpretation is janky, but was present before this patch. We can
> +           clean that up later in a nother patch.

"another"

> Source/JavaScriptCore/parser/Parser.h:417
> +    JSTextPosition positionBeforeLastLineFeed() const { return m_lexer->positionBeforeLastLineFeed(); }

We idiomatically call these "newlines" -- as in "positionBeforeLastNewline" -- and not line feeds. "\n" is the newline character.

> Source/JavaScriptCore/runtime/Executable.h:569
> +    static FunctionExecutable* create(VM& vm, const SourceCode& source, UnlinkedFunctionExecutable* unlinkedExecutable, unsigned firstLine, unsigned lastLine, unsigned startColumn, unsigned endColumn, bool bodyExcludesBraces = false)

"excludes = false", in English, would be stated as "does not not include". Double negatives are notoriously hard for humans to read, even though computers can handle them just fine. Better to name this variable "bodyIncludesBraces", and invert the code.

> Source/WebCore/testing/Internals.cpp:1301
> +

Stray newline.

> Source/WebCore/testing/Internals.cpp:1305
> +

Stray newline.

> Source/WebCore/testing/Internals.cpp:1329
> +    } else {
> +        if (executable->isEvalExecutable())
> +            result.append("eval");
> +        else {
> +            ASSERT(executable->isProgramExecutable());
> +            result.append("program");
> +        }
> +    }

Using "else if" here will reduce the indentation.

> Source/WebCore/testing/Internals.idl:141
> +    // Calling scriptBoundsFor() with null gets the bounds for the script of the current scope.

This comment should mention that you can call with no arguments. That's the preferred way to get the metadata for the current scope. This comment should also say "metadata", to match the function name.
Comment 21 Mark Lam 2013-11-19 13:45:54 PST
Thanks for the review.

(In reply to comment #20)
> > Source/JavaScriptCore/ChangeLog:18
> > +           FunctionExecutables. This has been fixed so that they all use base-0
> > +           columns. When the executable gets linked, the column is adjusted into
> > +           a base-1 value.
> 
> Why should unlinked and linked executables disagree on the base used for column counting?

For column (and lines based on a quick glance), the linked executables were always using base-1 values.  The unlinked executables were previously using a mix of base-0 and base-1.  The underlying parser Nodes were using base-0.  Long term, we would like to go towards base-0 for everything.

This change only fixes the inconsistency in unlinked executables, and I chose base-0 because we hope to make everything base-0 eventually.  As for the linked executables, I left them as base-1 for now because changing them to be base-0 is not strictly needed for this task, and also because that change can have other repercussions down-stream (client code that currently expects it to be base-1 will need to be converted as well).  It'll be a task for another day.

Will fix the rest of the issues.
Comment 22 Mark Lam 2013-11-19 13:53:39 PST
Landed in r159520: <http://trac.webkit.org/r159520>.