Bug 16217 - PCRE offset vector handling need to be re-written
Summary: PCRE offset vector handling need to be re-written
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 523.x (Safari 3)
Hardware: Mac OS X 10.4
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-11-30 18:32 PST by Eric Seidel (no email)
Modified: 2011-07-21 19:08 PDT (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Seidel (no email) 2007-11-30 18:32:33 PST
PCRE offset vector handling need to be re-written

PCRE has this *awful* concept of an "offset_vector" which is set on the match_data and passed in to jsRegExpExec.  We've already partially disabled the code in jsRegExpExec to second-guess the caller who might pass in a too-small offsets vector.  Now we need to change how the offsets vector is constructed and used.

First, how it's used:

offsets_vector is an int array of length 3n where n is the expected number of sub-string matches.

The first 2n ints are used to store start/stop offsets into the string for any matched sub-strings.  The last 1n ints are "private" and I'm not entirely sure what they're always used for, I believe they mostly store offset lengths during exec.

This should be replaced by an array of SubstringMatch { int startOffset; int endOffset; } structs an some auxiliary data store for the extra temporary offset data.  I assume that the current solution came into being to try and have the best data locality, and to try and reduce the number of mallocs, but having 1 fewer mallocs per call is totally not worth this compexity.  If we decide it is, we can hide this hack behind some sort of OffsetsStorage structure.

i suggest that we continue to pass in the SubstringMatch array by pointer (SubstringMatch* matches) even if the callers use a vector.  The reason being that that allows the callers to use inline capacity vectors w/o needing to copy to a generic Vector<T> for the call,  or have the SPI have some fixed size vector in it's argument lists, ick.  The offset support information can be allocated internally.  Alternaitvely all this allocation could be abstracted into a single class which the caller is required to create and pass in (or delete when it's passed out).  Example: OffsetsStorage* offsets = OffsetsStorage(number); would create the right allocation, and then internally we can access things using methods like setSubstringStart(number, offset).

I'd be curious if other PCRE hackers have other suggestions.  I think one of the above mentioned options should work and would be a huge cleanup from what we currently have.
Comment 1 Gavin Barraclough 2011-07-21 19:08:40 PDT
PCRE has been removed.