RESOLVED FIXED 25165
First step towards replacing PCRE and completing regular expression JIT
https://bugs.webkit.org/show_bug.cgi?id=25165
Summary First step towards replacing PCRE and completing regular expression JIT
Gavin Barraclough
Reported 2009-04-13 15:40:23 PDT
Land a new regex interpreter, YARR (yet another regex runtime). This interpreter is a more correct implementation of the ECMA regex language (and closer to Firefox behaviour) than our port of PCRE, and is designed to support a JIT. Currently the interpreter has not be optimized, and as such will likely perform less well than PCRE, as such we will land it disabled. Also land the first cut of NARC (not another regex compiler) – YARR's JIT. This presently supports around the same subset of regular expressions as WREC, however NARC is not designed to fall back to a JIT and as such presently enabling NARC will result in breaking functionality.
Attachments
Yarr! (192.76 KB, patch)
2009-04-13 15:41 PDT, Gavin Barraclough
ggaren: review+
(with review comment fixes, just missing a changelog...) (191.33 KB, patch)
2009-04-13 22:22 PDT, Gavin Barraclough
no flags
Gavin Barraclough
Comment 1 2009-04-13 15:41:57 PDT
Geoffrey Garen
Comment 2 2009-04-13 17:30:23 PDT
Comment on attachment 29442 [details] Yarr! I'm ignoring commented-out code, because Gavin told me that he had a local change to remove it. Let's rename the NARC #define to YARR_JIT. +#if !ENABLE(YARR) ~RegExp(); +#endif You can reduce the #ifdef'ing if you leave in ~RegExp in all builds, and just leave out the implementation in YARR builds. +#if ENABLE(NARC) + jitCompileRegex(globalData, m_regExpJITCode, m_pattern, m_numSubpatterns, m_constructionError, ignoreCase(), multiline()); +#else + UNUSED_PARAM(globalData); + m_regExpBytecode.set(Yarr::byteCompileRegex(m_pattern, m_numSubpatterns, m_constructionError, ignoreCase(), multiline())); +#endif Kind of strange that the JIT version of the code does not use the Yarr namespace explicitly, but the bytecode version of the code does. +CharacterClass* newlineOnce() +{ + static CharacterClass* once = newlineCreate(); + return once; +} DANGER WILL ROBINSON. DANGER. Static initialization like this is not thread-safe. If we want to dynamically allocate these character classes, we need to store them in the parser, or the JSGlobalData. I think function names like this would be easier to read as "newlineCharacterClass()", etc., instead of "newlineOnce()", etc. + // If the pattern contains illegal backreferences reset & reparse. + if (pattern.containsIllegalBackReference()) { Let's be more explicit and mention that we're doing this to match a quirk in Firefox's behavior. + DisjunctionContext* allocDisjunctionContext(ByteDisjunction* disjunction) + { + return new(malloc(sizeof(DisjunctionContext) + (disjunction->m_frameSize - 1) * sizeof(uintptr_t))) DisjunctionContext(); + } + + void freeDisjunctionContext(DisjunctionContext* context) + { + free(context); + } I think this code, and its client, would be cleaner if you made DisjunctionContext refcounted, and used RefPtr to manage its lifetime automatically. + BytecodePattern* compile() + { + regexBegin(m_pattern.m_body->m_callFrameSize); + emitDisjunction(m_pattern.m_body); + regexEnd(); + + BytecodePattern* bytecode = new BytecodePattern(bodyDisjunction, m_pattern.m_ignoreCase, m_pattern.m_multiline, m_pattern.m_numSubpatterns, m_allParenthesesInfo, m_pattern.m_userCharacterClasses); + bodyDisjunction = 0; + m_allParenthesesInfo.shrink(0); + m_pattern.m_userCharacterClasses.clear(); + return bytecode; + } RefPtr here would be good, too. + ~BytecodePattern() + { + delete m_body; + for (unsigned i = 0; i < m_allParenthesesInfo.size(); ++i) + delete m_allParenthesesInfo[i]; + for (unsigned i = 0; i < m_userCharacterClasses.size(); ++i) + delete m_userCharacterClasses[i]; + } And here. +struct RegexJITObject { + void* m_jitCode; + RefPtr<ExecutablePool> m_executablePool; + + RegexJITObject() + : m_jitCode(0) + { + } +}; Should we call this something more specific, like Yarr::CodeBlock? "Object" is a pretty generic name for an object. +// Yarr::parse(): +// Please use "/**/" syntax for multiline comments. + static const unsigned s_maxPatternSize = (1 << 13); I think you want to standardize with the use of MAX_PATTERN_SIZE elsewhere, to avoid differences between the engines. r=me
Gavin Barraclough
Comment 3 2009-04-13 22:22:57 PDT
Created attachment 29460 [details] (with review comment fixes, just missing a changelog...)
Gavin Barraclough
Comment 4 2009-04-14 00:07:40 PDT
Sending JavaScriptCore/ChangeLog Sending JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj Sending JavaScriptCore/assembler/MacroAssemblerX86Common.h Sending JavaScriptCore/assembler/X86Assembler.h Sending JavaScriptCore/runtime/RegExp.cpp Sending JavaScriptCore/runtime/RegExp.h Sending JavaScriptCore/wtf/Platform.h Adding JavaScriptCore/yarr Adding JavaScriptCore/yarr/RegexCompiler.cpp Adding JavaScriptCore/yarr/RegexCompiler.h Adding JavaScriptCore/yarr/RegexInterpreter.cpp Adding JavaScriptCore/yarr/RegexInterpreter.h Adding JavaScriptCore/yarr/RegexJIT.cpp Adding JavaScriptCore/yarr/RegexJIT.h Adding JavaScriptCore/yarr/RegexParser.h Adding JavaScriptCore/yarr/RegexPattern.h Transmitting file data ............... Committed revision 42481.
Note You need to log in before you can comment on or make changes to this bug.