Bug 34350 - [Symbian] More efficient aligned memory allocation for JSC Collector
Summary: [Symbian] More efficient aligned memory allocation for JSC Collector
Status: CLOSED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 528+ (Nightly build)
Hardware: S60 Hardware S60 3rd edition
: P2 Major
Assignee: Siddharth Mathur
URL:
Keywords: Performance, Qt
Depends on:
Blocks: 34208 35784
  Show dependency treegraph
 
Reported: 2010-01-29 15:09 PST by Siddharth Mathur
Modified: 2010-05-07 00:51 PDT (History)
10 users (show)

See Also:


Attachments
First stab (16.56 KB, patch)
2010-03-11 12:10 PST, Siddharth Mathur
no flags Details | Formatted Diff | Diff
Log from device (135.47 KB, text/plain)
2010-03-12 14:58 PST, Siddharth Mathur
no flags Details
Round 2 (16.56 KB, patch)
2010-03-17 15:27 PDT, Siddharth Mathur
laszlo.gombos: review-
Details | Formatted Diff | Diff
Round 3 (16.89 KB, patch)
2010-03-18 12:56 PDT, Siddharth Mathur
no flags Details | Formatted Diff | Diff
Round 4 is a charm :) (16.99 KB, patch)
2010-03-18 17:51 PDT, Siddharth Mathur
no flags Details | Formatted Diff | Diff
Same as v4 with a correctly generated patch file (16.43 KB, patch)
2010-03-18 23:15 PDT, Laszlo Gombos
no flags Details | Formatted Diff | Diff
Final patch (I hope) (16.86 KB, patch)
2010-03-19 06:08 PDT, Siddharth Mathur
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Siddharth Mathur 2010-01-29 15:09:22 PST
The 64k-aligned block requests being made using UserHeap::ChunkHeap() in Collector.cpp are inefficient. 

This is causing the error condition ...
 if (!userChunk)
        CRASH();

to show up frequently even when device has lots of free RAM left. 

We likely need to use RChunk.CreateDisconnectedLocal() and associated commit() and decommit() to..
a) efficiently request alignment with exotic 64k requirement
b) return committed memory back to system pool when Heap::freeBlock() is run.
Comment 1 Siddharth Mathur 2010-01-29 15:16:20 PST
Can someone please assign this bug to me, I am working on a fix.
Comment 2 Siddharth Mathur 2010-03-11 12:10:44 PST
Created attachment 50527 [details]
First stab 

First stab at more efficient memory management on Symbian for JSC's Collector blocks. I am adding this patch as an "early early preview" , it hasn't been tested at all (yet) after my recent refactoring. 

This work aims to make the Javascript engine return RAM to the system more quickly when JSC GC fires, so that app stability+endurance improves. Also, the aim is to reduce fragmentation of UserHeap/RHeap when 64k aligned blocks are requested. 

A walk through the patch: 
a) Collector.cpp now has less Symbian-only code since most of logic is inside new Symbian-only class AlignedBlockAllocator 
b) Collector.cpp now calls AlignedBlockAllocator's alloc(), free(), and destroy() methods to obtain and release BLOCK_SIZE-sized blocks which are also BLOCK_SIZE-aligned (64kB on Symbian) 
c) Collector.h adds a AlignedBlockAllocator* private data member which gets initialized in JSC::Heap's constructor
d) Collector.h adds COLLECTOR_VIRTUALMEM_RESERVATION to hold maximum virtual memory range to be reserved by Symbian RChunk API (currently 32MB for HW, should be 2-4 MB for Emulator) 
e) New files pagedallocator.cpp and pagedallocator.h added to newly created wtf/symbian/
f) new AlignedBlockAllocator class creates a Symbian "chunk" which is a way to reserve a range of virtual addresses, and uses a Map (of bits) to divvy the range into smaller BLOCK_SIZE increments
g) whenever alloc() gets called, a BLOCK_SIZE block is committed , and its addressed returned. 
h) whenever free() gets called, the block is de-committed (unpaged) and the system free RAM goes up immediately. 

TODO: 
i) Thread safety via RFastLock is needed in AlignedBlockAllocator. Currrently, Symbian port is not using multiple JSC threads (is it?), so this probably not super critical
ii) COLLECTOR_VIRTUALMEM_RESERVATION should be automatically set to a small value (e.g 4MB) for WINSCW (emulator) builds. Can we do this via Qmake? 
iii) Sanity check on HW. 
iv) Measure free system RAM using current implementation vs. new implementation when Anomaly/QtLauncher is running V8 benchmarks on Symbian. 
v) cleanup, before patch is worthy of r?
Comment 3 Janne Koskinen 2010-03-12 06:46:09 PST
Having a separate heap for collector cpp is excellent idea and this should be done, your approach is quite hardcore :)

What was the the technical limitation preventing from using RHeap? You say alignment issues, but RHeap cells are aligned and if you only allocate in fixed size blocks (multiple of page size) like here it would work as expected?
Emulators Lazy MMU is not a reason even if free() would take some time.

We need to add some documentation for application developers that their epocheapsize will not be the maximum memory consumed by webkit based app, but a significant portion of system RAM will be consumed by a separate heap. ofc. this is already true with font&bitmap server, but I think it is worth a mention considering the propotions i.e. on 3.1 32MB is more than devices have free in the system at boot i.e. you allow single javascript to eat up everything.

I look forward to see some profiling data.

I'll re-read the code on monday. It needs a lot of prettying like removing all references to qDebug, but I'm sure you knew this and just wanted a early opinions of the idea.
Comment 4 Siddharth Mathur 2010-03-12 14:58:20 PST
Created attachment 50637 [details]
Log from device

Log dump from ARMV5 UREL build showing difference in UserHeap vs self-managed RChunk
Comment 5 Janne Koskinen 2010-03-15 02:03:35 PDT
Allocs/deallocs look good from the log. Did you try using RHeap (not the user heap but a separate heap for collector) ? I'm sceptical of having own heap implementation when there is a heap implementation existing in the system that is known to work. Did you run performance profiling ? i.e. does the instant deallocation cause performance hit compared to the lazy deallocation done by RHeap ?

Code looks good, needs bit tidying.
Comment 6 Janne Koskinen 2010-03-15 02:28:22 PDT
Comment on attachment 50527 [details]
First stab 

> Index: JavaScriptCore/runtime/Collector.cpp
> -    userChunk->Free(reinterpret_cast<TAny*>(block));
> +    m_blockallocator->free(reinterpret_cast<void*>(block));

Why not use the TAny paradigm ?

> +    return (void*)info.iBase;

reintepret

> +    const size_t BLOCK_SIZE = 64 * 1024; // 64k

I want to profile various block sizes on device and their impact.

> +    // Enough virtual address range is available on HW, not much on Emulator.  
> +    const size_t COLLECTOR_VIRTUALMEM_RESERVATION = 32*1024*1024;

Way too much for 3.x devices. Do you know how much is available for the new C5 after boot ?

> +++ JavaScriptCore/wtf/symbian/pagedallocator.cpp	(revision 0)
> +     // Get system's page size value.
> +     SYMBIAN_PAGESIZE(m_pageSize); 

called once no need to macro this.

> +     __ASSERT_ALWAYS(SYMBIAN_ROUNDUPTOMULTIPLE(m_blockSize, m_pageSize), User::Panic(_L("AlignedBlockAllocator1"), KErrArgument));

Use webkit's assertions ?

> +     qDebug("AlignedBlockAllocator: chunk handle = %d,  m_chunk.Base() = %d, m_offset = %d ", m_chunk.Handle(), m_chunk.Base(), m_offset);

Cannot be used in webkit dll. Even if built for Qt (there is ABI issue in Symbian).

> +void* AlignedBlockAllocator::alloc()
> +        SYMBIAN_FREERAM(freeRam);
> +        qDebug("AlignedBlockAllocator::Alloc pseudo-OOM %d kB", freeRam / 1024);

Largest available cell would be the interesting value, not how much RAM in total we have available.

> +    TInt ret = m_chunk.Decommit(m_offset + m_blockSize * idx, m_blockSize);
> +    if ( ret != KErrNone )
> +       qDebug("AlignedBlockAllocator::Decommit %d", ret);

what could be done ? webkit code tends to ignore errors, so I guess this is fine.

> +void AlignedBlockAllocator::destroy() 
> +{
> +    for (TInt i = 0 ; i < m_map.numBits; i++) { 
> +        if (m_map.get(i)) { 
> +            TInt ret = m_chunk.Decommit(m_offset + i* m_blockSize, m_blockSize);
> +            if (ret == KErrNone) 
> +                m_map.clear(i);

Can't you just close the chunk and be done with it ? or will this leave data allocated?

> +    m_map.bits = 0;

d'tor 0ing is always useless assuming non-linear allocator.

> +#ifndef pagedallocator_h

> +        ~AlignedBlockAllocator();

virtual ?

> +        void destroy();
> +            TInt findFree()

can be set to const
Comment 7 Siddharth Mathur 2010-03-15 18:55:23 PDT
(In reply to comment #5)
> Allocs/deallocs look good from the log. Did you try using RHeap (not the user
> heap but a separate heap for collector) ? I'm sceptical of having own heap
> implementation when there is a heap implementation existing in the system that
> is known to work. Did you run performance profiling ? i.e. does the instant
> deallocation cause performance hit compared to the lazy deallocation done by
> RHeap ?
> 

Blocks used by collector don't have to be on the application's heap at all, for any port. This is because JSC::Heap hides the implementation detail of packing lots of Cell objects back to back on a large contiguous block of aligned memory. The class name JSC::Heap is a little misleading IMHO ;) 

The new memory allocator be in introduced in Symbian^4 (RHeap+UserHeap replacement) will not support alignment requests of 64kB, according to Andrew Thoelke, the author of much of that code. So decoupling JSC Collector's memory management needs from RHeap/UserHeap is the most portable option for existing 3.x-5.0 devices, upcoming Symbian^3 and Symbian^4 products based on my discussions with him.

More importantly, the "Webkit CRASH" panic dialog from CRASH() macro currently shows up in even very simple JS use cases, with lots of free RAM left on the device, so UserHeap is not scaling well at all (due to what the logs show).
Comment 8 Siddharth Mathur 2010-03-17 08:57:26 PDT
For anyone trying out the patch, I must point out that AlignedBlockAllocator::free(), the divisor in the first line of code where "idx" is calculated should be m_blockSize (and not m_offset). Sorry about the error. 

I will upload an updated patch later today with Janne's feedback incorporated.
Comment 9 Siddharth Mathur 2010-03-17 15:27:18 PDT
Created attachment 50966 [details]
Round 2 

v2 of patch with 1 problem fixed in free() implementation. 
Except for removing qDebugs, this is ready to be opened up for r? from my side.
Comment 10 WebKit Review Bot 2010-03-18 08:23:23 PDT
Attachment 50966 [details] did not pass style-queue:

Failed to run "WebKitTools/Scripts/check-webkit-style" exit_code: 1
JavaScriptCore/runtime/Collector.h:195:  BLOCK_SIZE is incorrectly named. Don't use underscores in your identifier names.  [readability/naming] [4]
JavaScriptCore/wtf/symbian/pagedallocator.cpp:29:  Found header this file implements before WebCore config.h. Should be: config.h, primary header, blank line, and then alphabetically sorted.  [build/include_order] [4]
Total errors found: 2 in 5 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 11 Laszlo Gombos 2010-03-18 09:29:36 PDT
Comment on attachment 50966 [details]
Round 2 

Great patch Siddharth !

The direction is good to optimize the Collector allocator to Symbian and move some Symbian logic out from Collector.cpp and isolate it into its own class.

Maybe BlockAllocatorSymbian (with capitalization) is a better name than pagedallocator.h. It is good to carry the Platform name in the file name as well, so avoid name collisions with same file names in different directories.  

> Index: JavaScriptCore/runtime/Collector.cpp
> ===================================================================
> -#if OS(SYMBIAN)
> -const size_t MAX_NUM_BLOCKS = 256; // Max size of collector heap set to 16 MB
> -static RHeap* userChunk = 0;
> -#endif

> -#endif // OS(SYMBIAN)
> +    // A convenience class to allocate aligned blocks efficiently
> +    m_blockallocator = new WTF::AlignedBlockAllocator(RESERVATION, BLOCK_SIZE);
> +#endif

Moving the chunk from static to a member seems like a good idea.

>  #elif OS(SYMBIAN)
> -    static void* stackBase = 0;
> -    if (stackBase == 0) {
> -        TThreadStackInfo info;
> -        RThread thread;
> -        thread.StackInfo(info);
> -        stackBase = (void*)info.iBase;
> -    }
> -    return (void*)stackBase;
> +    TThreadStackInfo info;
> +    RThread thread;
> +    thread.StackInfo(info);
> +    return reinterpret_cast<void*>(info.iBase);

This change does not seems to be strictly related to the rest of the patch, caching the stackBase might be a good idea. I would propose to take this change out from this patch.

> Index: JavaScriptCore/runtime/Collector.h
> ===================================================================
> -#if OS(WINCE) || OS(SYMBIAN)
> +#if OS(WINCE)
>      const size_t BLOCK_SIZE = 64 * 1024; // 64k
> +#elif OS(SYMBIAN)
> +    const size_t BLOCK_SIZE = 64 * 1024; // 64k
> +    // Virtual address range to reserve   
> +    const size_t RESERVATION = JSCCOLLECTOR_VIRTUALMEM_RESERVATION;
>  #else
>      const size_t BLOCK_SIZE = 64 * 4096; // 256k
>  #endif

Can we move RESERVATION = JSCCOLLECTOR_VIRTUALMEM_RESERVATION part into Collector.cpp ? 

> Index: JavaScriptCore/wtf/symbian/pagedallocator.cpp
> ===================================================================

I think it would be desired to remove Qt dependency from the Allocator. We should either use the WebKit logging mechanism or remove tracing.

> --- JavaScriptCore/wtf/symbian/pagedallocator.cpp	(revision 0)
> +++ JavaScriptCore/wtf/symbian/pagedallocator.cpp	(revision 0)
> @@ -0,0 +1,138 @@
> +/*
> + * Copyright (C) 2010 Nokia Corporation and/or its subsidiary(-ies)
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + *
> + * 1.  Redistributions of source code must retain the above copyright
> + *     notice, this list of conditions and the following disclaimer. 
> + * 2.  Redistributions in binary form must reproduce the above copyright
> + *     notice, this list of conditions and the following disclaimer in the
> + *     documentation and/or other materials provided with the distribution. 
> + * 3.  Neither the name of Apple Computer, Inc. ("Apple") nor the names of
> + *     its contributors may be used to endorse or promote products derived
> + *     from this software without specific prior written permission. 
> + *
> + * THIS SOFTWARE IS PROVIDED BY APPLE AND ITS CONTRIBUTORS "AS IS" AND ANY
> + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
> + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
> + * DISCLAIMED. IN NO EVENT SHALL APPLE OR ITS CONTRIBUTORS BE LIABLE FOR ANY
> + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
> + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
> + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#include "pagedallocator.h"
> +#include <QDebug>
> +
> +namespace WTF {
> +
> +/** Efficiently allocates blocks of size blockSize with blockSize alignment. 
> + * Primarly designed for JSC Collector's needs. 
> + * Not thread-safe.    
> + */
> +AlignedBlockAllocator::AlignedBlockAllocator(TUint32 reservationSize, TUint32 blockSize )
> +    : m_reservation(reservationSize), 
> +      m_blockSize(blockSize)
> +{
> +
> +     // Get system's page size value.
> +     SYMBIAN_PAGESIZE(m_pageSize); 
> +     // We only accept multiples of system page size for both initial reservation and the alignment/block size
> +     m_reservation = SYMBIAN_ROUNDUPTOMULTIPLE(m_reservation, m_pageSize);
> +     __ASSERT_ALWAYS(SYMBIAN_ROUNDUPTOMULTIPLE(m_blockSize, m_pageSize), User::Panic(_L("AlignedBlockAllocator1"), KErrArgument));
> +     
> +     // Calculate max. bit flags we need to carve a reservationSize range into blockSize-sized blocks
> +     m_map.numBits = m_reservation / m_blockSize;   
> +     const TUint32 bitsPerWord = 8*sizeof(TUint32); 
> +     const TUint32 numWords = (m_map.numBits + bitsPerWord -1) / bitsPerWord; 
> +     qDebug("AlignedBlockAllocator: %d bit flags (%d integers) to represent %d MB reservation with %d block size", 
> +             m_map.numBits, numWords, m_reservation / (1024*1024), m_blockSize);
> +
> +     m_map.bits = new TUint32[numWords];
> +     __ASSERT_ALWAYS(m_map.bits, User::Panic(_L("AlignedBlockAllocator2"), KErrNoMemory));
> +     m_map.clearAll();
> +     
> +     // Open a Symbian RChunk, and reserve requested virtual address range   
> +     // Any thread in this process can operate this rchunk due to EOwnerProcess access rights. 
> +     TInt ret = m_chunk.CreateDisconnectedLocal(0 , 0, (TInt)m_reservation , EOwnerProcess);  
> +     if (ret != KErrNone) 
> +         User::Panic(_L("AlignedBlockAllocator3"), ret);
> +       
> +     // This is the offset to m_chunk.Base() required to make it m_blockSize-aligned
> +     m_offset = SYMBIAN_ROUNDUPTOMULTIPLE(TUint32(m_chunk.Base()), m_blockSize) - TUint(m_chunk.Base()); 
> +     qDebug("AlignedBlockAllocator: chunk handle = %d,  m_chunk.Base() = %x, m_offset = %d ", m_chunk.Handle(), m_chunk.Base(), m_offset);
> +
> +}
> +
> +void* AlignedBlockAllocator::alloc()
> +{
> +
> +    TInt  freeRam = 0; 
> +    void* address = 0;
> +    
> +    // Look up first free slot in bit map
> +    const TInt freeIdx = m_map.findFree();
> +        
> +    if (freeIdx < 0) {
> +        // All bits are already set ("Committed"), so no more blocks 
> +        // can be carved out of the virtual address range.
> +        SYMBIAN_FREERAM(freeRam);
> +        qDebug("AlignedBlockAllocator::Alloc pseudo-OOM %d kB", freeRam / 1024);
> +        return address;

For readability it might be better to explicitly return 0 here.

> +    }
> +        
> +    qDebug("AlignedBlockAllocator: freeIdx %d, Committing at offset = %d", freeIdx, m_offset + (m_blockSize * freeIdx));
> +    TInt ret = m_chunk.Commit(m_offset + (m_blockSize * freeIdx), m_blockSize);
> +    if (ret != KErrNone) { 
> +        qDebug("AlignedBlockAllocator Commit: %d", ret);
> +        return address;

Same here.

> +    }
> +        
> +    // Updated bit to mark region as in use. 
> +    m_map.set(freeIdx); 
> +    
> +    // Calculate address of committed region (block)
> +    address = (void*)( (m_chunk.Base() + m_offset) + (TUint)(m_blockSize * freeIdx) );
> +    qDebug("AlignedBlockAllocator alloc:  address = %x, m_chunk.Base() = %d, m_offset = %d", address, m_chunk.Base(), m_offset);
> +    
> +    return address;
> +}
> +
> +void AlignedBlockAllocator::free(void* block)
> +{
> +    // Calculate index of block to be freed
> +    TInt idx = TUint(static_cast<TUint8*>(block) - m_chunk.Base() - m_offset) / m_blockSize;
> +    qDebug("AlignedBlockAllocator free: calculated idx %d for block %x using m_chunk.Base() = %x, m_offset = %d", idx, static_cast<TUint8 *>(block), m_chunk.Base(), m_offset);
> +    
> +    __ASSERT_DEBUG(idx >= 0 && idx < m_map.numBits, User::Panic(_L("AlignedBlockAllocator4"), KErrCorrupt)); // valid index check
> +    __ASSERT_DEBUG(m_map.get(idx), User::Panic(_L("AlignedBlockAllocator5"), KErrCorrupt)); // in-use flag check    
> +    
> +    // Return committed region to system RAM pool (the physical RAM becomes usable by others)
> +    TInt ret = m_chunk.Decommit(m_offset + m_blockSize * idx, m_blockSize);
> +    if ( ret != KErrNone )
> +       qDebug("AlignedBlockAllocator::Decommit %d", ret);
> +        
> +    // mark this available again
> +    m_map.clear(idx); 
> +}
> +
> +void AlignedBlockAllocator::destroy() 
> +{
> +    // release everything!
> +    m_chunk.Decommit(0, m_chunk.MaxSize());

Maybe clear the m_map bitflag here as well.

> +}
> +
> +AlignedBlockAllocator::~AlignedBlockAllocator()
> +{
> +    destroy();
> +    m_chunk.Close();
> +    delete [] m_map.bits;
> +}
> +
> +} // end of namespace
> +
> Index: JavaScriptCore/wtf/symbian/pagedallocator.h
> ===================================================================
> --- JavaScriptCore/wtf/symbian/pagedallocator.h	(revision 0)
> +++ JavaScriptCore/wtf/symbian/pagedallocator.h	(revision 0)
> @@ -0,0 +1,120 @@
> +/*
> + * Copyright (C) 2010 Nokia Corporation and/or its subsidiary(-ies)
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + *
> + * 1.  Redistributions of source code must retain the above copyright
> + *     notice, this list of conditions and the following disclaimer. 
> + * 2.  Redistributions in binary form must reproduce the above copyright
> + *     notice, this list of conditions and the following disclaimer in the
> + *     documentation and/or other materials provided with the distribution. 
> + * 3.  Neither the name of Apple Computer, Inc. ("Apple") nor the names of
> + *     its contributors may be used to endorse or promote products derived
> + *     from this software without specific prior written permission. 
> + *
> + * THIS SOFTWARE IS PROVIDED BY APPLE AND ITS CONTRIBUTORS "AS IS" AND ANY
> + * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
> + * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
> + * DISCLAIMED. IN NO EVENT SHALL APPLE OR ITS CONTRIBUTORS BE LIABLE FOR ANY
> + * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> + * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
> + * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
> + * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
> + * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + */
> +
> +#ifndef pagedallocator_h
> +#define pagedallocator_h
> +
> +#include <e32cmn.h>
> +#include <e32std.h>
> +#include <hal.h>
> +
> +
> +#define SYMBIAN_PAGESIZE(x) (HAL::Get(HALData::EMemoryPageSize, x));
> +#define SYMBIAN_FREERAM(x)  (HAL::Get(HALData::EMemoryRAMFree, x));
> +#define SYMBIAN_ROUNDUPTOMULTIPLE(x, multipleof)    ( (x + multipleof - 1) & ~(multipleof - 1) )
> +
> +// Set sane defaults if -D<flagname=value> wasn't provided via compiler args
> +#ifndef JSCCOLLECTOR_VIRTUALMEM_RESERVATION
> +#if defined(__WINS__) 
> +    // Emulator has limited virtual addresses space
> +    #define JSCCOLLECTOR_VIRTUALMEM_RESERVATION (4*1024*1024)
> +#else
> +    // HW has plenty of virtual addresses
> +    #define JSCCOLLECTOR_VIRTUALMEM_RESERVATION (64*1024*1024)
> +#endif
> +#endif
> +
> +namespace WTF {
> +
> +/** 
> + *  Allocates contiguous region of size blockSize with blockSize-aligned address. 
> + *  blockSize must be a multiple of system page size (typically 4K on Symbian/ARM)
> + *
> + *  @param reservationSize Virtual address range to be reserved upon creation of chunk (bytes).
> + *  @param blockSize Size of a single allocation. Returned address will also be blockSize-aligned.
> + */
> +class AlignedBlockAllocator {
> +    public:
> +        AlignedBlockAllocator(TUint32 reservationSize, TUint32 blockSize);
> +        ~AlignedBlockAllocator();
> +        void destroy();
> +        void* alloc();
> +        void free(void* data);
> +    
> +    private: 
> +        RChunk   m_chunk; // Symbian chunk that lets us reserve/commit/decommit
> +        TUint    m_offset; // offset of first committed region from base 
> +        TInt     m_pageSize; // cached value of system page size, typically 4K on Symbian
> +        TUint32  m_reservation;
> +        TUint32  m_blockSize;  
> +
> +        // Tracks comitted/decommitted state of a blockSize region
> +        struct {
> +            
> +            TUint32 *bits; // array of bit flags 
> +            TUint32  numBits; // number of regions to keep track of
> +            
> +            bool get(TUint32 n) const
> +            {
> +                return !!(bits[n >> 5] & (1 << (n & 0x1F)));
> +            }
> +            
> +            void set(TUint32 n)
> +            {
> +                bits[n >> 5] |= (1 << (n & 0x1F));
> +            }
> +            
> +            void clear(TUint32 n)
> +            {
> +                bits[n >> 5] &= ~(1 << (n & 0x1F));
> +            }
> +            
> +            void clearAll()
> +            {
> +               for (TUint32 i = 0; i < numBits; i++)
> +                    clear(i);
> +            }
> +            
> +            TInt findFree() const
> +            {
> +                for (TUint32 i = 0; i < numBits; i++) {
> +                    if (!get(i)) 
> +                        return i;
> +                }
> +                return -1;
> +            }

I would prefer to have some of these functions implemented in a cpp file not in a .h file, but I noticed Collector.h has a similar model. Up to you if you want to change this or not.

> +           
> +        } m_map;  
> +
> +};
> + 
> +}
> +
> +#endif
> +
> +


The style bot also picked up a few minor issues.

I'm OK with this change if you successfully run jsc module test with this change. Can you confirm ? 

Also, do you have performance numbers (before and after) with this change ? 

r- from me, just to see if some of the comments can be addressed, but the approach and the algorithm looks very good.
Comment 12 Laszlo Gombos 2010-03-18 10:01:33 PDT
One for comment, the patch of course needs a ChangeLog with a proper description.
Comment 13 Siddharth Mathur 2010-03-18 12:56:22 PDT
Created attachment 51077 [details]
Round 3

Round 3 with feedback incorporated. Thanks!
Comment 14 Simon Hausmann 2010-03-18 13:54:17 PDT
> +    # block allocator for JSC Collector's needs 
> +    SOURCES += wtf/symbian/BlockAllocatorSymbian.cpp
> +    INCLUDEPATH += wtf/symbian/BlockAllocatorSymbian.h

It should be HEADERS instead of INCLUDEPATH.


> +#include <wtf/symbian/pagedallocator.h>

This should be a studily caps inclusion.

> +const size_t RESERVATION_SIZE = JSCCOLLECTOR_VIRTUALMEM_RESERVATION;

I think coding style says camel case here.

> +    // A convenience class to allocate aligned blocks efficiently
> +    m_blockallocator = new WTF::AlignedBlockAllocator(RESERVATION_SIZE, BLOCK_SIZE);
> +#endif

Out of curiousity: Any reason to allocate the allocator itself on the heap? Why
not store it as plain member? :)


> +#if OS(SYMBIAN)
> +#include <wtf/symbian/pagedallocator.h>
> +#endif

If you store the allocator as pointer, then a forward declaration is enough
here.
Otherwise this inclusion is indeed needed, but then using studily caps :)
Comment 15 Siddharth Mathur 2010-03-18 17:51:31 PDT
Created attachment 51116 [details]
Round 4 is a charm :)

- Addressed Simon's feedback. 
- Fixed bad header inclusion 
- Changed the way JavascriptCore.pri includes the source file
- Added guard OS(SYMBIAN) flag around all of BlockAllocatorSymbian.cpp
Comment 16 Laszlo Gombos 2010-03-18 23:13:42 PDT
Comment on attachment 51116 [details]
Round 4 is a charm :)

Changes are great, but it looks like the patch file is generated incorrectly.

I will cancel the review and and fix up the patch file.
Comment 17 Laszlo Gombos 2010-03-18 23:15:24 PDT
Created attachment 51123 [details]
Same as v4 with a correctly generated patch file
Comment 18 Siddharth Mathur 2010-03-19 06:00:14 PDT
(In reply to comment #17)
> Created an attachment (id=51123) [details]
> Same as v4 with a correctly generated patch file

Bugzilla denies me rights to make it r?. Would need an Admin or Laszlo to make it so...
Comment 19 Siddharth Mathur 2010-03-19 06:08:52 PDT
Created attachment 51148 [details]
Final patch (I hope) 

Leading Webkit/ removed from paths in the patch file, per Laszlo's suggestion.
Comment 20 Simon Hausmann 2010-03-19 06:42:38 PDT
Comment on attachment 51148 [details]
Final patch (I hope) 

A few minor coding style nitpicks. I don't think they should block the review, but they should be fixed before landing.

> +AlignedBlockAllocator::AlignedBlockAllocator(TUint32 reservationSize, TUint32 blockSize )

Space before ).

> +     const TUint32 bitsPerWord = 8*sizeof(TUint32); 

Space before and after *

> +    // Calculate address of committed region (block)
> +    address = (void*)( (m_chunk.Base() + m_offset) + (TUint)(m_blockSize * freeIdx) );

Space between ) ) and ( (.

> +            TUint32 *bits; // array of bit flags 

* placement.
Comment 21 Laszlo Gombos 2010-03-19 07:39:35 PDT
I think we should have some sunspider numbers before we consider landing this patch. Otherwise it looks good to me.
Comment 22 Siddharth Mathur 2010-03-19 16:30:50 PDT
(In reply to comment #21)
> I think we should have some sunspider numbers before we consider landing this
> patch. Otherwise it looks good to me.

Since this patch is about more efficient memory management, I wasn't expecting a speed test like Sunspider to be a useful yardstick. But I ran it anyway because it is good to know that things aren't slower. 

Strangely enough, after using the proposed patch, Sunspider is actually faster in my testing on N97 with Anomaly browser: 
QtWebkit 4.6.2 vanilla: total time = 86937 ms
QtWebkit 4.6.2+patch : total time  = 59041 ms

Specifically, the bitops, controlflow and string tests are much faster. Since I am finding such speed improvement hard to believe, I will ask some colleague to run it independently. :)

As far as memory usage goes with Sunspider, the peak RAM of Anomaly usage goes down just a bit with the patched version from 10.9MB to 10.5MB. Average RAM usage goes down from 7MB to 6.6MB. This small decrease is expected from Sunspider because it uses only 1 .js file, which causes empty block to occur less frequently after garbage collection. 

V8 benchmark shows much better results for memory usage since it has 1 JS script for each test, and hence you see many blocks getting released back to system pool whenever the next test starts starts. The log dump I attached previously to this bug was from a V8 test run, you can see JSC::Heap::freeBlock() being called many times to release blocks to system pool.
Comment 23 Siddharth Mathur 2010-03-22 08:00:09 PDT
(In reply to comment #22)
 
> Strangely enough, after using the proposed patch, Sunspider is actually faster
> in my testing on N97 with Anomaly browser: 
> QtWebkit 4.6.2 vanilla: total time = 86937 ms
> QtWebkit 4.6.2+patch : total time  = 59041 ms

I rested the Sunspider numbers with Qt 4.6.0, and the numbers with the patch are still lower. (total 91191ms with 4.6.0 vanilla, 58419ms with the 4.6.2+patch binary) .
Comment 24 Laszlo Gombos 2010-03-22 08:19:12 PDT
Comment on attachment 51148 [details]
Final patch (I hope) 

lgtm, great patch, thanks !
Comment 25 WebKit Commit Bot 2010-03-22 08:59:19 PDT
Comment on attachment 51148 [details]
Final patch (I hope) 

Rejecting patch 51148 from commit-queue.

Failed to run "['WebKitTools/Scripts/run-webkit-tests', '--no-launch-safari', '--exit-after-n-failures=1', '--quiet']" exit_code: 1
Running build-dumprendertree
Compiling Java tests
make: Nothing to be done for `default'.
Running tests from /Users/eseidel/Projects/CommitQueue/LayoutTests
Testing 12518 test cases.
fast/loader/cancel-load-during-port-block-timer.html -> failed

Exiting early after 1 failures. 7708 tests run.
130.34s total testing time

7707 test cases (99%) succeeded
1 test case (<1%) had incorrect layout
3 test cases (<1%) had stderr output

Full output: http://webkit-commit-queue.appspot.com/results/1005107
Comment 26 Siddharth Mathur 2010-03-22 09:22:20 PDT
(In reply to comment #25)
> (From update of attachment 51148 [details])
> Rejecting patch 51148 from commit-queue.
> 
> Failed to run "['WebKitTools/Scripts/run-webkit-tests', '--no-launch-safari',
> '--exit-after-n-failures=1', '--quiet']" exit_code: 1
> Running build-dumprendertree
> Compiling Java tests
> make: Nothing to be done for `default'.
> Running tests from /Users/eseidel/Projects/CommitQueue/LayoutTests
> Testing 12518 test cases.
> fast/loader/cancel-load-during-port-block-timer.html -> failed
> 
> Exiting early after 1 failures. 7708 tests run.
> 130.34s total testing time
> 
> 7707 test cases (99%) succeeded
> 1 test case (<1%) had incorrect layout
> 3 test cases (<1%) had stderr output
> 
> Full output: http://webkit-commit-queue.appspot.com/results/1005107

Umm, after taking a quick look at cancel-load-during-port-block-timer.html, I am not sure if this is related to the Symbian-only nature of the patch...?
Comment 27 Eric Seidel (no email) 2010-03-22 10:17:46 PDT
Comment on attachment 51148 [details]
Final patch (I hope) 

Looks like a flaky test: bug 36425.
Comment 28 WebKit Commit Bot 2010-03-22 17:28:35 PDT
Comment on attachment 51148 [details]
Final patch (I hope) 

Clearing flags on attachment: 51148

Committed r56370: <http://trac.webkit.org/changeset/56370>
Comment 29 WebKit Commit Bot 2010-03-22 17:28:42 PDT
All reviewed patches have been landed.  Closing bug.
Comment 30 Balazs Kelemen 2010-03-31 03:45:16 PDT
/** Efficiently allocates blocks of size blockSize with blockSize alignment. 
 * Primarly designed for JSC Collector's needs. 
 * Not thread-safe.    
 */

Is it ok here to be not thread safe? As I know JavaScripCore is thread safe, what is needed by (for example) workers.
Comment 31 Siddharth Mathur 2010-03-31 15:40:17 PDT
(In reply to comment #30)
> /** Efficiently allocates blocks of size blockSize with blockSize alignment. 
>  * Primarly designed for JSC Collector's needs. 
>  * Not thread-safe.    
>  */
> 
> Is it ok here to be not thread safe? As I know JavaScripCore is thread safe,
> what is needed by (for example) workers.

Hi Balazs, 
The rationale was that since JSC_MULTIPLE_THREADS is OFF on Symbian in Collector.cpp and friends, the addition of thread safety can wait a bit. I will check if Web Workers is affected, because seems to be enabled in Qt 4.6.2 for Symbian (the latest Qt release) . 
Bottom line is that if the same JSC::Heap object is used by many JSC threads concurrently, then RFastLock based protection needs to be added in the alloc() code when the bit map is searched for next available slot.
Comment 32 Siddharth Mathur 2010-04-01 12:48:44 PDT
(In reply to comment #31)
> (In reply to comment #30)
> > /** Efficiently allocates blocks of size blockSize with blockSize alignment. 
> >  * Primarly designed for JSC Collector's needs. 
> >  * Not thread-safe.    
> >  */
> > 
> > Is it ok here to be not thread safe? As I know JavaScripCore is thread safe,
> > what is needed by (for example) workers.
> 
> Hi Balazs, 
> The rationale was that since JSC_MULTIPLE_THREADS is OFF on Symbian in
> Collector.cpp and friends, the addition of thread safety can wait a bit. I will
> check if Web Workers is affected, because seems to be enabled in Qt 4.6.2 for
> Symbian (the latest Qt release) . 
> Bottom line is that if the same JSC::Heap object is used by many JSC threads
> concurrently, then RFastLock based protection needs to be added in the alloc()
> code when the bit map is searched for next available slot.

After quick check of Web Workers using the URL below, with Qt 4.6.2 on Symbian/S60 emulator, I see that each Worker thread gets its on own JSC::Heap object. So thread-safety isn't likely required in the short term. 

[1]http://www.whatwg.org/demos/workers/stocks/page.html

Code flow (for each new Worker thread) : WorkerScriptController c'tor -> JSCGlobalData::create(isShared=false) -> JSC::Heap c'tor
Comment 33 Balazs Kelemen 2010-04-01 15:54:05 PDT
Good to know. Thank you for sharing your founds with me :)
Comment 34 Simon Hausmann 2010-05-07 00:51:34 PDT
for reference: this "feature" cherry-picked into qtwebkit-4.6 with commit ecfa4583e573ce4dff1f0df12f6bdba3022376e5