Bug 15585 - speed up sparse arrays by using a custom map
Summary: speed up sparse arrays by using a custom map
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: JavaScriptCore (show other bugs)
Version: 523.x (Safari 3)
Hardware: Mac OS X 10.4
: P2 Enhancement
Assignee: Darin Adler
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-10-20 19:14 PDT by Darin Adler
Modified: 2007-10-21 13:26 PDT (History)
1 user (show)

See Also:


Attachments
patch with change log (15.90 KB, patch)
2007-10-20 19:29 PDT, Darin Adler
mjs: review+
Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Darin Adler 2007-10-20 19:14:18 PDT
We can speed up SunSpider 10% by making our sparse array support use an integer-based map instead of converting everything to and from strings.
Comment 1 Darin Adler 2007-10-20 19:29:34 PDT
Created attachment 16754 [details]
patch with change log
Comment 2 Maciej Stachowiak 2007-10-21 03:10:28 PDT
Comment on attachment 16754 [details]
patch with change log

The code looks good and the speed gain looks good so I'm saying r=me.

However, I don't think any of the current SunSpider test cases create true sparse arrays. What's happening is that they are making dense arrays which are bigger than our arbitrary sparse array cutoff. I think we should pick the storage strategy based on some measure of the density ratio of the array. In particular, if the array has even a quarter of its slots filled, then the direct storage is more space-efficient than the overflow map. Indeed it is probably good to use the dense storage at even as low as 1/8 of slots filled. We should consider extending storage length so long as this density ratio is maintained, and perhaps even be able to switch from the overflow map back to direct array access (and maybe vice versa) at some thresholds of density change. Would you be interested in that as a follow-up? I bet it would help even more on the two sunspider tests that are definitely hitting the sparse array threshold, 3d-morph and access-nsieve.

(Eventually we might have test cases that really do call for a sparse array - at that point it might be worth doing a more space-efficient implementation like the sparsetable class from Google's sparse_hash implementation: http://google-sparsehash.googlecode.com/svn/trunk/doc/sparsetable.html
Comment 3 Darin Adler 2007-10-21 08:21:57 PDT
(In reply to comment #2)
> if the array has even a quarter of its slots filled, then the
> direct storage is more space-efficient than the overflow map. Indeed it is
> probably good to use the dense storage at even as low as 1/8 of slots filled.
> We should consider extending storage length so long as this density ratio is
> maintained, and perhaps even be able to switch from the overflow map back to
> direct array access (and maybe vice versa) at some thresholds of density
> change. Would you be interested in that as a follow-up?

Yes, I'll do that now.
Comment 4 Darin Adler 2007-10-21 08:25:32 PDT
Committed revision 26847.
Comment 5 Darin Adler 2007-10-21 10:09:02 PDT
(In reply to comment #2)
> perhaps even be able to switch from the overflow map back to
> direct array access (and maybe vice versa) at some thresholds of density
> change

I started implementing, and I found that this part is quite difficult. Keeping enough information about the sparse attributes so that we can consider their density is extremely challenging.
Comment 6 Maciej Stachowiak 2007-10-21 12:39:07 PDT
(In reply to comment #5)
> (In reply to comment #2)
> > perhaps even be able to switch from the overflow map back to
> > direct array access (and maybe vice versa) at some thresholds of density
> > change
> 
> I started implementing, and I found that this part is quite difficult. Keeping
> enough information about the sparse attributes so that we can consider their
> density is extremely challenging.
> 

If you do the relatively simple thing and only track density from 0, and not some subrange from current lowest element, the information you need to determine the density is the array length, and a count of attributes that are not currently undefined, which means increase the count each time an array slot is set when it wasn't already, and decrease the count when an existing array property is deleted or set to undefined. I think.
Comment 7 Darin Adler 2007-10-21 12:56:04 PDT
(In reply to comment #6)
> If you do the relatively simple thing and only track density from 0, and not
> some subrange from current lowest element, the information you need to
> determine the density is the array length, and a count of attributes that are
> not currently undefined, which means increase the count each time an array slot
> is set when it wasn't already, and decrease the count when an existing array
> property is deleted or set to undefined. I think.

Yes, I figured out that part.

(Although my reading of the spec indicates that having a value of "undefined" is distinct from not having a value at all.)

You can take a look at my attempt at this and critique the details soon.
Comment 8 Maciej Stachowiak 2007-10-21 13:26:51 PDT
(In reply to comment #7)
> (In reply to comment #6)
> > If you do the relatively simple thing and only track density from 0, and not
> > some subrange from current lowest element, the information you need to
> > determine the density is the array length, and a count of attributes that are
> > not currently undefined, which means increase the count each time an array slot
> > is set when it wasn't already, and decrease the count when an existing array
> > property is deleted or set to undefined. I think.
> 
> Yes, I figured out that part.
> 
> (Although my reading of the spec indicates that having a value of "undefined"
> is distinct from not having a value at all.)
> 
> You can take a look at my attempt at this and critique the details soon.
> 

In most cases that is true, but in the case of arrays I believe that every value less than the current array length is always considered to exist and is just undefined if deleted or never set. (I did not double check the spec though).