Page Comparison

...

This previous benchmarks allow the following conclusions:

By executing the aggregation When aggregating all objects stored in a MasterServer (low selectivity), an performance increase of 25x can be seen when aggregating via hash table lookups on the server-side and an increase of 75x can be seen when aggregating via hash table forEach iteration on the server-side. When neglecting the hash table structure and directly going over the Log, an increase of 340x can be seen.
When aggregating over a 10% subset of all objects stored in a performance improvement up to a factor 100x can be seen if one is using the hash table for the iteration. If one is directly traversing the Log, a performance improvement of up to a factor 450x can be seen (although it is questionable if a Log traversal would be appropriate for executing server-side data operations)MasterServer (high selectivity), an performance increase of 20x can be seen when aggregating via hash table lookups on the server-side and an increase of 6x can be seen when aggregating via hash table forEach iteration on the server-side. When neglecting the hash table structure and directly going over the Log, an increase of 33x can be seen when going over a total number of 10.000.000 objects.
Hash table lookups seem to be preferable over a forEach iteration when focusing on server-side aggregation via the hash table and having a high selectivity.
When traversing a set of distinct objects, retrieving a single object takes about 7-8?s (or a RAMCloud client can request about 130.000 objects/sec from a single RAMCloud server).
When invoking the hashTable forEach method the whole allocated memory for the hashtable has to be traversed. This is fine if the hashtable is densely packed with objects. In case of a sparse population with objects this introduces a penalty.

Versions Compared

Old Version 70

New Version 71

Key