Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3

...

#number of objects

client-side aggregation

server-side aggregation
  via hash table lookup

server-side aggregation
 via hash table forEach

server-side aggregation
  via Log traversal

10.000

63 47 ms

1 ms

74 ms

6 ms

100.000

648 480 ms

12 ms

84 ms

9 ms

1.000.000

6485 4790 ms

127 ms

168 ms

21 ms

10.000.000

64258 48127 ms

1378 ms

781 ms

142 ms

100.000.000

652201 485091 ms

19854 ms

6245 ms

1422 ms

Benchmarks over a subset (10%) of stored objects (high selectivity)

In this set of benchmarks, only a subset of 10% of the objects which are stored in a MasterServer are included in the aggregation operation. Consequently, this means that of if 1.000.000 objects are aggregated, there are 10.000.000 objects stored in total within a MasterServer.

#number of objects

client-side aggregation

server-side aggregation
  via hash table lookup

server-side aggregation
 via hash table forEach

server-side aggregation
  via Log traversal

10.000

48 ms

1 ms

82 ms

6 ms

100.000

486 ms

15 ms

174 ms

22 ms

1.000.000

4865 ms

169 ms

618 ms

153 ms

10.000.000

49223 ms

2481 ms

7565 ms

1465 ms

Benchmarks over a subset (0.5%) of stored objects (very high selectivity)

In this set of benchmarks, only a subset of 0.5% of the objects which are stored in a MasterServer are included in the aggregation operation. Consequently, this means that if 1.000.000 objects are aggregated, there are 200.000.000 objects stored in total within a MasterServer.

#number of objects

client-side aggregation

server-side aggregation
  via hash table lookup

server-side aggregation
 via hash table forEach

server-side aggregation
  via Log traversal

10.000

49 ms

1 ms

147 ms

34 ms

100.000

489 ms

27 ms

1330 ms

296 ms

1.000.000

4913 ms

274 ms

15698 ms

2901 ms

Image Added

Scalability of aggregation operations with increasing selectivity

selectivity

client-side aggregation

server-side aggregation
  via hash table lookup

server-side aggregation
 via hash table forEach

server-side aggregation
  via Log traversal

100%

206 objects/ms

5036 objects/ms

16012 objects/ms

70323 objects/ms

10%

203 objects/ms

4030 objects/ms

1322 objects/ms

6825 objects/ms

0.5%

203 objects/ms

3650 objects/ms

63 objects/ms

345 objects/ms

Image Added

Conclusions

This previous benchmarks allow the following conclusions:

  • By executing the aggregation When aggregating all objects stored in a MasterServer (low selectivity), an performance increase of 25x can be seen when aggregating via hash table lookups on the server-side and an increase of 75x can be seen when aggregating via hash table forEach iteration on the server-side. When neglecting the hash table structure and directly going over the Log, an increase of 340x can be seen.
  • When aggregating over a 10% subset of all objects stored in a performance improvement up to a factor 100x can be seen if one is using the hash table for the iteration. If one is directly traversing the Log, a performance improvement of up to a factor 450x can be seen (although it is questionable if a Log traversal would be appropriate for executing server-side data operations)MasterServer (high selectivity), an performance increase of 20x can be seen when aggregating via hash table lookups on the server-side and an increase of 6x can be seen when aggregating via hash table forEach iteration on the server-side. When neglecting the hash table structure and directly going over the Log, an increase of 33x can be seen when going over a total number of 10.000.000 objects. 
  • Hash table lookups seem to be preferable over a forEach iteration when focusing on server-side aggregation via the hash table and having a high selectivity.
  • When traversing a set of distinct objects, retrieving a single object takes about 7-8?s (or a RAMCloud client can request about 130.000 objects/sec from a single RAMCloud server).
  • When invoking the hashTable forEach method the whole allocated memory for the hashtable has to be traversed. This is fine if the hashtable is densely packed with objects. In case of a sparse population with objects this introduces a penalty.

Disaggregation Operation

#number of objects

server-side aggregation
  via hash table lookup

server-side Disaggregation via
hash table lookup with 0 backups

10.000

1 ms

4 ms

100.000

11 ms

50 ms

1.000.000

124 ms

515 ms

10.000.000

1413 ms

5411 ms