Benchmarking Ideas
New idea: TOCS-style (done by Henry) timeline for indexed operations
End-to-end latency measurements
End-to-end write latency with x number of indexes, each index entry of size y.
Vary from x = 0 to n: Fill up index with a bunch of entries. Then measure the time to write one object. Delete that one. Again, measure time to write that one. Repeat to get mean/variance/min/max; keep y fixed (say 30 B) – basically, write perf as a func of num of indexes to be written.
For a given x: Latency as function of index size (as more and more objects are written and indexes get more filled up, increasing time for lookup); keep y fixed (say 30 B)
Vary y.
End-to-end remove latency: Details same as write.
End-to-end lookup+indexedRead latency
Details same as write.
Varying range of lookup such that different number of objects match.
Throughput / bandwidth measurements: Varying number of clients c (0 to n).
Writes or reads or removes or mix
Same index server vs different index servers
One indexlet vs multiple different indexlets
One table vs multiple tables
Scalability
How performance varies as index gets larger and spread across multiple nodes. We can do static partitioning at creation for checking.
Nanobenchmarks: (later)
Time to insert an entry for:
varying current number of nodes in the indexlet tree (mostly a sanity check; slope same, intercept different from end-to-end).
varying entry size
repeated index entry vs different index entry
Time to lookup an entry: Details same as insert.
vary range
Time to remove an entry: Details same as insert.
Break down lookup+indexedRead from end-to-end into the two components
Relation between tree fanout and read/write perf
Memory footprint
We don't want to measure deletion / cleaning – can use from lsm paper. We want to see the overhead for an index. Create a large index, then measure space, divide by num of entries, calculate overhead.
Can compare malloc version and ramcloud object alloc version.
Recovery times
Verify that its the same as recovering table of that size to one recovery master.
Compare with other systems
Systems:
mySQL
H-base?
Cassandra?
Come up with more
Benchmarks:
See if there are standard benchmarks (or something that one of the above do) that measure indexing perf.
Create table, write n million indexed objects with x indexes.
Do above, then measure random indexed reads.