Benchmarking Ideas
New idea: TOCS-style (done by Henry) timeline for indexed operations
- End-to-end latency measurements
- End-to-end write latency with x number of indexes, each index entry of size y.
- Vary from x = 0 to n: Fill up index with a bunch of entries. Then measure the time to write one object. Delete that one. Again, measure time to write that one. Repeat to get mean/variance/min/max; keep y fixed (say 30 B) – basically, write perf as a func of num of indexes to be written.
- For a given x: Latency as function of index size (as more and more objects are written and indexes get more filled up, increasing time for lookup); keep y fixed (say 30 B)
- Vary y.
- End-to-end remove latency: Details same as write.
- End-to-end lookup+indexedRead latency
- Details same as write.
- Varying range of lookup such that different number of objects match.
- End-to-end write latency with x number of indexes, each index entry of size y.
- Throughput / bandwidth measurements: Varying number of clients c (0 to n).
- Writes or reads or removes or mix
- Same index server vs different index servers
- One indexlet vs multiple different indexlets
- One table vs multiple tables
- Scalability
- How performance varies as index gets larger and spread across multiple nodes. We can do static partitioning at creation for checking.
- Nanobenchmarks: (later)
- Time to insert an entry for:
- varying current number of nodes in the indexlet tree (mostly a sanity check; slope same, intercept different from end-to-end).
- varying entry size
- repeated index entry vs different index entry
- Time to lookup an entry: Details same as insert.
- vary range
- Time to remove an entry: Details same as insert.
- Break down lookup+indexedRead from end-to-end into the two components
- Relation between tree fanout and read/write perf
- Time to insert an entry for:
- Memory footprint
- We don't want to measure deletion / cleaning – can use from lsm paper. We want to see the overhead for an index. Create a large index, then measure space, divide by num of entries, calculate overhead.
- Can compare malloc version and ramcloud object alloc version.
- Recovery times
- Verify that its the same as recovering table of that size to one recovery master.
- Compare with other systems
- Systems:
- mySQL
- H-base?
- Cassandra?
- Come up with more
- Benchmarks:
- See if there are standard benchmarks (or something that one of the above do) that measure indexing perf.
- Create table, write n million indexed objects with x indexes.
- Do above, then measure random indexed reads.
- Systems: