Benchmarking Ideas

New idea: TOCS-style (done by Henry) timeline for indexed operations

 

  • End-to-end latency measurements
    • End-to-end write latency with x number of indexes, each index entry of size y.
      • Vary from x = 0 to n: Fill up index with a bunch of entries. Then measure the time to write one object. Delete that one. Again, measure time to write that one. Repeat to get mean/variance/min/max; keep y fixed (say 30 B) – basically, write perf as a func of num of indexes to be written.
      • For a given x: Latency as function of index size (as more and more objects are written and indexes get more filled up, increasing time for lookup); keep y fixed (say 30 B)
      • Vary y.
    • End-to-end remove latency: Details same as write.
    • End-to-end lookup+indexedRead latency
      • Details same as write.
      • Varying range of lookup such that different number of objects match.
  • Throughput / bandwidth measurements: Varying number of clients c (0 to n).
    • Writes or reads or removes or mix
    • Same index server vs different index servers
    • One indexlet vs multiple different indexlets
    • One table vs multiple tables
  • Scalability
    • How performance varies as index gets larger and spread across multiple nodes. We can do static partitioning at creation for checking.
  • Nanobenchmarks: (later)
    • Time to insert an entry for:
      • varying current number of nodes in the indexlet tree (mostly a sanity check; slope same, intercept different from end-to-end).
      • varying entry size
      • repeated index entry vs different index entry
    • Time to lookup an entry: Details same as insert.
      • vary range
    • Time to remove an entry: Details same as insert.
    • Break down lookup+indexedRead from end-to-end into the two components
    • Relation between tree fanout and read/write perf
  • Memory footprint
    • We don't want to measure deletion / cleaning – can use from lsm paper. We want to see the overhead for an index. Create a large index, then measure space, divide by num of entries, calculate overhead.
    • Can compare malloc version and ramcloud object alloc version.
  • Recovery times
    • Verify that its the same as recovering table of that size to one recovery master.
  • Compare with other systems
    • Systems:
      • mySQL
      • H-base?
      • Cassandra?
      • Come up with more
    • Benchmarks:
      • See if there are standard benchmarks (or something that one of the above do) that measure indexing perf.
      • Create table, write n million indexed objects with x indexes.
      • Do above, then measure random indexed reads.