Benchmarking Ideas

Benchmarking Ideas

New idea: TOCS-style (done by Henry) timeline for indexed operations

 

  • End-to-end latency measurements

    • End-to-end write latency with x number of indexes, each index entry of size y.

      • Vary from x = 0 to n: Fill up index with a bunch of entries. Then measure the time to write one object. Delete that one. Again, measure time to write that one. Repeat to get mean/variance/min/max; keep y fixed (say 30 B) – basically, write perf as a func of num of indexes to be written.

      • For a given x: Latency as function of index size (as more and more objects are written and indexes get more filled up, increasing time for lookup); keep y fixed (say 30 B)

      • Vary y.

    • End-to-end remove latency: Details same as write.

    • End-to-end lookup+indexedRead latency

      • Details same as write.

      • Varying range of lookup such that different number of objects match.

  • Throughput / bandwidth measurements: Varying number of clients c (0 to n).

    • Writes or reads or removes or mix

    • Same index server vs different index servers

    • One indexlet vs multiple different indexlets

    • One table vs multiple tables

  • Scalability

    • How performance varies as index gets larger and spread across multiple nodes. We can do static partitioning at creation for checking.

  • Nanobenchmarks: (later)

    • Time to insert an entry for:

      • varying current number of nodes in the indexlet tree (mostly a sanity check; slope same, intercept different from end-to-end).

      • varying entry size

      • repeated index entry vs different index entry

    • Time to lookup an entry: Details same as insert.

      • vary range

    • Time to remove an entry: Details same as insert.

    • Break down lookup+indexedRead from end-to-end into the two components

    • Relation between tree fanout and read/write perf

  • Memory footprint

    • We don't want to measure deletion / cleaning – can use from lsm paper. We want to see the overhead for an index. Create a large index, then measure space, divide by num of entries, calculate overhead.

    • Can compare malloc version and ramcloud object alloc version.

  • Recovery times

    • Verify that its the same as recovering table of that size to one recovery master.

  • Compare with other systems

    • Systems:

      • mySQL

      • H-base?

      • Cassandra?

      • Come up with more

    • Benchmarks:

      • See if there are standard benchmarks (or something that one of the above do) that measure indexing perf.

      • Create table, write n million indexed objects with x indexes.

      • Do above, then measure random indexed reads.