Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

What should primary keys be?

The Proposed Server API page asks "What do we want keys to be?".
This page collects and evaluates the options.

User-allocated variable-length or long fixed-length keys

  • These longer keys must be passed around in requests.
  • The master needs a hash table.
  • There is some feature overlap with indexes. Now you need an extremely fast "primary index" on the master to look up objects.

User-allocated 64-bit keys

  • I think there's a consensus that this is not a large enough space to be used as a reasonable hash value, so the only real use case is ICQ.

System-allocated 64-bit keys with reuse

  • This is by far the easiest solution to implement. Berthas (multi-level indexes analogous to page tables) seem to be the data structure of choice.
  • Dangling pointers in the application are extremely dangerous. If your application gets back the same type of object as was originally identified by that key, it will not crash but rather it is likely to leak information.

System-allocated 64-bit keys without reuse

  • This has "sweet" semantics: assigned keys strictly increase with time.
  • We still don't have to worry about running out of the key space and wrapping around.
  • Berthas would be the data structure of choice for tables that never have deletes. With deletes, other options are not so easy:
    • Berthas with leaves collapsed to hash tables when they become sparse
      • Older objects that have many nearby deletes now incur a performance penalty.
      • Inner nodes of the Bertha still grow over time.
    • Berthas with leaves reused as they become sparse
      • Now a leaf can have multiple pointers coming into it
      • Some fraction of keys will have to be skipped over and not used
      • Splitting a key range to another server now requires unmerging the leaf.
      • Inner nodes of the Bertha still grow over time.
    • Hash tables

System-allocated (64-bit key, 64-bit generation number) tuples

  • We can use Berthas again, storing the generation number in the inode.
  • Generation numbers aren't "sweet".
  • It now takes 128 bits to identify an object.
  • It is possible to have a global generation number, so if my object's generation number is larger than that of yours, mine was created more recently.
  • It is not possible, however, to enumerate objects in chronological order efficiently.
    • Unless if we maintain some sort of linked list...

System-allocated (48-bit key, 16-bit generation number) tuples

  • 48-bit keys may not be large enough
  • 16-bit generation numbers could wrap around fairly quickly. Do we burn the keys after that or do we wrap around and risk following a dangling pointer?
  • Same implementation as system-allocated (64-bit key, 64-bit generation number) tuples, just fewer bits.
  • No labels