Indexing Scribe

Indexing Scribe Notes:

No benefits of putting in the system,because we can do it just as efficiently in the client level: For hash based, it might be true, but for tree based, we would have to make one extra RPC for each level. If there is not any extra round trips, we probably shouldn't be putting them in the server.

It also depends how much metadata we need to keep on the client. If we know the highest level on the client itself, we might get away with it since its only one RPC and its relatively small and inexpensive to maintain on the client.

If we have two puts at the same time, the server gets to control how to serialize the puts. It will isolate the puts.

We don't have a consistent snapshot, even less guarantee than serializability.

3 issues:
One capability needed - index intersection.
Any interesting race conditions across inserts.
Visibility semantics - people should not start seeing things appear and disappearing.

Q. Complexity of dealing with garbage too much?
A. We can use the logs to keep track of that, and use the log cleaning to do that naturally.

Q. Missed why you're not just treating index as data. So that we don't have to rebuild them on a failure.

You've been assuming first normal form. You have a list of friends per person, so index might need to be a multi-index. There need to be uniqueness constraints.

String indexes compress really nicely. Another reason to put them in the server, so we can compress them. Index entries are likely to be pretty small.

Algos for updating keys - lock free - conditional updates may stop working if
we extend this system for the future.

Single master for an object - bad assumption. Facebook hottest object needs to be replicated. But single master good for transactions.

Multi-get helps a lot - we can do table scans in software in the future