Index tree issues (Nov 2014)
Current Btree Issues:
- High performance degradation as the size of the tree becomes larger for inserts and lookups. (See the slope in our write/read graphs.)
- Max size of the key is predefined in code. (RAM-626)
- Each tree entry is “key”-“value”. For elegance, we would like to have “key” correspond to the secondary key being indexed (which is char[], length) (say secKey), and “value” to be the primary key hash (say pKHash). We have to store our custom structure IndexletManager::KeyAndHash as the “key”. We do this primarily due to 3a but also in part due to 3b.
- We want the tree sorted based on the secKey and then by pKHash. However, the tree doesn’t seem to allow sort based “key” and “value”, but rather only on “key”. Why do we want this sort? Let's say the secKey being looked up right now has more pKHashes than will fit in one response – in this case we can send the partial list and a nextKeyHash to indicate where the next request should start looking up. Now a concurrent remove or update operation could remove the pKHash that was returned as the nextKeyHash. If the pKHashes are not sorted, on receiving the next request we would not know where to start sending the response from.
- If the secondary key space is sparse, then it is possible that there are too many pKHashes for each secKey. This would make each node size very large, making its updates very expensive. There is probably a sweet spot in terms of how many pKHashes per node is ideal, but currently our options were the two extremes.
Unclear if it is an issue with the Btree:
- The tree data structure should allow us to not store duplicates. Or, we need to up with a different algorithm to solve RAM-603.
Issues related to the tree (the way we use it) but not the tree itself:
1. The indexlet recovery doesn’t recover m_stats corresponding to the tree properly. This results in the problem documented in IndexletManager::lookupIndexKeys in the “if (indexlet->bt->empty())” section. (RAM-679)
2. Non-trivial to add new key types. (RAM-609)
3. If index entries cannot be appended to the log, then this needs to be propagated back to the caller. (RAM-675)