== Bootstrapping C Discovery ==
1) DNS
- Slow
+ Preconfigured on hosts
+ Delegation
+ Provides a way to deal with Coordinator failure/unavailability
== Summary of Actions ==
-- Auth -------------------------------
App authentication 6, 7
App authorization 7
Machine authentication 8, 9
Machine authorization 9
-- Addressing -------------------------
Find M for Object Id 1
Lookup LMA to NMA 4
-- Backup/Recovery --------------------
List backup candidates 3
Confirm crashes 4
Start M recovery 1, 4, 5
Notify masters of B crash 4
Create a fresh B instance 5
-- Performance ------------------------
Load Balancing 1, 10
Stats, Metrics, Accting 10
-- Debugging/Auditing ----------------
Logging 11
== Summary of State ==
Name Format Size Churn Refs
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 tabletMap (workspace, table, start, end, LMA) ~40b/tablet/master Only on load balance Per workspace, table, id miss
3 hostPlacementMap (NHA, rack) ~16b/backup 2 entries/15m/10K machines After ~n/k segs written/master
4 logicalHostMap (LMA, NHA) ~16b/host 2 entries/15m/10K machines Per host addressing miss (on Ms, As, or Bs)
5 hostFreeList (NHA) ~8b/spare host ~0 Every 15m
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
6 appAuthentication (appId, secret, workspace) ~24b/principal ~0 Once per A session
7* appAuthorization (token, workspace) ~16b/active session Once per new A access to each M
8 machineAuthentication Perhaps "certifcate"? Every 15m (except during bootstrap)
9* machineAuthorization (token, role) ~16b/active host 2 entries/15m/10K machines ~Twice per first time pairwise host interaction
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10 metrics ? (LHA, [int]) # dropped reqs/s etc. ? ?
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
11 logs ? Large High ~0
Data structures:
1 Find M for Object Id
(workspace, table) -> [(start, LMA)] Hashtable with ordered lists as elements
Start M recovery
LMA -> (workspace, table, start, end) Hashtable
3 List Backup Candidates
Return n and rotate them to the back of the list List
4 Lookup LMA to NMA
Confirm crashes
LMA -> NHA Hashtable
Start M Recovery
Remove entry in Hashtable, insert one
Notify masters of B crash
Needs list of all masters NMA, just walk buckets
5 Start M recovery Hashset (insert/remove)
Remove M from hostFreeList**
Create a fresh B Accidentally conflated these free lists
Remove B from hostFreeList**
== Persistence/Recovery Ideas for C ==
1) Replication
2) Disks, WAL/LFS
3) RAMCloud Tables
- Serious circular dependencies and bootstrapping issues
? Recovery latency
+ It works, it's durable
+ Code reuse
+ Super-cool
4) Decentralized
5) Punt/High-availability/special hardware/VM replication
=== Replication ===
- Performance & Complexity (2PC?)
=== RAMCloud-based C recovery ===
Choices:
1) C uses Bs
2) C uses (local) M
NOTE: this approach presupposes that key-value store is good bet for
this data; might not always be true (e.g. indexes)
PROBLEM: Authenticate hosts during recovery without state?
Can use small key RSA.
As below, except where PROBLEM is We know that our CM crashed when the
previous C did. Now - we need to just bootstrap the backup system
which already relies on broadcast.
start C
create empty tempLogicalHostMap
create tempHostFreeList and tempHostPlacementList from NHA range passed on the command-line
create empty tempTabletMap
broadcast to tempHostFreeList
// BOOTSTRAP and/or RECOVER
backupsOnline = 0
while backupsOnline < k {
pop tempHostFreeList
check tempHostPlacementList
if new rack {
start B
add B to tempLogicalHostMap
backupsOnline++
} else {
pushBack B tempHostFreeList
}
}
start M on localhost
add (0, self network address) to tempLogicalHostMap
add (0, logicalHostMap, 0, inf, M) to tempTabletMap
notify M of change and tell it to recover
insert(0, logicalHostMap, (0, logicalHostMap, 0, inf M))
map insert(0, logicalHostMap) tempLogicalHostMap
map insert(0, freeList) tempHostFreeList
3) C is an A
PROBLEMS: Can't count on C for starting recovery, else if master with data fails we're done
Needs to deal with NHAs sometimes, which is different than a normal app
start C
create empty tempLogicalHostMap
create tempHostFreeList and tempHostPlacementList from NHA range passed on the command-line
create empty tempTabletMap
broadcast to tempHostFreeList
receive a response from one host {
// RECOVER
// PROBLEM - what if master went down in the meantime, neither can recover
} else {
// BOOTSTRAP
backupsOnline = 0
while backupsOnline < k {
pop tempHostFreeList
check tempHostPlacementList
if new rack {
start B
add B to tempLogicalHostMap
backupsOnline++
} else {
pushBack B tempHostFreeList
}
}
pop tempHostFreeList; start M; add M to tempLogicalHostMap
add (0, logicalHostMap, 0, inf, M) to tempTabletMap; notify M of change
insert(0, logicalHostMap, (0, logicalHostMap, 0, inf M))
map insert(0, logicalHostMap) tempLogicalHostMap
map insert(0, freeList) tempHostFreeList
}