Image Removed

Locating the Coordinator

DNS?

Authentication

Users

See Security for the current proposal. Briefly, clients/users will provide a secret to the Coordinator which the Coordinator will verify and issue a token. Users must provide this token on RAMCloud requests which the Master will confirm with the Coordinator (and cache).

The Coordinator stores (persistent) a shared secret with the users. It also houses the tokens (soft state), we may want some persistence on this to keep from flooding a new Coordinator with authentication requests after a Coordinator recovery.

ACLs/Workspaces

For the moment access to a Workspace is all or nothing (or perhaps even conflated with the user) hence the Coordinator stores (persistent) a list of workspaces the user owns (or in the conflated case, the secret associated with each Workspace).

Servers

Master -> Coordinator

This step helps us with naming later as well. Since Masters must authenticate to the Coordinator and it assigns the Master roles it can then slot the Master into its naming tables (soft state).

Backup -> Coordinator

Master -> Backup

Problem: A Master may disclose data to a non-RAMCloud machine if a machine a machine address is reallocated for use as a non-RAMCloud machine. Possible solutions: ignore it or encrypt data.

Naming

A lookup table of logical hosts to (soft state) RPC addresses.

Aside: I don't really believe the Master -> RPC Addr mapping will need to be replicated, nor the Backup -> RPC Addr one.

Host Addressing

Tables/Tablets and Indexes

Placement

Location/Discovery

Reconfiguration

Recovery

Choosing Replacements

Crash Confirmation

Broadcast Notifications

Partition Detection

Statistics

Logging

Metrics

Configuration Information

Rack Placement

Machine Parameters

Summary of Coordinator State

Workspace list
Possibly users
User or workspace secrets
(soft) Issued security tokens
(soft) Logical host naming

...

Important Problems from Design Meeting

Exactly how does a master select backups?
How do we make sure a master is dead (to prevent divergent views of tables)?
How does a new master know when it has received all segments (e.g. knows it has the head of its log)?
How does host-to-host authentication work
- In particular, how does the application authenticate the Coordinator?

Further Things to Walkthrough in Design Meetings

How does one install a new machine?
How does one update RAMCloud software?

Code Block


== Bootstrapping C Discovery ==

1) DNS
   - Slow
   + Preconfigured on hosts
   + Delegation
   + Provides a way to deal with Coordinator failure/unavailability

== Summary of Actions ==

-- Auth -------------------------------
App authentication              6, 7
App authorization               7
Machine authentication          8, 9
Machine authorization           9

-- Addressing -------------------------
Find M for Object Id            1
Lookup LMA to NMA               4

-- Backup/Recovery --------------------
List backup candidates          3
Confirm crashes                 4
Start M recovery                1, 4, 5
Notify masters of B crash       4
Create a fresh B instance       5

-- Performance ------------------------
Load Balancing                  1, 10
Stats, Metrics, Accting         10

-- Debugging/Auditing  ----------------
Logging                         11

== Summary of State ==

        Name                    Format                                  Size                    Churn                         Refs
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1       tabletMap               (workspace, table, start, end, LMA)     ~40b/tablet/master      Only on load balance          Per workspace, table, id miss
3       hostPlacementMap        (NHA, rack)                             ~16b/backup             2 entries/15m/10K machines    After ~n/k segs written/master
4       logicalHostMap          (LMA, NHA)                              ~16b/host               2 entries/15m/10K machines    Per host addressing miss (on Ms, As, or Bs)
5       hostFreeList            (NHA)                                   ~8b/spare host          ~0                            Every 15m
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
6       appAuthentication       (appId, secret, workspace)              ~24b/principal          ~0                            Once per A session
7*      appAuthorization        (token, workspace)                      ~16b/active session                                   Once per new A access to each M
8       machineAuthentication   Perhaps "certifcate"?                                                                         Every 15m (except during bootstrap)
9*      machineAuthorization    (token, role)                           ~16b/active host        2 entries/15m/10K machines    ~Twice per first time pairwise host interaction
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10      metrics                 ? (LHA, [int]) # dropped reqs/s etc.    ?                       ?
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
11      logs                    ?                                       Large                   High                          ~0

Data structures:
1    Find M for Object Id
          (workspace, table) -> [(start, LMA)]                  Hashtable with ordered lists as elements
     Start M recovery
          LMA -> (workspace, table, start, end)                 Hashtable
3    List Backup Candidates
          Return n and rotate them to the back of the list      List
4    Lookup LMA to NMA
     Confirm crashes
          LMA -> NHA                                            Hashtable
     Start M Recovery
          Remove entry in Hashtable, insert one
     Notify masters of B crash
          Needs list of all masters NMA, just walk buckets
5    Start M recovery                                           Hashset (insert/remove)
           Remove M from hostFreeList**
     Create a fresh B                           Accidentally conflated these free lists
            Remove B from hostFreeList**

== Persistence/Recovery Ideas for C ==

1) Replication
2) Disks, WAL/LFS
3) RAMCloud Tables
   - Serious circular dependencies and bootstrapping issues
   ? Recovery latency
   + It works, it's durable
   + Code reuse
   + Super-cool
4) Decentralized
5) Punt/High-availability/special hardware/VM replication

=== Replication ===

- Performance & Complexity (2PC?)

=== RAMCloud-based C recovery ===

Choices:
1) C uses Bs


2) C uses (local) M

NOTE: this approach presupposes that key-value store is good bet for
      this data; might not always be true (e.g. indexes)

PROBLEM: Authenticate hosts during recovery without state?
         Can use small key RSA.

As below, except where PROBLEM is We know that our CM crashed when the
previous C did.  Now - we need to just bootstrap the backup system
which already relies on broadcast.


start C
create empty tempLogicalHostMap
create tempHostFreeList and tempHostPlacementList from NHA range passed on the command-line
create empty tempTabletMap
broadcast to tempHostFreeList
// BOOTSTRAP and/or RECOVER
backupsOnline = 0
while backupsOnline < k {
  pop tempHostFreeList
  check tempHostPlacementList
  if new rack {
    start B
    add B to tempLogicalHostMap
    backupsOnline++
  } else {
    pushBack B tempHostFreeList
  }
}
start M on localhost
add (0, self network address) to tempLogicalHostMap
add (0, logicalHostMap, 0, inf, M) to tempTabletMap
notify M of change and tell it to recover
insert(0, logicalHostMap, (0, logicalHostMap, 0, inf M))
map insert(0, logicalHostMap) tempLogicalHostMap
map insert(0, freeList) tempHostFreeList

3) C is an A

PROBLEMS: Can't count on C for starting recovery, else if master with data fails we're done

Needs to deal with NHAs sometimes, which is different than a normal app

start C
create empty tempLogicalHostMap
create tempHostFreeList and tempHostPlacementList from NHA range passed on the command-line
create empty tempTabletMap
broadcast to tempHostFreeList
receive a response from one host {
  // RECOVER
  // PROBLEM - what if master went down in the meantime, neither can recover
} else {
  // BOOTSTRAP
  backupsOnline = 0
  while backupsOnline < k {
    pop tempHostFreeList
    check tempHostPlacementList
    if new rack {
      start B
      add B to tempLogicalHostMap
      backupsOnline++
    } else {
      pushBack B tempHostFreeList
    }
  }
  pop tempHostFreeList; start M; add M to tempLogicalHostMap
  add (0, logicalHostMap, 0, inf, M) to tempTabletMap; notify M of change
  insert(0, logicalHostMap, (0, logicalHostMap, 0, inf M))
  map insert(0, logicalHostMap) tempLogicalHostMap
  map insert(0, freeList) tempHostFreeList
}

Versions Compared

Old Version 3

New Version Current

Key

Locating the Coordinator

Authentication

Users

ACLs/Workspaces

Servers

Naming

Host Addressing

Tables/Tablets and Indexes

Placement

Location/Discovery

Reconfiguration

Recovery

Choosing Replacements

Crash Confirmation

Broadcast Notifications

Partition Detection

Statistics

Logging

Metrics

Configuration Information

Rack Placement

Machine Parameters

Summary of Coordinator State

Important Problems from Design Meeting

Further Things to Walkthrough in Design Meetings

Page Comparison

Versions Compared

Old Version 3

New Version Current

Key

Locating the Coordinator

Authentication

Users

ACLs/Workspaces

Servers

Naming

Host Addressing

Tables/Tablets and Indexes

Placement

Location/Discovery

Reconfiguration

Recovery

Choosing Replacements

Crash Confirmation

Broadcast Notifications

Partition Detection

Statistics

Logging

Metrics

Configuration Information

Rack Placement

Machine Parameters

Summary of Coordinator State

Important Problems from Design Meeting

Further Things to Walkthrough in Design Meetings