Locating the Coordinator

DNS?

Happily all properly configured hosts should know how to query DNS. It also supports delegation which allows the RAMCloud DNS entries to be managed separately. It also can help deal with Coordinator failure by providing a list of possible future Coordinators.

Authentication

Users

See Security for the current proposal. Briefly, clients/users will provide a secret to the Coordinator which the Coordinator will verify and issue a token. Applications must provide this token on RAMCloud requests which the Master will confirm with the Coordinator (and cache).

The Coordinator stores (persistently) a shared secret with the users. It also houses the tokens (ephemerally), we may want some persistence on this to keep from flooding a new Coordinator with authentication requests after a Coordinator recovery.

ACLs/Workspaces

For the moment access to a Workspace is all or nothing (or perhaps even conflated with the user) hence the Coordinator stores (persistently) a list of workspaces the user owns (or in the conflated case, the secret associated with each Workspace).

Servers

Master -> Coordinator

This step helps us with naming later as well. Since Masters must authenticate to the Coordinator and it assigns the Master roles it can then slot the Master into its naming tables (ephemerally).

Backup -> Coordinator

Master -> Backup

Problem: A Master may disclose data to a non-RAMCloud machine if a machine a machine address is reallocated for use as a non-RAMCloud machine. Possible solutions: ignore it or encrypt data.

Naming

Host Addressing

A lookup table of logical hosts to (ephemerally) RPC addresses.

Aside: I don't really believe the Master -> RPC Addr mapping will need to be replicated, nor the Backup -> RPC Addr one. This is problematic - it probably makes the above state (persistent).

Tables/Tablets and Indexes

(workspace, table, start id, end id, logical host address) relation
(persistent)

Placement

(network host address, rack)

(persistent)

Location/Discovery

Perhaps we've got a story for this with DNS?

Reconfiguration

Recovery

Choosing Replacements

For crashed M choose a new network host address (under no constraints?). Issue a shootdown of old machine address?

Crash Confirmation

When do we notice crashes? Who heartbeats?

If Coordinator notices failure or has one reported to it heartbeat M, if failed contact other hosts (backups or masters) one inside the same rack and one outside. If failure is agreed then we must broadcast new mapping. How can we guarantee no client will see old master? We could disallow backups, then if backups are on on the old master it would kill itself once its backup failed. If it is a non-durable table this would be problematic except that there is no way to restore it anyway.

Broadcast Notifications

Perhaps delegate some of the work.

Partition Detection

Statistics

Logging

Metrics

Configuration Information

Rack Placement

Machine Parameters

Summary of Coordinator State

Workspace list
Possibly users
User or workspace secrets
(soft) Issued security tokens
(soft) Logical host naming

Coordinator