You have ten nodes. They all need to agree on a shared starting state before any of them can do anything useful. One node creates it. The rest need to wait, fetch it, and configure themselves. Welcome to every distributed system's least favorite problem.
The Bootstrap Problem
Distributed systems don't just start. They bootstrap. And bootstrapping has ordering constraints that most orchestration tools ignore.
Consider any system with a leader-follower initialization pattern. A database cluster where the primary must initialize the schema before replicas connect. A message broker where the controller must register the cluster before brokers join. A blockchain network where one validator must create the genesis file before others can participate.
The requirements are always the same:
- One node must go first and produce shared state
- Other nodes must wait until that state is available
- Each node must configure itself using the shared state
- The main process can only start after all initialization is complete
Kubernetes init containers solve this natively. They run sequentially, each completing before the next starts, and all of them finishing before the main container launches. No custom orchestration needed.
The ordering guarantee is the feature. Init containers don't run concurrently. They can't. That constraint is exactly what you want for distributed initialization.
The Genesis Sequence
In Starship, the bootstrap validator (the first node in the network) runs a sequence of init containers before the main process starts. Two containers are always present. The others are injected based on what the chain's config enables.
flowchart TD
A["1. init-build-images (optional)<br/>Build chain binary from source"] --> B["2. init-genesis<br/>Create genesis.json, generate node keys"]
B --> C["3. init-config<br/>Update config.toml, apply patches"]
C --> D["4. init-faucet (optional)<br/>Copy faucet binary if enabled"]
D --> F["Main containers start<br/>(validator + exposer sidecar)"]Each step depends on the output of the previous one:
| Init Container | What It Does | When It Runs |
|---|---|---|
| init-build-images | Builds the chain binary from source, sets up cosmovisor with upgrade paths | Only when the chain config specifies a build step (custom binary, cosmovisor upgrades) |
| init-genesis | Creates genesis.json, generates node ID and consensus keys, adds accounts and balances | Always |
| init-config | Updates config.toml and app.toml, applies genesis patches via jq merge | Always |
| init-faucet | Copies the faucet binary into position | Only when faucet.enabled: true in the chain config |
The two required containers (init-genesis and init-config) do the heavy lifting. init-build-images only appears when a chain needs a binary built from source rather than pulled from a pre-built image. init-faucet only appears when the chain config enables a faucet. The system composes the init container list dynamically based on what's actually needed.
All of these containers share a single writable volume. Each one reads from and writes to it, building up the node's state incrementally.
The Shared Volume
The mechanism behind all this file sharing is a Kubernetes emptyDir volume. It's an ephemeral, pod-scoped directory that exists for the lifetime of the pod. Every container in the pod (init containers, main container, sidecars) can mount it and see the same files.
The pod defines three volume types: one writable emptyDir for state, and ConfigMap volumes for read-only inputs.
volumes:
- name: node
emptyDir: {} # writable state, shared across all containers
- name: addresses
configMap:
name: keys # mnemonic phrases and account keys
- name: scripts
configMap:
name: setup-scripts-osmosis-1 # init shell scripts
# - name: faucet # second emptyDir, only when faucet.enabled: true
# emptyDir: {}Each init container mounts the same volumes. The emptyDir goes to the chain's home directory. The ConfigMaps go to fixed paths.
# init-genesis container
volumeMounts:
- mountPath: /root/.osmosisd # emptyDir: writable state
name: node
- mountPath: /configs # ConfigMap: key material
name: addresses
- mountPath: /scripts # ConfigMap: setup scripts
name: scriptsThe init containers read scripts from /scripts and key material from /configs, then write their output to the emptyDir volume. Each container picks up where the previous one left off.
flowchart TD
subgraph emptyDir["/root/.osmosisd (emptyDir volume)"]
direction TB
G["config/genesis.json"]
NK["config/node_key.json"]
CT["config/config.toml"]
AT["config/app.toml"]
NID["config/node_id.json"]
end
IG["init-genesis"] -->|writes| G
IG -->|writes| NK
IC["init-config"] -->|reads genesis, writes| CT
IC -->|writes| AT
IC -->|writes| NID
EXP["exposer sidecar"] -->|reads| G
EXP -->|reads| NK
EXP -->|reads| NID| Container | Reads from emptyDir | Writes to emptyDir |
|---|---|---|
init-genesis | (nothing yet) | config/genesis.json, config/node_key.json |
init-config | config/genesis.json | config/config.toml, config/app.toml, config/node_id.json |
| exposer sidecar | config/genesis.json, config/node_id.json, config/node_key.json | (nothing, read-only) |
| validator (main) | everything | data directory |
emptyDir is ephemeral and pod-scoped. No PVCs, no external storage, no cross-pod lock contention. If the pod restarts, the volume persists. If the pod is deleted, it's gone. For initialization state that gets regenerated on startup, that's exactly right.
Validators Joining
The bootstrap node creates the network. Additional validators need to join it. Their init sequence is different:
flowchart TD
A["1. init-build-images (optional)<br/>Build chain binary"] --> B["2. wait-for-chains<br/>Poll genesis exposer until ready"]
B --> C["3. init-validator<br/>Recover keys, fetch genesis.json"]
C --> D["4. init-config<br/>Set genesis node as persistent peer"]
D --> F["Main container starts<br/>(validator daemon)"]
F --> G["postStart hook<br/>Request tokens, submit create-validator tx"]The critical difference is step 2: wait-for-chains. This init container polls the genesis node's exposer sidecar in a loop, waiting until the node ID is available. It's a distributed readiness gate implemented as a blocking init container.
The validator joining process works like this:
- wait-for-chains polls the genesis pod's exposer service until it responds with a valid node ID
- init-validator recovers the validator's private key from a ConfigMap (using the pod's hostname as a key index) and downloads
genesis.jsonfrom the genesis node's exposer - init-config fetches the genesis node's ID and configures it as a
persistent_peerinconfig.toml - After the main container starts, a
postStartlifecycle hook requests tokens from the faucet and submits acreate-validatortransaction
The pod hostname determines the key index. Pod
validator-1gets key 1,validator-2gets key 2. No external coordination service needed.
The Exposer Sidecar
Here's the piece that ties it all together. Init containers handle sequencing within a single pod, but they can't share state across pods. That's where the exposer sidecar comes in.
The genesis StatefulSet doesn't just run the blockchain node. It also runs an exposer sidecar container on port 8081. This is a lightweight HTTP service that serves the genesis file and node metadata to any pod that needs it.
flowchart LR
subgraph genesis-pod["Genesis Pod"]
direction TB
INIT["Init Containers<br/>(sequential)"]
VAL["Validator<br/>(main container)"]
EXP["Exposer<br/>(sidecar, port 8081)"]
end
subgraph validator-pod["Validator Pod"]
direction TB
WAIT["wait-for-chains<br/>(polls exposer)"]
VINIT["init-validator<br/>(fetches genesis)"]
end
EXP -->|"GET /node_id"| WAIT
EXP -->|"GET /genesis"| VINITThe exposer mounts the same emptyDir volume as the init containers. Environment variables tell it where each file lives:
| Environment Variable | Path on Shared Volume |
|---|---|
EXPOSER_GENESIS_FILE | {home}/config/genesis.json |
EXPOSER_NODE_ID_FILE | {home}/config/node_id.json |
EXPOSER_NODE_KEY_FILE | {home}/config/node_key.json |
EXPOSER_PRIV_VAL_FILE | {home}/config/priv_validator_key.json |
Once the init containers finish and the sidecar starts, all that state becomes available over HTTP. The API surface is deliberately small:
| Endpoint | Returns | Used By |
|---|---|---|
GET /node_id | The genesis node's peer ID | wait-for-chains (readiness gate) |
GET /genesis | The full genesis.json file | init-validator (genesis download) |
GET /keys | Validator public keys | Other validators joining the network |
This is the coordination layer between pods, and it works without any external infrastructure. No shared persistent volumes across pods, no S3 buckets, no distributed file system. Just an HTTP endpoint backed by a local volume.
The exposer turns a single pod's local state into a service that other pods can depend on. Init containers sequence the writes. The sidecar serves the reads.
The pattern avoids a common anti-pattern: trying to share files between pods via persistent volumes or external storage. Persistent volumes create lock contention and ordering headaches. External storage adds latency and a dependency on infrastructure that might not be available during bootstrap. HTTP is simpler. The genesis pod serves the data, validators fetch it. If the exposer isn't ready yet, the wait-for-chains init container just keeps polling.
Ordering Guarantees
Three mechanisms enforce correct ordering across the system:
Sequential init containers. Kubernetes guarantees that init containers run one at a time, in order. If init-genesis fails, init-config never starts. If init-config fails, the main containers never start. This is not eventual consistency. It's strict sequential execution.
Idempotency checks. The init-genesis container checks whether genesis.json already exists on the volume. If it does, the container exits immediately. This makes pod restarts safe. A pod that crashes and restarts won't regenerate a different genesis file, it will reuse the existing one.
Polling readiness gates. The wait-for-chains init container in validator pods polls the genesis node's exposer endpoint in a loop. It doesn't proceed until it gets a valid response. No timeout assumptions, no sleep-and-hope. Just poll until the dependency is actually ready.
These three together give you something that's surprisingly hard to get in distributed systems: deterministic initialization ordering across multiple pods without a central coordinator.
Beyond Blockchain
The blockchain genesis use case is specific, but the pattern is general. Any system with leader-follower initialization can use this approach:
- Database clusters: Primary initializes the schema and replication slots. Replicas poll a readiness endpoint, fetch the base backup, configure themselves as standbys.
- Message brokers: The controller node registers the cluster. Brokers poll until the controller is ready, then join with the correct cluster ID.
- ML training clusters: The coordinator node sets up the training job. Workers poll until the job parameters are available, then join the distributed training run.
- Service mesh bootstraps: The control plane must be healthy before data plane proxies can fetch their configuration.
The ingredients are the same every time: ordered init containers for single-pod sequencing, an exposer sidecar for cross-pod state sharing, and a polling init container as a distributed readiness gate.
Resources
- Starship: universal interchain development and testing environment
- Kubernetes Init Containers: official documentation on init container sequencing and guarantees
- StatefulSets: stable network identities and ordered deployment
- Sidecar Containers: running auxiliary containers alongside main workloads