Building Fault Tolerance with Supervisors

In the previous lessons, you normalized failures using try/rescue, chained validations with with, and handled concurrent errors with Task. Now, you will take the next step toward resilience: letting the system recover automatically when a process crashes. In this lesson, you will create supervised GenServer workers that restart on failure and are reachable by name via a Registry.

Refresher: Registry and `via` Tuples

Elixir’s Registry provides a way to associate names (like integers, strings, or tuples) with process PIDs. This allows you to look up a process by a stable name, even if its PID changes after a crash and restart.

  • Looking up a process:
    You can use Registry.lookup/2 to find the PID(s) registered under a given name:
  • Why use via tuples?
    The {:via, Registry, {RegistryName, key}} tuple lets you register and refer to a process by a logical name (like an id), not by its PID. This is important because when a process crashes and is restarted by a supervisor, it gets a new PID—but its name in the registry stays the same. This makes your system robust to restarts and avoids leaking atoms (as with :global or :local names).
A Restartable Worker Named via Registry

What it does:

  • Defines a GenServer worker whose state is its id.
  • start_link/1 registers the process name using a Registry via tuple, so you can look it up by id (not by PID).
  • init/1 prints when a worker starts.
  • crash/1 is a public API that sends an asynchronous message to the named worker.
  • handle_cast/2 raises an exception to simulate a failure; this will crash the process and let a supervisor restart it.

Notes:

  • Naming via Registry scales to dynamic workers and avoids global atoms.
  • Casting is fire-and-forget; here, it’s fine because we only trigger a crash.
  • Exceptions in GenServer callbacks terminate the process, which is exactly what a supervisor expects to handle.
Supervise Workers with a One-for-One Strategy

What it does:

  • Declares a Supervisor that starts three Worker processes with ids 1, 2, and 3.
  • Each child_spec uses a unique child id (:worker_1, :worker_2, :worker_3) to avoid id collisions.
  • strategy: :one_for_one means if one worker crashes, only that worker is restarted.

Notes:

  • Supervisors enforce a restart policy and intensity. By default, if more than 3 restarts (max_restarts) occur within 5 seconds (max_seconds), the supervisor itself terminates. You can tune these via Supervisor options when needed.
  • The default child restart setting for GenServers is :permanent, which means they are always restarted if they terminate.
Boot, Inspect, Crash, and Observe Restart

What it does:

  • Starts a unique Registry named WorkerRegistry so each worker can be addressed by its id.
  • Starts the WorkerSupervisor and queries its children to see what’s running.
  • IO.inspect prints the initial children list, including child ids and PIDs.
  • Worker.crash(2) routes to the process registered with id 2, causing a crash.
  • Process.sleep/1 gives time to see “Worker 2 started” printed again as the supervisor restarts it.
Summary and What’s Next

You built a fault-tolerant setup: workers registered by id through Registry, supervised with a one_for_one strategy, and automatically restarted on failure. This complements the earlier lessons on error normalization and concurrent safety by adding system-level resilience.

Ready to solidify this skill? Head to the practice section to apply supervisors, process naming, and restarts in realistic scenarios.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal