Routing and Orchestration

Introduction

Welcome back to Building an Async CLI Tool for ETL Pipelines in Python! You've reached lesson 3, marking solid progress in this course. In the previous two lessons, we built a self-validating domain model with frozen dataclasses and descriptors, then constructed streaming parsers that normalize raw CSV and JSON Lines data into validated Transaction objects. Our pipeline can now ingest files, report errors gracefully, and produce clean, strongly typed transactions.

Today's focus is Routing & Orchestration: the decision layer that determines how validated transactions flow through different processing paths. We'll use structural pattern matching, a powerful feature introduced in Python 3.10, to build a declarative router that dispatches transactions based on operation type, amount thresholds, and validation rules. The key insight is that match/case statements let us express routing logic as a series of patterns rather than nested if/elif chains, making the code more readable and maintainable.

By the end of this lesson, we'll have implemented a complete routing system that handles additions and refunds differently, flags high-value transactions for special attention, and returns consistent result structures for both success and error cases. Let's begin by understanding why routing is essential in ETL systems.

Why Routing Matters in ETL

In real-world data pipelines, not all records follow the same path. Some transactions require immediate processing, others need approval workflows, and certain types trigger notifications or audits. The routing layer acts as the pipeline's traffic controller, examining each validated transaction and directing it to the appropriate business handler.

Without routing, we'd process every transaction identically or scatter conditional logic throughout the codebase. With a centralized router, we gain several advantages: business rules are explicit and visible in one place, adding new transaction types requires adding new cases rather than modifying existing code, and testing becomes easier because each handler can be validated independently. The router becomes the single source of truth for "what happens when we see this kind of transaction."

Enter Structural Pattern Matching

Python's match/case syntax provides a declarative way to route objects based on their structure and content. Instead of writing if isinstance(obj, Transaction) and obj.op == "add" and obj.amount.amount >= 10.00, we can write a pattern that expresses the same condition more concisely:

This pattern matches Transaction objects where the op attribute is "add," captures the amount from the nested Money object into the variable amt, and applies a guard condition to check if that amount meets our threshold. The syntax mirrors how we think about the data: "When I see an addition transaction with an amount of at least 10, do this."

Pattern matching excels at routing because patterns are checked in order from top to bottom, stopping at the first match. This lets us structure cases from most specific to most general, ensuring that special cases are handled before falling back to default behavior.

Class Patterns and Attribute Matching

The core pattern syntax for dataclasses is remarkably clean. When we write Transaction(op="add"), we're creating a pattern that matches any Transaction instance where the op attribute equals "add." We can match multiple attributes simultaneously and capture values for use in the handler:

The pattern checks that the object is a Transaction and that op is "refund," while capturing the account attribute into the variable acc for use in the case body. We can nest patterns to match deeper into object structures; Transaction(amount=Money(currency="USD")) would match transactions with a Money object whose currency is "USD."

When working with dataclasses, the pattern matching protocol uses __match_args__ to map positional arguments to attributes. Since our dataclasses use kw_only=True, we always use keyword patterns, which makes the code more explicit and less fragile when class definitions evolve.

Building the Addition Handler

Before implementing the router, let's examine the business handlers that process specific transaction types. The addition handler constructs a success result with all relevant fields:

The helper function _money_to_dict converts Money objects to dictionaries with string amounts, ensuring consistent JSON serialization. The handler itself creates an entry dictionary with the transaction's core fields, a sign of +1 indicating this adds to the ledger, and both unit price and total amount. The keyword-only flag parameter allows callers to mark transactions as "high_value" or apply other labels without modifying the core entry structure. The final result wraps the entry in a {"ok": True, "entry": ...} structure, establishing a consistent success format that consumers can rely on.

Building the Refund Handler

The refund handler follows a similar pattern but with refund-specific semantics:

The handler includes defensive validation: refunds must have positive amounts in the source data, even though they'll be recorded with a negative sign. While our Transaction class already enforces this constraint during construction, the handler provides an additional safety net. Valid refunds are recorded with sign: -1 to indicate they subtract from the ledger. This sign field enables downstream systems to quickly determine whether a transaction increases or decreases account balances without parsing the operation type.

The Router: Basic Structure

Now we can implement the router that dispatches transactions to these handlers. Let's start with the simplest case:

The function accepts a Transaction and returns a result dictionary. The match statement scrutinizes the transaction object, and the first case checks for additions. If tx.op equals "add," the pattern matches, and we call handle_addition with the transaction. This case handles all additions, regardless of amount or account. The pattern is intentionally broad; we'll add more specific cases above it to handle special scenarios.

Guards and Variable Capture

To flag high-value transactions, we need to inspect the amount and apply a threshold. This requires both variable capture and a guard condition:

The new first case demonstrates nested pattern matching and guards. We match Transaction objects where op is "add" and extract the amount and currency from the nested Money object into variables amt and cur. The guard if amt >= Decimal("10.00") then checks whether the captured amount meets our threshold. If both the pattern and the guard succeed, we call handle_addition with the flag="high_value" keyword argument, marking this transaction for special handling.

Case ordering is critical here. The high-value case appears first because it's more specific than the general addition case. If we reversed the order, the general case would match all additions first, and the high-value case would never execute. Pattern matching processes cases sequentially, stopping at the first match, so we structure from specific to general.

Handling Refunds

Refund processing adds a single case to our router. Since the Transaction class already validates that amounts must be positive during construction, we don't need a guard condition or a separate case for invalid amounts in the router:

This case matches any Transaction with op equal to "refund" and delegates to handle_refund, which constructs the success result with a negative sign. By keeping refund validation in the domain model rather than duplicating it in the router, we maintain a single source of truth for business rules. The router's job is simply to direct validated transactions to the appropriate handler.

The Default Case

In many routing systems, a catch-all wildcard pattern _ handles unexpected inputs. You might expect our router to include:

However, this case is unnecessary in our design. Recall that our Transaction class already validates that the op field must be either "add" or "refund" during construction. Any record with an unsupported operation type—such as "transfer," "adjust," or a typo like "aad"—is rejected during normalization and never reaches the router. This is a direct benefit of strong upstream validation: by enforcing constraints at the domain model level, we simplify downstream logic. The router only needs to handle the two operation types that can possibly exist, making the code cleaner and the intent clearer.

The Complete Router

Here's the full routing function with all cases in order:

The structure reads like a decision tree: check for high-value additions first, then standard additions, then refunds. Each case is independent and self-contained, making it easy to add new transaction types or modify thresholds without affecting other cases. Notice that we don't need a catch-all wildcard because our Transaction class guarantees that only "add" and "refund" operations reach the router. The pattern matching syntax keeps the routing logic declarative; we describe what we're looking for rather than writing imperative checks.

Orchestrating from Main

The main script integrates the router into the parsing pipeline. After inferring the file format and parsing records, we route each validated transaction:

For each record, if normalization succeeded (ok is True), we call route_transaction with the validated Transaction object. The router returns a result dictionary that we serialize to JSON with compact formatting. If normalization failed, we format the error string into the same {"ok": False, "error": ...} structure, ensuring every line of output follows the same schema, whether it represents a successful transaction or a normalization error.

Output Analysis

Let's examine the output to see how routing affects transaction processing:

Transaction 1 shows the high-value flag in action: with a unit price of 10.50 USD and quantity 3, the total is 31.50 USD, and the router correctly applied flag: "high_value" because the unit amount met the 10.00 threshold. Transaction 3, by contrast, has a unit price of only 3.00 USD, so despite having quantity 2, it processes as a standard addition without the flag because the threshold check examines the unit price, not the total.

Transaction 4 has a unit price of 15.00 USD, well above the threshold, earning the high-value flag. Transaction 6 has a single item at 25.99 USD, triggering the flag, while transaction 8's unit price of 100.00 USD with quantity 5 also earns the flag. These examples show that the router correctly identifies high-value transactions regardless of quantity.

Transaction 13 demonstrates the boundary case: a unit price of exactly 10.00 USD gets the high-value flag because our guard uses rather than . With a strict comparison, this transaction would be treated as a standard addition. It also shows the high-value flag applied to a transaction with a non-standard currency code ("XYZ"), proving that our routing logic is independent of currency validation.

Conclusion and Next Steps

Outstanding progress! You've mastered structural pattern matching for routing by implementing a declarative transaction dispatcher. The router examines operation types, captures nested values, applies guards to enforce thresholds, and returns consistent results for success cases. This pattern-based approach keeps routing logic declarative and maintainable, made even simpler by the strong upstream validation our domain model provides. Adding new transaction types or modifying business rules requires adding new cases rather than touching existing code.

The architectural decisions we made demonstrate best practices: ordering cases from specific to general ensures correct precedence, using guards for threshold checks keeps business rules visible, returning consistent result structures simplifies error handling downstream, and separating handlers from routing logic makes both easier to test and extend. These patterns scale well; as your ETL system grows, you can add new operation types, implement more sophisticated routing logic, or introduce additional flags without restructuring the router.

In the next lesson, we'll extend this foundation by introducing asynchronous I/O. You'll learn to process multiple files concurrently using asyncio, implement backpressure with bounded queues, and coordinate async tasks with task groups. This will transform your pipeline from a sequential processor into a highly concurrent system that maximizes throughput while maintaining control over resource usage.

The upcoming practice section will challenge you to build this routing system incrementally, starting with basic operation dispatch and progressively adding high-value flagging, refund handling, and robust error handling. You'll implement each pattern match case step by step, observing how case ordering affects behavior and how guards refine patterns. These hands-on exercises will cement your understanding of pattern matching and prepare you for building complex routing logic in production systems!

Previous Lesson

Next Lesson: Async Pipeline Backpressure

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal