Parse, don't validate - Python edition

This is inspired by “Parse, don’t validate” (worth reading even if you’re not familiar with Haskell) and Domain modeling made functional (an excellent book by Scott Wlaschin).

Consider this Python class representing an order:

@dataclass
class Order:
    id: UUID
    customer: Customer
    created_at: datetime
    accepted_at: Optional[datetime]
    paid_at: optional[datetime]
    shipped_at: Optional[datetime]

From these fields we can imagine a state machine for the orders in this system:

Order that is received -> Order that is accepted
Order that is accepted -> Order that is paid
Order that is paid -> Order that is shipped

However, with the class as it is above, what’s stopping us from making this erroneous state transition in our code?

Order that is received -> Order that is shipped

Let’s imagine what the code that handles shipping looks like:

def ship_order(order: Order) -> Order:
    # 1. Register some necessary information
    # 2. Update the order in some database
    # 3. Return the updated version of the order

Do you see how this pseudocode allows the erroneous state transition to happen? You may suggest that all we need is some validation to make sure that the input is in the proper state, so let’s add some validation:

def ship_order(order: Order) -> Order:
    # 0. Validate that the order we get as input here has the expected combinations of fields. I.e. validate that both `accepted_at` and `paid_at` is something other than `None`
    # 1. Register some information necessary
    # 2. Update the order in some database
    # 3. Return the updated version of the order
    pass

That should work. Our code is now safeguarded against us sending the ship_order function an order that’s not ready to be shipped.

Let’s consider another possible case though:

Order(
  id=some_id,
  customer=some_customer,
  created_at=some_date,
  accepted_at=None,
  paid_at=None,
  shipped_at=some_other_date,
)

Do you see the problem with this state? In the instance above we have an order that has not yet been accepted, nor has it been paid, but it has been shipped! How can we avoid this?

Above we considered the function ship_order, but we can easily imagine many different functions in our order handling system that take an Order as input:

How can I as a new developer in the order management project know when it’s safe to call any of these with my own Order instance? The signature of our example function here, def ship_order(order: Order) -> Order isn’t telling us very much about what’s acceptable or not. There may (or may not) be validation code in there that’ll make sure that nothing too bad happens at runtime, but I’m over here in compile-time.

One mitigation that makes a lot of sense is to improve variable names. We could rename the input variable in our ship_order function like this: def ship_order(paid_order: Order) -> Order, and that would give me a hint as to what goes there. But there’s nothing mypy, flake8 or any other linter can do about me writing the code ship_order(my_received_order).

A better way to solve this is to have more specialized classes. Consider these:

And the only way to construct these is to have smarter constructors than that of our big Order-class. Let’s reconsider our previous ship_orderfunction:

def ship_order(order: Order) -> Order:
    # 0. Validate that the order we get as input here has the expected combinations of fields. I.e. validate that both `accepted_at` and `paid_at` is something other than `None`
    # 1. Register some information necessary
    # 2. Update the order in some database
    # 3. Return the updated version of the order
    pass

If we’d instead use something like this:

def ship_order(order: PaidOrder) -> ShippedOrder:
    # 1. Register some information necessary
    # 2. Update the order in some database
    # 3. Return the ShippedOrder
    pass

No validation code is needed here! Since the PaidOrder class is stronger than our old Order class, we are guaranteed that the input has the legal combination of fields set. We could not possibly call this new ship_order function with an order that doesn’t have the accepted_at and paid_at fields set to None, because the type of those fields in our PaidOrder class is datetime, not Optional[datetime].

Benefits of this approach:

The point in “Parse, don’t validate” is to parse your data into a type that makes sure that your invariants are guaranteed to be held by the type system, rather than having validation code here and there to make sure everything is in order. Scott Wlaschin drives this point home in Domain modeling made functional, by keeping the types small and specialized.