Parse, don't validate - Python edition
This is inspired by “Parse, don’t validate” (worth reading even if you’re not familiar with Haskell) and Domain modeling made functional (an excellent book by Scott Wlaschin).
Consider this Python class representing an order:
@dataclass
class Order:
id: UUID
customer: Customer
created_at: datetime
accepted_at: Optional[datetime]
paid_at: optional[datetime]
shipped_at: Optional[datetime]
From these fields we can imagine a state machine for the orders in this system:
Order that is received -> Order that is accepted
Order that is accepted -> Order that is paid
Order that is paid -> Order that is shipped
However, with the class as it is above, what’s stopping us from making this erroneous state transition in our code?
Order that is received -> Order that is shipped
Let’s imagine what the code that handles shipping looks like:
def ship_order(order: Order) -> Order:
# 1. Register some necessary information
# 2. Update the order in some database
# 3. Return the updated version of the order
Do you see how this pseudocode allows the erroneous state transition to happen? You may suggest that all we need is some validation to make sure that the input is in the proper state, so let’s add some validation:
def ship_order(order: Order) -> Order:
# 0. Validate that the order we get as input here has the expected combinations of fields. I.e. validate that both `accepted_at` and `paid_at` is something other than `None`
# 1. Register some information necessary
# 2. Update the order in some database
# 3. Return the updated version of the order
pass
That should work. Our code is now safeguarded against us sending the ship_order
function an order that’s not ready to be shipped.
Let’s consider another possible case though:
Order(
id=some_id,
customer=some_customer,
created_at=some_date,
accepted_at=None,
paid_at=None,
shipped_at=some_other_date,
)
Do you see the problem with this state? In the instance above we have an order that has not yet been accepted, nor has it been paid, but it has been shipped! How can we avoid this?
Above we considered the function ship_order
, but we can easily imagine many different functions in our order handling system that take an Order
as input:
accept_order
return_order
update_order
- etc.
How can I as a new developer in the order management project know when it’s safe to call any of these with my own Order
instance? The signature of our example function here, def ship_order(order: Order) -> Order
isn’t telling us very much about what’s acceptable or not. There may (or may not) be validation code in there that’ll make sure that nothing too bad happens at runtime, but I’m over here in compile-time.
One mitigation that makes a lot of sense is to improve variable names. We could rename the input variable in our ship_order
function like this: def ship_order(paid_order: Order) -> Order
, and that would give me a hint as to what goes there. But there’s nothing mypy, flake8 or any other linter can do about me writing the code ship_order(my_received_order)
.
A better way to solve this is to have more specialized classes. Consider these:
ReceivedOrder
AcceptedOrder
PaidOrder
ShippedOrder
And the only way to construct these is to have smarter constructors than that of our big Order
-class. Let’s reconsider our previous ship_order
function:
def ship_order(order: Order) -> Order:
# 0. Validate that the order we get as input here has the expected combinations of fields. I.e. validate that both `accepted_at` and `paid_at` is something other than `None`
# 1. Register some information necessary
# 2. Update the order in some database
# 3. Return the updated version of the order
pass
If we’d instead use something like this:
def ship_order(order: PaidOrder) -> ShippedOrder:
# 1. Register some information necessary
# 2. Update the order in some database
# 3. Return the ShippedOrder
pass
No validation code is needed here! Since the PaidOrder
class is stronger than our old Order
class, we are guaranteed that the input has the legal combination of fields set. We could not possibly call this new ship_order
function with an order that doesn’t have the accepted_at
and paid_at
fields set to None
, because the type of those fields in our PaidOrder
class is datetime
, not Optional[datetime]
.
Benefits of this approach:
- It safeguards against accidental errors, such as
ship_order(my_received_order)
. - New readers of the code will pick up on the rules of the domain quicker than if they’d have to study a (huge)
Order
class which could be in various hidden states. - It saves you the need of writing tons of validation code. You never have to write a function that validates that an order in the “shipped” state has all of the fields
accepted_at
,paid_at
,shipped_at
fields set. In theShippedOrder
class, none of those fields would beOptional
.
The point in “Parse, don’t validate” is to parse your data into a type that makes sure that your invariants are guaranteed to be held by the type system, rather than having validation code here and there to make sure everything is in order. Scott Wlaschin drives this point home in Domain modeling made functional, by keeping the types small and specialized.