Event Sourcing vs CRUD

Event sourcing is a pattern for storage where data is stored in an append-only log. An event is something that happened, a fact. They are a rich source of data as they show a list of timestamped actions, and they can be used in the future to unlock additional views on the data.

We always refer to events in the past tense for example:

  • team_created_v1
  • basket_product_added_v1.

Events are snapshots of things that happened. Here are examples of events for a team. The first event creates the team, and the second is where a management position is changed so the team ownership is updated and the third updates the team name:

{
  type: "team_created_v1",
  data: {
    "name": "Team One",
    "owner": "Tim",
    "created_by": "Brian"
  }
}
{
  type: "team_owner_updated_v1",
  data {
    "new_owner": "Emma",
    "custom_reason": "Alex left company",
    "updated_by": "Fred"
  }
}
{
  type: "team_name_changed_v1",
  data {
    "new_name": "Super Team One",
    "updated_by": "Emma"
  }
}

Rather than capture this in a CRUD database with a table of users, we now have an ordered collection of events. These events can be fetched and replayed on-top of each other to give the final state of the Team.

The first event team_created_v1 creates the Team:

{
  "name": "Team One",
  "owner": "Tim",
  "created_by": "Brian",
  "last_updated_by": "Brian"
}

Then the next event team_owner_updated_v1 is applied and the Team is updated:

{
  "name": "Team One",
  "owner": "Emma",
  "created_by": "Brian",
  "last_updated_by": "Fred"
}

Then the final event team_name_changed_v1 is applied and the name of the team is updated:

{
  "name": "Super Team One",
  "owner": "Emma",
  "created_by": "Brian",
  "last_updated_by": "Fred"
}

With this model we still only get the final version (Super Team One).

We also get the rich history of the events that happened in order to put us into this state. We wouldn't be able to see those changes with a traditional CRUD database - as that is only a point-in-time snapshot.

We have the advantage of auditing what happened as well being able to take the events and play them into a test system to see what the results output - we cannot do that with a CRUD snapshot.

CQRS

Event Sourcing plays nicely with CQRS (Command Query Response Segregation) where the Command to save data operates separately than the Query to read data. Often writes and reads scale differently - and in CQRS there is a read-only structure ready to provide the data, allowing greater horizontal scaling.

When you append to your event stream, a process should update the read model. This can be done immediately via a transaction or via a stream and process like SQS and lambda - but this is where things can get overly complicated.

As a rule of thumb, only split out the updating from a stream if your database is under stress and cannot cope with the update of a small snapshot database. I have only seen this once, where the read model was out of the database in DynamoDb, and that created an incredible headache (but it was at JustEat with millions of orders).

In the systems I have worked on, where we did not have internal streams, we decided to update the projections at the same time as inserting the event so our projections were immediately available.

You can also enforce referential integrity at the end of the projection update but the better approach is to be eventually consistent so you don't need a transaction - the projections will update eventually.

GitHub Example Repository

I've put together some examples of Event Sourcing in action including a EventSourcingBase class which wraps up a Node example here:

Useful links