mandag 8. mars 2010

Persistence ignorance

This post is very much related to my post about relational databases and OO design. Specially the domain model part. A problem with the way data access was presented in the last post was that it was drawn as a layer between the domain model and the database. Of course when writing applications we need a layer between the database and whatever using it. The thing is that the domain model needs more than simple CRUD. Multiple CRUD operations has to be batched and handled in transactions. This is when the concept of persistence ignorance makes it's grand entrance. We need a way to cleanly handle storing/persisting our entities within transactions so that we can ensure that processes executes successfully or rolls back all changes.

I talk about entities here and of course you can use dataset's and other types of data carriers. Since I mainly work in C# which is an object oriented language my preference is to work with POCO's.

First off we need a way to persist that entity somewhere. The reason why I use the word persist instead of storing to the database is that location or type of storage is not relevant to the domain model. The only thing the domain model needs to know is that it's entity is persisted somewhere so that it can get to it later.

How do we create this magical peice of code that will handle all persistance and transactions for us? Well, we don't! Unless you feel the need to reinvent the wheel. Most ORM frameworks implements some kind of persistance ignorance, some container that can keep track of changes made to your entities and commit to the storage or roll back. There are some great frameworks out there that you can use. My personal favorite being NHibernate.

That being said you can make a mess with ORM's too. Some people talk about creating an application with Entity Framework or NHibernate. This is usualy a sign that the source code is full of ORM queries and connection/transaction handling. Again these are issues the domain model shouldn't have to deal with. It should focus on cleanly implementing it's spcified functionality. Not deal with these kind of technical details.

Let's take a minute too look at transactions. Transactions live within what we call a transaction scope. A transaction scope starts when you start the transaction and ends when you commit or roll back. So what would be included in a transactional scope? Let's say we're writing some code that updates some information on a customer and on the customers address which is stored in a separate table. Would we want both those updates within a transaction scope? Indeed! Then what about that other function that does various updates and then calls the contact and address update function? Shouldn't we have a transaction scope wrapping all of that too? Well of course so lets add some transaction handling to this function too and make sure we support nested transactions for the customer and address function. And with that the whole thing started giving off an unpleasant smell.. We have just started cluttering our code with transactions left, right and center. Now what?

Let's take a look at the model again. We can visualize the domain model as a bounded context. It has it's core and outer boundaries.Through it's boundaries it talks to other bounded contexts (UI, Other services, Database...). Take the UI. The UI would call some method on the domain model's facade and set of the domain model to do something clever. My point being that the domain model never ever goes of doing something all of a sudden. Something does a request or triggers an event. Something outside it's boundries always requests or triggers it to do something. These requests and triggers are perfect transaction scopes. They are units of work. These units of work knows exactly what needs to exist within transactional scope.

Unit Of Work is an implementation pattern for persistance ignorance. We can use this pattern to handle persistence and transactions. Let's say that every process or event triggered in the domain models boundry is a unit of work. This unit of work can be represented by an IUnitOfWork interface. To obtain a IUnitOfWork instance we use a IWorkFactory. By doing this we end up with a transaction handler which we have access to from the second our domain code is invoked until the call completes. How would a class like this look? Well we need some way to notify it about entities we want it to handle. Let's call the method Attach and give it a parameter entity. Now we can pass every entity object we want to persist to the Attach method of the IUnitOfWork. We also need a way to remove entities from storage. We'll create a Delete method for that. If the current unit of work succeeds we need a way to let it know that all is good and then go ahead and complete the transaction. Let's call this method Commit. This gives us a simple interface for handling persistence.

   IUnitOfWork
      T Attach<T>(T entity)
      void Delete<T>(T entity)
      void Commit()

The code using it would look something like this.

   using (IUnitOfWork work = _workFactory.Start())
   {
      MyEntity entity = new MyEntity();
      work.Attach<MyEntity>(entity);
      work.Commit();
   }

Since we are using something like for instance NHibernate in the background we would have to retrieve entities from storage through NHibernate and then attach them to IUnitOfWork. IUnitOfWork of course uses NHibernate in the background for all persistence. Because of the nature of ORM's like NHibernate it would make more sense to include entity retrieval through the IUnitOfWork too since every entity retrieved is automatically change tracked by the NHibernate session. That would also let us abstract NHibernate better from our domain model. Lets add a few functions to the IUnitOfWork interface to accomplish this. We would need a GetList function to be able to return a list of entities and maybe a GetSingle function to return a single entity. Get single would have to be able to retrieve through identity to take advantage of caching within  the ORM framework and also be able to pass queries where if using NHibernate we could use IDetachedCriteria. If you want complete abstraction you can make your own query builder which can convert to NHibernate queries internally. Now the IUnitOfWork interface would look something like this:

   IUnitOfWork
      T GetSingle<T>(object entity)
      T GetSingle<T>(IDetcahedCriteria criteria)
      IList GetList<T>(IDetcahedCriteria criteria)
      T Attach<T>(T entity)
      void Delete(object entity)
      void Commit()

To obtain the active instance of IUnitOfWork from anywhere in the code we can create a WorkRepository class. We'll just have the IWorkFactory register the unit of work with the WorkRepository using thread id as key. Doing that would enable us to issue the following command in whatever class we want to use the unit of work:

  public void SomeFunction()
  {
    ...
    var unitOfWork = WorkFactory.GetCurrent();
    var customer = unitOfWork.GetSingle<Customer>(customerID);
    ...
  }

How is that for a database layer? This is most likely all you need. Smack repository pattern on top of that and  your code will be pure cleanly written domain functionality. Now go ahead and solve real world problems ignoring all the complexity that comes with persistance, transactions and retrieving entities.