fredag 17. september 2010

Parallel takes you half the way

Parallel is the future! You can't go anywhere without hearing those words these days. And it's the truth. We all have multi core machines now. Languages prepare for this by implementing features that makes it easier to write code running in parallel. Microsoft for instance is pushing it's Task Parallel Library with .Net Framework 4 including parallel for, parallel LINQ, Parallel.Invoke and more. We even have debugging tools in Visual Studio helping out when debugging our parallel code. We have it all at our fingertips. Writing parallel code has never been easier! Soo.. why does it feel so complex and awquard? It's all there, the tools, the language features. We have all that we could ask for and still there is one crucial part that has been left out. It cannot be fixed by tools or frameworks because it is: Our minds!

Bare with me for a second. We create software to solve problems. A problem is often described by something like "Given A when something about A equals x then our application should produce Z". We have a starting point A that when satisfying some condition should produce Z. We produce executable code that does what is necessary to get from A to Z.

Running the code the execution path would look something like this. From a starting point it will execute the code, step by step, function by function going deeper into the call stack until the result Z is produced and from there back to where it started out.
To speed things up we can decide to make it parallel. What we would do is to fork the line going deeper into the call stack into several parallel sequences. Before returning back to where it started out those parallel sequences are joined together to one sequence. We are following the exact same concept as before. We have just implemented a different concept within it. The concept of executing multiple sequences of steps in parallel. For some scenarios this is a perfectly viable solution. It's viable for the times we want synchronous behavior.

Most of the time we will end up writing code like this. The problem though is that parallel or not this implementation is synchronous. Parallel execution is NOT synchronous. We make it synchronous by making sure we join the parallel sequences together into a single sequence again as they finish. Thus making it synchronous.

If the future is parallel the future is also asynchronous. When entering into a parallel future we simply cannot keep writing our software in a synchronous manner. We need to reeducate our minds with asynchronous being the norm while the exceptions are synchronous. Frankly it shouldn't be all that difficult. Our whole life is based around asynchronous behavior. After sliding a pizza into the oven we don't sit and stare at it until it's done. We put on a timer or make sure we check the time now and then and go do other things. We do it all the time. It's natural. Forcing parallel behavior into a synchronous setting on the other hand is not and it is going to be painful.
Some people has gone this rout already. Greg Young is one of them with his architectural pattern CQRS. If you have not seen his talk on the subject I strongly recommend you look into it. He recorded one of his online sessions for everyone to enjoy. You can download it here. It is a full day session but it's worth every minute.
Like the title states. Parallel takes us only half the way. The rest is up to us. We need to change the way we think so that we design our systems and write our code in a way that suites this parallel future. When solving a problem through software we automatically give the problem a synchronous setting. That is what we have been taught. This is the way it's been for a long time. We have become technically challenged. We have become so good at solving asynchronous problems in a synchronous fashion that it even feels natural. If given the same problem in a real life scenario we would probably have solved it asynchronously.
Lets imagine a CEO wanting to keep track of her contacts. How would we solve this problem. Oh, wait. You just had the following thought didn't you: "A screen where you can type contact information then press save and it'll save the contact in a database and if something fails throw an error back to the user". Our technically challenged minds just made the scenario synchronous. It became type, save, confirm. If we ask the CEO what she usually does she'd say: "Well I usually tell people to leave their contact details with my secretary". And that my friends is a concept called forking! She just passed that task off to someone else so that she could go do other things. She made the task completely asynchronous. How can this be, what if the secretary throws an exception (forgets), what if there's already a contact with the same name, what if, what if..
So why does this work in the real world? Because of trust and the fact that context is taken into consideration. The CEO trusts the secretary to store the contact details. She also knows that if something goes wrong she'll just tell the secretary to look up the information a second time. Even though it fails now and then the overall goal is met. And the overall goal being that the CEO spends less time dealing with keeping track of contact information. Through trust and context the matter was handled efficiently with sufficient error handling.
Most scenarios aren't type, save confirm. We just make them type, save, confirm because that is how we usually solve everything.

When writing software in a "wait for confirmation" manner we effectively state that our software will fail to do what it is supposed to do more than 50% of the time. Because for it to be viable for us waiting for the response it would have to respond with something unexpected most of the time. Either that or that every other task we could possibly perform depended on this tasks response. When accepting that the system failing is the exception not the norm we can start thinking about error handling in a different way. If the failure rate is very low the but a single failure is extremely costly business value wise we could even set up manual handling for failures and use fire and forget. If that is the right solution for the business then that is the right solution technically.

When working on AutoTest.NET I discovered that even immediate feedback isn't really immediate. AutoTest.NET is a continuous testing tool for .NET. The workflow when using it would be:

  1. Write code
  2. Save all changes
  3. AutoTest.NET builds and runs tests
  4. AutoTest.NET outputs red or green

It's purpose is to provide immediate feedback regarding the state of the code. My initial workflow included waiting from I had pressed "save all" until I could see the output I had expected. But as stated earlier, for waiting to be viable I should produce flawed code on more than 50% of my saves. And I don't. The response is usually what I expect so it's more efficient for me to keep on working as AutoTest.NET is doing it's job. As I save approximately every 30 seconds a 20 second delay would be considered immediate feedback. It's immediate because of context. I only need to know within my next save point, which is 30 seconds later. When it yields unexpected results the solution is only a couple of Ctrl+Z's away.
Given the context even immediate feedback can be dealt with asynchronously.

To go all the way we need to embrace asynchronous behavior with parallel programming. Make asynchronous the default behavior. That's what we do everyday in real life. Terms like eventual consistency and asynchronous behavior isn't even something we consider when we go about our business. Because in real life that is the way everything works.

søndag 22. august 2010

Testing through interfaces

When at Vagif Abilov's BBQ while talking about testing Greg Young said something like "I wonder what tests would look like if we wrote tests against interfaces instead of their implementation". And really, what would they look like?

The scenario was that you have an interface that has multiple implementations and you would want to write tests against the interface testing all implementations. This would seriously reduce the number of tests you would have to write. So let's give it a try.

The first thing we need to keep in mind is that we're writing the tests on something abstract. This means that we don't really know what to expect. When passing x in to the various implementations it's not certain that all of them will answer y. Actually hopefully only one of them would answer y where x is act and y is assert. If not there would be multiple implementations doing the exact same thing and that would kind of defeat the purpose. Off the top of my head that leaves us with the following scenarios:

X stays the same while Y varies
This would be something like an ICalculator. The ICalculator would have implementations like DecimalCalculator and OctalCalculator. When running tests here we would end up with results like this:
  • DecimalCalculator: 7*7 = 49
  • OctalCalculator: 7*7 = 61
Which means that when writing these types of tests we need to be able to handle asserting on certain values pr. implementation.

X varies while Y stays the same
Let's imagine that we have some type of parser taking xml in returning a list of objects of a certain type. This would typically mean one implementation pr. xml schema while the output could be the same. So writing these types of tests  we'll have varying code for passing parameters while the assert would stay the same.

Ok, that wasn't too bad. We could probably make this look clean. Now over to some other aspects that we'll have to deal with.

Dependencies
With the right (wrong) implementation faking dependencies might be a hellish thing with this solution. I guess that's a good thing as it forces us to not make a mess of it. But still we need some way of handling setting up dependencies for the implementations.

Resolving implementations
We need a way to retrieve all implementations for an interface. Of course this is something we do all the time with DI containers so any DI container would provide us with what we need here. We could probably do something smart here to inject the faked dependencies we'll need for each implementation.

With this in mind let's set up a test for the calculator scenario. The first thing I did was creating a class for handling the plumbing. Right now this class takes care of resolving all implementations of the chosen interface, running the test on each implementation and performing specified assertions. My test ended up looking like this:

        [Test]
        public void Should_multiply()
        {
            var tester = new InterfaceTester<ICalculator>();
            tester.Test(c => c.Multiply(7, 7))
                .AssertThat<DecimalCalculator>().Returned(49)
                .AssertThat<OctalCalculator>().Returned(61);
        } 

I'm quite happy with that. This test is both extend able and readable. Now let's do the same to the scenario with the string parser. I'll just extend the plumbing class used in the previous example to handle varying input parameters. The implementation ended up looking like this:


        [Test]
        public void Should_parse_number()
        {
            var tester = new InterfaceTester<INumberParser>();
            tester
                .Test<XmlParser>(x => x.Parse("<number>14</number>"))
                .Test<StringParser>(x => x.Parse("14"))
                .Returned(14);
        }


I can't say I'm as happy with this one as the complete delegate is copied for both implementations and not just the part that differs. But still it's a huge simplification compared to writing a full test suite pr implementation.

I guess I'll leave it at that for now. What this does not cover is setting up dependencies which likely will complicate the implementation a bit. After doing this implementation I can really see the value of writing my tests like this. It would save me time and energy and would leave me with a cleaner simpler test suite. The implementation ended up being fairly simple. Initial conclusion: Writing tests against interfaces is a good idea!

I'd love to hear  your thoughts on this! And if you're interested in the full source code let me know and I'll upload it to github or something.


lørdag 21. august 2010

New challenges, starting my own business

Times are changing and in a bit more than a week I'll be starting up a company with four others. I was asked to join them and after some thinking I said yes. Now I'm throwing myself off the cliff having a firm belief that wings will grow before I hit the ground :)
So what is the company about? Our goal is providing skilled, experienced people specialized in their field. All five of us have our own specialties that go we'll together from developer to analyst. From a business point of view we have deep knowledge with enterprise software specially oil/energy trading and business applications. The company has gotten the name Contango Consulting AS.

I am very excited about realizing a dream of being responsible for my own future. Trying to do my own thing. It's going to be tough and I'll probably learn more in a year than I have done up to now. I'm also certain that this blog will be affected by this. Hopefully it can result in my learnings ending up here for others to enjoy. And of course if any of you are in need of a skilled .NET developer / architect / trainer don't hesitate to let me know :) You can view more detailed information about me here.

-Svein Arne

tirsdag 10. august 2010

Nu on Linux (debian based systems)

There's a lot of activity these days on the Nu project. Short the Nu project is for .Net what gems are for Ruby. In fact it uses gems. In my last post I talked about tooling and where I wish tooling would go in the future. Package management is definitely one of the tools that will help us in the future. If you want to read up on Nu Rob Reynolds has some good posts explaining what it is and how to use it. What I'm going to go through here is just what is needed to get it working on Linux. There are just some small tweaks that needs to be done for it to be working.

If you don't have Ruby on your system already you'll need it.
sudo apt-get install ruby-full build-essential
Next you'll need to get gems.
sudo apt-get install rubygems
Ok, then we have all dependencies required to get going. Now lets get Nu. Nu is installed through gems like this (NB! make sure you install nu with root priveleges or it won't work).
sudo gem install nu
Good, now we have all we need to start using Nu. Just one thing, when running Nu I got an error saying "no such file to load -- FileUtils". The reason for this is that Linux has a case sensitive file system and the file is called fileutils.rb not FileUtils.rb.  If you also end up having this issue go to the folder containing fileutils.rb (something like /usr/lib/ruby/1.8) and create a symlink by running the following command.
sudo ln fileutils.rb FileUtils.rb
Now to some real Nu action. Let's say we have a project and we want to start using NHibernate. What you have to do is to go to the root folder of your project and type this command.
nu install nhibernate
You'll get a couple of warnings but it's going to do what it's supposed to. When it's done you can go into the newly created lib directory and see NHibernate and it's dependencies in there. Neat huh!?

mandag 9. august 2010

Tooling visions

Lately I have felt more and more uncomfortable about the tooling I'm currently working with. I feel a lot of the tools are not helping me reaching my goal. Frankly their in my way. The whole thing started when I started using ReSharper and saw what an IDE is about. ReSharper truly helps you accomplish things. It keeps you focused on your real goal: producing quality code. The only bad thing about ReSharper is that it's tied to Visual Studio. Visual Studio has become a horrid beast of an application. It's packed with features that to me has nothing to do with the application I use to write code. It doesn't help me any more than it's in my way.
My second wakeup call was when I started using Git. Specially after watching Linus Torvalds talking about he's ideas behind it and why he made it like he did. One of the reasons he says was to create a source control system that does it's job well and doesn't get in your way. And he succeeded! When working with Git you have to deal with it when you pull, commit or push. And those are the times you're supposed to deal with source control. You shouldn't have to deal with source control because you want to edit a file or having the source control system insert padlocks left right and center. You shouldn't have to think about whether your computer is online or offline when you work with your code. Going back to working with TFS made me realize how much time I waste using a tool. Time that I could have spent solving real problems.

Ok, that was the venting part :) Now to something a little more constructive. I have been thinking about how my ideal set of tools would look and what would be important. There are some points I really want to focus on. The first point applies as well to code: SEPARATION OF CONCERNS! Each tool should help you solve a single problem and it should do it well, extremely well. Second, as mentioned it should stay out of your way. It should know what it's trying to help you solve and act like that twin that completes your sentences. Third, it should not compromise. When you work with a mess of a project having everything depending on everything it should be painful. The tool should keep it's focus on helping you produce quality code and not make compromises to help dealing with a festering pile like Uncle Bob puts it.

First off let's deal with what we call the IDE. To me an IDE is notepad+ReSharper+navigation, and that's what I think it should be. It should be there to help us produce quality code as efficiently as possible providing intellisense, auto complete, refactoring and everything that has to do with writing code. And that to me has nothing to do with building binaries, running tests, debugging and deploying. Though I understand why IDE's have ended up where they are it's time to move on. We're no longer hacking and hoping. We don't set breakpoints and step through half the application as part of our work pattern. We write code and watch tests fail and pass. To me the IDE is about efficiently writing code.

Of course we need to compile and run tests and that should be it's own tool. We already have continuous testing tools like JUnitMax, Ruby's autotest and AutoTest.NET which I'm currently working on (add cheesy commercial part here). This tool should basically stay out of your way. The only time we would want to interact with this tool is when we have broken something. It should build and run only what it needs to and only grab our attention when something has gone wrong. This is the tool that would bind the editor and the debugger together. When something has gone wrong we should be able to get the right and enough information. When builds or tests fail we should be able to easily move to the right file and position in the editor to fix whatever is wrong.

Now to the debugger. The way I see it debuggers the way they work today are optimized for full system debugging not the simple "now what the heck did I just do to fail this test". And that's what I'm looking for 95% of the time. For these types of tasks I don't think that debugging through the IDE is helping. I don't think displaying the code file by file, class by class, function by function like it's written is the best way. And certainly not stepping through it. Something I do think would be more efficient is analyzing a series of snapshots showing where the execution had it's turning point, what threw an exception and things like that. I have tons of idea's that I'm hoping to realize through the ClrSequencer project. I think I'm going to dive a little bit deeper into this in another blog post.

I guess that's enough rambling for tonight. It's probably not the last thing you'll hear from me on the subject. Tooling is very important and tooling should help you not fight you and I have been feeling a lot of the latter lately.

fredag 6. august 2010

Continous testing with AutoTest.Net

It's about time for me to do some writing about AutoTest.Net. This is a project I have been working on for the last 2-3 months. It's a continous testing tool for .Net originally based on ruby's autotest. After playing with Ruby for a couple of evenings I really enjoyed the way you could work with it. My usual work circle of "write code, build, wait, run tests, wait" was replaced with "write code, save". Luckily the code I write tends to work more often than it breaks so waiting for builds and tests to run is a waste of time. Specially when I build and run tests about every 30 seconds or so.
So after the joy of working like that in Ruby I was determined to find a tool like that for .Net. After a bit of searching I found AutoTest.Net. The project was hosted at code.google.com and was initiated by James Avery but because of not having enough time on his hands the project was put on hold. I really wanted a tool like that so I went and got his permission to continue the project. It's now hosted on github.com/acken/AutoTest.Net. Today it contains support for both .Net and mono and it's cross platform. NUnit, MSTest and XUnit are the testing frameworks supported today and MbUnit will be added soon. It supports running tests from multiple testing frameworks in the same assembly.

Now, how does it all work you say? The whole thing consists of a console application and a winforms application. By now I use the winforms app about 98% of the time so that's what I'm going to show here.
The first thing you do ofcourse is to go to this link and download the latest binaries and unzip them to the folder of your choice. Locate the file named AutoTest.config and open it in your favorite xml editor. Now let's edit a few settings:

  1. DirectoryToWatch: Set this property to the folder containing the source code you want to work with. AutoTest.Net will detect changes to this folder and it's subfolders.
  2. BuildExecutable: This is the path to msbuild or in mono's case xbuild. You have the possibility to specify a version of msbuild pr. framework or visual studio version.. For now let's just specify the default <BuildExecutable> property. Something like C:\Windows\Microsoft.NET\Framework\v3.5\MSBuild.exe.
  3. Now let's specify a testing framework. You can pick from zero to all though zero wouldn't be any fun. I'll go with NUnit in this example.
  4. The last thing we want to do is to specify a code editor (<CodeEditor>). Let's pick visual studio. We can pass visual studio the file to open and the line to go to. Sadly there's a bug in visual studio preventing it to go to the right line :( So for now we'll rely on Ctrl+G. Anyways the config has visual studio set up correctly by default. Just make sure the path to devenv.exe is the same as on your machine.
Now we're ready to start the AutoTest.WinForms.exe application and do some real work.The first thing you 'll see is a form looking like this.


The only interesting thing right after startup is the button in the top right corner. As you can see now (gonna do something about the colors) the button is yellow. Behind this button you'll find the status for AutoTest.Net application. It's yellow now because the configuration has generated a warning. If the button is red an error has occurred within AutoTest.Net. Right now the window will look like this.


So let's go ahead write some jibberish in one of the files inside the folder we're watching and save the file. AT.Net should start working right after you save the file.


And of course jibberish means errors which will result in this.


When selecting one of the lines in the list you'll get the build error/test details underneath and you can click on the links to open the file in visual studio. Now let's fix the error we just made and save the file and let's see what happens.


And as expected it goes green with 5 succeeded builds and 221 passed tests. That's basically it. From here on it's lather, rinse, repeat.

Right now it's in alpha and it will of course have some bugs here and there. I hope this post will tempt you to try it out. Even at an early stage like this it's a really effective way of working!

lørdag 29. mai 2010

(Red-Green)N-Refactor

Red/Green/Refactor is the TDD mantra. And fair enough, it's a good description of the concept of TDD which I guess it's meant to describe. Heck it even describes the order you should perform these steps in when writing a test. Make sure the test is able to fail, make sure it passes for the right reasons and when done clean up. This is all good but I see people being mislead by this mantra. To explain this a bit better let's have a look at Uncle Bob's three rules of TDD
1. You are not allowed to write any production code unless it is to make a failing unit test pass.
2. You are not allowed to write any more of a unit test than is sufficient to fail; and compilation failures are failures.
3. You are not allowed to write any more production code than is sufficient to pass the one failing unit test.
Following these rules (and you should!) you'll see that when writing code you're mantra should in fact be red/green/red/green/red/green/............./refactor. Frankly though that wouldn't make a very good mantra. At the same time red/green/refactor doesn't work as an implementation pattern. The same way as agile methods will have you delivering working software in short iterations these three rules will deliver a suite of passing tests in about 30 second long iterations. And more importantly every red/green cycle will have you focusing on a single small task which is an efficient way of working. Summing up the mantra through Uncle Bob's three rules we get red/green repeated until the test is completed. This gives us (Red-Green)N.
So, why isn't refactoring part of the three rules of TDD? TDD or not TDD you should ALWAYS refactor. Make sure your code is readable and maintainable. TDD just happens to make your code a lot easier to refactor safely. Given that the concept of TDD tells you to refactor for every completed test (and you should!) the mantra you should follow when implementing is (Red-Green)N-Refactor.

onsdag 19. mai 2010

Focus on functionality, use technology - Revisited

Lars-Erik Kindblad  posted some great questions about the implementation in my previous post. I thought I'd post the reply here as the questions pins a lot of the thought behind the previous post. The code up for discussion is:

public void BlockCustomer(int customerID)
{
    var customer = getCustomer(customerID);
    customer.Block();
}

He's questions were:
1) How would you go about unit testing the second example? Since the customer is not injected into the constructor, the unit test would have a dependency on the customer.Block() method?
2) Would customer.Block() (and customer.Save(), Remove(), Notify() etc.) break the Single Responsibility Principle?
3) I agree customer.Block() creates more readable code, but it can create large and messy entity classes when the code base is growing. I've seen such classes with 1000+ lines of code. In that case I rather prefer using manager/service classes in order to partition the functionality across multiple classes.
So, first thing. In my code I use repositories extensively. Most likely the getCustomer method would use some repository to fetch the customer. The class holding the BlockCustomer method would therefore have the ICustomerRepository injected. When dealing with dependency injection I separate between stateless classes and state full classes. Usually I automatically inject stateless classes through a dependency injection container.  The repository class is a stateless class and would be injected into the class holding the BlockCustomer method automatically. The repository would be responsible for constructing the state full class Customer. To make sure that we are  able to write maintainable tests for the BlockCustomer method we'll have the repository return an ICustomer interface.

On to the second question, what about SRP. He is absolutely right, having the save and remove methods on the customer would break the Single Responsibility Principle.When using a domain model I'm not a big fan of using that type of active record implementation. Add and Remove is something I would have implemented in the ICustomerRepository. Within the repository I would use some kind of data mapper framework (NHibernate, EF..). Preferably one that supports automatic entity change tracking so that I would not have to implement Save. But still he has a point. From the first example in the previous post we could see that customer.Block() would have to do more than modifying it's own state. However I don't see anything wrong with it delegating the rest of it's work to a dependency. The dependency would be injected into the Customer class by the customer repository before handing it off.

class Customer
{
    ...
    private CustomerState _state;
    private IDependencyNeededToBlock _dependencyNeededToBlock;
    
    public IDependencyNeededToBlock DependencySetter
    {
        set
        {
            if (_dependencyNeededToBlock != null)
                throw new Exception("Dependency already set");
            _dependencyNeededToBlock = value;
        }
    }

    public void Block()
    {
        _state = CustomerState.Blocked;
        _dependencyNeededToBlock.DoWhatIsNeeded();
    }
}

However what is important to note is that the implementation should show the intention of the functionality. Since the intent of this functionality was blocking a customer it's natural that the functionality is initiated by the customer. If on the other hand the functionality was about something else and part of it was blocking the customer then of course that other functionality would not be delegated by the customer.

If the Customer class now is limited to modifying internal state and calling dependencies we should have a nice and maintainable class. On the other hand if it still ends up as a 1000+ lines class we probably need to redefine our perception of the word customer :)

torsdag 13. mai 2010

Focus on functionality, use technology

As time goes I find myself more and more intrigued by Domain Driven Design. It is for me like a shift where we go from forcefully molding functionality to fit the technology to elegantly produce functionality with the help of technology. Instead of looking at a piece of code and seeing Nhibernate sessions / Datasets and terms like update, delete and insert we'll see the true intention of the functionality we're looking at. The following example is code the way I would have written it some time ago.
    public void UpdateCustomerState(int customerID, CustomerState newState)
    {
        DataSet ds = getCustomerDataset(customerID);
        ds.Tables["Customer"]
            .Rows[0]["CustomerState"] = stateToInt(newState);
        
        switch (newState)
        {
            case CustomerState.Blocked:
                // Do varius stuff done to blocked customers
                break;
                // Handle more state variants...
        }
        updateCustomerDataset(ds);
    }
Now, looking at this code it looks pretty much like any code we would expect to find anywhere in any source code right? It has split some functionality out into separate methods to clean up the code. It reads pretty well so we can understand what it does, it sets the state of the customer. So what's the problem?
The problem is that it's all done from the tooling perspective and not from the actual intent of the functionality. What happened here was that the developer was told to create the functionality for blocking a customer. The developer did as we usually do.. went right into techno mode. "Ok, we have this state field in the customer table. If I just set that field to state blocked and make the necessary changes to the linked rows in table x and y that should do the trick." And when done the code looked like the code above. This is very much like what happens in Hitchhikers Guide To the Galaxy when Deep Thought reveals that the answer to the Ultimate Question of Life, the Universe, and Everything as being 42. As we know the answer wasn't the problem. The problem was that question. We can say the same thing about this piece of code. What you see is the answer but there is nothing mentioned about the intention behind it. When browsing through the classes you'll only find a method called UpdateCustomerState and nothing about blocking customers.

So what do we do about it? We write the code as stated by the intent behind the functionality. The developer was told that a customer needed to be able to be blocked. From that we can determine that a customer is some entity and it needs a behavior which is block. The implementation would look something like this:
public void BlockCustomer(int customerID)
    {
        var customer = getCustomer(customerID);
        customer.Block();
    }
The first example is the typical: Classes with mostly properties in addition to a set of manager/handler/service classes manipulating these properties to achieve desired behavior. The second example keeps the customers state hidden within the customer and only exposes it's behaviors.

Write functionality with the help of technology!

torsdag 29. april 2010

Code readability

I had an interresting discussion with a co-worker the other day about code documentation and formatting. Lately I have moved away from my previous traditional views on the matter. Earlier my main focus would have been about consistency and similarity. Things like having a header on each class, property and function explaining what it is and what it does is. In addition things like using regions within files and how to order functions, properties, private functions, events and so on within a class.

All this is done in the name of readeability right? At least we think we do. We have become so good at managing these things that we completely forget to ask ourselfe why the class has grown so obeese that we need a set of rules to navigate within it. Mybe we create these rules to enable us to write crap code and still feel good about it? I am not really convinced that setting up a huge ruleset in tools like StyleCop makes your code more readable. Of course it would make it readeable in the sense that if someone gave me a handwritten letter and then handed me the same letter written on a computer both in some language I don't understand.. I would probably be able to read the words letter by letter from the computer version more easily. Does it really matter? It's not like I would understand any of it anyway. Same thing when looking at code. Crap code doesn't get any better just because it's all formatted the same way.

The same goes for code comments. Does the function GetCustomerByName need a comment? If it needs a comment does that mean that the function does more than retrieving a customer by it's name? Maybe the name really is GetCustomerByNameOrInSomeCasesCreateOrder. If so this code doesn't need comments it needs some hard refactoring. Difficulties when choosing a good name for your function is usually a code smell.

My point here is that your code should do the talking. It should express it's true intention. Lets look at two peices of code. The first one being a brute force implementation and the other being a bit more rafined.

Implementation 1


Implementation 2

So what's the difference between the two? Basically the second example has split the functionality up into well named classes, functions and properties.  In my opinion this is not a problem unless youre writing performance critical low level stuff. Specially not when using disk or network resources.


What I am trying to say is focus on making the code readable before you focus on getting it well formatted. And most likely, when you get the code readable the need for standard formatting would be way lower.

mandag 8. mars 2010

Persistence ignorance

This post is very much related to my post about relational databases and OO design. Specially the domain model part. A problem with the way data access was presented in the last post was that it was drawn as a layer between the domain model and the database. Of course when writing applications we need a layer between the database and whatever using it. The thing is that the domain model needs more than simple CRUD. Multiple CRUD operations has to be batched and handled in transactions. This is when the concept of persistence ignorance makes it's grand entrance. We need a way to cleanly handle storing/persisting our entities within transactions so that we can ensure that processes executes successfully or rolls back all changes.

I talk about entities here and of course you can use dataset's and other types of data carriers. Since I mainly work in C# which is an object oriented language my preference is to work with POCO's.

First off we need a way to persist that entity somewhere. The reason why I use the word persist instead of storing to the database is that location or type of storage is not relevant to the domain model. The only thing the domain model needs to know is that it's entity is persisted somewhere so that it can get to it later.

How do we create this magical peice of code that will handle all persistance and transactions for us? Well, we don't! Unless you feel the need to reinvent the wheel. Most ORM frameworks implements some kind of persistance ignorance, some container that can keep track of changes made to your entities and commit to the storage or roll back. There are some great frameworks out there that you can use. My personal favorite being NHibernate.

That being said you can make a mess with ORM's too. Some people talk about creating an application with Entity Framework or NHibernate. This is usualy a sign that the source code is full of ORM queries and connection/transaction handling. Again these are issues the domain model shouldn't have to deal with. It should focus on cleanly implementing it's spcified functionality. Not deal with these kind of technical details.

Let's take a minute too look at transactions. Transactions live within what we call a transaction scope. A transaction scope starts when you start the transaction and ends when you commit or roll back. So what would be included in a transactional scope? Let's say we're writing some code that updates some information on a customer and on the customers address which is stored in a separate table. Would we want both those updates within a transaction scope? Indeed! Then what about that other function that does various updates and then calls the contact and address update function? Shouldn't we have a transaction scope wrapping all of that too? Well of course so lets add some transaction handling to this function too and make sure we support nested transactions for the customer and address function. And with that the whole thing started giving off an unpleasant smell.. We have just started cluttering our code with transactions left, right and center. Now what?

Let's take a look at the model again. We can visualize the domain model as a bounded context. It has it's core and outer boundaries.Through it's boundaries it talks to other bounded contexts (UI, Other services, Database...). Take the UI. The UI would call some method on the domain model's facade and set of the domain model to do something clever. My point being that the domain model never ever goes of doing something all of a sudden. Something does a request or triggers an event. Something outside it's boundries always requests or triggers it to do something. These requests and triggers are perfect transaction scopes. They are units of work. These units of work knows exactly what needs to exist within transactional scope.

Unit Of Work is an implementation pattern for persistance ignorance. We can use this pattern to handle persistence and transactions. Let's say that every process or event triggered in the domain models boundry is a unit of work. This unit of work can be represented by an IUnitOfWork interface. To obtain a IUnitOfWork instance we use a IWorkFactory. By doing this we end up with a transaction handler which we have access to from the second our domain code is invoked until the call completes. How would a class like this look? Well we need some way to notify it about entities we want it to handle. Let's call the method Attach and give it a parameter entity. Now we can pass every entity object we want to persist to the Attach method of the IUnitOfWork. We also need a way to remove entities from storage. We'll create a Delete method for that. If the current unit of work succeeds we need a way to let it know that all is good and then go ahead and complete the transaction. Let's call this method Commit. This gives us a simple interface for handling persistence.

   IUnitOfWork
      T Attach<T>(T entity)
      void Delete<T>(T entity)
      void Commit()

The code using it would look something like this.

   using (IUnitOfWork work = _workFactory.Start())
   {
      MyEntity entity = new MyEntity();
      work.Attach<MyEntity>(entity);
      work.Commit();
   }

Since we are using something like for instance NHibernate in the background we would have to retrieve entities from storage through NHibernate and then attach them to IUnitOfWork. IUnitOfWork of course uses NHibernate in the background for all persistence. Because of the nature of ORM's like NHibernate it would make more sense to include entity retrieval through the IUnitOfWork too since every entity retrieved is automatically change tracked by the NHibernate session. That would also let us abstract NHibernate better from our domain model. Lets add a few functions to the IUnitOfWork interface to accomplish this. We would need a GetList function to be able to return a list of entities and maybe a GetSingle function to return a single entity. Get single would have to be able to retrieve through identity to take advantage of caching within  the ORM framework and also be able to pass queries where if using NHibernate we could use IDetachedCriteria. If you want complete abstraction you can make your own query builder which can convert to NHibernate queries internally. Now the IUnitOfWork interface would look something like this:

   IUnitOfWork
      T GetSingle<T>(object entity)
      T GetSingle<T>(IDetcahedCriteria criteria)
      IList GetList<T>(IDetcahedCriteria criteria)
      T Attach<T>(T entity)
      void Delete(object entity)
      void Commit()

To obtain the active instance of IUnitOfWork from anywhere in the code we can create a WorkRepository class. We'll just have the IWorkFactory register the unit of work with the WorkRepository using thread id as key. Doing that would enable us to issue the following command in whatever class we want to use the unit of work:

  public void SomeFunction()
  {
    ...
    var unitOfWork = WorkFactory.GetCurrent();
    var customer = unitOfWork.GetSingle<Customer>(customerID);
    ...
  }

How is that for a database layer? This is most likely all you need. Smack repository pattern on top of that and  your code will be pure cleanly written domain functionality. Now go ahead and solve real world problems ignoring all the complexity that comes with persistance, transactions and retrieving entities.

lørdag 27. februar 2010

DRY and reuse pitfalls

Don't get me wrong here. I'm all about keeping my code reusable and DRY (Don't repeat yourself). What I want to pinpoint in this post are common pitfalls when reusing code. More the thought behind the decisions than the principle itself.

First let's talk about our overall mindset when writing code. When developing applications we spend time researching and planning for the functionality before we start implementing. The solution is constantly evolving in our heads and being discussed among the projects team members. This thought process will continue throughout planning and implementation. Because of human nature we'll always look forward into upcoming needs like "Maybe we have to support X in the future? I'd better prepare for it now." or "The function I just wrote could support scenario Y if I just make these changes. We'll probably need it in the future so I'd better do it now". We're making compromises on our existing code based on assumptions. In my mind this is not reuse it's code pollution. Reuse is something that happens when you have two implementations doing the exact same thing. DRY is not planning for the future. DRY is reusing functionality in your existing codebase.
For this scenario a good solution would be writing your code based using the SOLID principles. In that way you'd know that your code would be able to evolve with the uncertainties of the future

Another thing I come over quite often is SRP (Single Responsibility Principle) violations as a result of code reuse. Let's take the example where our application has a LogWriter handling writing to the error log. The class looks like this:

class LogWriter
{
    private const string LOG_ENTRY_START = "*************************";
    private string _filename;
    
    public LogHandler(string filename)
    {
        _filename = filename;
    }
    
    public void WriteLogEntry(string message, string stackTrace)
    {
        using (var writer = new StreamWriter(_filename, true))
        {
            writer.WriteLine(LOG_ENTRY_START);
            writer.WriteLine(message);
            writer.WriteLine(stackTrace);
        }
    }
}

Time goes and for some reason a need arises to also be able to support writing log entries to a database. Someone gets the clever idea to create and overload to the WriteLogEntry method that takes an extra boolean writeToDatabase parameter. Cramming two separate behaviors into a single class or function is not reusing code. It might feel like code reuse since you can use the same class for writing to both logs. The painful reality is that this is considered code rot, not code reuse.
Again this is something that is better of solved through following  the SOLID principles. If everything was depending on an abstraction of the LogWriter such as an ILogWriter interface we could easily extend our solution with a new DatabaseLogWriter implementing ILogWriter.

The last subject I want to mention here is cross boundary reuse. This is a topic I have touched in an earlier post about OO design and relational databases. The .NET community is jumping straight into using ORM these days. Which I think is fantastic! Whether it's NHibernate, LLBLGen or entity framework we're using entities now not datasets. I will use entities as an example on cross boundary reuse pitfalls This leads me back to my previous post where I argue that the Domain Model/Business Logic and the UI serves two very different needs. Let's say we decide to create a Customer entity in the Domain Model that we also pass off the Domain Models boundaries up to the UI. We probably end up having to clutter our entity with loads of information needed exclusively by the UI. In the UI there's needs like showing addresses, customer activities and various readable information. This is a lot like the LogWriter example only on a higher level. This time we violate SRP to be able to reuse an entity cross boundary. Again this does not lead to greater code reuse but to greater code rot.
In this case I would strongly recommend using DTO's for transferring information cross boundaries. These DTO's can be created in a way that they perfectly fit the needs of the one aimed to consume them.

fredag 22. januar 2010

Solution structuring and TDD in Visual Studio

I'm currently working on solution structuring for a large system using TDD. To be able to work efficiently within the project we have defined a set of solution types with different purposes. This is the setup we ended up with:

Workbench
TDD requires that the solutions you spend most of your time in builds as fast as possible. To achieve this you have to keep the project number within the solution to a minimum. If you cannot get arround having multiple dependencies for every project, binary references would be the way to go. We have choosen a more decoupled approach where every project depends upon abstractions and interfaces are wired to implementations through a DI container. Contracts and interfaces are separated out into contract projects. This keeps project references down to just referencing interface/contracts projects. The build output from this solution would not be able to run since the contract implementations are not referenced here. The solution contains all environment independent unit and integration tests for this workbench's projects. Because of practicing TDD it's not important that the solution is able to run but that the tests are able to pass. A workbench exists for every bounded context and standalone code library.

Test rig
Even though most of our time is spent writing tests and code in the workbench solutions we some times have the need to debug a running system. These solutions contain all code needed to run parts of the system like hosting services or even complete running systems. The test rig solutions are usually quite large and takes time to build but, theire there for us to ocasionally test the system locally.

Continous Integration solution
This solution contains all projects and  all environment independent unit and integration test for the complete system. This solution is part of the continous integration build performed on check in. Naturally CI runs all tests on every build.

System test solution
We need a solution containing environment dependent tests requiring things like database access or a running system.The tests within this solution should run on every deployment to the test environment to make sure the system wiring is intact.

As for deployment both test rigs and continous integration solutions contains enough of the system to be able to perform deployment.

It would really be interresting to see how other people are working on similar projects.

lørdag 16. januar 2010

Object Oriented design and Relational Databases

For as long as I have worked with object oriented languages there has always been a bit awquard working with relational databases. I have grown up in the Microsoft world with visual foxpro/vb/.net/access/ms sql server and then used ado, ado.net and now ORM. Seeing Udi Dahan's talk on"Command Query Responsibility Segregation" and reading Eric Evans book "Domain Driven Design" made me connect some dots leading me to write this post.

So why do we use relational databases? If the only purpose of the database is to persist the domain models entities we would have used an object oriented database right? No transformation between tables and objects would be needed. Ok, given this scenario we're sitting here with our clean entity objects formed in a way that perfectly satisfies the needs for rule validation and process execution performed by the domain model. Brilliant, just the way we like it! Enter the UI. Now this is where it gets ugly. The user requires information to be presented in a way that is humanly readeable. The domain model is perfectly happy with knowing that the customer entity with id 1432 links to the address entity with id 65423. To the person using the application that would be a useless peice of information. The structure of the information needed by the user is often very different to the entities needed by the domain model. Specially when the user needs some kind of grouping of information or statistics. These type of complex queries wants to gather information spaning multiple entites joining them in ways unnatural to the domain model. This is where the relational database comes and performs it's magic. With a relational database we can easily perform complex queries joining multiple tables and tweaking information to fit our needs.


Above is the traditional way of looking at layered architecture. I find this way of viewing layered architecture a bit deceiving. So what about the issue described above with the UI geting in the mix. How do we often solve this problem? Well sadly the domain model often gets to pay the price. Our clean entities are stretched and pulled and lumps of information are attached to them so that we can pass them on to the UI. These sins are committed in the name of Layered Architecture though Layered Architecture is not to blame. It's just easy to interpret the picture above that way. Wether being datasets or object entities relations and bulks of information are added, complicating both the UI and the domain. We end up having to make compromizes constantly because the entity no longer fits either the domain model or the UI in a good way.
There must be a better way! Well there is. We have already pin pointed two separate needs here. The domain model needs a database to persist it's entities to and the user needs to use the persisted information in a way that makes sence to him/her. Let's create two rules.
  1. Neither the domain model nor the UI should ever be aware of the complex structure of the database.
  2. The domain model should never be aware of the complexity of it's clients (UI in this example).
Ok, that solves two problems. The domain model's entities will no longer be compromized by it's consumers since their complexity can no longer affect it. They will also be unaware of any database complexity coming from the database because of it being fitted to coping with multiple needs. Our data abstraction layer using orm or any other data access provider will make sure of that.
Great now we have a clean, readeable and maintainable domain model again. So where does the UI retrieve it's information from then? From the database of course. And to hide the database complexity we can use a view or a stored procedure that returns the information the UI needs formatted excatcly as it needs it to be. How cool is that. We just took advantage of the power of the relational database which now hides it's complexity from it's users.

The UI now bypasses the domain model completely when retrieving it's information. This means that the way the UI makes changes to the database has changed. Earlier the UI was provided with the domain model's entities which it modified and sent back to the domain model for persisting. That is no longer possible since the domain model doesn't share it's entities. What we would want to do now is to build an abstraction between the domain model and the UI. Call it an abstraction layer or service layer. Naming is not important right now. The UI now needs to be able to persist information and execute processes through this abstraction. We need some defined messages that the UI can send to the abstraction. For example the SaveAddress operation in the abstraction needs to be able to take an AddressMessage containing address information. The abstraction then needs to use the message to persist it's information using the domain and it's entities. We then end up with a design where information flows like on the sketch below.


When creating services and abstractions it's important to think about responsibility. For instance the consumer of the service should be responsible for it's interfaces and messages while the service host should be responsible for the implementation of these interfaces. Let's look at the service layer between the UI and the model. The UI would define how the service methods and messages should look and the domain model would implement the service interface. Again for the database access framework the UI and the domain model would be responsible for defining the interface while the data abstraction component/layer would implement this interface.

To conclude this post the main takeaways are that a design like this should be viewed as three separate parts: UI, Domain Model and Data store. The implementations should respect this and make sure that each part focus on the problem it's trying to solve.
  • Domain Model - Handle the logic, rules and processes the application is supposed to handle through it's specification. This is the heart of the application.
  • UI - Make sure that the user is able to work with the applications functionality in a way suited for the human mind.
  • Relational Database - Handles persisting the Domain Model's entities and provides the UI with human readable information.

fredag 15. januar 2010

Please stop the madness

Looking at Microsoft’s approach to frameworks and libraries lately gives me the creeps. In frameworks like entity framework, workflow foundation and such Microsoft heavily relies on graphical tools and generated code. My three main objections to this way of developing are 1. It complicates the way of working 2. It complicates maintenance 3. It gives of the wrong signals to developers.

The whys:

1. It complicates the way of working
As developers what is our main skill? Writing code right? And of course with experience we have learned how to read code and through reading we learn how to write cleaner more readable code. Now suddenly we have to relate to the code we write, the designer UI and the code generated by the designer. In addition to that the code generated by the designer are often a messy blob of complex code. By using these tools we have complicated what should have been clean readable code.
Another thing is writing tests for code using generated code. This usually ends up being a nightmare.

2. It complicates maintenance
What happens when requirements change? Well you have to have the designer regenerate the code don't you? Something that you could have done through refactoring tools you now have to do through the provided UI. Also you risk ending up in a scenario when the framework comes in a new and fresh version where upgrade issues corrupts the generated code. Ok that one was a bit unfair but I'll still consider it an issue.

3. It gives of the wrong signals to developers
This is probably my biggest issue with the concept. The issues solved by these designers the way I see it is: hiding a complex framework or making a non developer friendly framework. First off, hiding a complex framework is treating the symptoms of bad design. I would much rather see them putting their effort into writing a high quality useable API for it. If the reason is that writing code for it is too much to ask from the developer that's just sad. As stated earlier one of the greatest skills a developer has is writing code. As for 'non developers' we're talking about development frameworks not applications like the office suite which rightfully contains UI designers and code generator tools.

I just needed to get this out of my system :p I guess my plead to Microsoft is please stop the madness and get back to writing good clean framework API's that developers can write high quality applications with. The .NET core proves that you know how to.