A lot of interesting lessons have been learned during the development of Dogen and I'm rather afraid many more are still in store. As it is typical with agile, I'm constantly reviewing processes in search of improvements. One such idea was that putting pen to paper could help improving the retrospective process itself. The result is this rather long blog post, which hopefully is of use to developers in similar circumstances. Unlike the typical bullet-point based retrospective, this post it is a rambling narrative as it aims to provide context to the reader. Subsequent retrospectives will be a lot smaller and more to the point.
Talking about context: I haven't spoken very much about Dogen in this blog, so a small introduction is in order. Dogen is an attempt to create a domain model generator. The manual goes into quite a bit more detail, but for the purposes of this exercise, it suffices to think of it as a C++ code generator. Dogen has been developed continuously since 2012 - with a few dry spells - and reached its fiftieth sprint recently. Having said that, our road to a finished product is still a long one.
The remainder of this article looks at what what has worked and what has not worked so well thus far into Dogen's development history.
Understanding Time
Dogen was conceived when we were trying to do our first start up. Once that ended - around the back end of 2012 - I kept working on the tool in my spare time, and this was a setup that has continued ever since. There are no other contributors; development just keeps chugging along, slowly but steadily, with no pressures other than to enjoy the sights.
Working on my own and in my spare time meant that I had two conflicting requirements: very little development resources and very ambitious ideas that required lots of work. With family commitments and a full time job, I quickly found out that there weren't a lot of spare cycles left. In fact, after some analysis, I realised I was in a conundrum. Whilst there is was a lot of "dead-time" in the average week, it was mostly "low-quality grade time": lots of discontinued segments of varying and unpredictable lengths. Summed together in a naive way it seemed like a lot, but - as every programmer knows - six blocks of ten minutes do not one solid hour make.
Nevertheless, one has to play the game with the cards that were dealt. I soon realised that the correct question to ask was: "what kind of development style makes one productive under these conditions?". The answer turned out to be opportunistic coding. This is rooted in having a better understanding of the different "qualities" of time and how best to exploit them. For example, when you have say five to fifteen minutes available, it makes sense to do small updates to the manual or fix trivial problems - a typo in the documentation, renaming variables in a function, mopping up the backlog and other activities of that ilk. A solid block of forty minutes to an hour affords you more: for instance, implementing part or the whole of stories for which the analysis has been completed, or doing some analysis for existing stories. On those rare cases where half-a-day or longer is available, one must make the most of it and take on a complex piece of work that requires sustained concentration. This sessions proved to be most valuable when the output is a set of well defined stories that are ready for implementation.
One needs very good processes in order to be able to manage the usage of time in this fashion. Luckily, agile provides it.
Slow Motion Agile
Looking back on ~2.4k commits, one of the major wins in terms of development process was to think incrementally. Of course, agile already gives you a mental framework for that, and we had a functioning scrum process during our start up days: daily stand-ups, bi-weekly sprints, pre-sprint planning, post-sprint reviews, demos and all of that good stuff. It worked really well, and keep us honest and clean. We used a very simple org-mode file to keep track of all the open stories, and at one point we even built a simple burn-down chart generator to allow us to measure velocity.
Granted, when you are working alone in your spare time, a chunk of agile may not make sense; for instance, providing status updates to yourself may not be the most productive use of scarce time. Surprisingly, I found quite a bit of process to be vital. I've kept the bi-weekly sprint cycle, the sprint logs, the product backlog and the time-tracking we had originally setup and found them extremely useful - quite possibly the thing that has kept me going for such an extended period of time, to be brutally honest. When you are working on an open source project it is very easy to get lost in its open-ended-ness and find yourself giving up, particularly if you are not getting (or expecting) any user feedback. Even Linus himself has said many times he would have given up the kernel if it wasn't for other people bringing him problems to keep him interested.
Lacking Linus' ability to attract crowds of interested developers, I went for the next best thing: I made them up. Well, at least in metaphorical way, I guess, as this is what user stories are when you have no external users to drive them. As I am using the product in anger, I find it very easy to put myself in the head of a user and come up with requirements that push development forward. These stories really help, because they transform the cloud of possibilities into concrete, simple, measurable deliverables that one can choose to deliver or not. Once you have a set of stories, you have no excuse to be lazy because you can visualise in your head just how much effort it would require you to implement a story - and hey, since nerds are terrible at estimating, it's never that much effort at all. As everyone knows, it's not quite that easy in the end; but once you've started, you get the feeling you have to at least finish the task at hand, and so on, one story at a time, one sprint at a time, until a body of work starts building up. It's slow, excruciatingly slow, but it's steady like water working in geological time; when you look back 5 sprints, you cannot help but be amazed on how much can be achieved in such a incremental way - and how much is still left.
And then you get hooked into measurements. I now love measuring everything, from how long it takes me to complete a story, to where time goes in an sprint, to how many commits I do a day, to, well, everything that can easily be measured without adding any overhead. There is no incentive for you to game the system - hell, you could create a script that commits 20 times a day, if the commit count is all you care about. But it's not, so why bother. Due to this, statistics start to actually tell you valuable information about the world and to impel you forward. For instance, GitHub streaks mean that I always try to at least make one commit per day. Because of this, even on days when I'm tired, I always force my self to do something and sometimes that quick commit morphs into an hour or two of work that wouldn't have happened otherwise.
As I mentioned before, it was revealing to find out that there are different types of time. In order to to take advantage of this heterogeneity, one must make scrupulous use of the product backlog. This has proven invaluable, as you can attest by its current size. Whether we are part way through a story or just idly daydreaming, each and every idea must be added to the product backlog, with sufficient detail to allow one to reconstruct one's train of thought at that point in time. Once in the backlog, items can be continuously refined until eventually we find a suitable sprint to tackle them or they get deprecated altogether. But without an healthy backlog it is not possible to make the most these illusive time slots. Conversely, it is important to try to make each story as small and as focused as possible, and to minimise spikes unless they really are on the critical path of the story. This is mainly for psychological reasons: one needs to mark stories as complete, to feel like work has been done. Never-ending stories are just bad for morale.
In general, this extreme incrementalism has served us well. Not all is positive though. The worst problem has been a great difficulty in tackling complex problems - those that require several hours just to load them into your head. These are unavoidable in any sufficiently large code base. Having lots of discontinued segments of unpredictable duration have reduced efficiency considerably. In particular, I notice I have spent a lot more time lost in conceptual circles, and I've taken a lot longer to explore alternatives when compared to working full time.
DVCS to the Core
We had already started to use git during the start-up days, and it had proved to be a major win at the time. After all, one never quite knows where one will be coding from, and whether internet access is available or not, so it's important to have a self-contained environment. In the end we found out it brought many, many more advantages such as great collaborative flows, good managed web interfaces/hosting providers (GitHub and, to some extent, BitBucket), amazing raw speed even on low-powered machines, and a number of other wins - all covered by lots and lots of posts around the web, so I won't bore you with that.
On the surface it may seem that DVCS is most useful on a multi-developer team. This is not the case. The more discontinued your time is, the more you start appreciating its distributed nature. This is because each "kind" of time has a more suitable device - perhaps a netbook for the train, a desktop at someone's house or even a phone while waiting somewhere. With DVCS you can easily to switch devices and continue exactly where you left off. With GitHub you can even author using the web interface, so a mobile phone suddenly becomes useful for reading and writing.
Another decision that turned out to be a major win is still not the done thing. Ever the trailblazers, we decided to put everything related to the project in version control. And by "everything" I do mean everything: documentation, bug reports, agile process, blog posts, the whole lot. It did seem a bit silly not to use GitHub's Wiki and Issues at the time, but, on hindsight, having everything in one versioned controlled place proved to be a major win:
- searching is never further than a couple of greps away, and it's not sensitive to connectivity;
- all you need is a tiny sliver of connectivity to push or pull, and work can be batched to wait for that moment;
- updates by other people come in as commits and can be easily reviewed as part of the normal push/pull process - not that we got any of late, to be fair;
- changes can easily be diffed;
- history can be checked using the familiar version control interface, which is available wherever you go.
When you have little time, these advantages are life-savers.
The last but very important lesson learned was to commit early and commit often. It's rather obvious in hindsight, really. After all, if you have very small blocks of time to do work, you want to make sure you don't break anything; last thing you need is to spend a week debugging a tricky problem, with no idea of where you're going or how far you still have to travel. So it's important to make your commits very small and very focused such that a bisection would almost immediately reveal a problem - or at least provide you with an obvious rollback strategy. This has proved itself to be invaluable far too many times to count. The gist of this approach it is to split changes in an almost OCD sort of way, to the point that anyone can look at the commit comment and the commit diff and make a judgement as to whether the change was correct or not. To be fair, it's not quite always that straightforward, but that has been the overall aim.
Struggling to stay Continuously Integrated
After the commit comes the build, and the proof is in the pudding, as they say. When it comes to code, that largely means CI; granted, it may not be a very reliable proof, but nevertheless it is the best proof we've got. One of the major wins from the start up days was to setup CI, and to give it as wide a coverage as we could muster. We setup multiple build agents across compilers and platforms, added dynamic analysis, code coverage, packaging and basic sanity tests on those packages.
All of these have proven to be major steps in keeping the show on the road, and once setup, they were normally fairly trivial to maintain. We did have a couple of minor issues with CDash whilst we were running our own server. Eventually we moved over to the hosted CDash server but it has limitations on the number of builds, which meant I had to switch some build agents off. In addition to this, the main other stumbling block is finding the time to do large infrastructural updates to the build agents such as setting up new versions of Boost, new compilers and so on. These are horrendously time consuming across platforms because you never know what issues you are going to hit, and each platform has their own way of doing things.
The biggest lesson we learned here is that CI is vital but software products with no time at all should not waste time managing their own CI. There are just not enough hours in the day. I have been looking into travis to make this process easier in the future. Also, whilst being cross-platform is a very worthy objective, one has to weigh the costs with the benefits. If you have a tiny user base, it may make sense to stick to one platform and continue to do portable coding without "proof"; once users start asking for multiple platforms, it is then worth considering doing the work required to support them.
The packaging story was also a very good one to start off with - after
all, most users will probably rely on those - but it turned out to be
much harder than first thought. We spent quite a bit of time
integrating with the GitHub API, uploading packages into their
downloads section, downloading them from there, testing, and then
renaming them for user consumption. Whilst it lasted, this setup was
very useful. Unfortunately it didn't last very long as GitHub decided
to decommission their downloads section. Since most of the upload and
download code was GitHub specific, we could not readily move over to a
different location. The lesson here was that this sort of
functionality is extremely useful, and it is worth dedicating time to
it, but one should always have a plan B and even a plan C. To make a
long story short, the end result is that we don't have any downloads
available at all - not even a stale ones - nor do we have any sanity
checks on packages we produce; they basically go to /dev/null
.
In summary, all of our pains led us to conclude that one should externalise early, externalise often and externalise everything. If there is a free (or cheap) provider in the cloud that can take on some or all of your infrastructure work away, you should always consider using them first rather than host your own infrastructure. And remember: your time is worth some money, and it is better spent coding. Of course, it is important to ensure that the provider is reliable, has been around for a while and is used by a critical mass. There is nothing worse than spending a lot of effort migrating to a platform, only to find out that it is about to dramatically change its APIs, prices, terms and conditions - or even worse, to be shutdown altogether.
Loosely Coupled
Another very useful lesson I learned was to keep the off-distro dependencies to a minimum. This is rather related to the previous points on CI and cross-platform-ness, really. During the start up days we started off by requiring a C++ compiler with good C++ 11 support, and a Boost library with a few off-tree libraries - mainly Boost.Log. This meant we had to have our own little "chroot" with all of these, and we had to build them by hand, sprinkled with plenty of helper scripts. In those dark days, almost nothing was supplied by the distro and life was painful. It was just about workable when we had time on our hands, but this is really not the sort of thing you want to spend time maintaining if you are working on a project in your spare time.
To be fair, I had always intended to move to distro-supplied packages as soon as they caught up, and when that happened the transition was smooth enough. As things stand, we have a very small off-distro footprint - mainly ODB and EOS. The additional advantage of not having off-distro dependencies is that you can start to consider yourself for inclusion on a distro. Even in these days of Docker, being shipped by a distro is still a good milestone for any open source project, so it's important to aim for it. Once more, it's the old psychological factors.
All and all, it seems to me we took the right decisions as both C++ 11 and Boost.Log have proven quite useful; but in the future I certainly will think very carefully about adding dependencies to off-distro libraries.
Conclusions
In general, the first fifty iterations of Dogen have been very positive. It has been a rather interesting journey, and dealing with pure uncertainty is not always easy - after all, one always wants to reach a destination. At the same time, much has been learned in the process, and a setup has been created that is sustainable given the available resources. In the near future I intend to improve the visibility of the project as I believe that, for all it's faults, it is still useful in its current form.
No comments:
Post a Comment