Tuesday, May 26, 2015

Nerd Food: A Prelude of Things to Come

Nerd Food: A Prelude of Things to Come

This sprint I found myself making one of those historical transitions: moving my entire Emacs infrastructure from a old, creaking at the seams approach, to the new all-singing-all-dancing way of doing things. This post documents the start of this transition.

The Road to Cunene

I have been using Emacs since around 1998. One of the biggest reasons to use the great old editor is its infinite configurability. Really, to call Emacs "configurable" is rather like saying that Euler wasn't bad with numbers. In truth - and it takes you a while to really grok this - Emacs is just a lisp platform with a giant editing library built on top; a library that keeps on getting extended on a daily basis by a large number of Emacs users. And, of course, you configure Emacs using lisp, so that the lines between "configuration" and "development" are, at best, blurry.

But lets go back to the beginning. Like every other Emacs newbie in those days, I too started with a plain (i.e. non-configured) Emacs and soon evolved to a very simple .emacs - this file being one of the possible places in which to store its configuration. The reason why almost all Emacs users start configuring Emacs very early on is because its defaults are astonishingly atrocious. It still amazes me to the day that some people are able to use plain Emacs and come out at the other end as Emacs users. In some ways, I guess it is a test of fire: do you really want to use Emacs? There are two responses to this test: most give up, but a few persist and soon start changing the editor to behave in a slightly saner manner.

The .emacs starts small, especially if you are not familiar with lisp. Sooner or later it occurs to you that, surely, someone must have already done one of these before, and then you find the amazing world of .emacs "development". This opens up entire new vistas of the Emacs landscape, because with each .emacs you find, you discover untold numbers of configuration knobs and - much more importantly - many new modes to install. In Emacs lingo, a mode is kind of like a "plug-in" for Eclipse or Visual Studio users. But this is just an approximation; as with everything "Emacs", there is actually no real equivalent way of describing Emacs terminology with analogies outside of Emacs. The problem with IDEs and most other editors is that they can only be extended in ways that their designers thought useful. In Emacs, everything is extensible. And I do mean everything. I remember the day I realised that a key press was really just the invocation of the self-insert-command function and, like any other function, it too could be changed in a myriad of ways.

But I digress. As with most users, my .emacs evolved over the years as I found more and more modes. I soon found that it was very painful to keep all my machines with the same setup; invariably I would change something at work but forget to change it at home or Uni or vice-versa. To make matters worse, some machines were on Windows. And in those days, there was no Emacs package management support, so you ended up copying lots of modes around. Life was painful and brute in my first decade of Emacs.

Around six years ago, things got a lot better: I started to use git in anger, refactored my .emacs into something slightly saner and called it Cunene - after the river in Southern Angola. Eventually I put it on GitHub. I believe - but don't recall exactly - that most of the refactoring ideas were stolen from Phil Hagelberg's Starter Kit and Alex Ott's .emacs.

Whatever the source of ideas, the improvements were undeniable. Cunene offered a all-in-one place to go to for my .emacs, and it combined all the experience I had in seeing other people's .emacs. At over twenty megs it wasn't exactly svelte, but my objective was to have a "zero-conf" setup; given a new machine, all I wanted to do was to git clone cunene, start Emacs and have exactly the same environment as everywhere else.

Further, I could update Cunene from any machine and push it back to GitHub. Cunene contained all the modes I needed, all byte-compiled, and all at trusted versions and with some (very minor) patches. I could easily upgrade one or more modes from one machine and then just git pull from all other machines. It also handled any Windows-specific workarounds, ensuring things worked well out of the box there too.

To be fair, for the last 6 years, this setup has served me well, but time also revealed its limitations:

  • package management support was limited. I tried using Elpa but, at the time, not many packages were available. Package management has evolved in leaps and bounds - Melpa, Marmelade, etc - but Cunene was still stuck with the old Elpa support.
  • the accumulation of modes over such a long period meant that starting Emacs took quite a long time. And to make matters worse, only a small percentage of the modes were truly useful.
  • most of the modes were at stale versions. Since things worked for me, I had no incentive to keep up with latest and greatest - and for all the easiness, it was still not exactly trivial to upgrade modes. This meant that I ended up having to put up with bugs that had long been fixed in HEAD, and worse, whenever I upgraded to latest, I saw massive changes in behaviour.
  • I was stuck on Emacs 23. For whatever reason, some parts of Cunene did not work with Emacs 24 properly and I was never able to get to the bottom of this. Being on an old version of Emacs has been a problem because I make use of C++-11 but Emacs 23 doesn't really indent it properly. And of course, Emacs 24 is just improved all around.
  • Cunene had a lot of boilerplate code. Since I never really learnt how to code in Emacs lisp, I was most likely writing a lot of non-idiomatic code. Also, the Emacs API has moved on considerably in fifteen years, so certain things were not being done in the best way possible.
  • Cedet and Org-mode are now part of Emacs but we were still carrying our own copies. I never managed to get Cedet to work properly either.
  • many new modes have appeared of late that provide much better solutions to some of the problems I had, but Cunene insulated me from these developments. In addition, adding new modes would only add to the complexity so I had no incentive to do so.

There had to be a better way of doing things; something that combined the advantages of Cunene but fixed its shortcomings.

Then I heard of Prelude.

The Road to Prelude

According to the official documentation:

Prelude is an Emacs distribution that aims to enhance the default Emacs experience. Prelude alters a lot of the default settings, bundles a plethora of additional packages and adds its own core library to the mix. The final product offers an easy to use Emacs configuration for Emacs newcomers and lots of additional power for Emacs power users.

I am still finding my way around - so don't quote me - but from what I have seen, it seems to me that Prelude is like the Cunene "framework" but done by people that know what they are doing. It covers all of the advantages described above, but shares none of its disadvantages. In particular:

  • it provides a sensible set of baseline defaults that "we all can agree on". I found it quite surprising that a plain Prelude looked almost like Cunene. Of course, no two Emacs users agree on anything, really, so there is still a lot to be tweaked. Having said that, the great thing is you can start by seeing what Prelude says and giving it a good go using it; if the baseline default does not work for you, you can always override it. Just because you have been doing something in a certain way for a long time does not mean its the best way, and the move to Prelude provides an opportunity to reevaluate a lot of "beliefs".
  • all the framework code is now shared by a large number of Emacs users. This means it is well designed and maintained, and all you have to worry about is your small extensibility points. With over 1k forks in GitHub, you can rest assured that Prelude will be around for a long time. In addition, if you find yourself changing something that is useful to the Prelude community, you can always submit a pull request and have that code shared with the community. You no longer have to worry about staleness or non-idiomatic code.
  • Prelude integrates nicely with several package managers and handles updates for you.
  • There are lots of examples of Prelude users - you just need to follow the GitHub forks. It would be nice to have a list of "good examples" though, because at 1K forks its not easy to locate those.
  • If you fork Prelude the right way, you should be able to update from upstream frequently without having too many conflicts. I am still getting my head around this, but the model seems sound at first blush.

But to know if it worked required using it in anger, and that's what will cover in the next few sections.

From Cunene to Prelude

Emacs users are creatures of habit and changing your entire workflow is not something to take lightly. Having said that, I always find that the best way to do it is to just go for it. After all, you can always go back to how you did things before. In addition, I did not want to do a wholesale port of Cunene for two reasons:

  • I didn't want to bring across any bad habits when Prelude was already solving a problem properly.
  • I wanted to get rid of all of the accumulated cruft that was no longer useful.

What follows are my notes on the porting work. This is a snapshot of the work, a few days into it. If there is a reason, I may do further write-ups to cover any new developments.

Initial Setup

Prelude recommends you to create a fork and then add to it your personal configuration. I decided to create a branch in which to store the personal configuration rather than pollute master. This has two advantages:

  • pulling from upstream will always be conflictless;
  • if I do decide to submit a pull request in the future, I can have a clean feature branch off of master that doesn't have any of the personal cruft in it.

As it happens, I later found out that other Prelude users also use this approach such as Daniel Wu, as you can see here. I ended up using Daniel's approach in quite a few cases.

I created my prelude fork in GitHub using the web interface. Once the fork was ready, I moved Cunene out of the way by renaming the existing .emacs.d directory and performed the following setup:

$ curl -L https://github.com/bbatsov/prelude/raw/master/utils/installer.sh -o installer.sh
$ chmod +x installer.sh
$ ./installer.sh -s git@github.com:mcraveiro/prelude.git

This created a Prelude-based ~/.emacs.d, cloned off of my fork. I then setup upstream:

$ cd ~/.emacs.d
$ git remote add upstream git@github.com:bbatsov/prelude.git

This means I can now get latest from upstream by simply doing:

$ git checkout master
$ git pull upstream master
$ git push origin master

I then setup the personal branch:

 $ git branch --track personal origin/personal
 $ git branch
   master
 * personal

For good measure, I also setup personal to be the default branch in GitHub. This hopefully means there is one less configuration step when setting up new machines. Once all of that was done, I got ready to start Emacs 24. The version in Debian Testing at present is 24.4.1 - not quite the latest (24.5 is out) but recent enough for those of us stuck in 23.

The start-up was a bit slow; Prelude downloaded a number of packages, taking perhaps a couple of minutes and eventually was ready. For good measure I closed Emacs and started it again; the restart took a few seconds, which was quite pleasing. I was ready to start exploring Prelude.

The "Editor" Configuration

My first step in configuration was to create a init.el file under .emacs.d/personal and add prelude-personal-editor.el. I decided to follow this naming convention by looking at the Prelude core directory; seems vaguely in keeping. This file will be used for a number of minor tweaks that are not directly related to an obvious major mode (at least from a layman's perspective).

  • Fonts, Colours and Related Cosmetics

    The first thing I found myself tweaking was the default colour theme. Whilst I actually quite like Zenburn, I find I need a black background and my font of choice. After consulting a number of articles such as Emacs Prelude: Background Color and the Emacs Wiki, I decided to go with this approach:

    ;; set the current frame background and font.
    (set-background-color "black")
    (set-frame-font "Inconsolata Bold 16" nil t)
    
    ;; set the font and background for all other frames.
    (add-to-list 'default-frame-alist
                 '(background-color . "black")
                 '(font .  "Inconsolata Bold 16"))
    

    The font works like a charm, but for some reason the colour gets reset during start-up. On the plus side, new frames are setup correctly. I have raised an issue with Prelude: What is the correct way to update the background colour in personal configuration? For now there is nothing for it but to update the colour manually. Since I don't restart Emacs very often this is not an urgent problem.

    One nice touch was that font-lock is already global so there is no need for additional configuration there.

  • Widgets and Related Cosmetics

    Pleasantly, Prelude already excludes a lot of annoying screen artefacts and it comes with mouse wheel support out of the box - which is nice. All and all, a large number of options where already setup the way I like it:

    • no splash screen;
    • no menu-bars or tool-bars;
    • good frame title format with the buffer name;
    • no annoying visible bell;
    • displaying of column and line numbers, as well as size of buffers out of the box;
    • not only search had highlight, but the all shiny Anzu mode is even niftier!
    • no need for hacks like fontify-frame.

    However, Preludes includes scroll-bars and tool-tips - things I do not use since I like to stick to the keyboard. It also didn't have date and time in the mode line; and for good measure, I disabled clever window splitting as I found it a pain in the past. Having said that, I am still not 100% happy with time and date since it consumes a lot of screen real estate. This will be revisited at some point in the context of diminish and other mode line helpers.

    ;; disable scroll bar
    (scroll-bar-mode -1)
    
    ;; disable tool tips
    (when window-system
      (tooltip-mode -1))
    
    ;; time and date
    (setq display-time-24hr-format t)
    (setq display-time-day-and-date t)
    (display-time)
    

    One note on line highlighting. Whilst I quite like this feature in select places such as grep and dired, I am not a fan of using it globally like Prelude does. However, I decided to give it a try and disable it later if it becomes too annoying.

  • Tabs, Spaces, Newlines and Indentation

    In the realm of "spacing", Prelude scores well:

    • no silly adding of new lines when scrolling down, or asking when adding a new line at save;
    • pasting performs indentation automatically (yank indent etc)- default handling of tabs and spaces is fairly sensible - except for the eight spaces for a tab! A few minor things are missing such as untabify-buffer. These may warrant a pull request at some point in the near future.
    • a nice whitespace mode which is not quite the same as I had it in Cunene but seems to be equally as capable so I'll stick to it.
  • To Prompt or Not to Prompt

    There are a few cases where me and Prelude are at odds when it comes to prompts. First, I seem to try to exit Emacs by mistake and I do that a lot. As any heavy Emacs user will tell you, there is nothing more annoying than exiting Emacs by mistake (in fact, when else do you exit Emacs?). I normally have more than 50 buffers open and not only does it take forever to bring up Emacs with that much state, but it never quite comes back up exactly the way I left it. Anyway, suffices to say that I strongly believe in the "are you sure you want to exit Emacs" prompt, so I had that copied over from Cunene. And, of course, one does not like typing "yes" when "y" suffices:

    ;; Make all "yes or no" prompts show "y or n" instead
    (fset 'yes-or-no-p 'y-or-n-p)
    
    ;; confirm exit
    (global-set-key
     (kbd "C-x C-c")
     '(lambda ()
        (interactive)
        (if (y-or-n-p-with-timeout "Do you really want to exit Emacs ?" 4 nil)
            (save-buffers-kill-emacs))))
    

    There is a nice touch in Prelude enabling a few disabled modes such as upper/down casing of regions - or perhaps the powers that be changed that for Emacs 24. Whoever is responsible, its certainly nice not to have to worry about it.

  • Keybindings

    One of the biggest cultural shocks, inevitably, happened with keybindings. I am giving Prelude the benefit of the doubt - even though my muscle memory is not happy at all. The following has proved annoying:

    • Apparently arrow keys are discouraged. Or so I keep hearing in my minibuffer every time I press one. As it happens, the warnings are making me press them less.
    • C-b was my ido key. However, since I should really not be using the arrow keys, I had to get used to using the slightly more standard C-x b.
    • Eassist include/implementation toggling was mapped to M-o and M-i was my quick way of opening includes in semantic (more on that later). However, these bindings don't seem to work any more.
    • pc-select is a bit screwed in some modes such as C++ and Emacs lisp. But that's alright since you shouldn't be using the arrow keys right? What is annoying is that it works ok'ish in Org-mode so I find that I behave differently depending on the mode I'm on.
    • in addition, win-move is using the default shift-arrow keys and its not setup to handle multiple frames. This is a problem as I always have a few frames. These will have to be changed, if nothing else just to preserve my sanity.
    • talking about pc-select, I still find myself pasting with C-v. I just can't help it, its buried too deeply into the muscle memory. But it must be said, it's rather disconcerting to see your screen move up when you press C-v; it makes you think your paste has totally screwed up the buffer, when in reality its just the good old muscle memory biting again.
    • C-x u now doesn't just undo like it used to. On the plus side, undo-tree just rocks! We'll cover it below.
    • C-backspace doesn't just delete the last word, it seems to kill a whole line. Will take some getting used to.

    All and all, after a few days, the muscle memory seems to have adapted well enough. I'm hoping I'll soon be able to use C-b and C-f without thinking, like a real Emacs user.

Modes From Cunene

Unfortunately, package management was not quite as complete as I had hoped and so, yet again, I ended up with a number of modes that had to be copied into git. Fortunately these are a lot less in number. I decided to place them under personal/vendor as I wasn't sure what the main vendor folder was for.

  • Cedet

    After almost losing my mind trying to configure Cedet from Emacs 24, I decided to bite the bullet and upgrade to the latest development version. In the past this was a safe bet; I'm afraid to report it still is the best way to get Cedet up and running. In fact, I got it working within minutes after updating to develop versus a whole day of fighting against the built-in version. Pleasantly, it is now available in git:

    git clone http://git.code.sf.net/p/cedet/git cedet
    

    Building it was a simple matter of calling make, both at the top-level and in contrib:

    $ cd cedet
    $ make EMACS=emacs24
    $ cd contrib
    $ make EMACS=emacs24
    

    The setup was directly copied from their INSTALL document, so I recommend reading that.

    Still in terms of Cedet, a very large win was the move to EDE Compilation Database. I really cannot even begin to do justice to the joys of this mode - it is truly wonderful. I did the tiniest of changes to my build process by defining an extra macro:

    cmake ../../../dogen -G Ninja -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE
    

    With just that - and a couple of lisp incantations (see the cedet init file) - and suddenly I stopped having to worry about supplying flags to flymake (well, flycheck - but that's another story), semantic, the whole shebang. I haven't quite worked out all of the details just yet, but with very little configuration the compilation database seems to just get everything working magically.

    Because of this, I am now finding myself using Cedet a lot more; the intelisense seems to just work on the majority of cases. The only snag is the annoyance of old: having Emacs block on occasion whilst it builds some semantic database or other. It doesn't happen often but its still a pain when it does. Which gave me the idea of replacing it with a Clang based "semantic database generator". Lets see what the Cedet mailinglist says about it.

    All and all, Cedet is much improved from the olden days; so much so I feel it warrants a proper review after a few months of using it in anger. In fact, I feel so brave I may even setup emacs-refactor or semantic-refactor. It is also high-time to revisit C/C++ Development Environment for Emacs and pick up some new tips.

  • Git-emacs

    Git-emacs makes me a bit sad. In truth, I am a perfectly content magit user (more on that later) except for one feature - the file status "dot". This is something I got used from the svn days and still find it quite useful. Its silly really, especially in these days of git-gutter, but I still like to know if there have been any changes to a file or not, and I haven't found a good way of doing this outside of git-emacs. It provides a nice little red or green dot in the modeline, like so:

    git-emacs.png

    Figure 1: Git-emacs state modeline

    However, there are no packaged versions of git-emacs and since everyone uses magit these days, I can't see it making to Elpa. Also, it is rather annoying having to load the whole of git-emacs for a dot, but there you go.

  • Doxymacs

    Very much in the same vein as git-emacs, doxymacs is also one of those more historical modes that seem a bit unmaintained. And very much like git-emacs, I only use it for the tiniest of reasons: it syntax-highlights my doxygen comments. I know, I know. On the plus side, it seems to do a whole load of other stuff - I just never quite seem to need any other feature besides the nice syntax highlighting of comments.

Modes From Prelude or Emacs 24

In this section we cover modes that are either new/updated for Emacs 24 or available from Prelude via Elpa.

  • Dired

    Dired is configured in a fairly sensible manner out of the box. For example, one no longer has the annoying prompts when deleting/copying directories with files - it never occurred to me you could configure that away for some reason.

    On the down side, it is not configured with dired-single, so the usual proliferation of dired buffers still occurs. I have decided not to setup dired-single for a few days and see how bad it gets.

    The other, much more annoying problem was that hidden files are displayed by default. I first tried solving this problem with dired-omit as per this page:

    (setq-default dired-omit-mode t)
    (setq-default dired-omit-files "^\\.?#\\|^\\.$\\|^\\.\\.$\\|^\\.")
    

    However, I found that omit with regexes is not that performant. So I ended up going back to the old setup of ls flags:

    (setq dired-listing-switches "-l")
    
  • Undo-tree and browse-kill-ring

    As mentioned before, C-x u is not just undo, it's undo-tree! Somehow I had missed this mode altogether up til now. Its pretty nifty, as it allows you to navigate the undo-tree - including forks. It is quite cool.

    I also found that the latest version of browse-kill-ring is very nice; so much so that I find myself using it a lot more now. The management of the clipboard will never be the same.

  • Org-mode

    One rather annoying thing was that with the latest Org-mode, the clock-table is a bit broken. I quickly found out I wasn't the only one to notice: Is it possible to remove ' ' from clock report but preserve indentation?

    This link implies the problem is fixed in Emacs 24.4, but I am running it and sadly it doesn't seem to be the case. I also found out that the automatic resizing of clock tables is no longer… well, automatic. Instead, we now have to supply the size. My final setup for the clock-table is as follows:

    #+begin: clocktable :maxlevel 3 :scope subtree :indent nil :emphasize nil :scope file :narrow 75
    

    This seems to generate a table that is largely like the ones we had prior to upgrading.

    Other than that, Org-mode has behaved - but then again, I'm not exactly a poweruser.

  • Bongo

    I use the amazing Bongo media player to play the few internet radio stations I listen to - mainly SomaFM, to be honest. Its good to see it in Melpa. It's still not quite as straightforward as you'd like to save a playlist - I always find that loading the buffer itself does not trigger bongo mode for some reason - but other than that, it works fine.

    On the downside, I use the venerable mpg123 to play random albums and that hasn't made it to Melpa yet. I've decided to try to use Bongo for this use case too, but if that doesn't work out then I'll have to add it to vendor…

  • Shell

    Prelude comes with eshell configured by default. I must confess I have always been a bash user - simple and easy. I'll persevere with eshell for a couple of days, but I can already see that this may be a bridge too far.

  • Flycheck

    One of the main reasons that made me consider moving to prelude was Flymake. I added it to cunene fairly early on, some 6 years ago, and I was amazed at how I had managed to use Emacs for over a decade without using Flymake. However, after a good 6 years of intensive usage, I can attest that Flymake is showing its age. The main problem is how it locks up Emacs whilst updating. If you combine that with the insane errors one gets in C++, all you need is an angle-bracket out of place and your coding flow is disrupted for potentially several minutes. To be fair, this happens very infrequently, but its still a major nuisance. So I was keen to explore Flycheck.

    All I can say is: wow! The same feeling of amazement I felt for Flymake when I first used has been repeated with Flycheck. Not only its blazingly fast, it supports multiple checkers and the errors buffer is a dream to work with. And with the Compilation Database integration it means there is no configuration required. I can't believe I survived this long without Flycheck!

  • Magit

    One of my favourite modes in Emacs - at least of the new generation of modes - is Magit. So much so that I find that I rarely use git from anywhere else, it's just so easy to do it from Magit. Which makes me extremely sensitive to any changes to Magit's interface.

    The version in Prelude - presumably from Melpa - is a tad different from the legacy one I was using in Cunene. On the plus side, most of the changes are improvements such as having a "running history" in the git process buffer, with font-lock support. The main Magit buffer also looks very nice, with lots of little usability touches. A tiny few changes did result in slow-downs of my workflow, such as a sub-menu on commit. Its not ideal but presumably one will get used to it.

    The only negative change seems to be that Magit is not quite as responsive as it used to be. Hard to put a finger yet, but I was used to having pretty much zero wait time on all operations in Magit, and yet now it seems that a few things are no longer instantaneous. It will require some more analysis to properly point the finger, but its a general feel.

Conclusions

It's still early days, but the move to Emacs 24 and Prelude is already paying off. The transition has not been entirely straightforward, and it certainly has slowed things down for the moment - if not for anything else, just due to the keybinding changes! But one can already see that this is the future for most Emacs users, particularly those that are not power-users like myself but just like the editor.

The future is certainly bright for Emacs. And we haven't yet started covering the latest and greatest modes such as smart-mode-line. But that's a story for another blog post.

Created: 2015-05-27 Wed 00:18

Emacs 24.4.1 (Org mode 8.2.10)

Validate

Thursday, September 25, 2014

Nerd Food: Start-ups at the Gate: Trends in the Technology Industry

Nerd Food: Start-ups at the Gate: Trends in the Technology Industry

It is very difficult to convey the vast scale at which the largest Internet companies operate. To make matters worse, we are fast becoming immune to statistics such as one billion users and five trillion searches per day, surrounded as we are by a sea of large numbers on a daily basis. Having said that, any Information Technology (IT) professional worth his or her salt cannot help but feel in awe at what has been achieved. It is not just that these platforms are big; they work at a scale that is qualitatively different from anything that has come before. The sort of things that are possible at this scale are mind-boggling, and we have only begun to scratch the surface1.

Perhaps even more revolutionary is the fact that these companies have made it possible for anyone to start thinking about data in the same way as they do, and to start handling it using the very same tools they use. There is now a never-ending archive of the very best large-scalability tools, all available for free, with code that anyone can inspect, modify and optimise to meet their specific requirements. The tools come with a wealth of practical documentation on how to put solutions together - either freely available or at low-cost - and with a number of passionate user communities that provide expert advice and are eager to accept modifications.

The ecosystem they have created is truly staggering. As an example, Facebook has open sourced almost 10M lines of code to date. Twitter, Google and LinkedIn are not far behind2. It is also important to notice that non-Internet companies are making extremely large contributions too, such as Microsoft and IBM. All told, the overall pool of open source code is growing exponentially, as demonstrated by a 2008 study. In most cases, these are full-fledged products, tested in the most challenging production conditions imaginable. Of course, one must also not forget the contributions made to projects that are not under company control such as the Linux Kernel, the Apache web-server and the GNU Compiler GCC.

In order to understand why modern start-ups provide such a compelling financial case, one must first understand how we got to the amazing technology landscape we have today. To do so, we shall divide recent technology history into eras, and explain each era's contribution. We will then focus on modern start-ups, and explain how this model can be deployed to a large gamut of industries and in particular to the financial sector.

First Era: Dot-com Bubble

Silicon Valley was and still is the world's start-up factory so, unsurprisingly, it was ground zero for the start-up revolution that took place at the end of the nineties. It would eventually be known as The Dot-com Bubble. Most people remember those days as a heady time, where each and every idea was packaged as a website and sold for millions or in some cases billions of dollars. Of course, we all had a steep price to pay when the bubble burst - an extinction event that decimated the young Internet sector and IT companies in general.

There is however another way to look at this bubble: it was a gigantic experiment to determine whether there were successful business models to be found in the large scale of the Internet. Whilst much mal-investment occurred, the bubble still produced or pushed forward several of the giants of today such as Google, Amazon and Yahoo.

Most of these companies share a similar technology story. Originally faced with a dearth of investment but with bright young engineers, they found themselves relying on Free and Open Source Software (FOSS) and cheap, off-the shelf hardware. Once they became big enough, it just didn't make sense to replace all of that infrastructure with software and hardware supplied by commercial vendors.

This turn of events was crucial. If these companies had had larger budgets and less skilled engineers, they would have relied on the cutting edge technology of the time. The short-term gain would reveal itself as long term pain, for their ability to scale would be inevitably restricted. In addition, many of the business models wouldn't have worked due to this cost structure3. As it was, since they couldn't even afford the relatively cheap licences of commercial software, they had to make do with what was available for free.

The engineers in these companies - and many others that didn't make it through the dot-com filter - spent countless hours improving FOSS tools and gave back much of these improvements to communities such as Linux, MySQL, Apache, GCC and so on. However, they kept private the plumbing work done to manage the large cluster of cheap machines, as well as the domain related technology - in industry-speak, the Secret Sauce.

By the time the dot-com bubble had run its course and the dust settled, the landscape looked as follows:

  • A model had been created whereby a small number of engineers could bootstrap an Internet-based company at very low cost, serving a small number of users initially.
  • The model had been stretched to very large numbers of users and had been found to scale extremely well; as the business proved itself and investment came in, it was possible increase the size of the computing infrastructure to cope with demand.
  • Because of the open nature of the technologies involved, the ideas became widespread over the internet.

The basic high-scalability FOSS stack - ready for start-ups - was born; the Data Centre, where large amounts of computing are available at low cost, soon followed. It would eventually morph into the Cloud.

Second Era: Social Media

The bursting of the dot-com bubble did not dampen the entrepreneurial spirits, but it did dry up all the easily available capital and thus pushed the aspiring start-ups to be ever more frugal. In addition, VCs started to look for better ways to evaluate prospects. The problem they faced was no different from what they had faced during the dot-com days: how to figure out the potential of a company with no defined business model and nothing else to compare it against.

Google had proved comprehensively that the traditional valuation methods did not make sense in the world of start-ups. After all, here was a company which it's founders couldn't sell for 1M USD and yet a few years later was generating billions of dollars in revenues. Very few saw this coming. VCs were keen not to make the same mistake with the next Google4.

So it was that a system to determine potential by proxy emerged over the years, using indicators such as the size of the user base, time spent by users on the platform and so on - effectively, any attribute that was deemed to have given a competitive advantage to Google and other successful dot-com companies.

In this environment, social media start-ups took took centre stage. Following on from the examples of their predecessors, these companies took for granted that they were to operate on very large data sets. They inherited a very good set of scalable tools, but found that much still had to be built on top. Unlike their predecessors, many chose to do some or all of the infrastructure work out in the open, joining or creating new communities around the tools. This was in no small part due to the scarcity of funds, which encouraged collaboration.

The social media start-ups soon found themselves locked in an arms race for size, where the biggest would be the winner and all others would be doomed to irrelevance5. The size of the user base of the successful companies exploded6, and the tooling required to manage such incredibly large volumes of data had to improve at the same pace or faster. Interestingly, these start-ups continued to view in-house code largely as a cost, not an asset, even after they started to bring in large revenue. The size of the secret sauce was to be kept at a minimum and the pace of open sourcing accelerated over time7.

A final factor was the rise of the next iteration of the data centre, popularised by Amazon with AWS and EC2. It allowed any company to scale out without ever having to concern themselves with physical hardware. This was revolutionary because it allowed razor-thin costs for scalability:

  • Pay only for what you use: the elastic nature of EC2 meant that one could grow or shrink one's cluster based on real time traffic demands and availability of capital.
  • Zero-cost software: FOSS was available in Amazon from the very beginning and was extremely popular with start-ups.
  • Fully automated environments via APIs: resource constrained start-ups could now start to automate all aspects of the product life-cycle. This meant they could release faster, which in turn allowed them to fight more effectively for their user base. This would in time become the DevOps movement.

By the end of the decade, the scalability tooling was largely complete. It was now possible for a small start-up to create a small website and to see it scale from hundreds to millions, restricted only by their ability to bring in capital.

Third Era: Mobile

Mobile phones have been growing close to an exponential rate for over two decades. However, the rise of the smart phones was a game changer, and the line in the sand was drawn with the release of the iPhone. What makes mobile so important to our story is it's penetration. Until smart phones became ubiquitous, there was a large segment of the population that was either totally inaccessible or accessible for limited periods of time. With increasingly large numbers of people carrying smart phones as they go about their day, many use cases that were never before thought possible came to the table. So whilst we call this "the Mobile era", the true heroes are smart phones and, to a smaller extent, the tablets.

The mobile era started with simple apps. Smart phones were still new and applications for each platform were novelty. There was a need to reinvent all that existed before in the world of PCs and adapt it to the new form factor. It was during this phase that the economies of scale of mobile phones became obvious. Whereas consumer PC software had prices on the range of tens to hundreds of dollars, mobile phones bootstrapped a completely different pricing model, with many apps selling for less than one dollar. Volume made up for the loss in revenue per unit. The model was so incredibly successful that a vibrant environment of apps sprung up around each of the successful platforms, carefully nurtured by the companies running the show via their app stores.

Soon enough the more complex apps came about. Companies like Four Square and WhatsApp were trailblazers in the mobile space, merging it with ideas from social media. Many others like Spotify took their wares from the stagnant PC environment and moved to the ever growing mobile space. Complex apps differed from the simple apps in that they required large backends to manage operations. Since these companies were cash strapped - a perennial condition of all start-ups - they found themselves reusing all of the technology developed by the social media companies and became part of the exact same landscape. Of course, the social media companies were eventually forced to jump on the mobile bandwagon - lest they got crushed by it.

So it was that the circle was closed between the three eras.

Evolutionary Pressures and Auto-Catalytic Processes

The changes just described are so revolutionary that one cannot help but look for models to approximate some kind of explanation for what took place. Two stand out. The first is to imagine the population of start-up companies as a small segment of the overall company population that was submitted to an unbelievably harsh fitness function: to grow the data volumes exponentially while growing costs less than linearly. This filter generated new kinds of companies, new kinds of technologies and new kinds of ways of managing technology.

Secondly, there is the auto-catalytic nature of the processes that shaped the current technology landscape. Exponential growth tends to have at its root this kind of self-reinforcing cycle, whereby improvements in an area A trigger improvements in another area B, which in turn forces A to improve. The process keeps on repeating itself whilst it manages to retain stability.

It is this relationship we currently have between start-ups and FOSS: the better the software gets, the cheaper it is to create new start-ups and the faster these can grow with the same amount of capital. By the same token, the more start-ups rely on FOSS, the more they find themselves contributing back or else risk falling behind - both technologically and cost-wise. This feedback loop is an emerging property of the entire system and it has become extremely pronounced over time.

Finance and the Age of Disruption

The concept of disruption was developed in the nineties by Clayton Christensen in Innovator's Dilemma. This book has seen a resurgence in popularity as well as in criticism8. For good or bad, the ideas in this book became the intellectual underpinnings of a new generation of start-ups.

They seek to combine all of the advances of the previous start-ups to create solutions to problems far outside the traditional IT realm. Examples are the hotel industry (AirBnB), the taxi industry (Uber, Lyft) and even the banking industry (Simple). Whilst it's still early days, and whilst there have been many teething problems such as issues with regulation, the destination of travel is already clear: there will be more and more start-ups following the disruptive route.

What makes these companies a compelling proposition to VCs is that they are willing to take on established concerns, with cost structures that are orders of magnitude larger than that of these start-ups. Their thinking is two-fold: the established companies are leaving a lot of money on the table, consumed by their inefficiency; and they are not exploiting the opportunities to their full potential because they do not understand how to operate at a vast scale.

It is in this context that finance scene comes into the picture - as part of the expansionary movement of the disruption movement. VCs have longed eyed enviously the financial industry because they believed that the problems being solved in trading are not that dissimilar to those faced by many large scale start-ups. And yet the rewards are disproportional large in Finance, when compared with say social media.

Fintech soon emerged. As applied to start-ups, Fintech is the umbrella name given to the ecosystem of start-ups and VCs that focus specifically on financial technology. This ecosystem has grown from 930M USD in 2008 to around 3Bn in 2013 according to Accenture. Centred mainly in London, but with smaller offshoots in other financial centres, the Fintech scene is starting to attract established players in the world of Finance. For instance, Barclays has joined the fray by creating an incubator. They farmed off the work to a third-party (Tech Stars) but allowed all the start-ups in the programme to have unprecedented access to their Mobile APIs. Their target is to own the next generation of financial applications on Mobile devices.

Whist Barclays is disrupting from the outside, it is obvious that the investment banking legacy platforms are a fertile ground for start-ups. This is where the scalability stack has a near-perfect fit. A typical example is OpenGamma. The start-up designed an open source Risk platform, initially focused on back office use. They have received over 20M USD in funding as of 2014 and have already been the recipient of several of the industry's awards. There are now several open source trading platforms to choose from including TradeLink and OpenGamma, as well as the popular quantitative analytics library QuantLib.

As we have seen in the previous sections, there is an auto-catalytic process at play here. Once source code becomes widely available, the cost of creating the next Financial startup goes down dramatically because they can reuse the tools. This in turn means many more start-ups will emerge, thus improving the general quality of the publicly available source code.

Conclusions

The objective of this article was to provide a quick survey of the impact of start-up companies in the technology landscape, and how these relate to finance. We now turn our attention to the logical conclusions of these developments.

  • Finance will increasingly be the target of VCs and start-ups: The Fintech expansion is to continue over the coming years and it will affect everyone involved in the industry, particularly the established participants. More companies will take the route of Barclays, trying to be part of the revolution rather than dethroned by it.
  • Banks and other established companies will begin to acquire start-ups: Related to the previous item in some ways; but also with a twist. As part of the Deloitte TMT predictions event, Greg Rogers - the manager of Barclays Accelerator - stated that the acquisition of non-financial start-ups by banks was on the cards. He was speaking about Facebook's acquisition of WhatsApp for 18Bn USD, one of the largest of the year. As Google and Facebook begin integrating payments into their social platforms, banking firms will find their traditional business models under attack and will have no option but to retaliate.
  • Finance will turn increasingly to FOSS: The cost structure that finance firms had up to 2008 is not suitable to the post 2008 world. At present, the volume of regulatory work is allowing these cost structures to persist (and in cases increase). However, eventually banks will have to face reality and dramatically reduce their costs, in line with the new kind of revenues they are expected to make in a highly-regulated financial world. There will be a dramatic shift away from proprietary technologies of traditional vendors, unless these become much more competitive against their fierce FOSS rivals.
  • A FOSS financial stack will emerge over the next five years: Directly related to the previous point, but taking it further. Just as it was with social media companies, so it seems likely that financial firms will eventually realise that they cannot afford to maintain all the infrastructure code. Once an investment bank takes the leap and starts relying on FOSS for trading or back-office, the change will ripple through the industry. The state of the FOSS code is production ready, and a number of hedge funds are already using it in anger. All that is required is for the cost structure to be squeezed even further in the investment banking sector.

Footnotes:

1 As one of many examples, see Google Flu Trends. It is a predictor of outbreaks of the flu virus, with a prediction rate of about 97%. For a more comprehensive - if somewhat popular - take on the possibilities of large data sets, see Big Data: A Revolution That Will Transform How We Live, Work and Think. For a very different take - highliting the dangers of Big Data - see Taleb's views on the ever decreasing noise to signal ratio: The Noise Bottleneck or How Noise Explodes Faster than Data.

2 In fact, by some measures, Google has contributed several times that amount. For one such take, see Lauren Orsini's article.

3 As an example, it was common practice for vendors to charge according to the number of processors, users and so on. Many of the better funded start-ups made use of technology from Cisco, Sun, Oracle and other large commercial vendors, but companies that did so are not very well represented in the population that survived the dot-com bubble, and they are not represented at all in the 2014 Fortune 500 list. Google, Amazon and E-Bay are the only Fortune 500 companies from that crop and they all relied to a very large extent on in-house technology. Note though that we are making an empirical argument here rather than a statistical one, both due to the lack of data available, as well as concern for Survivorship Bias.

4 For one of many takes on the attempt to sell Google, see When Google Wanted To Sell To Excite For Under 1 Million~— And They Passed. To get a flavour of how poorly understood Google's future was as late as 2000, see Google Senses That It's Time to Grow Up. Finally, the success story is best told by the growth of revenues between 2001 and 2003 - see Google's 2003 Financial Tables.

5 Twitter, Facebook, YouTube, LinkedIn and the like were the victors, but for every victor, a worthwhile foe was defeated; MySpace, Hi5, Orkut and many others were all very popular at one time but lost the war and faded into obscurity.

6 As an example, the number of Facebook users grew at an exponential rate between 2004 and 2013 - see Facebook: 10 years of social networking, in numbers.

7 A possible explanation for this decision is the need for continuous scalability. Even companies as large as Facebook or Google cannot dedicate the resources required to adequately maintain every single tool they own; their code bases are just too large. At the same time, they cannot afford for code to become stale because it must continually withstand brutal scalability challenges. The solution to this conundrum was to open source aggressively and to create vibrant communities around tooling. Converting themselves to stewards of the tools, they could now place quasi-skeleton crews to give direction to development, and then rely on the swarms of new start-ups to contribute patches. Once there are enough improvements, the latest version of these tools can be incorporated into the internal infrastructure. This proved to be a very cost-effective strategy, even for large companies, and allowed continued investment across the technology stack.

8 There are quite a few to choose from but Lepore's is one of the best because it robustly attacks both the ideology and the quality of the data.

Date: 2014-09-25 21:37:47 BST

Org version 7.8.02 with Emacs version 23

Validate XHTML 1.0

Sunday, September 07, 2014

Nerd Food: Dogen: Old Demo

Nerd Food: Dogen: Old Demo

As part of my attempt to make the work in Dogen a bit more visible, I thought I'd repost an old demo here. The interface has changed very little since those days so it's still a useful introduction.

Date: 2014-09-07 22:23:58 BST

Org version 7.8.02 with Emacs version 24

Validate XHTML 1.0

Nerd Food: Dogen: Lessons in Incremental Coding

Nerd Food: Dogen: Lessons in Incremental Coding

A lot of interesting lessons have been learned during the development of Dogen and I'm rather afraid many more are still in store. As it is typical with agile, I'm constantly reviewing processes in search of improvements. One such idea was that putting pen to paper could help improving the retrospective process itself. The result is this rather long blog post, which hopefully is of use to developers in similar circumstances. Unlike the typical bullet-point based retrospective, this post it is a rambling narrative as it aims to provide context to the reader. Subsequent retrospectives will be a lot smaller and more to the point.

Talking about context: I haven't spoken very much about Dogen in this blog, so a small introduction is in order. Dogen is an attempt to create a domain model generator. The manual goes into quite a bit more detail, but for the purposes of this exercise, it suffices to think of it as a C++ code generator. Dogen has been developed continuously since 2012 - with a few dry spells - and reached its fiftieth sprint recently. Having said that, our road to a finished product is still a long one.

The remainder of this article looks at what what has worked and what has not worked so well thus far into Dogen's development history.

Understanding Time

Dogen was conceived when we were trying to do our first start up. Once that ended - around the back end of 2012 - I kept working on the tool in my spare time, and this was a setup that has continued ever since. There are no other contributors; development just keeps chugging along, slowly but steadily, with no pressures other than to enjoy the sights.

Working on my own and in my spare time meant that I had two conflicting requirements: very little development resources and very ambitious ideas that required lots of work. With family commitments and a full time job, I quickly found out that there weren't a lot of spare cycles left. In fact, after some analysis, I realised I was in a conundrum. Whilst there is was a lot of "dead-time" in the average week, it was mostly "low-quality grade time": lots of discontinued segments of varying and unpredictable lengths. Summed together in a naive way it seemed like a lot, but - as every programmer knows - six blocks of ten minutes do not one solid hour make.

Nevertheless, one has to play the game with the cards that were dealt. I soon realised that the correct question to ask was: "what kind of development style makes one productive under these conditions?". The answer turned out to be opportunistic coding. This is rooted in having a better understanding of the different "qualities" of time and how best to exploit them. For example, when you have say five to fifteen minutes available, it makes sense to do small updates to the manual or fix trivial problems - a typo in the documentation, renaming variables in a function, mopping up the backlog and other activities of that ilk. A solid block of forty minutes to an hour affords you more: for instance, implementing part or the whole of stories for which the analysis has been completed, or doing some analysis for existing stories. On those rare cases where half-a-day or longer is available, one must make the most of it and take on a complex piece of work that requires sustained concentration. This sessions proved to be most valuable when the output is a set of well defined stories that are ready for implementation.

One needs very good processes in order to be able to manage the usage of time in this fashion. Luckily, agile provides it.

Slow Motion Agile

Looking back on ~2.4k commits, one of the major wins in terms of development process was to think incrementally. Of course, agile already gives you a mental framework for that, and we had a functioning scrum process during our start up days: daily stand-ups, bi-weekly sprints, pre-sprint planning, post-sprint reviews, demos and all of that good stuff. It worked really well, and keep us honest and clean. We used a very simple org-mode file to keep track of all the open stories, and at one point we even built a simple burn-down chart generator to allow us to measure velocity.

Granted, when you are working alone in your spare time, a chunk of agile may not make sense; for instance, providing status updates to yourself may not be the most productive use of scarce time. Surprisingly, I found quite a bit of process to be vital. I've kept the bi-weekly sprint cycle, the sprint logs, the product backlog and the time-tracking we had originally setup and found them extremely useful - quite possibly the thing that has kept me going for such an extended period of time, to be brutally honest. When you are working on an open source project it is very easy to get lost in its open-ended-ness and find yourself giving up, particularly if you are not getting (or expecting) any user feedback. Even Linus himself has said many times he would have given up the kernel if it wasn't for other people bringing him problems to keep him interested.

Lacking Linus' ability to attract crowds of interested developers, I went for the next best thing: I made them up. Well, at least in metaphorical way, I guess, as this is what user stories are when you have no external users to drive them. As I am using the product in anger, I find it very easy to put myself in the head of a user and come up with requirements that push development forward. These stories really help, because they transform the cloud of possibilities into concrete, simple, measurable deliverables that one can choose to deliver or not. Once you have a set of stories, you have no excuse to be lazy because you can visualise in your head just how much effort it would require you to implement a story - and hey, since nerds are terrible at estimating, it's never that much effort at all. As everyone knows, it's not quite that easy in the end; but once you've started, you get the feeling you have to at least finish the task at hand, and so on, one story at a time, one sprint at a time, until a body of work starts building up. It's slow, excruciatingly slow, but it's steady like water working in geological time; when you look back 5 sprints, you cannot help but be amazed on how much can be achieved in such a incremental way - and how much is still left.

And then you get hooked into measurements. I now love measuring everything, from how long it takes me to complete a story, to where time goes in an sprint, to how many commits I do a day, to, well, everything that can easily be measured without adding any overhead. There is no incentive for you to game the system - hell, you could create a script that commits 20 times a day, if the commit count is all you care about. But it's not, so why bother. Due to this, statistics start to actually tell you valuable information about the world and to impel you forward. For instance, GitHub streaks mean that I always try to at least make one commit per day. Because of this, even on days when I'm tired, I always force my self to do something and sometimes that quick commit morphs into an hour or two of work that wouldn't have happened otherwise.

As I mentioned before, it was revealing to find out that there are different types of time. In order to to take advantage of this heterogeneity, one must make scrupulous use of the product backlog. This has proven invaluable, as you can attest by its current size. Whether we are part way through a story or just idly daydreaming, each and every idea must be added to the product backlog, with sufficient detail to allow one to reconstruct one's train of thought at that point in time. Once in the backlog, items can be continuously refined until eventually we find a suitable sprint to tackle them or they get deprecated altogether. But without an healthy backlog it is not possible to make the most these illusive time slots. Conversely, it is important to try to make each story as small and as focused as possible, and to minimise spikes unless they really are on the critical path of the story. This is mainly for psychological reasons: one needs to mark stories as complete, to feel like work has been done. Never-ending stories are just bad for morale.

In general, this extreme incrementalism has served us well. Not all is positive though. The worst problem has been a great difficulty in tackling complex problems - those that require several hours just to load them into your head. These are unavoidable in any sufficiently large code base. Having lots of discontinued segments of unpredictable duration have reduced efficiency considerably. In particular, I notice I have spent a lot more time lost in conceptual circles, and I've taken a lot longer to explore alternatives when compared to working full time.

DVCS to the Core

We had already started to use git during the start-up days, and it had proved to be a major win at the time. After all, one never quite knows where one will be coding from, and whether internet access is available or not, so it's important to have a self-contained environment. In the end we found out it brought many, many more advantages such as great collaborative flows, good managed web interfaces/hosting providers (GitHub and, to some extent, BitBucket), amazing raw speed even on low-powered machines, and a number of other wins - all covered by lots and lots of posts around the web, so I won't bore you with that.

On the surface it may seem that DVCS is most useful on a multi-developer team. This is not the case. The more discontinued your time is, the more you start appreciating its distributed nature. This is because each "kind" of time has a more suitable device - perhaps a netbook for the train, a desktop at someone's house or even a phone while waiting somewhere. With DVCS you can easily to switch devices and continue exactly where you left off. With GitHub you can even author using the web interface, so a mobile phone suddenly becomes useful for reading and writing.

Another decision that turned out to be a major win is still not the done thing. Ever the trailblazers, we decided to put everything related to the project in version control. And by "everything" I do mean everything: documentation, bug reports, agile process, blog posts, the whole lot. It did seem a bit silly not to use GitHub's Wiki and Issues at the time, but, on hindsight, having everything in one versioned controlled place proved to be a major win:

  • searching is never further than a couple of greps away, and it's not sensitive to connectivity;
  • all you need is a tiny sliver of connectivity to push or pull, and work can be batched to wait for that moment;
  • updates by other people come in as commits and can be easily reviewed as part of the normal push/pull process - not that we got any of late, to be fair;
  • changes can easily be diffed;
  • history can be checked using the familiar version control interface, which is available wherever you go.

When you have little time, these advantages are life-savers.

The last but very important lesson learned was to commit early and commit often. It's rather obvious in hindsight, really. After all, if you have very small blocks of time to do work, you want to make sure you don't break anything; last thing you need is to spend a week debugging a tricky problem, with no idea of where you're going or how far you still have to travel. So it's important to make your commits very small and very focused such that a bisection would almost immediately reveal a problem - or at least provide you with an obvious rollback strategy. This has proved itself to be invaluable far too many times to count. The gist of this approach it is to split changes in an almost OCD sort of way, to the point that anyone can look at the commit comment and the commit diff and make a judgement as to whether the change was correct or not. To be fair, it's not quite always that straightforward, but that has been the overall aim.

Struggling to stay Continuously Integrated

After the commit comes the build, and the proof is in the pudding, as they say. When it comes to code, that largely means CI; granted, it may not be a very reliable proof, but nevertheless it is the best proof we've got. One of the major wins from the start up days was to setup CI, and to give it as wide a coverage as we could muster. We setup multiple build agents across compilers and platforms, added dynamic analysis, code coverage, packaging and basic sanity tests on those packages.

All of these have proven to be major steps in keeping the show on the road, and once setup, they were normally fairly trivial to maintain. We did have a couple of minor issues with CDash whilst we were running our own server. Eventually we moved over to the hosted CDash server but it has limitations on the number of builds, which meant I had to switch some build agents off. In addition to this, the main other stumbling block is finding the time to do large infrastructural updates to the build agents such as setting up new versions of Boost, new compilers and so on. These are horrendously time consuming across platforms because you never know what issues you are going to hit, and each platform has their own way of doing things.

The biggest lesson we learned here is that CI is vital but software products with no time at all should not waste time managing their own CI. There are just not enough hours in the day. I have been looking into travis to make this process easier in the future. Also, whilst being cross-platform is a very worthy objective, one has to weigh the costs with the benefits. If you have a tiny user base, it may make sense to stick to one platform and continue to do portable coding without "proof"; once users start asking for multiple platforms, it is then worth considering doing the work required to support them.

The packaging story was also a very good one to start off with - after all, most users will probably rely on those - but it turned out to be much harder than first thought. We spent quite a bit of time integrating with the GitHub API, uploading packages into their downloads section, downloading them from there, testing, and then renaming them for user consumption. Whilst it lasted, this setup was very useful. Unfortunately it didn't last very long as GitHub decided to decommission their downloads section. Since most of the upload and download code was GitHub specific, we could not readily move over to a different location. The lesson here was that this sort of functionality is extremely useful, and it is worth dedicating time to it, but one should always have a plan B and even a plan C. To make a long story short, the end result is that we don't have any downloads available at all - not even a stale ones - nor do we have any sanity checks on packages we produce; they basically go to /dev/null.

In summary, all of our pains led us to conclude that one should externalise early, externalise often and externalise everything. If there is a free (or cheap) provider in the cloud that can take on some or all of your infrastructure work away, you should always consider using them first rather than host your own infrastructure. And remember: your time is worth some money, and it is better spent coding. Of course, it is important to ensure that the provider is reliable, has been around for a while and is used by a critical mass. There is nothing worse than spending a lot of effort migrating to a platform, only to find out that it is about to dramatically change its APIs, prices, terms and conditions - or even worse, to be shutdown altogether.

Loosely Coupled

Another very useful lesson I learned was to keep the off-distro dependencies to a minimum. This is rather related to the previous points on CI and cross-platform-ness, really. During the start up days we started off by requiring a C++ compiler with good C++ 11 support, and a Boost library with a few off-tree libraries - mainly Boost.Log. This meant we had to have our own little "chroot" with all of these, and we had to build them by hand, sprinkled with plenty of helper scripts. In those dark days, almost nothing was supplied by the distro and life was painful. It was just about workable when we had time on our hands, but this is really not the sort of thing you want to spend time maintaining if you are working on a project in your spare time.

To be fair, I had always intended to move to distro-supplied packages as soon as they caught up, and when that happened the transition was smooth enough. As things stand, we have a very small off-distro footprint - mainly ODB and EOS. The additional advantage of not having off-distro dependencies is that you can start to consider yourself for inclusion on a distro. Even in these days of Docker, being shipped by a distro is still a good milestone for any open source project, so it's important to aim for it. Once more, it's the old psychological factors.

All and all, it seems to me we took the right decisions as both C++ 11 and Boost.Log have proven quite useful; but in the future I certainly will think very carefully about adding dependencies to off-distro libraries.

Conclusions

In general, the first fifty iterations of Dogen have been very positive. It has been a rather interesting journey, and dealing with pure uncertainty is not always easy - after all, one always wants to reach a destination. At the same time, much has been learned in the process, and a setup has been created that is sustainable given the available resources. In the near future I intend to improve the visibility of the project as I believe that, for all it's faults, it is still useful in its current form.

Date: 2014-09-07 22:02:42 BST

Org version 7.8.02 with Emacs version 24

Validate XHTML 1.0

Friday, August 08, 2014

Nerd Food: Using Mono In Anger - Part IV

Nerd Food: Using Mono In Anger - Part IV

In which we discuss the advances in MonoDevelop 5

This is the fourth and final part of a series of posts on my experiences using Mono for a fairly demanding project. For more context please read part 1, part 2 and part 3.

In this instalment we shall have a look at latest incarnation of MonoDevelop.

Getting Latest and Greatest

As I was part-way through these series of blog posts, Xamarin announced Xamarin Studio 5 - the commercial product based off of MonoDevelop. Clearly I had to get my hands on it. However, in this particular instance Debian unstable was proven to be rather… stable. The latest versions of Mono and MonoDevelop are rather quaint, and the packaging mailing list is not the most active, as my request for news on packaging revealed.

Building is not an entirely trivial experience, as Brendan's comment on a previous post demonstrated, so I was keen on going for binary packages. Surprisingly, there are not many private repos that publish up-to-date debian packages for mono. After much searching, I found an Ubuntu PPA that did:

add-apt-repository 'deb  http://ppa.launchpad.net/ermshiperete/monodevelop/ubuntu quantal main'
apt-get install monodevelop-current

Running it was as easy as using the launcher script:

/opt/monodevelop/bin/monodevelop-launcher.sh

And just as I was about to moan from the sidelines and beg Xamarin to try and help out Debian and Linux packagers in general, Miguel sent the following tweet:

Miguel de Icaza‏@migueldeicaza 4h Mono snapshots: @directhex just published our daily Linux packages http://mono-project.com/DistroPackages/Jenkins

It's like Xamarin just reads my mind!

Haven't had the chance to play with these packages yet, and I didn't see any references to MonoDevelop in Jenkins (admittedly, it wasn't the deepest search I've done), but seems like a great step forward.

Playing with Latest and Greatest

So what has changed? The UI may look identical to the previous version, but lord has the polish level gone up. Basically, almost all the problems I had bumped into have gone away.

NuGet support

Update: See this post by Matt Ward for more details on NuGet support.

As I mentioned before, whilst the NuGet plugin was great for basic usage, it did have a lot of corner cases including the certificates issues, full restore not working properly and so on. This has all been sorted out in MonoDevelop 5. It sports an internal implementation as explained in the release notes, and it has been flawless up till now.

I did bump into an annoying problem, but I think its more Visual Studio's fault than anything else. Basically, Microsoft decided to add some NuGet.targets to the solution by copying them to .nuget. Now, to their credit, they appear to have thought about mono:

        <!-- We need to launch nuget.exe with the mono command if we're not on windows -->
       <NuGetToolsPath>$(SolutionDir).nuget</NuGetToolsPath>

However, this fails miserably. The DownloadNuGet target does not appear to exist in mono, and copying NuGet.exe manually into .nuget also failed - apparently its not just a binary these days. The lazy man solution was to find the NuGet binaries in MonoDevelop and copy them across to the .nuget directory (had them at monodevelop-5.0/external/nuget-binary). Once this was done, building worked just fine.

Note also that I didn't have time to test the .nuget directory properly, by overriding the default directory with something slightly more sensible. However, I don't particularly like having my packages in the middle of the source tree so I'll be trying that very soon.

Overall, the NuGet experience is great, and package restoring Just Works (TM).

Intellisense and Friends

I was already quite pleased with Intellisense in MonoDevelop 4, but I did find it was easy to confuse it when files got in to a bit of a state - say when pasting random chunks of code into a file. All of these problems are now gone with MonoDevelop 5. In more challenging situations, I have noticed the syntax highlighting disappearing for a little bit but as soon as the code is vaguely sensible, it returns straight away.

It is also a pleasure to use Ctrl-Shift-T to go to definitions, in some ways it seems even more powerful than ReSharper. It is certainly more responsive, even on my lowly NetBook with 1GB of RAM.

One slight snag is that extract interface seems to have gone missing - I was pretty sure I had used it on MonoDevelop 4, but for the life of me can't find it on 5.

NUnit

I was a very happy user of the NUnit add-on for weeks on end and it performed flawlessly. However, today it got stuck loading tests and I ended up having to restart MonoDevelop to fix it. Bearing in mind I normally leave it running for weeks at a time, this annoyed me slightly. Of course, to be fair, I do restart Visual Studio every couple of days or so, so the odd MonoDevelop restart is not exactly the end of the world.

But in general, one complaint I have against both Visual Studio and MonoDevelop is with the opaqueness of unit testing. For me, it all started with shadow copying in NUnit UI and went downhill from there, really. If only one could see what exactly what it is that the IDE is trying to do, it would be fairly trivial to debug it; as it is, all I know is that my tests are "loading" but fail to load a few minutes later.

Anyway, that's just me ranting. Other than that, unit testing has worked really well, and I even started making use of the "Results Pad" and all - shiny charts!

Git FTW, UI Quirks and Resources

I had mentioned before that there were some minor UI quirks. For instance I recall seeing a search box that was not properly drawn, and having problems with the default layout of the screen. I'm happy to report that all of my UI quirks have gone away with 5. It is quite polished in that regard.

I've also started making use of the version control support - to some extent, of course, as I still think that Magit is above sliced bread. Having said that, its very useful to see a diff against what was committed or going up and down the history of a file without having to go to emacs. Version control is extremely quick. Even though Visual Studio now has git support integrated, it is a lot slower that MonoDevelop. I basically never wait at all for git on MonoDevelop.

Finally, a word on resources. I can still use MonoDevelop on my NetBook with its 1GB of RAM, much of it taken by Gnome 3 and Chrome. However, I did see it using over 250 MB of RAM on my desktop PC. I wonder if MonoDevelop is more aggressive on its usage of memory when it sees there is a lot available.

Conclusions

Whilst I'll still be using MonoDevelop for a few weeks longer, I think we have done enough on this four part series. My main objective was really to pit Mono and MonoDevelop against Visual Studio 2013 on a fairly serious project, requiring all the usual suspects: .Net, Castle, Log4Net, MongoDB and so on. To my surprise, I found I had very few interoperability problems - on the whole, the exact same source code, configuration, etc just worked for both Windows and Linux. It says a lot on how far Mono has progressed.

Regrettably, I didn't get as far as playing around with vNext - the coding is taking a lot longer than expected - but if I do get as far as that I shall post an update.

It's great news that Xamarin is improving their Linux support; I can imagine that there must be a number of companies out there considering Docker for their .Net environments. Xamarin is going to be in a great position to win over these tight-Windows-shops with the great products they have.

Date: 2014-08-09 00:51:03 BST

Org version 7.8.02 with Emacs version 23

Validate XHTML 1.0