Friday, September 04, 2015

Nerd Food: Neurons for Computer Geeks - Part III: Coding Interlude

Nerd Food: Neurons for Computer Geeks - Part III: Coding Interlude

If you are anything like me, the first two parts of this series have already bored you silly with theory (Part I, Part II) and you are now hankering for some code - any code - to take away the pain. So part III is here to do exactly that. However, let me prefix that grandiose statement by saying this is not the best code you will ever see. Rather, its just a quick hack to introduce a few of the technologies we will make use of for the remainder of these series, namely:

  • CMake and Ninja: this is how we will build our code.
  • Wt: provides a quick way to knock-up a web frontend for C++ code.
  • Boost: in particular Boost Units and later on Boost OdeInt. Provides us with the foundations for our numeric work.

What I mean by a "quick hack" is: there is no validation, no unit tests, no "sound architecture" and none of the things you'd expect from production code. But it should serve as an introduction to modeling in C++.

All the code is available in GitHub under neurite. Lets have a quick look at the project structure.

CMake

We just took a slimmed down version of the Dogen build system to build this code. We could have gotten away with a much simpler CMake setup, but I intend to use it for the remainder of this series so that's why its a bit more complex than what you'd expect. It is made up of the following files:

  • Top-level CMakeLists.txt: ensures all of the dependencies can be found and configured for building, sets up the version number and debug/release builds.
  • build/cmake: any Find* scripts that are not supplied with the CMake distribution. We Google for these and copied them here.
  • projects/CMakeLists.txt: sets up all of the compiler and linker flags we need to build the project. Uses pretty aggressive flags such as -Wall and -Werror.
  • projects/ohms_law/src/CMakeLists.txt: our actual project, the bit that matters for this article.

ohms_law Project

The project is made up of two classes, in files calculator.[hc]pp and view.[hc]pp. The names are fairly arbitrary but they try to separate View from Model: the user interface is in view and the "number crunching" is in calculator.

The View

Lets have a quick look at view. In the header file we simply define a Wt application with a few widgets:

class view : public Wt::WApplication {
public:
  view(const Wt::WEnvironment& env);

private:
  Wt::WLineEdit* current_;
  Wt::WLineEdit* resistance_;
  Wt::WText* result_;
};

It is implemented in an equally trivial manner. We just setup the widgets and hook them together. Finally, we create a trivial event handler that performs the "computations" when the button is clicked.

view::view(const Wt::WEnvironment& env) : Wt::WApplication(env) {
  setTitle("Ohm's Law Calculator");

  root()->addWidget(new Wt::WText("Current: "));
  current_ = new Wt::WLineEdit(root());
  current_->setValidator(new Wt::WDoubleValidator());
  current_->setFocus();

  root()->addWidget(new Wt::WText("Resistance: "));
  resistance_ = new Wt::WLineEdit(root());
  resistance_->setValidator(new Wt::WDoubleValidator());

  Wt::WPushButton* button = new Wt::WPushButton("Calculate!", root());
  button->setMargin(5, Wt::Left);
  root()->addWidget(new Wt::WBreak());
  result_ = new Wt::WText(root());

  button->clicked().connect([&](Wt::WMouseEvent&) {
      const auto current(boost::lexical_cast<double>(current_->text()));
      const auto resistance(boost::lexical_cast<double>(resistance_->text()));

      calculator c;
      const auto voltage(c.voltage(resistance, current));
      const auto s(boost::lexical_cast<std::string>(voltage));
      result_->setText("Voltage: " + s);
    });
}

The Model

The model is equally as simple as the view. It is made up of a single class, calculator, whose job is to compute the voltage using Ohm's Law. It does this by making use of Boost Units. This is obviously not necessary, but we wanted to take the opportunity to explore this library as part of this series of articles.

double calculator::
voltage(const double resistance, const double current) const {
  boost::units::quantity<boost::units::si::resistance>
    R(resistance * boost::units::si::ohms);
  boost::units::quantity<boost::units::si::current>
    I(current * boost::units::si::amperes);
  auto V(R * I);
  return V.value();
}

Compiling and Running

If you are on a debian-based distribution, you can do the following steps to get the code up and running. First install the dependencies:

$ sudo apt-get install libboost-all-dev witty-dev ninja-build cmake clang-3.5

Then obtain the source code from GitHub:

$ git clone https://github.com/mcraveiro/neurite.git

Now you can build it:

cd neurite
mkdir output
cd output
cmake ../ -G Ninja
ninja -j5

If all went according to plan, you should be able to run it:

$ stage/bin/neurite_ohms_law --docroot . --http-address 0.0.0.0 --http-port 8080

Now using a web browser such as chrome, connect to http://127.0.0.1:8080 and you should see a "shiny" Ohm's Law calculator! Sorry, just had to be done to take away the boredom a little bit. Lets proceed with the more serious matters at hand, with the promise that the real code will come later on.

Created: 2015-09-04 Fri 17:16

Emacs 24.5.1 (Org mode 8.2.10)

Validate

Monday, August 31, 2015

Nerd Food: Neurons for Computer Geeks - Part II: The Shocking Complexity of Electricity

Nerd Food: Neurons for Computer Geeks - Part II: The Shocking Complexity of Electricity

In part I we started to describe the basic morphology of the neuron. In order to continue, we now need to take a detour around the world of electricity. If you are an electricity nerd, I apologise in advance; this is what happens when a computer scientist escapes into your realm, I'm afraid.

"Honor the charge they made!"

First and foremost, we need to understand the concept of charge. It is almost a tautology that atoms are made up of "sub-atomic" particles. These are the proton, the neutron and the electron. The neutron is not particularly interesting right now; however the electron and the proton are, and all because they have a magical property called charge. For our purposes, it suffices to know that "charge" means that certain sub-atomic particles attract or repeal each other, according to a well defined set of rules.

You can think of a charge as a property attached to the sub-atomic particle, very much like a person has a weight or height, but with a side-effect; it is as if this property makes people push or hug each other when they are in close proximity, and they do so with the same strength when at the same distance. This "strength" is the electric force. How they decide whether to hug or push the next guy is based on the "sign" of the charge - that is, positive or negative - with respect to their own charge "sign". Positives push positives away but hug negatives and vice-versa.

For whatever historical reasons, very clever people decided that an electron has one negative unit of charge and a proton has a positive unit of charge. The sign is, of course, rather arbitrary. We could have just as well said that protons are red and electrons are blue or some other suitably binary-like convention to represent these permutations. Just because protons and electrons have the same charge, it does not follow that they are similar in other respects. In fact, they are very different creatures. For example, the electron is very "small" when compared to the proton - almost 2000 times "smaller". The relevance of this "size" difference will become apparent later on. Physicists call this "size" mass, by the by.

As it happens, all of these sub-atomic crazy critters are rather minute entities. So small in fact that it would be really cumbersome if we had to talk about charges in terms of the charge of an electron; the numbers would just be too big and unwieldy. So, the very clever people came up with a sensible way to bundle up the charges of the sub-atomic particles in bigger numbers, much like we don't talk about millimetres when measuring the distance to the Moon. However, unlike the nice and logical metric system, with its neat use of the decimal system, physicists came up instead with the Coulomb, or C, one definition of which is:

  • 1 Coloumb (1C) = 6.241 x 1018 protons
  • -1 Coloumb (-1C) = 6.241 x 1018 electrons

This may sound like a very odd choice - hey, why not just 1 x 1020 or some other "round" number? - but just like a kilobyte is 1024 bytes rather 1000, this wasn't done by accident either. In fact, all related SI units were carefuly designed to work together and make calculations as easy as possible.

Anyway, whenever you see q or Q in formulas it normally refers to a charge in Coulombs.

Units, Dimensions, Measures, Oh My!

Since we are on the subject of SI, this is probably a good point to talk about units, dimensions, measurements, magnitudes, conversions and other such exciting topics. Unfortunately, these are important to understand how it all hangs together.

A number such as 1A makes use of the SI unit of measure "Ampere" and it exists in a dimension: the dimension of all units which can talk about electric charges. This is very much in the same way we can talk about time in seconds or minutes - we are describing points in the time dimension, but using different units of measure - or just units, because we're lazy. A measurement is the recording of a quantity with a unit in a dimension. Of course, it would be too simple to call it a "quantity", so instead physicists, mathematicians and the like call it magnitude. But for the lay person, its not too bad an approximation to replace "magnitude" with "quantity".

Finally, it is entirely possible to have compound dimensional units; that is, one can have a unit of measure that refers to more than one dimension, such as say "10 kilometres per second".

I won't discuss conversions just now, but you can easily imagine that formulas that contain multiple units may provide ways to convert from one unit to another. This will become relevant later on.

Go With the Flow

Now we have a way of talking about charge, and now we know these things can move - since they attract and repel each other - the next logical thing is to start to imagine current. The name sounds magical, but in reality it is akin to a current in a river: you are just trying to figure out how much water is coming past you every second (or in some other suitable unit in the time dimension). The exact same exercise could be repeated for the number of cars going past in a motorway or the number of runners across some imaginary point in a track. For our electric purposes, current tells you how many charges have zipped past over a period of time.

In terms of SI units, current is measured in Amperes, which have the symbol A; an Ampere tells us how many Coloumbs have flown past in a second. Whenever you see I in formulas it normally refers to current.

Now lets see how these two things - Coulombs and Amperes - could work together. Lets imagine an arbitrary "pipe" between two imaginary locations, one side of which with a pile of positive charges and, on the other side, a pile of negative charges - both measured in Coulombs, naturally. In this extraordinarily simplified and non-existing world, the negative charges would "flow" down the pipe, attracted by the positive charges. Because the positive charges are so huge they won't budge, but the negative charges - the lighter electrons - would zip across to meet them. The number of charges you see going past in a time tick is the current.

Resist!

Going back to our example of current in a river, one can imagine that some surfaces are better at allowing water to flow than others; for example, a river out in the open is a lot less "efficient" at flowing than say a plastic pipe designed for that purpose. One reason is that the river has to deal with twists and turns as it finds a path over the landscape whereas the pipe could be laid out as straight as possible; but it is also that the rocks and other elements of the landscape slow down water, whereas a nice flat pipe would have no such impediments. If one were to take these two extremes - a plastic pipe designed for maximum water flow versus a landscape - one could see that they affect flow differently; and one could be tempted to name the property of "slowing down the flow" resistance, because it describes how much "resistance" these things are offering to the water. If you put up a barrier to avoid flooding, you probably would want it to "resist" water quite a lot rather than allow it to flow; and you can easily imagine that sand and sandbags "resist" water in very different ways.

Resistance is a fundamental concept in the electrical world. The gist of it is similar to the contrived examples above, in that not all materials behave the same way with regards to allowing charges to flow. Some allow them to flow freely nearly at maximum speed whereas others do not allow them to flow at all.

Since we are dealing with physics, it is of course possible to measure resistance. We do so in SI units of Ohms, denoted by the Greek letter upper-case Ω.

As we shall see, not all materials are nicely behaved when it comes to resistance.

You've Got Potential Baby!

Lets return to our non-existing "pipe that allows charges to flow" scenario, and take it one step further. Imagine that for whatever reason our pipe becomes clogged up with a blockage somewhere in the middle. Nothing could actually flow due to this blockage so our current drops to zero.

According to the highly simplified rules that we have learned thus far, we do know that - were there to be no blockage - there would be movement (current). That is, the setup of the two bundles in space is such that, given the right conditions, we would start to see things flowing. But, alas, we do not have the right conditions because the pipe is blocked; hence no flow. You could say this setup has "the potential" to get some flow going, if only we could fix the blockage.

In the world of electricity, this idea is captured by a few related concepts. If we highly simplify them, they amount to this:

  • electric potential: the idea that depending where you place a charge in space, it may have different "potential" to generate energy. We'll define energy a bit better latter on, but for now a layman's idea of it suffices. By way of an example: if you place a positive charge next to a lump of positive charges and let it go, it will move a certain distance away from the lump. Before you let the charge go, you know the charge has potential to move away. You can also see that the charge will move by different amounts depending how close you place it to the lump; the closer you place it, the more it will move. When we are thinking of electric potential, we think of just one charge.
  • electric potential energy: clearly it would be possible to move two or three charges too, as we did for the one; and clearly they should produce more energy than a single charge. So one simple way of understanding electric potential energy is to think of it as the case of electric potential that deals with the total number of charges we're interested in, rather than just one.

Another way of imagining these two concepts is to think that electric potential is a good way to measure things when you don't particularly care about the number of charges involved; it is as if you scaled everything to just one unit of charge. Electric potential energy is more when you are thinking of a system with an actual number of charges. But both concepts deal with the notion that placing a charge at different points in space may have an impact in the energy you can get out of it.

Having said all of that we can now start to think about electric potential difference. It uses the same approach as electric potential, in that everything is scaled to just one unit of charge, but as the name implies, it provides a measurement of the difference between the electric potential of two points. Electric potential difference is more commonly known as voltage. Interestingly, it is also known as electric pressure, and this may be the most meaningful of its names; this is because when there is an electric potential difference, it applies "pressure" on charges which force them to move.

The SI unit Volt is used to measure electric potential, electric potential energy and electric potential difference amongst other things. This may sound a bit weird at first, but it is just because one is unfamiliar with these concepts. Take time, for example: we use minutes as a unit of measure of all sorts of things (duration of a football game, time it takes for the moon to go around the earth, etc.). We did not invent a new unit for each phenomenon because we recognised - at some point - that we were dealing with points in the same dimension.

Quick Conceptual Mop-Up

Before we move over to the formulas, it may be best to tie up a few loose ends. These are not strictly necessary, but just make the picture a bit more complete and moves us to a more realistic model - if still very simplistic.

First, we should start with atoms; we mentioned charges but skipped them. Atoms are (mostly) a stable arrangement of charges, placed in such a way that the atoms themselves are neutral - i.e. contain exactly the same amount of negative and positive charges. We mentioned before that protons and electrons don't really get along, and neutrons are kind of just there, hanging around. In truth, neutrons and protons also really get along, via the aptly named nuclear force; this is what binds them together in the nucleus of the atom. Electrons are attracted to protons and live their existences in a "cloud" around the nucleus. Note that the nucleus is more than 99% of the mass of the atom, which gives you an idea of just how small electrons are.

The materials we will deal with in our examples are made of atoms, as are, well, quite a few things in the universe. These materials are themselves stable arrangements of atoms, just like atoms are stable arrangements of protons, neutrons and electrons. As you can see in the picture, these look like lattices of some kind.

carbon-atoms.jpg

Figure 1: Microscopic View of Carbon Atoms. Source: Quantum Physics: The Brink of Knowing Something Wonderful

In practice, copper wires are made up of a great many things rather than just atoms of copper. One such "kind of thing" is the unbound electrons - or free-moving electrons; basically electrons are not trapped into an atom. As we mentioned before, electrons are the ones doing most of the moving. Left to their own devices, electrons in a conducting material will just move around, bumping into atoms in a fairly random way. However, lets say you take one end of a copper wire and plug it to the + side of a regular AA battery and then take other end and plug it to the - side of the battery. According to all we've just learned, its easy to imagine what will happen: the electrons stored in the - side will zip across the copper to meet their proton friends at the other end. This elemental construction, with its circular path, is called a circuit. What you've done is to upset the neutral balance of the copper wire and got all the electrons to move in a coordinated way (rather than random) from the - side to the + side.

It is at this juncture that we must introduce the concept of ions. An ion is basically an atom that is no longer neutral - either because it has more protons than electrons (called a cation) or more electrons than protons (called an anion). In either case, this comes about because the atom has gained or lost some electrons. Ions will become of great interest when we return to the neuron.

One final word on resistance and its sister concept of conductance:

  • Resistance is in effect a byproduct of the way the electrons are arranged in the electron cloud and is related to the ionisation mentioned above; certain arrangements just don't allow electrons to flow across.
  • Conductance is the inverse of resistance. When you talk about resistance you are focusing on the material's ability to impair movement of charges; when you talk about conductance you are focusing on the material's ability to let charge flow through.

The reason we choose copper or other metals for our examples is because they are good at conducting these pesky electrons.

Ohm's Law

We have now introduced all the main actors required for one of the main parts in the play: Ohm's Law. It can be stated very easily:

V = R x I

And here's a picture to aid intuition.

The best way to understand this law is to create a simple circuit.

Ohm's_Law_with_Voltage_source_TeX.svg

Figure 3: Simple electrical circuit. Source: Wikipedia, Electrical network

On the left we have a voltage source, which could be our 1.5V AA battery. On the right of the diagram we have a resistor - an electric component that is designed specifically to "control" the flow of the electric current. Without the resistor, we would be limited by how much current the battery can pump out and how much "natural" resistance the copper wire has, which is not a lot since it is very good at conducting. The resistor gives us a way to limit current flow from these theoretical maximum limitations.

Even if you are not particularly mathematically oriented, you can easily see that Ohm's Law gives us a nice way to find any of these three variables, given the other two. That is to say:

R = V / I
I = V / R

These tell us many interesting things such as: for the same resistance, current increases as the voltage increases. For good measure, we can also find out the conductance too:

G = I / V = 1 / R

It is important to notice that not everything obeys Ohm's law - i.e. behave in a straight line. The conductors that obey this law are called ohmic conductors. Those that do not are called non-ohmic conductors. There are also things that obey to Ohm's Law, for the most part. These are called quasi-ohmic.

What next?

We have already run out of time for this instalment but there are still some more fundamental electrical concepts we need to discuss. The next part will finish these and start to link them back to the neuron.

Created: 2015-08-31 Mon 19:27

Emacs 24.5.1 (Org mode 8.2.10)

Validate

Nerd Food: Neurons for Computer Geeks - Part I: A Neuron From Up On High

Nerd Food: Neurons for Computer Geeks - Part I: A Neuron From Up On High

As any computer geek would tell you, computer science is great in and of itself and many of us could live long and contented lives inside that box. But things certainly tend to become interesting when there is a whole problem domain to model, and doubly so when that domain is outside of our comfort zone. As it happens, I have managed to step outside said zone - rather, quite far outside - so it seemed like a good idea to chronicle these adventures here.

The journey we are about to embark starts with a deceptively simple mission: to understand how one can use computers to model neurons. The intended audience of these posts is anyone who loves coding but has no idea about electricity, circuits, cells and so on - basically someone very much like me. We shall try to explain, at least to a degree, all of the required core concepts in order to start coding. As it turns out, there are quite a few.

But hey, as they say, "If you can't explain something to a six year-old, you really don't understand it yourself". So lets see if I got it or not.

I'm a Cell, Get Me Out Of Here!

A neuron is a cell, so it makes sense to start with cells. Cells are a basic building block in biology and can be considered as the smallest unit of a living organism - at least for our purposes, if nothing else. The key idea behind a cell is as obvious as you'd like: there is the inside, the outside, and the thing that separates both.

Of course, this being biology, we need to give it complicated names. Accordingly, the inside of the cell is the cytoplasm and the thing that separates the cell from the outside world is the membrane. You can think of it as a tiny roundy-box-like thing, with some gooey stuff inside. The material of the box is the membrane. The gooey stuff is the cytoplasm. When we start describing the different cellular structures - as we are doing here - we are talking about the cell's morphology.

Living beings are made up of many, many cells - according to some estimates, a human body would have several trillion - and cells themselves come in many, many kinds. Fortunately, we are interested in just one kind: the neuron.

The Neuron Cell

The neuron is a nerve cell. Of course, there are many, many kinds of neurons - nature just seems to love complexity - but they all share things in common, and those things define their neuron-ness.

Unlike the "typical" cell we described above (i.e. "roundy-box-like thing"), the neuron is more like a roundy-box-like thing with some branches coming out of it. The box-like thing is the cell body and is called soma. There are two types of branches: axons and dendrites. A dendrite tends to be short, and it branches like a tree with a very small trunk. The axon tends to be long and it also branches off like a tree, but with a very long trunk. As we said, there are many kinds of neurons, but a fair generalisation is that they tend to have few axons (one or maybe a couple) and many dendrites (in the thousands).

8808542_f520.jpg

Figure 1: Source: What is a Neuron?

This very basic morphology is already sufficient to allows to start to think of a neuron as a "computing device" - a strange kind of device where the dendrites provide inputs and the axon outputs. The neuron receives all these inputs, performs some kind of computation over them, and produces an output.

The next logical question for a computer scientist is, then: "where do the outputs come from and where do they go?". Imagining an idealised neuron, the dendrites would be "connecting" to other dendrites or to axons. At this juncture (pun not intended), we need to expand on what exactly these "connections" are. In truth, its not that the axon binds directly to the dendrite; there is always a gap between them. But this gap is a special kind of gap, first because it is a very small gap and second because it is one over which things can travel, from the axon into the dendrite. This kind of connectivity between neurons is called a synapse.

From this it is an easy leap to imagine that these sets of neurons connected to other neurons begin to form "networks" of connectivity, and these networks will also have computational-device-like properties, just like a neuron. These are called neural networks. Our brain happens to be one of these "neural networks", and a pretty large one at that: it can have as many as 80-100 billion neurons, connected over some 1 quadrillion synapses. In these days of financial billions and trillions, it is easy to be fooled into thinking 100 billion is not a very large number, so to get a sense of perspective lets compare it to another large network. The biggest and fastest growing human-made network is the Internet, estimated to have some 5 billion connected devices but less than 600k connections in its core - and yet we are already creacking at the seams.

The Need To Go Lower

Alas, we must dig deeper before we start to understand how these things behave in groups. Our skimpy first pass at the neuron morphology left a lot of details out, which are required to understand how they behave. As we explained, neurons have axons and dendrites, and these are responsible for hooking them together. However, what is interesting is what they talk about once they are hooked.

A neuron is can be thought of as an electrical device, and much of its power (sorry!) stems from this. In general, as computer scientists, we don't like to get too close to the physical messiness of the world of hardware; we deem it sufficient to understand some high-level properties, but rarely do we want to concern ourselves with transistors or even - regrettably - registers or pipelines in the CPU. With neurons, we can't get away with it. We need to understand the hardware - or better, the wetware - and for that we have to go very low-level.

We started off by saying cells have a membrane that separates the outside world from the cytoplasm. That was a tad of an oversimplification; after all, if the membrane did not allow anything in, how would the cell continue to exist - or even come about in the first place? In practice these membranes are permeable - or to be precise, semi-permeable. This just means that it allows some stuff in and some stuff out, under controlled circumstances. This is how a cell gets energy in to do its thing and how it expels its unwanted content out. Once things started to move in and out selectively, something very interesting can start to happen: the build up of "electric potential". However, rather unfortunately, in order to understand what we mean by this, we need to cover the fundamentals of electricity.

Onward and downwards we march. Stay tuned for Part II.

Created: 2015-08-31 Mon 17:25

Emacs 24.5.1 (Org mode 8.2.10)

Validate

Tuesday, May 26, 2015

Nerd Food: A Prelude of Things to Come

Nerd Food: A Prelude of Things to Come

This sprint I found myself making one of those historical transitions: moving my entire Emacs infrastructure from a old, creaking at the seams approach, to the new all-singing-all-dancing way of doing things. This post documents the start of this transition.

The Road to Cunene

I have been using Emacs since around 1998. One of the biggest reasons to use the great old editor is its infinite configurability. Really, to call Emacs "configurable" is rather like saying that Euler wasn't bad with numbers. In truth - and it takes you a while to really grok this - Emacs is just a lisp platform with a giant editing library built on top; a library that keeps on getting extended on a daily basis by a large number of Emacs users. And, of course, you configure Emacs using lisp, so that the lines between "configuration" and "development" are, at best, blurry.

But lets go back to the beginning. Like every other Emacs newbie in those days, I too started with a plain (i.e. non-configured) Emacs and soon evolved to a very simple .emacs - this file being one of the possible places in which to store its configuration. The reason why almost all Emacs users start configuring Emacs very early on is because its defaults are astonishingly atrocious. It still amazes me to the day that some people are able to use plain Emacs and come out at the other end as Emacs users. In some ways, I guess it is a test of fire: do you really want to use Emacs? There are two responses to this test: most give up, but a few persist and soon start changing the editor to behave in a slightly saner manner.

The .emacs starts small, especially if you are not familiar with lisp. Sooner or later it occurs to you that, surely, someone must have already done one of these before, and then you find the amazing world of .emacs "development". This opens up entire new vistas of the Emacs landscape, because with each .emacs you find, you discover untold numbers of configuration knobs and - much more importantly - many new modes to install. In Emacs lingo, a mode is kind of like a "plug-in" for Eclipse or Visual Studio users. But this is just an approximation; as with everything "Emacs", there is actually no real equivalent way of describing Emacs terminology with analogies outside of Emacs. The problem with IDEs and most other editors is that they can only be extended in ways that their designers thought useful. In Emacs, everything is extensible. And I do mean everything. I remember the day I realised that a key press was really just the invocation of the self-insert-command function and, like any other function, it too could be changed in a myriad of ways.

But I digress. As with most users, my .emacs evolved over the years as I found more and more modes. I soon found that it was very painful to keep all my machines with the same setup; invariably I would change something at work but forget to change it at home or Uni or vice-versa. To make matters worse, some machines were on Windows. And in those days, there was no Emacs package management support, so you ended up copying lots of modes around. Life was painful and brute in my first decade of Emacs.

Around six years ago, things got a lot better: I started to use git in anger, refactored my .emacs into something slightly saner and called it Cunene - after the river in Southern Angola. Eventually I put it on GitHub. I believe - but don't recall exactly - that most of the refactoring ideas were stolen from Phil Hagelberg's Starter Kit and Alex Ott's .emacs.

Whatever the source of ideas, the improvements were undeniable. Cunene offered a all-in-one place to go to for my .emacs, and it combined all the experience I had in seeing other people's .emacs. At over twenty megs it wasn't exactly svelte, but my objective was to have a "zero-conf" setup; given a new machine, all I wanted to do was to git clone cunene, start Emacs and have exactly the same environment as everywhere else.

Further, I could update Cunene from any machine and push it back to GitHub. Cunene contained all the modes I needed, all byte-compiled, and all at trusted versions and with some (very minor) patches. I could easily upgrade one or more modes from one machine and then just git pull from all other machines. It also handled any Windows-specific workarounds, ensuring things worked well out of the box there too.

To be fair, for the last 6 years, this setup has served me well, but time also revealed its limitations:

  • package management support was limited. I tried using Elpa but, at the time, not many packages were available. Package management has evolved in leaps and bounds - Melpa, Marmelade, etc - but Cunene was still stuck with the old Elpa support.
  • the accumulation of modes over such a long period meant that starting Emacs took quite a long time. And to make matters worse, only a small percentage of the modes were truly useful.
  • most of the modes were at stale versions. Since things worked for me, I had no incentive to keep up with latest and greatest - and for all the easiness, it was still not exactly trivial to upgrade modes. This meant that I ended up having to put up with bugs that had long been fixed in HEAD, and worse, whenever I upgraded to latest, I saw massive changes in behaviour.
  • I was stuck on Emacs 23. For whatever reason, some parts of Cunene did not work with Emacs 24 properly and I was never able to get to the bottom of this. Being on an old version of Emacs has been a problem because I make use of C++-11 but Emacs 23 doesn't really indent it properly. And of course, Emacs 24 is just improved all around.
  • Cunene had a lot of boilerplate code. Since I never really learnt how to code in Emacs lisp, I was most likely writing a lot of non-idiomatic code. Also, the Emacs API has moved on considerably in fifteen years, so certain things were not being done in the best way possible.
  • Cedet and Org-mode are now part of Emacs but we were still carrying our own copies. I never managed to get Cedet to work properly either.
  • many new modes have appeared of late that provide much better solutions to some of the problems I had, but Cunene insulated me from these developments. In addition, adding new modes would only add to the complexity so I had no incentive to do so.

There had to be a better way of doing things; something that combined the advantages of Cunene but fixed its shortcomings.

Then I heard of Prelude.

The Road to Prelude

According to the official documentation:

Prelude is an Emacs distribution that aims to enhance the default Emacs experience. Prelude alters a lot of the default settings, bundles a plethora of additional packages and adds its own core library to the mix. The final product offers an easy to use Emacs configuration for Emacs newcomers and lots of additional power for Emacs power users.

I am still finding my way around - so don't quote me - but from what I have seen, it seems to me that Prelude is like the Cunene "framework" but done by people that know what they are doing. It covers all of the advantages described above, but shares none of its disadvantages. In particular:

  • it provides a sensible set of baseline defaults that "we all can agree on". I found it quite surprising that a plain Prelude looked almost like Cunene. Of course, no two Emacs users agree on anything, really, so there is still a lot to be tweaked. Having said that, the great thing is you can start by seeing what Prelude says and giving it a good go using it; if the baseline default does not work for you, you can always override it. Just because you have been doing something in a certain way for a long time does not mean its the best way, and the move to Prelude provides an opportunity to reevaluate a lot of "beliefs".
  • all the framework code is now shared by a large number of Emacs users. This means it is well designed and maintained, and all you have to worry about is your small extensibility points. With over 1k forks in GitHub, you can rest assured that Prelude will be around for a long time. In addition, if you find yourself changing something that is useful to the Prelude community, you can always submit a pull request and have that code shared with the community. You no longer have to worry about staleness or non-idiomatic code.
  • Prelude integrates nicely with several package managers and handles updates for you.
  • There are lots of examples of Prelude users - you just need to follow the GitHub forks. It would be nice to have a list of "good examples" though, because at 1K forks its not easy to locate those.
  • If you fork Prelude the right way, you should be able to update from upstream frequently without having too many conflicts. I am still getting my head around this, but the model seems sound at first blush.

But to know if it worked required using it in anger, and that's what will cover in the next few sections.

From Cunene to Prelude

Emacs users are creatures of habit and changing your entire workflow is not something to take lightly. Having said that, I always find that the best way to do it is to just go for it. After all, you can always go back to how you did things before. In addition, I did not want to do a wholesale port of Cunene for two reasons:

  • I didn't want to bring across any bad habits when Prelude was already solving a problem properly.
  • I wanted to get rid of all of the accumulated cruft that was no longer useful.

What follows are my notes on the porting work. This is a snapshot of the work, a few days into it. If there is a reason, I may do further write-ups to cover any new developments.

Initial Setup

Prelude recommends you to create a fork and then add to it your personal configuration. I decided to create a branch in which to store the personal configuration rather than pollute master. This has two advantages:

  • pulling from upstream will always be conflictless;
  • if I do decide to submit a pull request in the future, I can have a clean feature branch off of master that doesn't have any of the personal cruft in it.

As it happens, I later found out that other Prelude users also use this approach such as Daniel Wu, as you can see here. I ended up using Daniel's approach in quite a few cases.

I created my prelude fork in GitHub using the web interface. Once the fork was ready, I moved Cunene out of the way by renaming the existing .emacs.d directory and performed the following setup:

$ curl -L https://github.com/bbatsov/prelude/raw/master/utils/installer.sh -o installer.sh
$ chmod +x installer.sh
$ ./installer.sh -s git@github.com:mcraveiro/prelude.git

This created a Prelude-based ~/.emacs.d, cloned off of my fork. I then setup upstream:

$ cd ~/.emacs.d
$ git remote add upstream git@github.com:bbatsov/prelude.git

This means I can now get latest from upstream by simply doing:

$ git checkout master
$ git pull upstream master
$ git push origin master

I then setup the personal branch:

 $ git branch --track personal origin/personal
 $ git branch
   master
 * personal

For good measure, I also setup personal to be the default branch in GitHub. This hopefully means there is one less configuration step when setting up new machines. Once all of that was done, I got ready to start Emacs 24. The version in Debian Testing at present is 24.4.1 - not quite the latest (24.5 is out) but recent enough for those of us stuck in 23.

The start-up was a bit slow; Prelude downloaded a number of packages, taking perhaps a couple of minutes and eventually was ready. For good measure I closed Emacs and started it again; the restart took a few seconds, which was quite pleasing. I was ready to start exploring Prelude.

The "Editor" Configuration

My first step in configuration was to create a init.el file under .emacs.d/personal and add prelude-personal-editor.el. I decided to follow this naming convention by looking at the Prelude core directory; seems vaguely in keeping. This file will be used for a number of minor tweaks that are not directly related to an obvious major mode (at least from a layman's perspective).

  • Fonts, Colours and Related Cosmetics

    The first thing I found myself tweaking was the default colour theme. Whilst I actually quite like Zenburn, I find I need a black background and my font of choice. After consulting a number of articles such as Emacs Prelude: Background Color and the Emacs Wiki, I decided to go with this approach:

    ;; set the current frame background and font.
    (set-background-color "black")
    (set-frame-font "Inconsolata Bold 16" nil t)
    
    ;; set the font and background for all other frames.
    (add-to-list 'default-frame-alist
                 '(background-color . "black")
                 '(font .  "Inconsolata Bold 16"))
    

    The font works like a charm, but for some reason the colour gets reset during start-up. On the plus side, new frames are setup correctly. I have raised an issue with Prelude: What is the correct way to update the background colour in personal configuration? For now there is nothing for it but to update the colour manually. Since I don't restart Emacs very often this is not an urgent problem.

    One nice touch was that font-lock is already global so there is no need for additional configuration there.

  • Widgets and Related Cosmetics

    Pleasantly, Prelude already excludes a lot of annoying screen artefacts and it comes with mouse wheel support out of the box - which is nice. All and all, a large number of options where already setup the way I like it:

    • no splash screen;
    • no menu-bars or tool-bars;
    • good frame title format with the buffer name;
    • no annoying visible bell;
    • displaying of column and line numbers, as well as size of buffers out of the box;
    • not only search had highlight, but the all shiny Anzu mode is even niftier!
    • no need for hacks like fontify-frame.

    However, Preludes includes scroll-bars and tool-tips - things I do not use since I like to stick to the keyboard. It also didn't have date and time in the mode line; and for good measure, I disabled clever window splitting as I found it a pain in the past. Having said that, I am still not 100% happy with time and date since it consumes a lot of screen real estate. This will be revisited at some point in the context of diminish and other mode line helpers.

    ;; disable scroll bar
    (scroll-bar-mode -1)
    
    ;; disable tool tips
    (when window-system
      (tooltip-mode -1))
    
    ;; time and date
    (setq display-time-24hr-format t)
    (setq display-time-day-and-date t)
    (display-time)
    

    One note on line highlighting. Whilst I quite like this feature in select places such as grep and dired, I am not a fan of using it globally like Prelude does. However, I decided to give it a try and disable it later if it becomes too annoying.

  • Tabs, Spaces, Newlines and Indentation

    In the realm of "spacing", Prelude scores well:

    • no silly adding of new lines when scrolling down, or asking when adding a new line at save;
    • pasting performs indentation automatically (yank indent etc)- default handling of tabs and spaces is fairly sensible - except for the eight spaces for a tab! A few minor things are missing such as untabify-buffer. These may warrant a pull request at some point in the near future.
    • a nice whitespace mode which is not quite the same as I had it in Cunene but seems to be equally as capable so I'll stick to it.
  • To Prompt or Not to Prompt

    There are a few cases where me and Prelude are at odds when it comes to prompts. First, I seem to try to exit Emacs by mistake and I do that a lot. As any heavy Emacs user will tell you, there is nothing more annoying than exiting Emacs by mistake (in fact, when else do you exit Emacs?). I normally have more than 50 buffers open and not only does it take forever to bring up Emacs with that much state, but it never quite comes back up exactly the way I left it. Anyway, suffices to say that I strongly believe in the "are you sure you want to exit Emacs" prompt, so I had that copied over from Cunene. And, of course, one does not like typing "yes" when "y" suffices:

    ;; Make all "yes or no" prompts show "y or n" instead
    (fset 'yes-or-no-p 'y-or-n-p)
    
    ;; confirm exit
    (global-set-key
     (kbd "C-x C-c")
     '(lambda ()
        (interactive)
        (if (y-or-n-p-with-timeout "Do you really want to exit Emacs ?" 4 nil)
            (save-buffers-kill-emacs))))
    

    There is a nice touch in Prelude enabling a few disabled modes such as upper/down casing of regions - or perhaps the powers that be changed that for Emacs 24. Whoever is responsible, its certainly nice not to have to worry about it.

  • Keybindings

    One of the biggest cultural shocks, inevitably, happened with keybindings. I am giving Prelude the benefit of the doubt - even though my muscle memory is not happy at all. The following has proved annoying:

    • Apparently arrow keys are discouraged. Or so I keep hearing in my minibuffer every time I press one. As it happens, the warnings are making me press them less.
    • C-b was my ido key. However, since I should really not be using the arrow keys, I had to get used to using the slightly more standard C-x b.
    • Eassist include/implementation toggling was mapped to M-o and M-i was my quick way of opening includes in semantic (more on that later). However, these bindings don't seem to work any more.
    • pc-select is a bit screwed in some modes such as C++ and Emacs lisp. But that's alright since you shouldn't be using the arrow keys right? What is annoying is that it works ok'ish in Org-mode so I find that I behave differently depending on the mode I'm on.
    • in addition, win-move is using the default shift-arrow keys and its not setup to handle multiple frames. This is a problem as I always have a few frames. These will have to be changed, if nothing else just to preserve my sanity.
    • talking about pc-select, I still find myself pasting with C-v. I just can't help it, its buried too deeply into the muscle memory. But it must be said, it's rather disconcerting to see your screen move up when you press C-v; it makes you think your paste has totally screwed up the buffer, when in reality its just the good old muscle memory biting again.
    • C-x u now doesn't just undo like it used to. On the plus side, undo-tree just rocks! We'll cover it below.
    • C-backspace doesn't just delete the last word, it seems to kill a whole line. Will take some getting used to.

    All and all, after a few days, the muscle memory seems to have adapted well enough. I'm hoping I'll soon be able to use C-b and C-f without thinking, like a real Emacs user.

Modes From Cunene

Unfortunately, package management was not quite as complete as I had hoped and so, yet again, I ended up with a number of modes that had to be copied into git. Fortunately these are a lot less in number. I decided to place them under personal/vendor as I wasn't sure what the main vendor folder was for.

  • Cedet

    After almost losing my mind trying to configure Cedet from Emacs 24, I decided to bite the bullet and upgrade to the latest development version. In the past this was a safe bet; I'm afraid to report it still is the best way to get Cedet up and running. In fact, I got it working within minutes after updating to develop versus a whole day of fighting against the built-in version. Pleasantly, it is now available in git:

    git clone http://git.code.sf.net/p/cedet/git cedet
    

    Building it was a simple matter of calling make, both at the top-level and in contrib:

    $ cd cedet
    $ make EMACS=emacs24
    $ cd contrib
    $ make EMACS=emacs24
    

    The setup was directly copied from their INSTALL document, so I recommend reading that.

    Still in terms of Cedet, a very large win was the move to EDE Compilation Database. I really cannot even begin to do justice to the joys of this mode - it is truly wonderful. I did the tiniest of changes to my build process by defining an extra macro:

    cmake ../../../dogen -G Ninja -DCMAKE_EXPORT_COMPILE_COMMANDS=TRUE
    

    With just that - and a couple of lisp incantations (see the cedet init file) - and suddenly I stopped having to worry about supplying flags to flymake (well, flycheck - but that's another story), semantic, the whole shebang. I haven't quite worked out all of the details just yet, but with very little configuration the compilation database seems to just get everything working magically.

    Because of this, I am now finding myself using Cedet a lot more; the intelisense seems to just work on the majority of cases. The only snag is the annoyance of old: having Emacs block on occasion whilst it builds some semantic database or other. It doesn't happen often but its still a pain when it does. Which gave me the idea of replacing it with a Clang based "semantic database generator". Lets see what the Cedet mailinglist says about it.

    All and all, Cedet is much improved from the olden days; so much so I feel it warrants a proper review after a few months of using it in anger. In fact, I feel so brave I may even setup emacs-refactor or semantic-refactor. It is also high-time to revisit C/C++ Development Environment for Emacs and pick up some new tips.

  • Git-emacs

    Git-emacs makes me a bit sad. In truth, I am a perfectly content magit user (more on that later) except for one feature - the file status "dot". This is something I got used from the svn days and still find it quite useful. Its silly really, especially in these days of git-gutter, but I still like to know if there have been any changes to a file or not, and I haven't found a good way of doing this outside of git-emacs. It provides a nice little red or green dot in the modeline, like so:

    git-emacs.png

    Figure 1: Git-emacs state modeline

    However, there are no packaged versions of git-emacs and since everyone uses magit these days, I can't see it making to Elpa. Also, it is rather annoying having to load the whole of git-emacs for a dot, but there you go.

  • Doxymacs

    Very much in the same vein as git-emacs, doxymacs is also one of those more historical modes that seem a bit unmaintained. And very much like git-emacs, I only use it for the tiniest of reasons: it syntax-highlights my doxygen comments. I know, I know. On the plus side, it seems to do a whole load of other stuff - I just never quite seem to need any other feature besides the nice syntax highlighting of comments.

Modes From Prelude or Emacs 24

In this section we cover modes that are either new/updated for Emacs 24 or available from Prelude via Elpa.

  • Dired

    Dired is configured in a fairly sensible manner out of the box. For example, one no longer has the annoying prompts when deleting/copying directories with files - it never occurred to me you could configure that away for some reason.

    On the down side, it is not configured with dired-single, so the usual proliferation of dired buffers still occurs. I have decided not to setup dired-single for a few days and see how bad it gets.

    The other, much more annoying problem was that hidden files are displayed by default. I first tried solving this problem with dired-omit as per this page:

    (setq-default dired-omit-mode t)
    (setq-default dired-omit-files "^\\.?#\\|^\\.$\\|^\\.\\.$\\|^\\.")
    

    However, I found that omit with regexes is not that performant. So I ended up going back to the old setup of ls flags:

    (setq dired-listing-switches "-l")
    
  • Undo-tree and browse-kill-ring

    As mentioned before, C-x u is not just undo, it's undo-tree! Somehow I had missed this mode altogether up til now. Its pretty nifty, as it allows you to navigate the undo-tree - including forks. It is quite cool.

    I also found that the latest version of browse-kill-ring is very nice; so much so that I find myself using it a lot more now. The management of the clipboard will never be the same.

  • Org-mode

    One rather annoying thing was that with the latest Org-mode, the clock-table is a bit broken. I quickly found out I wasn't the only one to notice: Is it possible to remove ' ' from clock report but preserve indentation?

    This link implies the problem is fixed in Emacs 24.4, but I am running it and sadly it doesn't seem to be the case. I also found out that the automatic resizing of clock tables is no longer… well, automatic. Instead, we now have to supply the size. My final setup for the clock-table is as follows:

    #+begin: clocktable :maxlevel 3 :scope subtree :indent nil :emphasize nil :scope file :narrow 75
    

    This seems to generate a table that is largely like the ones we had prior to upgrading.

    Other than that, Org-mode has behaved - but then again, I'm not exactly a poweruser.

  • Bongo

    I use the amazing Bongo media player to play the few internet radio stations I listen to - mainly SomaFM, to be honest. Its good to see it in Melpa. It's still not quite as straightforward as you'd like to save a playlist - I always find that loading the buffer itself does not trigger bongo mode for some reason - but other than that, it works fine.

    On the downside, I use the venerable mpg123 to play random albums and that hasn't made it to Melpa yet. I've decided to try to use Bongo for this use case too, but if that doesn't work out then I'll have to add it to vendor…

  • Shell

    Prelude comes with eshell configured by default. I must confess I have always been a bash user - simple and easy. I'll persevere with eshell for a couple of days, but I can already see that this may be a bridge too far.

  • Flycheck

    One of the main reasons that made me consider moving to prelude was Flymake. I added it to cunene fairly early on, some 6 years ago, and I was amazed at how I had managed to use Emacs for over a decade without using Flymake. However, after a good 6 years of intensive usage, I can attest that Flymake is showing its age. The main problem is how it locks up Emacs whilst updating. If you combine that with the insane errors one gets in C++, all you need is an angle-bracket out of place and your coding flow is disrupted for potentially several minutes. To be fair, this happens very infrequently, but its still a major nuisance. So I was keen to explore Flycheck.

    All I can say is: wow! The same feeling of amazement I felt for Flymake when I first used has been repeated with Flycheck. Not only its blazingly fast, it supports multiple checkers and the errors buffer is a dream to work with. And with the Compilation Database integration it means there is no configuration required. I can't believe I survived this long without Flycheck!

  • Magit

    One of my favourite modes in Emacs - at least of the new generation of modes - is Magit. So much so that I find that I rarely use git from anywhere else, it's just so easy to do it from Magit. Which makes me extremely sensitive to any changes to Magit's interface.

    The version in Prelude - presumably from Melpa - is a tad different from the legacy one I was using in Cunene. On the plus side, most of the changes are improvements such as having a "running history" in the git process buffer, with font-lock support. The main Magit buffer also looks very nice, with lots of little usability touches. A tiny few changes did result in slow-downs of my workflow, such as a sub-menu on commit. Its not ideal but presumably one will get used to it.

    The only negative change seems to be that Magit is not quite as responsive as it used to be. Hard to put a finger yet, but I was used to having pretty much zero wait time on all operations in Magit, and yet now it seems that a few things are no longer instantaneous. It will require some more analysis to properly point the finger, but its a general feel.

Conclusions

It's still early days, but the move to Emacs 24 and Prelude is already paying off. The transition has not been entirely straightforward, and it certainly has slowed things down for the moment - if not for anything else, just due to the keybinding changes! But one can already see that this is the future for most Emacs users, particularly those that are not power-users like myself but just like the editor.

The future is certainly bright for Emacs. And we haven't yet started covering the latest and greatest modes such as smart-mode-line. But that's a story for another blog post.

Created: 2015-05-27 Wed 00:18

Emacs 24.4.1 (Org mode 8.2.10)

Validate

Thursday, September 25, 2014

Nerd Food: Start-ups at the Gate: Trends in the Technology Industry

Nerd Food: Start-ups at the Gate: Trends in the Technology Industry

It is very difficult to convey the vast scale at which the largest Internet companies operate. To make matters worse, we are fast becoming immune to statistics such as one billion users and five trillion searches per day, surrounded as we are by a sea of large numbers on a daily basis. Having said that, any Information Technology (IT) professional worth his or her salt cannot help but feel in awe at what has been achieved. It is not just that these platforms are big; they work at a scale that is qualitatively different from anything that has come before. The sort of things that are possible at this scale are mind-boggling, and we have only begun to scratch the surface1.

Perhaps even more revolutionary is the fact that these companies have made it possible for anyone to start thinking about data in the same way as they do, and to start handling it using the very same tools they use. There is now a never-ending archive of the very best large-scalability tools, all available for free, with code that anyone can inspect, modify and optimise to meet their specific requirements. The tools come with a wealth of practical documentation on how to put solutions together - either freely available or at low-cost - and with a number of passionate user communities that provide expert advice and are eager to accept modifications.

The ecosystem they have created is truly staggering. As an example, Facebook has open sourced almost 10M lines of code to date. Twitter, Google and LinkedIn are not far behind2. It is also important to notice that non-Internet companies are making extremely large contributions too, such as Microsoft and IBM. All told, the overall pool of open source code is growing exponentially, as demonstrated by a 2008 study. In most cases, these are full-fledged products, tested in the most challenging production conditions imaginable. Of course, one must also not forget the contributions made to projects that are not under company control such as the Linux Kernel, the Apache web-server and the GNU Compiler GCC.

In order to understand why modern start-ups provide such a compelling financial case, one must first understand how we got to the amazing technology landscape we have today. To do so, we shall divide recent technology history into eras, and explain each era's contribution. We will then focus on modern start-ups, and explain how this model can be deployed to a large gamut of industries and in particular to the financial sector.

First Era: Dot-com Bubble

Silicon Valley was and still is the world's start-up factory so, unsurprisingly, it was ground zero for the start-up revolution that took place at the end of the nineties. It would eventually be known as The Dot-com Bubble. Most people remember those days as a heady time, where each and every idea was packaged as a website and sold for millions or in some cases billions of dollars. Of course, we all had a steep price to pay when the bubble burst - an extinction event that decimated the young Internet sector and IT companies in general.

There is however another way to look at this bubble: it was a gigantic experiment to determine whether there were successful business models to be found in the large scale of the Internet. Whilst much mal-investment occurred, the bubble still produced or pushed forward several of the giants of today such as Google, Amazon and Yahoo.

Most of these companies share a similar technology story. Originally faced with a dearth of investment but with bright young engineers, they found themselves relying on Free and Open Source Software (FOSS) and cheap, off-the shelf hardware. Once they became big enough, it just didn't make sense to replace all of that infrastructure with software and hardware supplied by commercial vendors.

This turn of events was crucial. If these companies had had larger budgets and less skilled engineers, they would have relied on the cutting edge technology of the time. The short-term gain would reveal itself as long term pain, for their ability to scale would be inevitably restricted. In addition, many of the business models wouldn't have worked due to this cost structure3. As it was, since they couldn't even afford the relatively cheap licences of commercial software, they had to make do with what was available for free.

The engineers in these companies - and many others that didn't make it through the dot-com filter - spent countless hours improving FOSS tools and gave back much of these improvements to communities such as Linux, MySQL, Apache, GCC and so on. However, they kept private the plumbing work done to manage the large cluster of cheap machines, as well as the domain related technology - in industry-speak, the Secret Sauce.

By the time the dot-com bubble had run its course and the dust settled, the landscape looked as follows:

  • A model had been created whereby a small number of engineers could bootstrap an Internet-based company at very low cost, serving a small number of users initially.
  • The model had been stretched to very large numbers of users and had been found to scale extremely well; as the business proved itself and investment came in, it was possible increase the size of the computing infrastructure to cope with demand.
  • Because of the open nature of the technologies involved, the ideas became widespread over the internet.

The basic high-scalability FOSS stack - ready for start-ups - was born; the Data Centre, where large amounts of computing are available at low cost, soon followed. It would eventually morph into the Cloud.

Second Era: Social Media

The bursting of the dot-com bubble did not dampen the entrepreneurial spirits, but it did dry up all the easily available capital and thus pushed the aspiring start-ups to be ever more frugal. In addition, VCs started to look for better ways to evaluate prospects. The problem they faced was no different from what they had faced during the dot-com days: how to figure out the potential of a company with no defined business model and nothing else to compare it against.

Google had proved comprehensively that the traditional valuation methods did not make sense in the world of start-ups. After all, here was a company which it's founders couldn't sell for 1M USD and yet a few years later was generating billions of dollars in revenues. Very few saw this coming. VCs were keen not to make the same mistake with the next Google4.

So it was that a system to determine potential by proxy emerged over the years, using indicators such as the size of the user base, time spent by users on the platform and so on - effectively, any attribute that was deemed to have given a competitive advantage to Google and other successful dot-com companies.

In this environment, social media start-ups took took centre stage. Following on from the examples of their predecessors, these companies took for granted that they were to operate on very large data sets. They inherited a very good set of scalable tools, but found that much still had to be built on top. Unlike their predecessors, many chose to do some or all of the infrastructure work out in the open, joining or creating new communities around the tools. This was in no small part due to the scarcity of funds, which encouraged collaboration.

The social media start-ups soon found themselves locked in an arms race for size, where the biggest would be the winner and all others would be doomed to irrelevance5. The size of the user base of the successful companies exploded6, and the tooling required to manage such incredibly large volumes of data had to improve at the same pace or faster. Interestingly, these start-ups continued to view in-house code largely as a cost, not an asset, even after they started to bring in large revenue. The size of the secret sauce was to be kept at a minimum and the pace of open sourcing accelerated over time7.

A final factor was the rise of the next iteration of the data centre, popularised by Amazon with AWS and EC2. It allowed any company to scale out without ever having to concern themselves with physical hardware. This was revolutionary because it allowed razor-thin costs for scalability:

  • Pay only for what you use: the elastic nature of EC2 meant that one could grow or shrink one's cluster based on real time traffic demands and availability of capital.
  • Zero-cost software: FOSS was available in Amazon from the very beginning and was extremely popular with start-ups.
  • Fully automated environments via APIs: resource constrained start-ups could now start to automate all aspects of the product life-cycle. This meant they could release faster, which in turn allowed them to fight more effectively for their user base. This would in time become the DevOps movement.

By the end of the decade, the scalability tooling was largely complete. It was now possible for a small start-up to create a small website and to see it scale from hundreds to millions, restricted only by their ability to bring in capital.

Third Era: Mobile

Mobile phones have been growing close to an exponential rate for over two decades. However, the rise of the smart phones was a game changer, and the line in the sand was drawn with the release of the iPhone. What makes mobile so important to our story is it's penetration. Until smart phones became ubiquitous, there was a large segment of the population that was either totally inaccessible or accessible for limited periods of time. With increasingly large numbers of people carrying smart phones as they go about their day, many use cases that were never before thought possible came to the table. So whilst we call this "the Mobile era", the true heroes are smart phones and, to a smaller extent, the tablets.

The mobile era started with simple apps. Smart phones were still new and applications for each platform were novelty. There was a need to reinvent all that existed before in the world of PCs and adapt it to the new form factor. It was during this phase that the economies of scale of mobile phones became obvious. Whereas consumer PC software had prices on the range of tens to hundreds of dollars, mobile phones bootstrapped a completely different pricing model, with many apps selling for less than one dollar. Volume made up for the loss in revenue per unit. The model was so incredibly successful that a vibrant environment of apps sprung up around each of the successful platforms, carefully nurtured by the companies running the show via their app stores.

Soon enough the more complex apps came about. Companies like Four Square and WhatsApp were trailblazers in the mobile space, merging it with ideas from social media. Many others like Spotify took their wares from the stagnant PC environment and moved to the ever growing mobile space. Complex apps differed from the simple apps in that they required large backends to manage operations. Since these companies were cash strapped - a perennial condition of all start-ups - they found themselves reusing all of the technology developed by the social media companies and became part of the exact same landscape. Of course, the social media companies were eventually forced to jump on the mobile bandwagon - lest they got crushed by it.

So it was that the circle was closed between the three eras.

Evolutionary Pressures and Auto-Catalytic Processes

The changes just described are so revolutionary that one cannot help but look for models to approximate some kind of explanation for what took place. Two stand out. The first is to imagine the population of start-up companies as a small segment of the overall company population that was submitted to an unbelievably harsh fitness function: to grow the data volumes exponentially while growing costs less than linearly. This filter generated new kinds of companies, new kinds of technologies and new kinds of ways of managing technology.

Secondly, there is the auto-catalytic nature of the processes that shaped the current technology landscape. Exponential growth tends to have at its root this kind of self-reinforcing cycle, whereby improvements in an area A trigger improvements in another area B, which in turn forces A to improve. The process keeps on repeating itself whilst it manages to retain stability.

It is this relationship we currently have between start-ups and FOSS: the better the software gets, the cheaper it is to create new start-ups and the faster these can grow with the same amount of capital. By the same token, the more start-ups rely on FOSS, the more they find themselves contributing back or else risk falling behind - both technologically and cost-wise. This feedback loop is an emerging property of the entire system and it has become extremely pronounced over time.

Finance and the Age of Disruption

The concept of disruption was developed in the nineties by Clayton Christensen in Innovator's Dilemma. This book has seen a resurgence in popularity as well as in criticism8. For good or bad, the ideas in this book became the intellectual underpinnings of a new generation of start-ups.

They seek to combine all of the advances of the previous start-ups to create solutions to problems far outside the traditional IT realm. Examples are the hotel industry (AirBnB), the taxi industry (Uber, Lyft) and even the banking industry (Simple). Whilst it's still early days, and whilst there have been many teething problems such as issues with regulation, the destination of travel is already clear: there will be more and more start-ups following the disruptive route.

What makes these companies a compelling proposition to VCs is that they are willing to take on established concerns, with cost structures that are orders of magnitude larger than that of these start-ups. Their thinking is two-fold: the established companies are leaving a lot of money on the table, consumed by their inefficiency; and they are not exploiting the opportunities to their full potential because they do not understand how to operate at a vast scale.

It is in this context that finance scene comes into the picture - as part of the expansionary movement of the disruption movement. VCs have longed eyed enviously the financial industry because they believed that the problems being solved in trading are not that dissimilar to those faced by many large scale start-ups. And yet the rewards are disproportional large in Finance, when compared with say social media.

Fintech soon emerged. As applied to start-ups, Fintech is the umbrella name given to the ecosystem of start-ups and VCs that focus specifically on financial technology. This ecosystem has grown from 930M USD in 2008 to around 3Bn in 2013 according to Accenture. Centred mainly in London, but with smaller offshoots in other financial centres, the Fintech scene is starting to attract established players in the world of Finance. For instance, Barclays has joined the fray by creating an incubator. They farmed off the work to a third-party (Tech Stars) but allowed all the start-ups in the programme to have unprecedented access to their Mobile APIs. Their target is to own the next generation of financial applications on Mobile devices.

Whist Barclays is disrupting from the outside, it is obvious that the investment banking legacy platforms are a fertile ground for start-ups. This is where the scalability stack has a near-perfect fit. A typical example is OpenGamma. The start-up designed an open source Risk platform, initially focused on back office use. They have received over 20M USD in funding as of 2014 and have already been the recipient of several of the industry's awards. There are now several open source trading platforms to choose from including TradeLink and OpenGamma, as well as the popular quantitative analytics library QuantLib.

As we have seen in the previous sections, there is an auto-catalytic process at play here. Once source code becomes widely available, the cost of creating the next Financial startup goes down dramatically because they can reuse the tools. This in turn means many more start-ups will emerge, thus improving the general quality of the publicly available source code.

Conclusions

The objective of this article was to provide a quick survey of the impact of start-up companies in the technology landscape, and how these relate to finance. We now turn our attention to the logical conclusions of these developments.

  • Finance will increasingly be the target of VCs and start-ups: The Fintech expansion is to continue over the coming years and it will affect everyone involved in the industry, particularly the established participants. More companies will take the route of Barclays, trying to be part of the revolution rather than dethroned by it.
  • Banks and other established companies will begin to acquire start-ups: Related to the previous item in some ways; but also with a twist. As part of the Deloitte TMT predictions event, Greg Rogers - the manager of Barclays Accelerator - stated that the acquisition of non-financial start-ups by banks was on the cards. He was speaking about Facebook's acquisition of WhatsApp for 18Bn USD, one of the largest of the year. As Google and Facebook begin integrating payments into their social platforms, banking firms will find their traditional business models under attack and will have no option but to retaliate.
  • Finance will turn increasingly to FOSS: The cost structure that finance firms had up to 2008 is not suitable to the post 2008 world. At present, the volume of regulatory work is allowing these cost structures to persist (and in cases increase). However, eventually banks will have to face reality and dramatically reduce their costs, in line with the new kind of revenues they are expected to make in a highly-regulated financial world. There will be a dramatic shift away from proprietary technologies of traditional vendors, unless these become much more competitive against their fierce FOSS rivals.
  • A FOSS financial stack will emerge over the next five years: Directly related to the previous point, but taking it further. Just as it was with social media companies, so it seems likely that financial firms will eventually realise that they cannot afford to maintain all the infrastructure code. Once an investment bank takes the leap and starts relying on FOSS for trading or back-office, the change will ripple through the industry. The state of the FOSS code is production ready, and a number of hedge funds are already using it in anger. All that is required is for the cost structure to be squeezed even further in the investment banking sector.

Footnotes:

1 As one of many examples, see Google Flu Trends. It is a predictor of outbreaks of the flu virus, with a prediction rate of about 97%. For a more comprehensive - if somewhat popular - take on the possibilities of large data sets, see Big Data: A Revolution That Will Transform How We Live, Work and Think. For a very different take - highliting the dangers of Big Data - see Taleb's views on the ever decreasing noise to signal ratio: The Noise Bottleneck or How Noise Explodes Faster than Data.

2 In fact, by some measures, Google has contributed several times that amount. For one such take, see Lauren Orsini's article.

3 As an example, it was common practice for vendors to charge according to the number of processors, users and so on. Many of the better funded start-ups made use of technology from Cisco, Sun, Oracle and other large commercial vendors, but companies that did so are not very well represented in the population that survived the dot-com bubble, and they are not represented at all in the 2014 Fortune 500 list. Google, Amazon and E-Bay are the only Fortune 500 companies from that crop and they all relied to a very large extent on in-house technology. Note though that we are making an empirical argument here rather than a statistical one, both due to the lack of data available, as well as concern for Survivorship Bias.

4 For one of many takes on the attempt to sell Google, see When Google Wanted To Sell To Excite For Under 1 Million~— And They Passed. To get a flavour of how poorly understood Google's future was as late as 2000, see Google Senses That It's Time to Grow Up. Finally, the success story is best told by the growth of revenues between 2001 and 2003 - see Google's 2003 Financial Tables.

5 Twitter, Facebook, YouTube, LinkedIn and the like were the victors, but for every victor, a worthwhile foe was defeated; MySpace, Hi5, Orkut and many others were all very popular at one time but lost the war and faded into obscurity.

6 As an example, the number of Facebook users grew at an exponential rate between 2004 and 2013 - see Facebook: 10 years of social networking, in numbers.

7 A possible explanation for this decision is the need for continuous scalability. Even companies as large as Facebook or Google cannot dedicate the resources required to adequately maintain every single tool they own; their code bases are just too large. At the same time, they cannot afford for code to become stale because it must continually withstand brutal scalability challenges. The solution to this conundrum was to open source aggressively and to create vibrant communities around tooling. Converting themselves to stewards of the tools, they could now place quasi-skeleton crews to give direction to development, and then rely on the swarms of new start-ups to contribute patches. Once there are enough improvements, the latest version of these tools can be incorporated into the internal infrastructure. This proved to be a very cost-effective strategy, even for large companies, and allowed continued investment across the technology stack.

8 There are quite a few to choose from but Lepore's is one of the best because it robustly attacks both the ideology and the quality of the data.

Date: 2014-09-25 21:37:47 BST

Org version 7.8.02 with Emacs version 23

Validate XHTML 1.0