Friday, June 17, 2016

Nerd Food: Interesting...

Nerd Food: Interesting…

Time to flush all those tabs again. Some interesting stuff I bumped into recently-ish.

Finance, Economics, Politics

  • Understanding Growth, part 1: looks very promising although I've only started parsing it. Also pointed me to - Tomas Sedlacek and the Economics of Good and Evil. Bought the book, but still reading it. Seems very thoughtful.
  • Here’s How Electric Cars Will Cause the Next Oil Crisis: Extremely interesting take on the relationship between electric cars and the oil price. Its along the lines of articles posted in the past, to be fair, but still. Basically, it won't take a huge number of sales of electric cars to start knocking down the oil price. And with Model 3 coming out, this all seems quite ominous to the oil producing countries. Here we go again, Angola.
  • Red Hat becomes first $2b open-source company: I may not use their wares any more but RedHat will always be one of my favourite companies. Really happy to see they are growing nicely and hopefully continuing all of their incredible investment on Linux.
  • The Amazon Tax: Really, really good article about Amazon and their strategy. If you read only one, read this. Amazon is amazing - and its dominance is very worrying because they are so good at executing! See also Bezos letter.
  • It’s a Tesla: Great article about Tesla. Some of the usual Fanboyism we all know and love, of course, but still a lot of very good points. The core of the article is a interesting comparison between Tesla and Apple. By the by, not at all convinced about that dashboard and the launch ceremony itself was a bit sparse too! But, Model 3 looks great. I'm officially a Stratechery fanboy now.
  • Google’s Alphabet Transition Has Been Tougher Than A-B-C: Great article on the pains of moving to a single monolithic structure to something more distributed. In truth, what would one expect with such a seismic change? And, also, how come it took Google so long to make this shift? After all, programmers are supposedly taught how important separation of concerns is. The other very interesting point is the CED difficulties. These guys were able founders (at least able enough to get bought out by Google) but seem to fail badly at the CEO'ing malarky.

Startups et al.

General Coding

  • Water treatment plant hacked, chemical mix changed for tap supplies: this is a tad worrying. Can you imagine the amount of systems out there with vulnerabilities, etc - many of which are connected to the internet.
  • On the Impending Crypto Monoculture: Talking about security, very worrying news from the crypto front. It seems our foundations are much less solid than expected - and after all the OpenSSL bugs, this is a surprising statement indeed. Very interesting email on the subject. The LWN article is a must read too.
  • Neural Networks Demystified - Part 1: Data and Architecture: just started browsing this in my spare time, but it looks very promising. For the layperson.
  • Microsoft deletes 'teen girl' AI after it became a Hitler-loving sex robot within 24 hours: friggin' hilarious in a funny-not-funny sort of way. This tweet said it best: "Tay" went from "humans are super cool" to full nazi in <24 hrs and I'm not at all concerned about the future of AI. – Gerry
  • Abandoning Gitflow and GitHub in favour of Gerrit: I've always wanted to know more about Gerrit but never seem to find the time. The article explains it to my required extent, contrasting it with the model I'm more familiar with - GitHub, forks and pull requests. I must say, still not convinced about Gerrit, but having said that, it seems there is definitely scope for some kind of hybrid between the two. A lot of the issues they mention in the article are definitely pain points for GitHub users.
  • Introducing DGit: OK this one is a puzzling post, from our friends at GitHub engineering. I'm not sure I get it at all, but seems amazing. Basically, they talk about all the hard work they've made to make git distributed. Fine, I'm jesting - but not totally. The part that leaves no doubts is that GitHub as a whole is a lot more reliable after this work and can handle a lot more traffic - without increasing its hardware requirements. Amazing stuff.

Databases

C++

  • Compiler Bugs Found When Porting Chromium to VC++ 2015: great tales form the frontline. Also good to hear that MS is really responsive to bug reports. Can't wait to be able to build my C++ 14 code on Windows…
  • EasyLambda: C++ 14 library for data processing. Based on MPI though. Still, seems like an interesting find.

Layperson Science

Other

Created: 2016-06-17 Fri 10:56

Emacs 24.5.1 (Org mode 8.2.10)

Validate

Thursday, June 16, 2016

Nerd Food: The Strange Case of the Undefined References

Nerd Food: The Strange Case of the Undefined References

As a kid, I loved reading Sherlock Holmes and Poirot novels. Each book got me completely spellbound, totally immersed and pretty much unable to do anything else until I finally found out whodunnit. Somehow, the culprits were never the characters I suspected of. Debugging and troubleshooting difficult software engineering problems is a lot like the plot of a crime novel: in both cases you are trying to form a mental picture of something that happened, with very incomplete information - the clues; in both cases, experience and attention to detail is crucial, with many a wrong path taken before the final eureka moment; and, in both cases too, there is this overwhelming sense of urgency in figuring out whodunnit. Of course, unlike a crime novel, we'd all prefer not having to deal with these kinds of "interesting" issues, but you don't choose the problems - they choose you.

I recently had to deal with one such problem, which annoyed me to no end until I finally fixed it. It was so annoying I decided it was worth blogging about - if nothing else, it may save other people from the same level of pain and misery.

A bit of context for those that are new here. Dogen is a pet project that I've been maintaining for a few years now. Like many other C++ projects, it relies on the foundational Boost libraries. To be fair, we rely on other stuff as well - libraries such as LibXML2 and so on - but Boost is our core C++ dependency and the only one where latest is greatest, so it tends to cause us the most problems. I've covered my past woes in terms of dependency management and how happy I was to find Conan. And so it was that life was bliss for a number of builds, until one day…

It All Started With a Warning

It was a rainy day and I must have been bored because I noticed a rather innocuous-looking warning on my Travis build, related to Conan:

CMake Warning (dev) in build/output/conanbuildinfo.cmake:
  Syntax Warning in cmake code at
    /home/travis/build/DomainDrivenConsulting/dogen/build/output/conanbuildinfo.cmake:142:88
  Argument not separated from preceding token by whitespace.
Call Stack (most recent call first):
  CMakeLists.txt:30 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

Little did I know that this simple discovery would lead to a sequence of troublesome events and to many a broken build. I decided to report the problem to the Conan developers who, with their usual promptness, rolled up their sleeves, quickly bounced ideas back and forth and then did a sterling job in spinning fixes until we got to the bottom of the issue. Some of the fixes were to Conan itself, whereas some others were related to rebuilding Boost. In the heat of the investigation, I bumped into some very troubling - and apparently unrelated - linking errors:

/home/travis/.conan/data/Boost/1.60.0/lasote/stable/package/ebdc9c0c0164b54c29125127c75297f6607946c5/lib/libboost_log.so: undefined reference to `std::invalid_argument::invalid_argument(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)@GLIBCXX_3.4.21'
/home/travis/.conan/data/Boost/1.60.0/lasote/stable/package/ebdc9c0c0164b54c29125127c75297f6607946c5/lib/libboost_log.so: undefined reference to `std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::find(char const*, unsigned long, unsigned long) const@GLIBCXX_3.4.21'

The build was littered with errors such as these. But the most puzzling thing was that I had changed nothing of consequence on my side and the Conan guys changed very little at their end too! What on earth was going on?

After quite a lot of thinking, Conan's memsharded came up a startling conclusion: we've been hit by one of those rare-but-dreadful ABI-transitions! His comment is worth reading in full, but the crux of his findings is as follows (copied verbatim):

  • Boost packages, generated with travis use docker to manage different versions of gcc, as gcc 5.2 or gcc 5.3
  • Those docker images are using modern linux distros, e.g. > Ubuntu 15.10
  • By default, new modern linux distros have switched to the gcc > 5.1 new C++11 ABI, that is libstdc++ is built with gcc > 5.1, usually named libcxx11, as well as the rest of the system. The libcxx11 ABI is incompatible with the old gcc < 5.1 libcxx98 ABI.
  • Building in such environment links with the new libcxx11 by default.
  • Now, we move to our user, package consumer environment, which could be an Ubuntu 14.04, or a travis VM (12.04). Those distros use a libcxx98 libstdc++, as a lot of programs of those distros depends on the old libcxx98 ABI. It is not simple to replace it for the new one, requiring to rebuild or reinstall large part of the system and applications. Maybe it could be installed for dev only, and specified in the build, but I have not been able yet.

Reading the above may have given you that sad, sinking feeling: "what on earth is he on about, I just want to compile my code!", "Why oh why is C++ so damn complicated!" and so forth. So, for the benefit of those not in the know, let me try to provide the required background to fully grok memsharded's comment.

What's this ABI Malarkey Again?

This topic may sound oddly familiar to the faithful reader of Nerd Food and with good reason: we did cover ABIs in the distant past, at a slightly lower level. The post in question was On MinGW, Cygwin and Wine and it does provide some useful context to this discussion, but, if you want a TL;DR, it basically dealt with kernel space and user space and with things such as the C library. This time round we will turn our attention to the C++ Standard Library.

In addition to specifying the C++ language, the C++ Standard also defines the API of the C++ Standard Library - the classes and their methods, the functions and so on. The C++ Standard Library is responsible for providing a set of services for applications compiled with a C++ compiler. So far, so similar to the C Standard Library. Where things begin to differ is in the crucial matter of the ABI. But first, lets get a working definition for ABI, just so we are all on the same page. For this, we can do worse than using Linux System Programming:

Whereas an API defines a source interface, an ABI defines the low-level binary interface between two or more pieces of software on a particular architecture. It defines how an application interacts with itself, how an application interacts with the kernel, and how an application interacts with libraries. An ABI ensures binary compatibility, guaranteeing that a piece of object code will function on any system with the same ABI, without requiring recompilation.

ABIs are concerned with issues such as calling conventions, byte ordering, register use, system call invocation, linking, library behavior, and the binary object format. The calling convention, for example, defines how functions are invoked, how arguments are passed to functions, which registers are preserved and which are mangled, and how the caller retrieves the return value.

The second paragraph is especially crucial. You see, although both the C and the C++ Standards are somewhat silent on the matter of specifying an ABI, C tends to have a de facto standard for a given OS on a given architecture. This may not sound like much and you may be saying: "what, wait: the same OS on a different architecture has a different ABI?" Yep, that is indeed the case. If you think about it, it makes perfect sense; after all, C was carefully designed to be equivalent to "portable assembler"; in order to achieve maximum performance, one must not create artificial layers of indirection on top of the hardware but instead expose it as is. So, by the same token, two different C compilers working on the same architecture and OS will tend to agree on the ABI. The reason why is because the OS will also follow the hardware where it must, for performance reasons; and where the OS can make choices, it more or less makes the choice for everybody else. For example, until recently, if you were on Windows, it did you no good to compile code into an ELF binary because the law of the land was PE. Things have now changed dramatically, but the general point remains: the OS and the hardware rule.

C++ inherits much of C's approach to efficiency, so at first blush you may be fooled into thinking it too would have a de facto ABI standard ("for a given OS, " etc. etc.). However, there are a few crucial differences that have grave consequences. Let me point out a few:

  • C++'s support for genericity - such as function overloading, templates, etc - is implemented by using name mangling; however, each compiler tends to have their own mangling scheme.
  • implementation details such as the memory layout of objects in the C++ Standard Library - in particular, as we shall see, std::string - are important.

In the past, compiler vendors tended exacerbate differences such as these; as it was with the UNIX wars, so too during the "C++ wars" did it make sense to be as incompatible as possible in the never ending hunt for monetisation. Thus, ABI specifications were kept internal and were closely guarded secrets. But since then the world has changed. To a large extent, C++ lost the huge amounts of funding it once had during the nineties and part of the naughties, and many vendors either went under or greatly reduced their efforts in this space. Two compilers emerged as victors: MSVC on the Windows platform and - once the dust of the EGCS fork finally settled - GCC everywhere else. The excellent quality of GCC across a vast array of platforms and its strict standards adherence - coupled with a quick response to the standardisation efforts - resulted in total domination outside of Windows. So much so that only recently did it meet a true challenger in Clang. The brave new world in which we now find ourselves in is one where C++ ABI standardisation is a real possibility - see Defining a Portable C++ ABI.

But pray forgive the old hand, I digress again. The main point is that, for a given OS on a given architecture, you normally had to compile all your code with a single compiler; if you did that, you were good to go. Granted, GCC never made any official promises to keep its releases ABI-compatible, but in practice we came to rely on the fact that new and old releases interoperated just fine since the days of 3.x. And so did Clang, respecting GCC's ABI so carefully it made us think of them as one happy family. Then, C++-11 arrived.

Mixing and Matching

As described in GCC5 and the C++11 ABI, this pleasant state of affairs was too idyllic to last forever:

[…] [S]ome new complexity requirements in the C++11 standard require ABI changes to several standard library classes to satisfy, most notably to std::basic_string and std::list. And since std::basic_string is used widely, much of the standard library is affected.

On hindsight, the improvements in the std::string implementation are great; as a grasshopper, I recall spending hours on end debugging my code in the long forgotten days of EGGS 2.91, only to find out there was a weird bug in the COW implementation for my architecture. That was the first time - and as it happens, the last time too - I found a library bug, and it made a strong impression on me, at that young age. These people were not infallible.

These days I sit much higher up in the C++ stack. Like many, I didn't read that carefully the GCC 5 release notes when it came out, relying as usual on my distro to do the right thing. And, as usual, the distros largely did, even though, unbeknown to many, a stir was happening in their world 1. But hey, who reads distro blogs, right? Hidden comfortably under my Debian Testing lean-to, I was blissfully unaware of this transition since my code continued to compile just fine. Also, where things start to get hairy is when you need to mix and match compiler versions and build settings - and who on their right mind does that, right?

As it happens, this is a situation in which modern C++ users of Travis may easily find themselves in, stuck as they are on either on Ubuntu 12.04 (2012) or Ubuntu 14.04 (2014). Nick Sarten's blog post rams the point home in inimitable fashion:

Hold on, did I say GCC 4.6? Clang 3.4? WHAT YEAR IS IT?

Yes, what year is it indeed. So it is that most of us rely on PPA's to bring the C++ environment on Travis up to date, such as the Ubuntu Toolchain:

sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test

This always seemed like an innocent thing to do but after my linking errors and memsharded discoveries, one suddenly started to question everything: what settings did the PPA use to build? What settings were used to build the Boost Conan packages? With what compiler? In what distro? The nightmare was endless. It was clear this was going to lead to tears before bedtime.

The Long Road to a Solution

Whilst memsharded honed into the problem pretty quickly - less than a couple of weeks - a complete solution to my woes was a lot more elusive. In truth, this is the kind of situation where you need long spells of concentrated effort, so working in your copious spare time does not help at all. I first tried the easiest approach: to pray that it would all go away by itself, given enough time. And, lo and behold, things did work again, for a little while! And then started to fail again; the Boost package in Conan got rebuilt and the build broke. And that way it stayed.

Once waiting was no longer an option, I had to take it seriously and started investigating in earnest. Trouble is, when you lose trust in the compilation settings you then need to methodically validate absolutely everything, until you bottom out the problem. And that takes time. Many things were tried, including:

  • rebuilding Boost locally, attempting to reproduce the issue - to no avail.
  • rebuilding the Conan Boost packages with the old ABI; a fail (#12).
  • reading up a variety of articles on the subject, most of them linked in this post.
  • building the Boost packages locally and exporting them into Travis using DropBox's public folders. Another fail, but DropBox was a win.
  • obtaining the exact same Ubuntu 14.04 image as Travis is using, use the compiler from the PPA and export Boost to Travis using DropBox and replicating the problem locally in a VM. This worked.

Predictably, the final step is the one I should have tried first, but one is always lazy. Still, all of this got me wondering why had things been so complicated. Normally one would be able to ldd or nm -C the binary and figure out the dependencies, but in this case I seemed to always be pointing to libstdc++.so.6 regardless. Most puzzling. And then I found the Debian wiki page on GCC5, which states:

The good news is, that GCC 5 now provides a stable libcxx11 ABI, and stable support for C++11 (GCC version before 5 called this supported experimental). This required some changes in the libstdc++ ABI, and now libstdc++6 provides a dual ABI, the classic libcxx98 ABI, and the new libcxx11 (GCC 5 (<< 5.1.1-20) only provides the classic libcxx98 ABI). The bad news is that the (experimental) C++11 support in the classic libcxx98 ABI and the new stable libcxx11 ABIs are not compatible, and upstream doesn't provide an upgrade path except for rebuilding. Note that even in the past there were incompatibilities between g++ versions, but not as fundamental ones as found in the g++-5 update to stable C++11 support.

Using different libstdc++ ABIs in the same object or in the same library is allowed, as long as you don't try to pass std::list to something expecting std::__cxx11::list or vice versa. We should rebuild everything with g++-5 (once it is the default). Using g++-4.9 as a fallback won't be possible in many cases.

libstdc++ (>= 5.1.1-20) doesn't change the soname, provides a dual ABI. Existing C++98 binary packages will continue to work. Building these packages using g++-5 is expected to work after build failures are fixed.

The crux is, of course, all the stuff about a dual ABI. I had never bumped into the dual ABI beast before, and now that I did I'm not sure I am entirely pleased. It's probably great when it just works, but it's tricky to troubleshoot when it doesn't: are you linking against a libstdc++ with dual ABI disabled/unsupported? Or is it some other error you've introduced? Personally, having a completely different SO name like memsharded had suggested seems like a less surprising approach - e.g. call it libcxx11 instead of libstdc++. But, as always, one has to play with the cards that were dealt so there is no point in complaining.

Conclusion

The Ubuntu 14.04 build of Boost did get us a green build again, but for all the joyous celebrations, there is still a grey cloud hovering above since the mop-up exercise is not completed. I now need to figure out how to build Boost with Conan on 14.04 and upload this version into the package manager's repo. However, for now carpe diem. After so much unproductive time, there is a real need for a few weeks (months!) of proper coding - the reason why I have a spare time project in the first place. But some lessons were learned.

Firstly, one cannot but feel truly annoyed at ${COSMIC_DEITY} for having to deal with issues such as this. After all, one of the reasons I prefer C++ to the languages I use at work (C# and Java) is that it is usually very transparent; normally I can very quickly reproduce, diagnose and fix a problem in my code. Of course, lord knows this statement is not true of all C++ code, but at least it tends to be valid for most Modern C++ - and over the last five years that's all the C++ I dealt with in anger. It was indeed rather irritating to find out that the pain has not yet been removed from the language, and on occasion, even experienced developers get bitten. Hard.

A second point worth of note is that in C++ - more so than in any other language - one cannot just blindly trust the package manager. There are just so many configuration knobs and buttons for that to be possible, and one can easily get bitten by assumptions. The sad truth is that even when using Conan, one should probably upload one's own packages built with a well understood configuration. True, this may cost time - but on the other hand, it will avoid wild goose chases such as this one.

Finally, its also important to note that this whole episode illustrates the sterling job that package maintainers do in distributions. Paradoxically, their work is often so good that we tend to be blissfully unaware of its importance. Articles such as Maintainers Matter take a heightened sense of urgency after an experience like this.

The road was narrow, long and troublesome. But, as with all Poirot novels, there is always that satisfying feeling of finally finding out whodunnit in the end.

Post Script

There is one final twist to this story, which adds insult to injury and further illustrates ${COSMIC_DEITY}'s sense of humour. When I finally attempted to restore our clang builds, I found out that LLVM has disabled their APT repo for an unspecified length of time:

> TL;DR: APT repo switched off due to excessive load / traffic

There are no alternatives at present to build with a recent clang. Sometimes one has the feeling that the universe does not want to play ball. Stiff upper lip and all that; mustn't grumble.

Footnotes:

1

For example, see The Case of GCC-5.1 and the Two C++ ABIs to understand Arch's pains.

Created: 2016-06-16 Thu 14:12

Emacs 24.5.1 (Org mode 8.2.10)

Validate