Marco Craveiro: 02/01/2016

Monday, February 08, 2016

Nerd Food: Interesting...

Nerd Food: Northwind, or Using Dogen with ODB - Part II

On Part I of this series, we got our Oracle Express database up and running against Debian Testing. It involved quite a bit of fiddling but we seemed to get there in the end. In Part II we shall now finish the configuration of the Oracle database and set up the application dependencies. On Part III we will finally get to the Dogen model, and start to make use of ODB.

What's in a Schema?

The first thing we need to do to our database is add the "application users". This is a common approach to most server side apps, where we tend to have "service users" that login to the database and act upon user requests on their behalf. We can then use audit tables to stamp the user actions so we can monitor them. We can also have application level permissions that stop users from doing silly things. This is of course a step up from the applications in the nineties, where one would have one database account for each user - allowing all sorts of weird and wonderful things such as users connecting directly to databases via ODBC and Excel or Access. I guess nowadays developers don't even know someone thought this to be a good idea at one point.

When I say "database user", most developers exposed to RDBMs immediately associate this to a user account. This is of course how most databases work, but obviously not so with Oracle. In Oracle, "users" and "schemas" are conflated, so much so it's hard to tell if there is any difference between them. For the purist RDBM user, a schema is a schema - a collection of tables and other database objects, effectively a namespace - and a user is a user - a person (real or otherwise) that owns database objects. In Oracle these two more or less map to the same concept. So when you create a user, you have created a schema and you can start adding tables to it; and when you refer to database objects, you prefix them by the user name just as you would if they belonged to a schema. And, of course, you can have users that have no database objects for themselves, but which were granted permission to access database objects from other users.

So our first task is to create two schemas; these are required by the Dogen model which we will use as our "application". They are:

basic
northwind

As I mentioned before, I had created some fairly basic tests for ODB support in Dogen. Those entities were placed in the aptly named schema basic. I then decided to extend the schema with something a bit more meaty, which is where northwind comes in.

For the oldest readers, especially those with a Microsoft background, Northwind is bound to conjure memories. Many of us learned Microsoft Access at some point in the nineties, and in those days the samples were pure gold. I was lucky enough to learn about relational databases in my high-school days, using Clipper and dBASE IV, so the transition to Microsoft Access was more of an exercise in mapping than learning proper. And that's where Northwind came in. It was a "large" database, with forms and queries and tables and all sorts of weird and wonderful things; every time you needed something done to your database you'd check first to see how Northwind had done it.

Now that we are much older, of course, we can see the flaws of Northwind and even call for its abolition. But you must remember that in the nineties there was no Internet for most of us - even dial-up was pretty rare where I was - and up-to-date IT books were almost as scarce, so samples were like gold dust. So for all of these historic reasons and as an homage to my olden days, I decided to implement the Northwind schema in Dogen and ODB; it may not cover all corner cases, but it is certainly a step up on my previous basic tests.

Enough about history and motivations. Returning to our SQLPlus from Part I, where we were logged in as SYSTEM, we start first by creating a table space and then the users which will make use of that table space:

SQL> create tablespace tbs_01 datafile 'tbs_f01.dbf' size 200M online;

Tablespace created.

SQL> create user basic identified by "PASSWORD" default tablespace tbs_01 quota 100M on tbs_01;
User created.

SQL> create user northwind identified by "PASSWORD" default tablespace tbs_01 quota 100M on tbs_01;

User created.

Remember to replace PASSWORD with your own passwords. This is of course a very simple setup; in the real world you would have to take great care setting the users and table spaces up, including thinking about temporary table spaces and so forth. But for our simplistic purposes this suffices. Now we need to grant these users a couple of useful privileges - again, for a real setup, you'd need quite a bit more:

SQL> GRANT create session TO basic;
GRANT create session TO basic;

Grant succeeded.

SQL> GRANT create table TO basic;
GRANT create table TO basic;

Grant succeeded.

SQL> GRANT create session TO northwind;
GRANT create session TO northwind;

Grant succeeded.

SQL> GRANT create table TO northwind;
GRANT create table TO northwind;

Grant succeeded.

If all went well, we should now be able to exit the SYSTEM session, start a new one with one of these users, and play with a test table:

$ sqlplus northwind@XE

SQL*Plus: Release 11.2.0.2.0 Production on Fri Feb 24 10:20:10 2017

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Enter password:

Connected to:
Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production

SQL> create table test ( name varchar(10) );

Table created.

SQL> insert into test(name) values ('kianda');
insert into test(name) values ('kianda');

1 row created.

SQL> select * from test;

NAME
----------
kianda

SQL> grant select on test to basic;

Grant succeeded.

SQL> Disconnected from Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production
$ sqlplus basic@XE

SQL*Plus: Release 11.2.0.2.0 Production on Fri Feb 24 10:23:04 2017

Copyright (c) 1982, 2011, Oracle.  All rights reserved.

Enter password:

Connected to:
Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production

SQL> select * from northwind.test;

NAME
----------
kianda

This all looks quite promising. To recap, we logged in with user northwind, created a table, inserted some random data and selected it back; all looked ok. Then for good measure, we granted the rights to see this test table to user basic; logged in as that user and selected the test table, with the expected results.

At this point we consider our Oracle setup completed and we're ready to enter the application world.

Enter ODB

Setting up ODB is fairly easy, especially if you are on Debian: you can simply obtain it from apt-get or synaptic. The only slight snag is, I could not find the oracle dependencies (i.e. libodb-oracle). Likely this is because they depend on OCI, which is non-free, so Debian either does not bother to package it at all or you need some kind of special (non-free) repo for it. As it was, instead of losing myself on wild goose chases, I thought easier to build from source. And since I had to build one from source, might as well build all (or almost all) to demonstrate the whole process from scratch as it is pretty straightforward, really.

Before we proceed, one warning: when it comes to the libraries, best if you either use your package manager or build from source. You should probably only mix-and-match if you really know what you are doing; if you do and things get tangled up, it may take you a long while to figure out the source of your woes. Note also that this warning applies to the support libraries but not to the ODB compiler itself.

So, the manual approach. I first started by revisiting my previous notes on building ODB; as it happens, I had covered installing ODB from source previously here for version 2.2. However, those instructions have largely bit-rotted at the Dogen end and things have changed slightly since that post, so a revisit is worthwhile.

As usual, we start by grabbing all of the packages from the main ODB website:

odb 2.4.0-1 amd64.deb: the ODB compiler itself.
libodb-2.4.0: the main ODB library, required by all backends.
libodb-pgsql-2.4.0: the PostgreSQL backend. We don't need it today, of course, but since PostgreSQL is my DB of choice I always install it.
libodb-oracle-2.4.0: the Oracle backend. We will need this one.
libodb-boost-2.4.0: the ODB boost profile. This allows using boost types in your Dogen model and having ODB do the right thing in terms of ORM mapping. Our Northwind model does not use boost at present, but I intend to change it as soon as possible as this is a very important feature for customers.

Of course, if you are too lazy to click on links, just use wget:

$ mkdir odb
$ cd odb
$ wget http://www.codesynthesis.com/download/odb/2.4/odb_2.4.0-1_amd64.deb -O odb_2.4.0-1_amd64.deb
$ wget http://www.codesynthesis.com/download/odb/2.4/libodb-2.4.0.tar.gz -O libodb-2.4.0.tar.gz
$ wget http://www.codesynthesis.com/download/odb/2.4/libodb-pgsql-2.4.0.tar.gz -O libodb-pgsql-2.4.0.tar.gz
$ wget http://www.codesynthesis.com/download/odb/2.4/libodb-oracle-2.4.0.tar.gz -O libodb-oracle-2.4.0.tar.gz
$ wget http://www.codesynthesis.com/download/odb/2.4/libodb-boost-2.4.0.tar.gz -O libodb-boost-2.4.0.tar.gz

We start with the DEB, as simple as always:

# dpkg -i odb_2.4.0-1_amd64.deb
Selecting previously unselected package odb.
(Reading database ... 549841 files and directories currently installed.)
Preparing to unpack odb_2.4.0-1_amd64.deb ...
Unpacking odb (2.4.0-1) ...
Setting up odb (2.4.0-1) ...
Processing triggers for man-db (2.7.6.1-2) ...

I tend to store locally built software under my home directory, so that's where we'll place the libraries:

$ mkdir ~/local
$ tar -xaf libodb-2.4.0.tar.gz
$ cd libodb-2.4.0/
$ ./configure --prefix=/full/path/to/local
<snip>
make[1]: Leaving directory '/path/to/build/directory/odb/2.4/libodb-2.4.0'
$ make install
<snip>
make[1]: Leaving directory '/path/to/build/directory/odb/2.4/libodb-2.4.0'

Remember to replace /full/path/to/local with your installation directory. The process is similar for the other three packages, with one crucial difference: you need to ensure the environment variables are set to place all required dependencies in the include and link path. This is achieved via the venerable environment variables CPPFLAGS and LDFLAGS (and LD_LIBRARY_PATH as we shall see). You may bump into --with-libodb. However, be careful; the documentation states:

If these libraries are not installed and you would like to use their build directories instead, you can use the --with-libodb, and --with-boost configure options to specify their locations, for example:

./configure --with-boost=/tmp/boost

So if you did make install, you need the environment variables instead.

Without further ado, here are the shell commands. First boost; do note I am relying on the presence of Debian's system boost; if you have a local build of boost, which is not in the flags below, you will also need to add a path to it.

$ cd ..
$ tar -xaf libodb-boost-2.4.0.tar.gz
$ cd libodb-boost-2.4.0/
$ CPPFLAGS=-I/full/path/to/local/include LDFLAGS=-L/full/path/to/local/lib ./configure --prefix=/full/path/to/local
<snip>
config.status: executing libtool-rpath-patch commands
$ make -j5
<snip>
make[1]: Leaving directory '/path/to/build/directory/odb/2.4/libodb-boost-2.4.0'
$ make install
make[1]: Leaving directory '/path/to/build/directory/odb/2.4/libodb-boost-2.4.0'

For PostgreSQL again I am relying on the header files installed in Debian. The commands are:

$ cd ..
$ tar -xaf libodb-pgsql-2.4.0.tar.gz
$ cd libodb-pgsql-2.4.0/
$ CPPFLAGS=-I/full/path/to/local/include LDFLAGS=-L/full/path/to/local/lib ./configure --prefix=/full/path/to/local
<snip>
config.status: executing libtool-rpath-patch commands
$ make -j5
<snip>
make[1]: Leaving directory '/path/to/build/directory/odb/2.4/libodb-pgsql-2.4.0'
$ make install
<snip>
make[1]: Leaving directory '/path/to/build/directory/odb/2.4/libodb-pgsql-2.4.0'

Finally, Oracle. For this we need to supply the locations of the downloaded drivers or else ODB will not find the Oracle header and libraries. If you recall from the previous post, they are located in /usr/include/oracle/12.1/client64 and /usr/lib/oracle/12.1/client64/lib, so we must augment the flags with those two paths. In addition, I found configure was failing with errors finding shared objects, so I added LD_LIBRARY_PATH for good measure. The end result was as follows:

$ cd ..
$ tar -xaf libodb-oracle-2.4.0.tar.gz
$ cd libodb-oracle-2.4.0
$ LD_LIBRARY_PATH=/usr/lib/oracle/12.1/client64/lib CPPFLAGS="-I/full/path/to/local/include -I/usr/include/oracle/12.1/client64" LDFLAGS="-L/full/path/to/local/lib -L/usr/lib/oracle/12.1/client64/lib" ./configure --prefix=/full/path/to/local
<snip>
config.status: executing libtool-rpath-patch commands
$ make -j5
<snip>
make[1]: Leaving directory '/path/to/build/directory/odb/2.4/libodb-oracle-2.4.0'
$ make install
<snip>
make[1]: Leaving directory '/path/to/build/directory/odb/2.4/libodb-oracle-2.4.0'

And there you are; all libraries built and installed into our local directory, ready to be used.

Conclusion

In this part we've configured the Oracle Express database with the application users, and we sanity checked the configuration. Once that was out of the way, we built and installed all of the ODB libraries required by application code.

On Part III we will finally start making use of this setup and attempt to connect to the Oracle database. Stay tuned!

Created: 2017-02-24 Fri 12:37

Emacs 25.1.1 (Org mode 8.2.10)

Validate

Nerd Food: Tooling in Computational Neuroscience - Part III: Data

In God we trust; all others must bring data. -- W. Edwards Deming

Welcome to yet another instalment in our series of posts about tooling in Computational Neuroscience. Previously, we have discussed simulators - a popular one, in particular - and microscopes. We shall now talk about data in Computational Neuroscience, a seemingly broad and somewhat mundane topic but one which is central to any attempt in understanding the status quo of the discipline. The target audience remains as it was - the lay person - but I'm afraid things are getting increasingly technical.

More Data! We Need More Data!

Computational Neuroscience by itself is not particularly interesting if there are no inputs to the models we carefully craft nor detailed outputs to allow us to know what the models are doing. Similarly, one needs to be able to use experimental data to inform our modeling choices and in order to baseline expectations; if this data is not available, one cannot tell how close or how far models are from the real thing. As everywhere else, data is of crucial importance here; we need lots of it and of many different kinds.

Once you need data, you soon need to worry about data representation: how should information be encoded? Clearly, in order for the data to be useful in a general sense, it must be accompanied by a formal or informal specification or else users will not know how to interpret it. Furthermore, given the highly technical nature of the data in question, the specification must be very precise or the data becomes useless or even dangerous; "Was that in microns or nanometres?" is not the sort of question you want to be asking. In a world where producers and consumers of data can be anywhere geographically, the specification assumes an ever larger degree of importance.

In summary, it is just not practical to allow everyone to come up with their own data formats:

writing a clear and concise specification for data interchange is hard work, and requires a lot of experience in both the domain and the specification process in general. The first attempts would probably prove to be incomplete, inconsistent or impractical.
writing code to read and write files according to a specification and in multiple programming languages is also demanding engineering work.
writing code to convert from one data specification to another is even more complicated because it requires intimate knowledge of both.
some data is generated directly by hardware, making it impractical to adapt to different requirements.

Another aspect worth highlighting is the "big data" nature of a lot of the data sets used in this field. Anything to do with the brain gets pretty complex pretty quickly, and this manifests itself in the data dimension by having ever larger data sets with greater levels of detail. On the plus side, thanks to Moore's Law sigmoid, detailed information at all levels is allowing us to answer questions that were unanswerable not so long ago. The flip side is that all those details come at a cost: the data sets are becoming huge. For example, the resolution of the data coming out of microscopy is now so high that a single data set can take as much as 500 TB. And of course, not only are individual data sets getting larger and larger, but we are able to generate more of them at an ever increasing pace because the processes are more streamlined. It is a fire-hose of data.

All of these difficulties are not unique to Computational Neuroscience or even to Neuroscience as a whole, but the complexity of the domain has the effect of greatly exacerbating an already thorny problem.

Neuroinformatics to the Rescue

If you think we're exaggerating then think again. The management of data in Neuroscience is so complex it is a field on its own right, with the cool-sounding name of Neuroinformatics. Wikipedia tells us that:

Neuroinformatics is a research field concerned with the organization of neuroscience data by the application of computational models and analytical tools. These areas of research are important for the integration and analysis of increasingly large-volume, high-dimensional, and fine-grain experimental data. Neuroinformaticians provide computational tools, mathematical models, and create interoperable databases for clinicians and research scientists.

In layman's terms, Neuroinformatics concerns itself with Neuroscience data and the places where said data is to be stored. It is also implied that one has to deal with a variety of types of data, e.g.: data from experiments (of which there can be many kinds), model inputs, model outputs, the models themselves when viewed as data, etc. The classification of this data is in itself a Neuroinformatics task. Finally, Neuroinformatics also is responsible for the tooling necessary to acquire the data, manipulate it, analyse it, visualise it and so on. Given such a broad definition, one is forced to conclude that there is a big overlap between Computational Neuroscience - the modeling activity - and Neuroinformatics - the management of the data required by it. This lack of clarity is common in science, particularly as new fields develop; take for example Mathematics and Computer Science at its inception.

In truth, such definitions and demarcations are only as useful as the tangible benefits they provide. It is perhaps more fruitful to think of Neuroinformatics as a hat you don on as and when your Computational Science work requires; the definition is there then to allow one to be aware of the separation between the analytic work in modeling and the data storage / retrieval work. For the purposes of this article, we'll continue to refer to the "Neuroinformatics Scientist" and the Computational Neuroscientist personas, but bear in mind they may resolve to the same person in practice.¹

Before we move on, I'd like to point out another interesting challenge Neuroinformatics has to address, and one that is common to all Medical Sciences: the need to handle human-derived data very carefully. After all, making data sets available widely must not have implications for the original patients, so its often a requirement that the data is de-identified; in the cases where the data is patient sensitive, additional requirements may be made to users of the data to avoid leaking this information, such as requiring a registration, etc. This illustrates the peculiar nature of Neuroinformatics, with the constant tension between making data as widely available as possible but at the same time having to ensure there are no side-effects of doing so. Presumably, Primum non nocere - first, do no harm.

Databases, Repositories and Archives

Thanks to the efforts of Neuroinformatics, there is now a wealth of Neuroscience data available to all on the Internet. The roots of this growth were sowed in the nineties when labs started sharing research results online. Sharing always existed in one way or another, of course, but the rise of the Internet simply changed the magnitude of the process. It soon became apparent that there was a need to organise central repositories of data, and to ensure the consistency of the shared data. Papers with a distinct Neuroinformatics tone were written, such as An on-line archive of reconstructed hippocampal neurons (1999). Repositories grew, multiplied, morphed and in many cases died, as these things do, and the evolutionary process left us with the survivors. I'd like to highlight some of the ones I have bumped into so far are (with descriptions in their own words):

ModelDB: "ModelDB provides an accessible location for storing and efficiently retrieving computational neuroscience models. ModelDB is tightly coupled with NeuronDB. Models can be coded in any language for any environment. Model code can be viewed before downloading and browsers can be set to auto-launch the models."
NeuronDB: "NeuronDB provides a dynamically searchable database of three types of neuronal properties: voltage gated conductances, neurotransmitter receptors, and neurotransmitter substances. It contains tools that provide for integration of these properties in a given type of neuron and compartment, and for comparison of properties across different types of neurons and compartments."
NeuroMorpho: "NeuroMorpho.Org is a centrally curated inventory of digitally reconstructed neurons associated with peer-reviewed publications. It contains contributions from over 100 laboratories worldwide and is continuously updated as new morphological reconstructions are collected, published, and shared. To date, NeuroMorpho.Org is the largest collection of publicly accessible 3D neuronal reconstructions and associated metadata."
Functional Connectomes Project: "Following the precedent of full unrestricted data sharing, which has become the norm in molecular genetics, the FCP entailed the aggregation and public release (via www.nitrc.org) of over 1200 resting state fMRI (R-fMRI) datasets collected from 33 sites around the world."
OpenfMRI: "[…] project dedicated to the free and open sharing of functional magnetic resonance imaging (fMRI) datasets, including raw data."
Open Source Brain: "resource for sharing and collaboratively developing computational models of neural systems."

As you can see from this small list - rather incomplete, I'm sure - there is a wealth of information out there, covering all sorts of aspects of the brain. We never had so much data as we do today. And, in many ways, this is fast becoming a problem. As an example, data from each of Neuroscience's plethora of divisions and sub-fields is not designed to talk to each other: Electron Microscopy (EM) data is disconnected from data obtained by Magnetic Resonance Imaging (MRI), which is also totally separate from connectome information² and so forth. In many cases, these sub-fields have evolved in fairly separate paths, and developed their own technical vocabulary in isolation and over long periods of time - an approach perfectly suitable for a "disconnected" world but less than ideal for a world where multiple sources of data are required to make sense of complex phenomena. If one can't even agree on what to call things, how can one be able to explain them?

Thus, the early Neuroinformatics approach is best described as "evolutionary". It is not as if someone sat down and generated a well defined set of file formats for data interchange, covering all different aspects of the areas under study. Instead, what has been emerging is a multitude of file formats in each sub-field, all calling out for attention, and all of them designed for the immediate goal at hand rather than the greater good of Neuroscience.

Taming the Sea of Data

From a Software Engineering perspective, an evolutionary approach makes perfect sense; after all, the Real Programmers had said: "first make it work, then make it right, and, finally, make it fast." In many ways, we are reaching the "make it right" phase, with an increasing interest in efforts towards the creation of broad standards. There have been several papers and initiatives on the subject, such as the Neuroscience Information Framework, or NIF, described in a paper: The Neuroscience Information Framework: A Data and Knowledge Environment for Neuroscience. The paper outlined a lot of the problems that are hampering research, such as:

the need for specialised search engines that are domain aware, and advanced query tools too;
the need to aid integration and to provide connectivity across related data and findings;
a requirement to provide new and enhanced forms of analysing existing data, as data reuse is extremely important - new insights can be obtained on already existing data, often long after the data was generated, and by using it in ways that were not at all envisioned by the original authors;
the need to make contribution to online repositories easier; lowering the "contribution barrier" is important to increase data availability but must be done in ways that do not compromise the quality of the data;
a requirement to make all code open source such that any lab can make use of it, and the community as a whole can share the maintenance load;
a need for an online repository for all tooling, to avoid reinventing the wheel;
the need to create a multi-domain standard vocabulary.

There are many worthwhile points in this paper, and it is highly recommended to anyone interested in the subject matter. For instance, the section discussing the design of the NIF also covers the requirements for any specification that wishes to solve the problems outlined above. They are worth highlighting as - in my humble and lay opinion - they are very well thought out.

The design of such a framework must combine technical specifications choices and broad community support; "open data, access and exchange, via open source and platform, aid Framework-enabled open discover for Neuroscience."
A common framework would reduce costs and enhance benefits of data sharing and knowledge sharing; it would "reduce the cost/benefit ration for data acquisition and utilization."
The framework must be designed by the broader community and with the needs of this broader community in mind, and it must build upon prior development in Neuroinformatics.
A focus on interoperability is crucial, and it is not a static target but one that must be looked after over time. In addition, there is also a need to keep in mind that different resources have very different interoperability potential. In order to maximise interoperability, we should aim to standardise as much as possible all aspects of the process such as user interfaces, terminologies, formats, etc.

To the untrained eye, the NIF initiative appears to be a great effort to solve fundamental problems in the field. It also seems to have spawned and/or helped popularise many useful and lasting resources such as NeuroMorpho. However, the impression one gets from the outside is that the NIF didn't quite fulfil all of its potential. Having said that, I am keenly looking for up-to-date documents that describe the current status across all of its many aspects - alas, I have not yet succeeded in finding any such document. If indeed it is the case that the initiative petered out, it did highlight a few potential problems for anyone working in this space:

large undertakings are hard to pull off; small, organic, incremental changes are easier to do, but of course, that is why we have the problems we currently have.
large initiatives require large amounts of funding; work is technical and very expensive.
it is not easy to understand NIFs deliverables from looking at their documentationa and website. One can clearly see it was an ambitious project, and one which took on the brunt of the problem areas highlighted above, but perhaps it needed a slightly more self-contained view of their achievements rather than a whole-or-nothing approach. This allows preserving some components even whilst others are failing to gain traction.

XML strikes back

Another interesting attempt to tackle these problems is what I call the "XML suite". These are basically a set of different XML-based standards that are able to interoperate and augment each other, a bit like a stack of building blocks. You can find more details in this paper: XML for Model Specification in Neuroscience. Some of the components of the XML Suite are (with descriptions on their own words, copied from the above paper and a link for more details):

LEMS: "the Low Entropy Model Specification […] is being developed to provide a compact, minimally redundant, human-readable, human-writable, declarative way of expressing models of biological systems. It differs from other systems such as CellML or SBML in its requirement to be human writable and the inclusion of basic physical concepts such as dimensionality and physical nesting as part of the language."
NeuroML: "supports the use of declarative model specifications for neuroscience modeling efforts at different scales, from intracellular mechanisms to networks of reconstructed neurons."
MorphML: "provides a common format for exchange of neuronal morphology data. It can also be used to specify cell structure for modeling efforts as part of NeuroML."
BrainML: "application for representing time series data, spike trains, experimental protocols, and other data relevant to neurophysiology experiments."
SBML: "(Systems Biology Markup Language) is an application for specifying models of biochemical reaction networks such as metabolic networks, cell-signaling pathways and gene regulatory networks."
CellML: "is designed for the specification of biological models of cellular and sub-cellular processes such as calcium dynamics, metabolic pathways, signal transduction, and electrophysiology."
MathML: "provides the means for describing the structure and content of mathematical notation in order to serve, receive, and process mathematics on the web. Other XML applications often use MathML language elements for representing mathematical equations."

A positive aspect of the XML Suite is its "discrete" nature. Each of these file formats are free to evolve in isolation, and the nature of their cooperation is very loose in most cases. For example MathML is not at all related to Neuroscience and has the support of the Maths community (to some extent). In addition, the "stacking" approach is also a very interesting one, allowing a good domain focus. For example, NeuroML is built on top of LEMS, so in theory each of these should cover different domains and there should be minimal redundancy.

The key challenge for the XML Suite is for each of their components to find a sustainable user base and sustainable funding to go along with it. This is a broader problem of Neuroinformatics: researchers do not want to spend time on work that is not contributing directly to their research and so the developer pool to do fundamental work on the file formats is limited. Once the developer pool becomes too limited, the file format ends up with a small user base because it is not fit for purpose, and thus starts a downward spiral. This appears to have been the fate of projects such as BrainML.

Conclusion

This post provided an overview of the data landscape in Computational Neuroscience and introduced the sub-field of Neuroinformatics. We also looked at some of the available data stores and reviewed a few of the more popular initiatives to solve the fundamental data problems in the field.

Stay tuned for the next instalment!

Footnotes:

For a bit more details on the two fields see What are Computational Neuroscience and Neuroinformatics?

"A connectome is a comprehensive map of neural connections in the brain, and may be thought of as its "wiring diagram". From this page.

Created: 2016-02-08 Mon 21:41

Emacs 24.5.1 (Org mode 8.2.10)

Validate

Marco Craveiro

Monday, February 08, 2016

Nerd Food: Interesting...

What's in a Schema?

Enter ODB

Conclusion

Nerd Food: Tooling in Computational Neuroscience - Part III: Data

More Data! We Need More Data!

Neuroinformatics to the Rescue

Databases, Repositories and Archives

Taming the Sea of Data

XML strikes back

Conclusion

Footnotes:

Blog Archive

About Me