I am not a professional programmer (my area is medical research), but I am quite capable in C/C++, and various scripting languages. A while back I got intrigued by Lisp, but I never got the time to seriously learn it. After a brief exposure to R I decided to invest more time in a functional programming language.
I would like the practicality of a JVM language and thus narrowed to Clojure and Scala. From what I understand, both can use already existing Java libraries and given at performance-critical code can be delegated to Java, have the potential to perform relatively equally well.
How do these languages compare in the application space I need them for?
Are There any real-life projects in bioinformatics using either?
Already existing code would be a serious plus, as would be good documentation and a fairly gentle learning curve. Also, how does the concurrency model of the two compare with each other?
Any significant advantages/disadvantages any one has?
Practice As Follows
I can personally vouch for Clojure as a great tool for this kind of work. (I believe Scala would be great too, I just have less experience with it).
My personal research is in the field of predictive modelling / machine learning and is very computationally intensive – so I think it has many parallels with bioinformatics or biostatistics.
My personal approach / setup includes:
Incanter used primarily as a data visualisation tool. Great for producing quick visualisations which are usually just 1-liners at the REPL. There are also lots of statistical and numerical processing tools which I believe use the Colt library under the hood. I’m not an expert in R but I understand that Incanter is roughly “R translated to Clojure/Lisp”.
Exploiting quite a few Java libraries as needed. Some of these are my own, for example algorithms that I have written in Java in order to get the best possible fine-tuned performance out of the JVM. But you could equally easily use any of the other great Java libraries available, as calling Java from Clojure is very simple (.methodName object param1 param2)
Quite a lot of higher order functions to automate my workflow. For example I have a higher order function that will run an optimisation algorithm of any kind in a loop for a specified amount of time and then produce an Incanter graph of the improvement on each iteration. Not rocket science, but really easy to code up in a few lines of Clojure.
Never really having to worry about performance. You can make Clojure go pretty fast if you want to (e.g. with type hints, primitive arithmetic support etc.) but normally it’s irrelevant as you’re going to spend 99%+ of your cycles in well-optimised library code anyway. Hence a bit of overhead in the “glue” code is negligible – I feel I gain much more in terms of personal productivity by having a dynamic, high-level, functional language to work in.
Major use of Clojure’s concurrency features – this has to be one of Clojure’s strongest features. I tend to use the STM to code concurrent processes with transactions that can’t interfere with each other, then kick off long-running calculations in a future so that I can get on with other tasks and wait for notification of the result.
A slowly growing collection of macros to “extend the language” when needed. I actually use macros less than I thought I would (higher order functions are often a better choice). But when you need them they are invaluable – this is where you really appreciate the value of a homoiconic language. Since they effectively allow you to add new syntax to the language itself, they are very powerful when used correctly to build the DSL that you need.
In short – I don’t think you can go wrong with Clojure as a researcher.
The one thing I probably wouldn’t use it for (yet) is actually writing a new numerical library – this would probably be better done in Scala or pure Java as you would probably want to adopt a more imperative / OOP style.