About this blog


This website has the aim of helping biologists learn basic computer techniques that they will find useful in their research. It assumes absolutely no prior knowledge and hopefully it will provide some basic skills that will come in handy when confronted with loads and loads of data, as is now so often the case in biology. In order to develop some programming skills we are going to use Python and will also look at using R, particularly for various Next-Generation sequencing tasks. Each post will have an objective relating to some common bioinformatic task that is likely to arise in the life of a biologist. This will include things like developing automated annotation pipelines for large sequence data sets, processing massive data sets without recourse to Excel, automating the retrieval of biological information from databases with web based APIs, automating things like Blast, signalP and other common programs and ultimately getting into advance topics like using decision trees for filtering data and developing algorithms that will help you answer biological questions posed by your research. Hopefully at the end of each post you will have a program that you can use as well as having learnt some new concepts in computing that will enable you to further refine the program to your needs.

Having said that the first couple of posts will be a relatively non-biological look at the Unix command line. I have done this for a number of reasons, the most important being the need to have some level of proficiency with this tool if you want to use any type of high performance computing facility, almost all of which use Unix. Since the advent of Next-Generation sequencing the need for processing power, and more importantly memory, means that if you are assembling 300 Gb of Illumina data your laptop is not going to cut it. Although the Python programming exercises will be doable on Windows, or any other OS that you can run Python on, I figured we may as well bite the bullet and get some command line proficiency going at the outset so you will be able to hop on the super-computer at your institution provides and have some idea of what is going on. As for me I am a practising scientist with a background in NMR spectroscopy, proteomics and DNA and RNA sequencing. I am not a trained computer scientist but the techniques I use have always involved computation and by osmosis I have slowly picked some stuff up. Having said that you should always take what I say with a grain of salt and corrections or questions about exercises are welcome either by email (see bottom of the page) or Twitter (@ProgBiologists). Biology is rapidly becoming a computational science so whether your sequencing, doing proteomics, developing interesting algorithms or just wasting time when you should be setting up that PCR — this site is for you.