Learning to code: it’s as easy and as complex as that.

Learning to code: it’s as easy and as complex as that.

Learning to code: it’s as easy and as complex as that.

Many life scientists have used their time under pandemic-induced lockdown to learn their first programming language. Yay! Coding has many many advantages (reproducibility, more complex analyses, reusability, time-saving) that I have espoused all too many times before.

Many people start with a tutorial or two, then jump straight in and analyse a dataset. They get a plot, maybe a few statistics and voila! They are now scientists who can code.

Kinda.

See, the thing is there is more (much more) to implementing programmatic workflows in a biological context. Unfortunately, these are the skills that intro-to-programming courses often fail to teach you (or sometimes even mention) – but without them, adding scripts to your workflow can often make your work less reproducible, not more. For the sake of the argument, lets pick one.

Version control.

Imagine a bowl of spaghetti (lots of lines of code), which you admire briefly before throwing at a canvas (your interpreter). You create a beautiful work of art (your plot), and stand back to admire your handiwork. Maybe you show it to a few friends, and they are super impressed. (Awesome!). And then they ask – how did you do it? Not just the throwing motion, or the rough distance between the bowl and the canvas. They want to know the precise location of every single strand of spaghetti in the bowl before you threw it at the canvas. They want to know the recipe you used for the sauce, down to the precise number of grams of oregano.

Now, of course, we aren’t actually talking about a bowl of spaghetti – we are talking about your code. So – of course – you can simply show them the script! Huzzah! The precise instructions that enabled you to make that very specific plot. But here’s the kicker. The code is not static. In fact, chances are that you poured over it for hours (maybe days) adjusting, tweaking, changing, testing and rerunning – all to get the glorious, awe-inspiring plot at the end.

By this stage, you’ve probably made over 100 spaghetti-splattered-spectacles (plots) and your house is starting to smell like an Italian Pizza joint. If I asked you to produce the exact recipe, down to the very position of each strand of spaghetti, for plot number 47 – could you do it?

My guess is probably no.

Conducting experiments is at the heart of science. However, without the essential extracurricular activities like maintaining backups, organising your results files and optimising project management systems, you wouldn’t last very long in academic research. The same is true of coding. Learning to write a functioning script is just the tip of the iceberg when it comes to implementing reproducible computing workflows in biology.

So, with this in mind, what are those all important skills you ask? Well, luckily, far more experienced scholars than I have written on this topic and provided guides (like this and this) for those starting out in the computational space. These are highly applicable to biologists learning to code, and I encourage you to check them out before you stray too far down the spaghetti-on-the-wall path (like I did). As with any system, it is easiest to implement new routines before you have become to set in the old ones! At the very least, here are a few things to consider:

For more, make sure you check out the resources below.

Oh, and one last thing.

Unfortunately, this type of organisational work is not typically measured as a key performance outcome for biologists. Your number of git repositories or test coverage for an analysis suite is unlikely to come up in an award nomination or promotion application. And yet, it is entirely crucial as we move toward bigger and more complicated data and analyses – and so I encourage you to take the time to learn anyway. Maybe one day it will be recognised as essential work by the powers that be – but until then, at the very least, it remains essential for anyone wanting to do good reproducible science. And who doesn’t want that?


Final thoughts

So you want to learn to code? Yes – do it, 100%. I honestly couldn’t recommend or encourage it more. But please do so mindfully. If you want to have the best possible chance of integrating this wonderful tool into your scientific ecosystem in the long-term, you have to lay solid foundations and develop sustainable practice/methods.

Have you used any of these techniques as a budding bench-to-bytes biologist? Find me on twitter or head over to the contact page to tell me more!


Resources:

Image credits: Tomwsulcer via WikiMedia Commons & olamishchenko via unsplash