Caveats when working with bioinformatics data

This documents the common pitfalls when working with Bioinformatics data and how to prevent them.

Headers

Case

use janitor::clean_names to standardize names to snakecases.

Names

use a standardized name:

  • chr for chromosome, instead of chrom, seqnames etc. Sometimes you have to change the name to fit a certain software (e.g. GenomicRanages), but only convert the name within the call of the function itself, and immediately change back. Never propagate the name change to the next function because it will then be a headache to deal with the dependencies between functions.

Chromosome names

Decide on one naming convention. For now, I decide on chr# instead of # because most bcf files that I work with contain such names.

Avatar
Tim

Personalizing medicine

Related