This documents the common pitfalls when working with Bioinformatics data and how to prevent them.
Headers
Case
use janitor::clean_names
to standardize names to snakecases.
Names
use a standardized name:
chr
for chromosome, instead ofchrom
,seqnames
etc. Sometimes you have to change the name to fit a certain software (e.g.GenomicRanages
), but only convert the name within the call of the function itself, and immediately change back. Never propagate the name change to the next function because it will then be a headache to deal with the dependencies between functions.
Chromosome names
Decide on one naming convention. For now, I decide on chr#
instead of #
because most bcf files that I work with contain such names.