Bioinformatics File Formats: A Complete Guide
The Architecture of Biological Information
Modern biology converts the chemical reality of nucleotides into computational strings. The file format is the essential interface between this reality and scientific insight. Understanding these standards is the prerequisite for reproducible research.
Standard Formats & Specifications
Coordinate Systems
Visualizing the "Off-by-One" logic trap
Critical Pitfalls
-
The Coordinate Shift
Extracting sequences using BED coordinates on a 1-based system (like GATK) without adjustment will result in frameshift errors.
-
Reference Versioning
An hg19 BAM will technically "work" with an hg38 FASTA in some tools, but the results will be biological nonsense. Always verify headers.
-
Legacy Quality Scores
If you encounter data from 2010 (Illumina 1.5), it likely uses Phred+64. Modern tools assume Phred+33. This misinterpretation makes high-quality reads look like junk.
Research Pipelines
End-to-end workflows: From Raw Data to Biological Insight