Bioinformatics File Formats Guide

Bioinformatics File Formats: A Complete Guide

The Architecture of Biological Information

Modern biology converts the chemical reality of nucleotides into computational strings. The file format is the essential interface between this reality and scientific insight. Understanding these standards is the prerequisite for reproducible research.

Standard Formats & Specifications

Coordinate Systems

Visualizing the "Off-by-One" logic trap

Critical Pitfalls

  • The Coordinate Shift

    Extracting sequences using BED coordinates on a 1-based system (like GATK) without adjustment will result in frameshift errors.

  • Reference Versioning

    An hg19 BAM will technically "work" with an hg38 FASTA in some tools, but the results will be biological nonsense. Always verify headers.

  • Legacy Quality Scores

    If you encounter data from 2010 (Illumina 1.5), it likely uses Phred+64. Modern tools assume Phred+33. This misinterpretation makes high-quality reads look like junk.

Research Pipelines

End-to-end workflows: From Raw Data to Biological Insight

Select your currency
Hurry up! Sale ends in:
Days
Hours
Minutes
Seconds