NCBI Ecosystem Infographic
NCBI Integrated Bioinformatics Infrastructure

The Digital Laboratory

A comprehensive mapping of the Entrez engine, curated standards, variation frequencies, and high-throughput analytical workflows.

1. The Entrez Engine

Operating across 40+ databases, Entrez uses pre-computed statistical associations to traverse the "biological logic" of disparate datasets.

Hard Links

Logical, verifiable connections established at submission. Example: A sequence record linked to its PubMed citation via accession number.

Neighbor Links

PubMed: "Related Articles" via text-similarity weights.
Sequence: Pre-computed BLAST clusters identifying statistically relevant relatives.

LinkOut: Federated Portal

Mouse Genome (MGI) BindingDB (Drug-Target) University Plasmids PDB Structures

E-Utilities API

EGQuery
Global search: Hit counts
ESearch
Discovery: Returns UIDs
ELink
Navigation: Maps across DBs
EPost
Storage: History Server
ESummary
Triage: Metadata DocSums
EFetch
Retrieval: XML/FASTA data
ESpell
Correction: Spellings

ESearch → ELink → EPost → EFetch

2. The Gene Nexus & Standards

Anatomy of a Gene Record

Genomic Context
Exon structure via GDV Viewer
Expression
Tissue specificity via GEO Profiles
Homology
Evolutionary grouping via HomoloGene
Pathways
BioSystems signaling integration

MeSH: Literature Normalization

Standardizes 36M+ PubMed abstracts. Queries for "Cancer" or "Tumors" are automatically mapped to the concept "Neoplasms", ensuring exhaustive retrieval.

Archival vs. Curated

GenBank (The Archive)

Redundant author submissions. May contain errors, cloning artifacts, or haplotypes.

RefSeq (The Standard)

Expert-synthesized standard sequences. Non-redundant biological baseline. Optimized for analysis.

NM_: mRNA
NP_: Protein
NC_: Genome
NG_: Cluster

3. Variation & Evolutionary Toolbox

dbSNP: ss# vs rs#

Submission IDs (ss#) cluster into Reference SNPs (rs#). The rsID is the universal key for scientific communication; multiple studies reporting the same variant merge into one rsID.

ALFA Aggregator

Pre-computes and aggregates allele frequencies from 1M+ dbGaP subjects across 12 major populations.

• African Ancestry
• East Asian Ancestry
• European Ancestry
• South Asian Ancestry

South Asian Case Study: Variant found <0.1% in Europeans but 15% in ALFA South Asians? Likely a benign lineage-specific polymorphism.

Advanced Evolutionary Toolkit

COBALT (Alignment)

Constraint-based Multiple Alignment. Sirf sequences nahi, balki CDD (Conserved Domain) domains ko use karke alignment ko structural sense deta hai.

TreeViewer (Phylogeny)

Visualizes Paralogs (duplications) vs. Orthologs (speciation). Ex: Creatine Kinase Muscle (M) vs Brain (B) isoforms duplication history.

CD-Search (Domain Detection)

Protein BLAST ke sath automatic identify karta hai functional units (e.g., ATP-binding cassette ya Zinc-finger domains).

MMDB Structure Neighbors

Detects relationships where sequence identity < 20% hai magar 3D folds conserved hain. Sequence BLAST se gayab distant cousins ko pakadta hai.

4. BLAST Suite & Algorithms

Algorithm Type Best For...
megablast Nuc → Nuc Identical sequences (same species). Optimized for speed.
blastp Prot → Prot Functional annotation and domain analysis.
tblastn Prot → Nuc Gene Prediction (e.g. Grey Whale Kreatine Kinase discovery).
blastx Nuc → Prot Translating novel transcripts/ESTs into proteins.

Search Parameters

Word Size

Reduce (11 for blastn vs 28 for megablast) to increase sensitivity for distant homologs.

E-Value

Lower threshold (e.g. 1e-5) filters random statistical noise in massive databases.

Database Bias: ClusteredNR

Standard nr is biased toward humans/mice. ClusteredNR groups sequences with 90% identity.

Benefit:

Reduces database size by ~40% and helps find Sponge/Jellyfish homologs buried under mammalian data.

Lynch Syndrome Precision Workflow

1

Clinical Suspicion to Targets

Clinician suspects Lynch Syndrome (CRC). MedGen links to GeneReviews. Target MMR genes identified: MLH1, MSH2, MSH6, PMS2. GTR finds lab tests.

2

Standardization Baseline

Retrieves RefSeq NM_000249.4. Maps patient sequences against this curated standard to ensure consistent HGVS variant naming (e.g. c.123G>A).

3

Variant Interpretation & Filtering

ClinVar Star Rating

Expert Panel Review status (3-star) provides definitive pathogenicity evidence over unreviewed submissions.

ALFA Check (VUS)

If variant frequency > 5% in the ancestry group, it is likely a benign polymorphism.

4

Evidence Synthesis

LinkOut to PDB (Structure). If mutation disrupts the MLH1-PMS2 interface, structural evidence supports pathogenicity. PubMed search using rsID retrieves case reports.

Select your currency
Hurry up! Sale ends in:
Days
Hours
Minutes
Seconds