Protein Sequence Databases
PIR, SWISS-PROT and TREMBEL
1. Introduction
Protein sequence databases are biological databases that store information about amino acid sequences of proteins, along with their functional, structural, and biochemical characteristics. Since proteins are the functional molecules of the cell, protein databases are essential for understanding gene expression, metabolism, enzymatic activity, signaling pathways, and evolution.
Protein sequence databases mainly contain data derived from translated nucleotide sequences and experimental protein studies.
2. Types of Protein Sequence Databases
Protein sequence databases are broadly classified into:
A. Primary Protein Databases
Contain original protein sequence data
Minimal or no manual annotation
B. Secondary Protein Databases
Derived from primary databases
Provide curated functional and structural information
C. Composite Protein Databases
Combine protein data from multiple sources
Reduce redundancy
3. Protein Information Resource (PIR)
Overview
Protein Information Resource (PIR) is one of the earliest protein sequence databases, developed to store and analyze protein sequences.
Maintained by
Georgetown University (USA)
In collaboration with NBRF (National Biomedical Research Foundation)
Data Content
Protein sequences
Functional information
Evolutionary relationships
Classification into protein families
Unique Features
Organized into protein superfamilies
Emphasis on evolutionary and functional classification
Non-redundant dataset
Advantages
High-quality annotations
Useful for comparative protein studies
Limitations
Smaller than newer databases
Less frequently updated compared to UniProt
4. SWISS-PROT Database
Overview
SWISS-PROT is a manually curated, high-quality protein sequence database known for its accuracy and reliability.
Maintained by
Swiss Institute of Bioinformatics (SIB)
European Bioinformatics Institute (EMBL-EBI)
Data Content
Amino acid sequences
Protein function
Enzyme activity
Post-translational modifications
Domain structure
Subcellular localization
Key Features
Manual curation by experts
Minimal redundancy
High annotation accuracy
Extensive cross-references
SWISS-PROT Entry Includes :
Accession number
Protein name
Organism
Function
Sequence length
Amino acid sequence
Advantages
Highly reliable
Preferred for functional studies
Limitations
Slow growth due to manual annotation
5. TrEMBL (Translated EMBL)
Overview
TrEMBL is a computer-annotated protein database that contains protein sequences translated from nucleotide sequence databases.
Maintained by
EMBL-EBI
Swiss Institute of Bioinformatics
Data Source
Translations of coding sequences from:
EMBL
GenBank
DDBJ
Key Features
Automatically annotated
Large and rapidly growing database
Supplement to SWISS-PROT
Advantages
Covers newly discovered proteins
Fast data availability
Limitations
Annotation may contain errors
Less reliable than SWISS-PROT
6. UniProt Knowledgebase (UniProtKB)
SWISS-PROT and TrEMBL together form the UniProt Knowledgebase (UniProtKB).
Components
UniProtKB/Swiss-Prot – reviewed, manually curated
UniProtKB/TrEMBL – unreviewed, automatically annotated
Purpose
Provide comprehensive protein sequence and functional information
Serve as a central protein knowledge hub
7. Comparison of PIR, SWISS-PROT, and TrEMBL
8. Applications of Protein Sequence Databases
Protein function prediction
Identification of conserved domains
Comparative protein analysis
Phylogenetic studies
Drug target identification
Enzyme characterization
9. Importance of Protein Sequence Databases
Link genes to protein function
Support proteomics research
Assist in metabolic pathway analysis
Aid in molecular evolution studies
Help in crop improvement and biotechnology
10. Conclusion
Protein sequence databases such as PIR, SWISS-PROT, and TrEMBL play a vital role in modern bioinformatics. While SWISS-PROT provides high-quality, manually curated protein data, TrEMBL ensures rapid availability of newly sequenced proteins. PIR contributes valuable evolutionary and functional classifications. Together, these databases support comprehensive protein research and biological discovery.
Comments