Secondary Databases (PROSITE, PRINTS, BLOCKS
Secondary Databases
Introduction
Biological databases are broadly classified into primary and secondary databases.
Primary databases store raw experimental data (e.g., nucleotide or protein sequences), whereas secondary databases contain derived information obtained by analyzing primary sequence data.
Secondary databases are mainly used to:
Identify protein families
Detect conserved motifs, patterns, and domains
Predict protein function
Study structure–function relationships
Examples of secondary databases include PROSITE, PRINTS, BLOCKS, Pfam, etc.
1. PROSITE Database
Definition
PROSITE is a secondary database that documents protein domains, families, and functional sites in the form of patterns and profiles.
Developed by
Swiss Institute of Bioinformatics (SIB)
Maintained along with UniProt
Principle
PROSITE is based on the idea that functionally important regions of proteins are conserved during evolution.
These conserved regions can be represented as:
1. Patterns (regular expressions)
2. Profiles (position-specific scoring matrices)
Components of PROSITE
Patterns
Short conserved motifs
Written as regular expressions
Useful for identifying active sites or binding sites
Example: Serine protease active site
Profiles
More sensitive than patterns
Can detect distant homologs
Represent the probability of amino acids at each position.
Documentation (PROSITE entries)
Each entry includes:
Description of the protein family/domain
Biological function
References
Links to UniProt
Applications
Protein function prediction
Identification of catalytic and binding sites
Annotation of newly sequenced proteins
Detection of protein families
Advantages
High specificity
Well-curated and annotated
Easy interpretation
Limitations
Patterns may miss distant homologs
False negatives may occur
2. PRINTS Database
Definition
PRINTS is a secondary protein database that identifies protein families using fingerprints, which are groups of conserved motifs.
Developed by
University of Manchester, UK
Principle
Unlike PROSITE, which uses single motifs, PRINTS uses multiple conserved motifs (fingerprints) to characterize a protein family.
A protein is considered a member of a family only if it matches most or all motifs in the fingerprint.
Structure of PRINTS
Each PRINTS entry consists of:
A set of conserved motifs
Alignment of sequences
Functional annotation
Cross-references to other databases
Key Features
Fingerprints improve accuracy
Reduces false positive matches
Useful for family-level classification
Applications
Identification of protein superfamilies
Functional annotation of proteins
Evolutionary studies
Validation of protein family membership
Advantages
High reliability due to multiple motifs
Better discrimination between closely related families
Limitations
Less sensitive to very divergent sequences
Smaller coverage compared to some databases
3. BLOCKS Database
Definition
BLOCKS is a database of conserved regions (blocks) in protein families, represented as ungapped multiple sequence alignments.
Developed by
Fred Hutchinson Cancer Research Center, USA
Principle
A block is a conserved region found in multiple proteins, without insertions or deletions.
These blocks represent functionally or structurally important regions of proteins.
Characteristics
Derived from PROSITE families
Focuses on local conserved regions
Uses position-specific scoring matrices (PSSMs)
BLOCKS Format
Each entry contains:
Protein family name
Conserved block sequences
Alignment information
Scoring matrices
Applications
Detection of conserved motifs
Protein classification
Functional prediction
Sequence similarity searches
Advantages
Highly conserved regions improve accuracy
Ungapped alignments are easy to analyze
Limitations
Ignores variable regions
Limited coverage for novel proteins
Comparison of PROSITE, PRINTS and BLOCKS
Importance of Secondary Databases
Help in functional annotation of proteins
Aid in genome annotation projects
Support comparative genomics and evolutionary studies
Essential tools in bioinformatics and proteomics.
Conclusion
Secondary databases such as PROSITE, PRINTS and BLOCKS play a crucial role in understanding protein structure and function. By analyzing conserved motifs and domains, these databases help in accurate protein classification, functional prediction, and evolutionary analysis, making them indispensable tools in modern bioinformatics.
Comments