Secondary Databases (PROSITE, PRINTS, BLOCKS)

Secondary Databases (PROSITE, PRINTS, BLOCKS

Secondary Databases

Introduction

Biological databases are broadly classified into primary and secondary databases.

Primary databases store raw experimental data (e.g., nucleotide or protein sequences), whereas secondary databases contain derived information obtained by analyzing primary sequence data.

Secondary databases are mainly used to:

Identify protein families

Detect conserved motifs, patterns, and domains

Predict protein function

Study structure–function relationships

Examples of secondary databases include PROSITE, PRINTS, BLOCKS, Pfam, etc.

1. PROSITE Database

Definition

PROSITE is a secondary database that documents protein domains, families, and functional sites in the form of patterns and profiles.

Developed by

Swiss Institute of Bioinformatics (SIB)

Maintained along with UniProt

Principle

PROSITE is based on the idea that functionally important regions of proteins are conserved during evolution.

These conserved regions can be represented as:

1. Patterns (regular expressions)

2. Profiles (position-specific scoring matrices)

Components of PROSITE

Patterns

Short conserved motifs

Written as regular expressions

Useful for identifying active sites or binding sites

Example: Serine protease active site

Profiles

More sensitive than patterns

Can detect distant homologs

Represent the probability of amino acids at each position.

Documentation (PROSITE entries)

Each entry includes:

Description of the protein family/domain

Biological function

References

Links to UniProt

Applications

Protein function prediction

Identification of catalytic and binding sites

Annotation of newly sequenced proteins

Detection of protein families

Advantages

High specificity

Well-curated and annotated

Easy interpretation

Limitations

Patterns may miss distant homologs

False negatives may occur

2. PRINTS Database

Definition

PRINTS is a secondary protein database that identifies protein families using fingerprints, which are groups of conserved motifs.

Developed by

University of Manchester, UK

Principle

Unlike PROSITE, which uses single motifs, PRINTS uses multiple conserved motifs (fingerprints) to characterize a protein family.

A protein is considered a member of a family only if it matches most or all motifs in the fingerprint.

Structure of PRINTS

Each PRINTS entry consists of:

A set of conserved motifs

Alignment of sequences

Functional annotation

Cross-references to other databases

Key Features

Fingerprints improve accuracy

Reduces false positive matches

Useful for family-level classification

Applications

Identification of protein superfamilies

Functional annotation of proteins

Evolutionary studies

Validation of protein family membership

Advantages

High reliability due to multiple motifs

Better discrimination between closely related families

Limitations

Less sensitive to very divergent sequences

Smaller coverage compared to some databases

3. BLOCKS Database

Definition

BLOCKS is a database of conserved regions (blocks) in protein families, represented as ungapped multiple sequence alignments.

Developed by

Fred Hutchinson Cancer Research Center, USA

Principle

A block is a conserved region found in multiple proteins, without insertions or deletions.

These blocks represent functionally or structurally important regions of proteins.

Characteristics

Derived from PROSITE families

Focuses on local conserved regions

Uses position-specific scoring matrices (PSSMs)

BLOCKS Format

Each entry contains:

Protein family name

Conserved block sequences

Alignment information

Scoring matrices

Applications

Detection of conserved motifs

Protein classification

Functional prediction

Sequence similarity searches

Advantages

Highly conserved regions improve accuracy

Ungapped alignments are easy to analyze

Limitations

Ignores variable regions

Limited coverage for novel proteins

Comparison of PROSITE, PRINTS and BLOCKS

Importance of Secondary Databases

Help in functional annotation of proteins

Aid in genome annotation projects

Support comparative genomics and evolutionary studies

Essential tools in bioinformatics and proteomics.

Conclusion

Secondary databases such as PROSITE, PRINTS and BLOCKS play a crucial role in understanding protein structure and function. By analyzing conserved motifs and domains, these databases help in accurate protein classification, functional prediction, and evolutionary analysis, making them indispensable tools in modern bioinformatics.

Notethepoint 43official Previous Question Paper Updates2.0

Search This Blog

Secondary Databases (PROSITE, PRINTS, BLOCKS)

Comments

Popular Posts

••CLASSIFICATION OF ALGAE - FRITSCH

Genetically modified microbes - biodegradation, biopesticides, bioremediation, mineral leaching and biofertilizers.