Skip to main content

Secondary Databases (PROSITE, PRINTS, BLOCKS)

Secondary Databases (PROSITE, PRINTS, BLOCKS 



Secondary Databases


Introduction

Biological databases are broadly classified into primary and secondary databases.
Primary databases store raw experimental data (e.g., nucleotide or protein sequences), whereas secondary databases contain derived information obtained by analyzing primary sequence data.
Secondary databases are mainly used to:
Identify protein families
Detect conserved motifs, patterns, and domains
Predict protein function
Study structure–function relationships
Examples of secondary databases include PROSITE, PRINTS, BLOCKS, Pfam, etc.


1. PROSITE Database

Definition
PROSITE is a secondary database that documents protein domains, families, and functional sites in the form of patterns and profiles.

Developed by

Swiss Institute of Bioinformatics (SIB)
Maintained along with UniProt
Principle
PROSITE is based on the idea that functionally important regions of proteins are conserved during evolution.
These conserved regions can be represented as:

 1. Patterns (regular expressions)
2. Profiles (position-specific scoring matrices)


Components of PROSITE

Patterns
Short conserved motifs
Written as regular expressions
Useful for identifying active sites or binding sites
Example: Serine protease active site

Profiles
More sensitive than patterns
Can detect distant homologs
Represent the probability of amino acids at each position.

Documentation (PROSITE entries)
Each entry includes:
Description of the protein family/domain
Biological function
References
Links to UniProt


Applications

Protein function prediction
Identification of catalytic and binding sites
Annotation of newly sequenced proteins
Detection of protein families

Advantages
High specificity
Well-curated and annotated
Easy interpretation


Limitations

Patterns may miss distant homologs
False negatives may occur

2. PRINTS Database
Definition
PRINTS is a secondary protein database that identifies protein families using fingerprints, which are groups of conserved motifs.

Developed by

University of Manchester, UK


Principle
Unlike PROSITE, which uses single motifs, PRINTS uses multiple conserved motifs (fingerprints) to characterize a protein family.
A protein is considered a member of a family only if it matches most or all motifs in the fingerprint.


Structure of PRINTS

Each PRINTS entry consists of:
A set of conserved motifs
Alignment of sequences
Functional annotation
Cross-references to other databases


Key Features
Fingerprints improve accuracy
Reduces false positive matches
Useful for family-level classification


Applications

Identification of protein superfamilies
Functional annotation of proteins
Evolutionary studies
Validation of protein family membership

Advantages

High reliability due to multiple motifs
Better discrimination between closely related families
Limitations

Less sensitive to very divergent sequences
Smaller coverage compared to some databases

3. BLOCKS Database
Definition

BLOCKS is a database of conserved regions (blocks) in protein families, represented as ungapped multiple sequence alignments.

Developed by
Fred Hutchinson Cancer Research Center, USA


Principle

A block is a conserved region found in multiple proteins, without insertions or deletions.
These blocks represent functionally or structurally important regions of proteins.


Characteristics
Derived from PROSITE families
Focuses on local conserved regions
Uses position-specific scoring matrices (PSSMs)

BLOCKS Format


Each entry contains:

Protein family name
Conserved block sequences
Alignment information
Scoring matrices


Applications

Detection of conserved motifs
Protein classification
Functional prediction
Sequence similarity searches

Advantages
Highly conserved regions improve accuracy
Ungapped alignments are easy to analyze


Limitations

Ignores variable regions
Limited coverage for novel proteins


Comparison of PROSITE, PRINTS and BLOCKS




Importance of Secondary Databases

Help in functional annotation of proteins
Aid in genome annotation projects
Support comparative genomics and evolutionary studies
Essential tools in bioinformatics and proteomics.


Conclusion

Secondary databases such as PROSITE, PRINTS and BLOCKS play a crucial role in understanding protein structure and function. By analyzing conserved motifs and domains, these databases help in accurate protein classification, functional prediction, and evolutionary analysis, making them indispensable tools in modern bioinformatics.



Comments

Popular Posts

IN SITU HYBRIDIZATION (ISH)

IN SITU HYBRIDIZATION (ISH) Introduction In situ hybridization (ISH) is a molecular biology and cytogenetic technique used to detect and localize specific DNA or RNA sequences within intact cells, tissues, or chromosomes. The term in situ means “in the original place”, indicating that the target nucleic acid is identified without extracting it from the cell, thereby preserving cellular and tissue morphology. ISH is widely used in gene mapping, gene expression analysis, medical diagnosis, and developmental biology. Principle of In Situ Hybridization The principle of ISH is based on complementary base pairing between a single-stranded, labeled nucleic acid probe and its complementary target DNA or RNA sequence present in fixed cells or tissues. The sample is fixed on a slide. Target nucleic acids are denatured to single strands. A labeled probe hybridizes specifically with the target sequence. Excess probe is washed away. The hybridized probe is visualized using appropriate detection sys...

••CLASSIFICATION OF ALGAE - FRITSCH

      MODULE -1       PHYCOLOGY  CLASSIFICATION OF ALGAE - FRITSCH  ❖F.E. Fritsch (1935, 1945) in his book“The Structure and  Reproduction of the Algae”proposed a system of classification of  algae. He treated algae giving rank of division and divided it into 11  classes. His classification of algae is mainly based upon characters of  pigments, flagella and reserve food material.     Classification of Fritsch was based on the following criteria o Pigmentation. o Types of flagella  o Assimilatory products  o Thallus structure  o Method of reproduction          Fritsch divided algae into the following 11 classes  1. Chlorophyceae  2. Xanthophyceae  3. Chrysophyceae  4. Bacillariophyceae  5. Cryptophyceae  6. Dinophyceae  7. Chloromonadineae  8. Euglenineae    9. Phaeophyceae  10. Rhodophyceae  11. Myxophyce...

Mapping of DNA

DNA MAPPING   1. Introduction DNA mapping refers to the process of determining the relative positions of genes or DNA sequences on a chromosome. It provides information about the organization, structure, and distance between genetic markers in a genome. DNA mapping is an essential step toward genome sequencing, gene identification, disease diagnosis, and genetic engineering. DNA maps serve as roadmaps that guide researchers to locate specific genes associated with traits or diseases. 2. Objectives of DNA Mapping To locate genes on chromosomes To determine the order of genes To estimate distances between genes or markers To study genome organization To assist in genome sequencing projects. 3. Principles of DNA Mapping DNA mapping is based on: Recombination frequency Physical distance between DNA fragments Hybridization of complementary DNA Restriction enzyme digestion Use of genetic markers The closer two genes are, the less frequently they recombine during meiosis. 4 . Types of DNA...