Skip to main content

Secondary Databases (PROSITE, PRINTS, BLOCKS)

Secondary Databases (PROSITE, PRINTS, BLOCKS 



Secondary Databases


Introduction

Biological databases are broadly classified into primary and secondary databases.
Primary databases store raw experimental data (e.g., nucleotide or protein sequences), whereas secondary databases contain derived information obtained by analyzing primary sequence data.
Secondary databases are mainly used to:
Identify protein families
Detect conserved motifs, patterns, and domains
Predict protein function
Study structure–function relationships
Examples of secondary databases include PROSITE, PRINTS, BLOCKS, Pfam, etc.


1. PROSITE Database

Definition
PROSITE is a secondary database that documents protein domains, families, and functional sites in the form of patterns and profiles.

Developed by

Swiss Institute of Bioinformatics (SIB)
Maintained along with UniProt
Principle
PROSITE is based on the idea that functionally important regions of proteins are conserved during evolution.
These conserved regions can be represented as:

 1. Patterns (regular expressions)
2. Profiles (position-specific scoring matrices)


Components of PROSITE

Patterns
Short conserved motifs
Written as regular expressions
Useful for identifying active sites or binding sites
Example: Serine protease active site

Profiles
More sensitive than patterns
Can detect distant homologs
Represent the probability of amino acids at each position.

Documentation (PROSITE entries)
Each entry includes:
Description of the protein family/domain
Biological function
References
Links to UniProt


Applications

Protein function prediction
Identification of catalytic and binding sites
Annotation of newly sequenced proteins
Detection of protein families

Advantages
High specificity
Well-curated and annotated
Easy interpretation


Limitations

Patterns may miss distant homologs
False negatives may occur

2. PRINTS Database
Definition
PRINTS is a secondary protein database that identifies protein families using fingerprints, which are groups of conserved motifs.

Developed by

University of Manchester, UK


Principle
Unlike PROSITE, which uses single motifs, PRINTS uses multiple conserved motifs (fingerprints) to characterize a protein family.
A protein is considered a member of a family only if it matches most or all motifs in the fingerprint.


Structure of PRINTS

Each PRINTS entry consists of:
A set of conserved motifs
Alignment of sequences
Functional annotation
Cross-references to other databases


Key Features
Fingerprints improve accuracy
Reduces false positive matches
Useful for family-level classification


Applications

Identification of protein superfamilies
Functional annotation of proteins
Evolutionary studies
Validation of protein family membership

Advantages

High reliability due to multiple motifs
Better discrimination between closely related families
Limitations

Less sensitive to very divergent sequences
Smaller coverage compared to some databases

3. BLOCKS Database
Definition

BLOCKS is a database of conserved regions (blocks) in protein families, represented as ungapped multiple sequence alignments.

Developed by
Fred Hutchinson Cancer Research Center, USA


Principle

A block is a conserved region found in multiple proteins, without insertions or deletions.
These blocks represent functionally or structurally important regions of proteins.


Characteristics
Derived from PROSITE families
Focuses on local conserved regions
Uses position-specific scoring matrices (PSSMs)

BLOCKS Format


Each entry contains:

Protein family name
Conserved block sequences
Alignment information
Scoring matrices


Applications

Detection of conserved motifs
Protein classification
Functional prediction
Sequence similarity searches

Advantages
Highly conserved regions improve accuracy
Ungapped alignments are easy to analyze


Limitations

Ignores variable regions
Limited coverage for novel proteins


Comparison of PROSITE, PRINTS and BLOCKS




Importance of Secondary Databases

Help in functional annotation of proteins
Aid in genome annotation projects
Support comparative genomics and evolutionary studies
Essential tools in bioinformatics and proteomics.


Conclusion

Secondary databases such as PROSITE, PRINTS and BLOCKS play a crucial role in understanding protein structure and function. By analyzing conserved motifs and domains, these databases help in accurate protein classification, functional prediction, and evolutionary analysis, making them indispensable tools in modern bioinformatics.



Comments

Popular Posts

AFLP--Amplified Fragment Length Polymorphism

AFLP is a PCR-based DNA fingerprinting technique combining restriction digestion and selective PCR amplification of genomic DNA fragments. Developed by Vos et al., 1995. AFLP detects DNA polymorphisms at the genomic level and is highly reproducible and sensitive. Used in genetic mapping, diversity studies, phylogenetics, and marker-assisted selection. Principle AFLP relies on restriction digestion of genomic DNA, followed by ligation of adaptors and PCR amplification of a subset of fragments. Polymorphism arises due to variations in restriction sites, fragment length, insertions, or deletions. Key idea: Restriction digestion → Adaptor ligation → Selective amplification → Gel separation → Detection of polymorphic bands Materials Required Genomic DNA Restriction enzymes (usually EcoRI and MseI) Adaptors complementary to restriction sites PCR reagents: Taq polymerase, dNTPs, buffer, Mg²⁺ Primers complementary to adaptors with selective nucleotides Thermal cycler Polyacrylamide or agarose ...

❥ Southern Blotting Notes

Southern Blotting  ❥ 𓆞❥ 𓆞❥ 𓆞❥ 𓆞❥ 𓆞❥ 𓆞❥ 𓆞❥ 𓆞❥ 𓆞❥  Introduction Southern blotting is a molecular biology technique used for the detection of specific DNA sequences in a complex mixture of DNA. It was developed by Edwin M. Southern in 1975. The method involves restriction digestion of DNA, separation by gel electrophoresis, transfer (blotting) onto a membrane, and hybridization with a labeled DNA probe. Principle of Southern Blotting The technique is based on the principle of complementary base pairing. A single-stranded labeled DNA probe hybridizes specifically with its complementary DNA sequence immobilized on a membrane. Detection of the label confirms the presence and size of the target DNA fragment. Steps Involved in Southern Blotting. 1. Isolation of DNA Genomic DNA is extracted from cells or tissues. DNA must be pure and intact to ensure accurate results. 2. Restriction Enzyme  Digestion DNA is digested using specific restriction endonucleases. Produces DNA f...

DNA-Mediated Gene Transfer – Detailed Notes

DNA-Mediated Gene Transfer – Detailed Notes 1. Definition DNA-mediated gene transfer refers to the direct introduction of exogenous DNA into a host cell’s genome or cytoplasm without using viral or bacterial vectors. It is a physical or chemical approach to achieve gene delivery. Also called direct gene transfer. 2 . Principle Foreign DNA is delivered into host cells through physical or chemical methods. DNA may integrate into the host genome (stable transformation) or remain episomal (transient expression). Expression depends on: DNA sequence and promoter Type of host cell Delivery efficiency 3. Types of DNA-Mediated Gene Transfer A. Physical Methods These methods use physical forces to introduce DNA into cells. Microinjection DNA is injected directly into the nucleus or cytoplasm using a glass micropipette. Used in: animal embryos, oocytes, plant protoplasts Advantages: Precise, can deliver large DNA fragments Limitations: Labor-intensive, requires specialized equipment, low throughp...

Single Nucleotide Polymorphisms (SNPs) – Detailed Notes

Single Nucleotide Polymorphisms (SNPs) – Detailed Notes 1. Definition SNPs are single base-pair variations in the DNA sequence that occur at a specific position in the genome among individuals of a species. Example: At a specific locus, one individual may have A while another has G: Copy code Individual 1: …A T C G A T…   Individual 2: …A T C G G T… SNPs are the most common type of genetic variation in most organisms. 2. Characteristics of SNPs Single base change: Involves substitution of one nucleotide for another (A↔G, C↔T). Biallelic nature: Most SNPs have only two alleles in a population. Widespread in the genome: Found in coding regions (exons), non-coding regions (introns, promoters, intergenic regions). Stable inheritance: Passed from generation to generation like other genetic markers. Frequency: Occur approximately every 100–300 bp in the human genome. 3 . Types of SNPs SNPs are categorized based on location or effect on gene function: A. Based on genomic location Cod...

SSR (Simple Sequence Repeat) Marker

SSR (Simple Sequence Repeat) Markers – Detailed Notes Introduction SSR markers, also called microsatellites, are short tandem repeats (1–6 bp) of DNA sequences found throughout the genome. Examples: (A)n, (CA)n, (GATA)n, where n is the number of repeat units. SSRs are highly polymorphic, co-dominant, and locus-specific, widely used in genetic mapping, variety identification, population genetics, and marker-assisted selection (MAS). SSRs are similar to STRs; in plants and animals, the term SSR is more commonly used in molecular breeding, while STR is used more in forensics and human genetics. Structure of SSR Repeat motif: 1–6 bp Number of repeats: Variable among individuals → basis of polymorphism Flanking regions: Conserved sequences used to design specific PCR primers SSR loci are generally abundant in non-coding regions, though some occur in genes. Principle SSR markers exploit variation in the number of repeat units at a specific locus. PCR amplification using primers flanking the...

Protein Structure Database (PDB)

Protein Structure Database (PDB) Introduction The Protein Structure Database (PDB) is the primary global repository for the three-dimensional (3D) structures of biological macromolecules such as proteins, nucleic acids, and protein–ligand complexes. These structures are determined experimentally using techniques like X-ray crystallography, Nuclear Magnetic Resonance (NMR) spectroscopy, and Cryo-Electron Microscopy (Cryo-EM). PDB plays a vital role in understanding: Protein structure and function Molecular interactions Drug discovery and design Structural biology and bioinformatics History and Development Established in 1971 Founded by Brookhaven National Laboratory (USA) Initially contained only 7 protein structures Now maintained by the Worldwide Protein Data Bank (wwPDB) Members of wwPDB RCSB PDB (USA) PDBe (Europe) PDBj (Japan) BMRB (Biological Magnetic Resonance Data Bank) Objectives of PDB To collect, store, and distribute 3D structural data of biomolecules To provide free and ope...

GEL RETARDATION ANALYSIS

GEL RETARDATION ANALYSIS (EMSA – Electrophoretic Mobility Shift Assay) Introduction Gel retardation analysis, also known as Electrophoretic Mobility Shift Assay (EMSA), is a widely used in vitro technique for studying DNA–protein and RNA–protein interactions. The method is based on the observation that a DNA–protein complex migrates more slowly than free DNA during non-denaturing gel electrophoresis, resulting in a mobility shift or “retardation”. EMSA is extensively used to study transcription factor binding, regulatory DNA elements, and binding specificity. Definition Gel retardation analysis (EMSA) is a technique used to detect and analyze binding interactions between nucleic acids and proteins by observing the reduced electrophoretic mobility of nucleic acid–protein complexes compared to free nucleic acids. Principle A labeled DNA or RNA probe is incubated with a specific binding protein. When binding occurs, a nucleic acid–protein complex is formed. This complex has a larger size ...

Agrobacterium & CaMV-Mediated Gene Transfer –

Agrobacterium and CaMV-Mediated Gene Transfer – Detailed Notes 1. Introduction Gene transfer in plants is often achieved by exploiting natural genetic mechanisms of Agrobacterium tumefaciens and Cauliflower Mosaic Virus (CaMV). These systems allow stable introduction of foreign genes into plant genomes for transgenic plant development. 2. Agrobacterium-Mediated Gene Transfer 2.1 Definition Agrobacterium-mediated gene transfer uses the natural ability of Agrobacterium tumefaciens, a soil bacterium, to transfer a part of its DNA (T-DNA) into plant cells. T-DNA integrates into the plant nuclear genome, enabling stable transformation. 2.2 Mechanism Recognition and attachment Agrobacterium detects phenolic compounds secreted by wounded plant cells. These compounds activate virulence (vir) genes on the Ti (tumor-inducing) plasmid. Activation of vir genes VirA (sensor kinase) and VirG (response regulator) induce expression of other vir genes (VirB, VirC, VirD, VirE). T-DNA processing and tran...

SCAR (Sequence Characterized Amplified Region) Markers

SCAR (Sequence Characterized Amplified Region) Markers   Introduction SCAR markers are PCR-based DNA markers derived from RAPD, AFLP, or other random markers. Developed by Paran and Michelmore in 1993 to convert dominant, less reproducible markers into specific, reproducible, co-dominant markers. SCAR markers are locus-specific, reproducible, and sequence-characterized, making them ideal for marker-assisted selection (MAS). Principle SCAR markers are designed based on known DNA sequences obtained from cloned RAPD/AFLP fragments. Specific primers (18–24 bp) are synthesized to amplify a single, defined locus. The PCR amplification of this region generates a distinct band, which is highly reproducible and can distinguish homozygotes from heterozygotes if designed as co-dominant. Key idea: Random marker (e.g., RAPD) → Cloning & sequencing → Design specific primers → PCR → SCAR marker Materials Required Genomic DNA from the organism Specific primers (18–24 bp) designed from sequence...