Skip to main content

Protein Sequence DatabasesPIR, SWISS-PROT and TREMBEL


Protein Sequence Databases
PIR, SWISS-PROT and TREMBEL

1. Introduction

Protein sequence databases are biological databases that store information about amino acid sequences of proteins, along with their functional, structural, and biochemical characteristics. Since proteins are the functional molecules of the cell, protein databases are essential for understanding gene expression, metabolism, enzymatic activity, signaling pathways, and evolution.
Protein sequence databases mainly contain data derived from translated nucleotide sequences and experimental protein studies.

2. Types of Protein Sequence Databases

Protein sequence databases are broadly classified into:

A. Primary Protein Databases

Contain original protein sequence data
Minimal or no manual annotation

B. Secondary Protein Databases
Derived from primary databases
Provide curated functional and structural information

C. Composite Protein Databases
Combine protein data from multiple sources
Reduce redundancy
3. Protein Information Resource (PIR)

Overview
Protein Information Resource (PIR) is one of the earliest protein sequence databases, developed to store and analyze protein sequences.

Maintained by

Georgetown University (USA)
In collaboration with NBRF (National Biomedical Research Foundation)


Data Content

Protein sequences
Functional information
Evolutionary relationships
Classification into protein families

Unique Features
Organized into protein superfamilies
Emphasis on evolutionary and functional classification
Non-redundant dataset

Advantages
High-quality annotations
Useful for comparative protein studies

Limitations
Smaller than newer databases
Less frequently updated compared to UniProt


4. SWISS-PROT Database

Overview
SWISS-PROT is a manually curated, high-quality protein sequence database known for its accuracy and reliability.

Maintained by
Swiss Institute of Bioinformatics (SIB)
European Bioinformatics Institute (EMBL-EBI)

Data Content

Amino acid sequences
Protein function
Enzyme activity
Post-translational modifications
Domain structure
Subcellular localization


Key Features

Manual curation by experts
Minimal redundancy
High annotation accuracy
Extensive cross-references


SWISS-PROT Entry Includes : 
Accession number
Protein name
Organism
Function
Sequence length
Amino acid sequence

Advantages
Highly reliable
Preferred for functional studies
Limitations
Slow growth due to manual annotation

5. TrEMBL (Translated EMBL)

Overview
TrEMBL is a computer-annotated protein database that contains protein sequences translated from nucleotide sequence databases.

Maintained by
EMBL-EBI
Swiss Institute of Bioinformatics

Data Source
Translations of coding sequences from:
EMBL
GenBank
DDBJ
Key Features
Automatically annotated
Large and rapidly growing database
Supplement to SWISS-PROT

Advantages
Covers newly discovered proteins
Fast data availability

Limitations

Annotation may contain errors
Less reliable than SWISS-PROT

6. UniProt Knowledgebase (UniProtKB)

SWISS-PROT and TrEMBL together form the UniProt Knowledgebase (UniProtKB).
Components
UniProtKB/Swiss-Prot – reviewed, manually curated
UniProtKB/TrEMBL – unreviewed, automatically annotated

Purpose
Provide comprehensive protein sequence and functional information
Serve as a central protein knowledge hub


7. Comparison of PIR, SWISS-PROT, and TrEMBL


8. Applications of Protein Sequence Databases

Protein function prediction
Identification of conserved domains
Comparative protein analysis
Phylogenetic studies
Drug target identification
Enzyme characterization

9. Importance of Protein Sequence Databases
Link genes to protein function
Support proteomics research
Assist in metabolic pathway analysis
Aid in molecular evolution studies
Help in crop improvement and biotechnology

10. Conclusion
Protein sequence databases such as PIR, SWISS-PROT, and TrEMBL play a vital role in modern bioinformatics. While SWISS-PROT provides high-quality, manually curated protein data, TrEMBL ensures rapid availability of newly sequenced proteins. PIR contributes valuable evolutionary and functional classifications. Together, these databases support comprehensive protein research and biological discovery.

Comments

Popular Posts

AFLP--Amplified Fragment Length Polymorphism

AFLP is a PCR-based DNA fingerprinting technique combining restriction digestion and selective PCR amplification of genomic DNA fragments. Developed by Vos et al., 1995. AFLP detects DNA polymorphisms at the genomic level and is highly reproducible and sensitive. Used in genetic mapping, diversity studies, phylogenetics, and marker-assisted selection. Principle AFLP relies on restriction digestion of genomic DNA, followed by ligation of adaptors and PCR amplification of a subset of fragments. Polymorphism arises due to variations in restriction sites, fragment length, insertions, or deletions. Key idea: Restriction digestion → Adaptor ligation → Selective amplification → Gel separation → Detection of polymorphic bands Materials Required Genomic DNA Restriction enzymes (usually EcoRI and MseI) Adaptors complementary to restriction sites PCR reagents: Taq polymerase, dNTPs, buffer, Mg²⁺ Primers complementary to adaptors with selective nucleotides Thermal cycler Polyacrylamide or agarose ...

❥ Southern Blotting Notes

Southern Blotting  ❥ 𓆞❥ 𓆞❥ 𓆞❥ 𓆞❥ 𓆞❥ 𓆞❥ 𓆞❥ 𓆞❥ 𓆞❥  Introduction Southern blotting is a molecular biology technique used for the detection of specific DNA sequences in a complex mixture of DNA. It was developed by Edwin M. Southern in 1975. The method involves restriction digestion of DNA, separation by gel electrophoresis, transfer (blotting) onto a membrane, and hybridization with a labeled DNA probe. Principle of Southern Blotting The technique is based on the principle of complementary base pairing. A single-stranded labeled DNA probe hybridizes specifically with its complementary DNA sequence immobilized on a membrane. Detection of the label confirms the presence and size of the target DNA fragment. Steps Involved in Southern Blotting. 1. Isolation of DNA Genomic DNA is extracted from cells or tissues. DNA must be pure and intact to ensure accurate results. 2. Restriction Enzyme  Digestion DNA is digested using specific restriction endonucleases. Produces DNA f...

Secondary Databases (PROSITE, PRINTS, BLOCKS)

Secondary Databases (PROSITE, PRINTS, BLOCKS  Secondary Databases Introduction Biological databases are broadly classified into primary and secondary databases. Primary databases store raw experimental data (e.g., nucleotide or protein sequences), whereas secondary databases contain derived information obtained by analyzing primary sequence data. Secondary databases are mainly used to: Identify protein families Detect conserved motifs, patterns, and domains Predict protein function Study structure–function relationships Examples of secondary databases include PROSITE, PRINTS, BLOCKS, Pfam, etc. 1. PROSITE Database Definition PROSITE is a secondary database that documents protein domains, families, and functional sites in the form of patterns and profiles. Developed by Swiss Institute of Bioinformatics (SIB) Maintained along with UniProt Principle PROSITE is based on the idea that functionally important regions of proteins are conserved during evolution. These conserved regions can ...

DNA-Mediated Gene Transfer – Detailed Notes

DNA-Mediated Gene Transfer – Detailed Notes 1. Definition DNA-mediated gene transfer refers to the direct introduction of exogenous DNA into a host cell’s genome or cytoplasm without using viral or bacterial vectors. It is a physical or chemical approach to achieve gene delivery. Also called direct gene transfer. 2 . Principle Foreign DNA is delivered into host cells through physical or chemical methods. DNA may integrate into the host genome (stable transformation) or remain episomal (transient expression). Expression depends on: DNA sequence and promoter Type of host cell Delivery efficiency 3. Types of DNA-Mediated Gene Transfer A. Physical Methods These methods use physical forces to introduce DNA into cells. Microinjection DNA is injected directly into the nucleus or cytoplasm using a glass micropipette. Used in: animal embryos, oocytes, plant protoplasts Advantages: Precise, can deliver large DNA fragments Limitations: Labor-intensive, requires specialized equipment, low throughp...

Single Nucleotide Polymorphisms (SNPs) – Detailed Notes

Single Nucleotide Polymorphisms (SNPs) – Detailed Notes 1. Definition SNPs are single base-pair variations in the DNA sequence that occur at a specific position in the genome among individuals of a species. Example: At a specific locus, one individual may have A while another has G: Copy code Individual 1: …A T C G A T…   Individual 2: …A T C G G T… SNPs are the most common type of genetic variation in most organisms. 2. Characteristics of SNPs Single base change: Involves substitution of one nucleotide for another (A↔G, C↔T). Biallelic nature: Most SNPs have only two alleles in a population. Widespread in the genome: Found in coding regions (exons), non-coding regions (introns, promoters, intergenic regions). Stable inheritance: Passed from generation to generation like other genetic markers. Frequency: Occur approximately every 100–300 bp in the human genome. 3 . Types of SNPs SNPs are categorized based on location or effect on gene function: A. Based on genomic location Cod...

SSR (Simple Sequence Repeat) Marker

SSR (Simple Sequence Repeat) Markers – Detailed Notes Introduction SSR markers, also called microsatellites, are short tandem repeats (1–6 bp) of DNA sequences found throughout the genome. Examples: (A)n, (CA)n, (GATA)n, where n is the number of repeat units. SSRs are highly polymorphic, co-dominant, and locus-specific, widely used in genetic mapping, variety identification, population genetics, and marker-assisted selection (MAS). SSRs are similar to STRs; in plants and animals, the term SSR is more commonly used in molecular breeding, while STR is used more in forensics and human genetics. Structure of SSR Repeat motif: 1–6 bp Number of repeats: Variable among individuals → basis of polymorphism Flanking regions: Conserved sequences used to design specific PCR primers SSR loci are generally abundant in non-coding regions, though some occur in genes. Principle SSR markers exploit variation in the number of repeat units at a specific locus. PCR amplification using primers flanking the...

Protein Structure Database (PDB)

Protein Structure Database (PDB) Introduction The Protein Structure Database (PDB) is the primary global repository for the three-dimensional (3D) structures of biological macromolecules such as proteins, nucleic acids, and protein–ligand complexes. These structures are determined experimentally using techniques like X-ray crystallography, Nuclear Magnetic Resonance (NMR) spectroscopy, and Cryo-Electron Microscopy (Cryo-EM). PDB plays a vital role in understanding: Protein structure and function Molecular interactions Drug discovery and design Structural biology and bioinformatics History and Development Established in 1971 Founded by Brookhaven National Laboratory (USA) Initially contained only 7 protein structures Now maintained by the Worldwide Protein Data Bank (wwPDB) Members of wwPDB RCSB PDB (USA) PDBe (Europe) PDBj (Japan) BMRB (Biological Magnetic Resonance Data Bank) Objectives of PDB To collect, store, and distribute 3D structural data of biomolecules To provide free and ope...

GEL RETARDATION ANALYSIS

GEL RETARDATION ANALYSIS (EMSA – Electrophoretic Mobility Shift Assay) Introduction Gel retardation analysis, also known as Electrophoretic Mobility Shift Assay (EMSA), is a widely used in vitro technique for studying DNA–protein and RNA–protein interactions. The method is based on the observation that a DNA–protein complex migrates more slowly than free DNA during non-denaturing gel electrophoresis, resulting in a mobility shift or “retardation”. EMSA is extensively used to study transcription factor binding, regulatory DNA elements, and binding specificity. Definition Gel retardation analysis (EMSA) is a technique used to detect and analyze binding interactions between nucleic acids and proteins by observing the reduced electrophoretic mobility of nucleic acid–protein complexes compared to free nucleic acids. Principle A labeled DNA or RNA probe is incubated with a specific binding protein. When binding occurs, a nucleic acid–protein complex is formed. This complex has a larger size ...

Agrobacterium & CaMV-Mediated Gene Transfer –

Agrobacterium and CaMV-Mediated Gene Transfer – Detailed Notes 1. Introduction Gene transfer in plants is often achieved by exploiting natural genetic mechanisms of Agrobacterium tumefaciens and Cauliflower Mosaic Virus (CaMV). These systems allow stable introduction of foreign genes into plant genomes for transgenic plant development. 2. Agrobacterium-Mediated Gene Transfer 2.1 Definition Agrobacterium-mediated gene transfer uses the natural ability of Agrobacterium tumefaciens, a soil bacterium, to transfer a part of its DNA (T-DNA) into plant cells. T-DNA integrates into the plant nuclear genome, enabling stable transformation. 2.2 Mechanism Recognition and attachment Agrobacterium detects phenolic compounds secreted by wounded plant cells. These compounds activate virulence (vir) genes on the Ti (tumor-inducing) plasmid. Activation of vir genes VirA (sensor kinase) and VirG (response regulator) induce expression of other vir genes (VirB, VirC, VirD, VirE). T-DNA processing and tran...

SCAR (Sequence Characterized Amplified Region) Markers

SCAR (Sequence Characterized Amplified Region) Markers   Introduction SCAR markers are PCR-based DNA markers derived from RAPD, AFLP, or other random markers. Developed by Paran and Michelmore in 1993 to convert dominant, less reproducible markers into specific, reproducible, co-dominant markers. SCAR markers are locus-specific, reproducible, and sequence-characterized, making them ideal for marker-assisted selection (MAS). Principle SCAR markers are designed based on known DNA sequences obtained from cloned RAPD/AFLP fragments. Specific primers (18–24 bp) are synthesized to amplify a single, defined locus. The PCR amplification of this region generates a distinct band, which is highly reproducible and can distinguish homozygotes from heterozygotes if designed as co-dominant. Key idea: Random marker (e.g., RAPD) → Cloning & sequencing → Design specific primers → PCR → SCAR marker Materials Required Genomic DNA from the organism Specific primers (18–24 bp) designed from sequence...