Skip to main content

Protein Sequence DatabasesPIR, SWISS-PROT and TREMBEL


Protein Sequence Databases
PIR, SWISS-PROT and TREMBEL

1. Introduction

Protein sequence databases are biological databases that store information about amino acid sequences of proteins, along with their functional, structural, and biochemical characteristics. Since proteins are the functional molecules of the cell, protein databases are essential for understanding gene expression, metabolism, enzymatic activity, signaling pathways, and evolution.
Protein sequence databases mainly contain data derived from translated nucleotide sequences and experimental protein studies.

2. Types of Protein Sequence Databases

Protein sequence databases are broadly classified into:

A. Primary Protein Databases

Contain original protein sequence data
Minimal or no manual annotation

B. Secondary Protein Databases
Derived from primary databases
Provide curated functional and structural information

C. Composite Protein Databases
Combine protein data from multiple sources
Reduce redundancy
3. Protein Information Resource (PIR)

Overview
Protein Information Resource (PIR) is one of the earliest protein sequence databases, developed to store and analyze protein sequences.

Maintained by

Georgetown University (USA)
In collaboration with NBRF (National Biomedical Research Foundation)


Data Content

Protein sequences
Functional information
Evolutionary relationships
Classification into protein families

Unique Features
Organized into protein superfamilies
Emphasis on evolutionary and functional classification
Non-redundant dataset

Advantages
High-quality annotations
Useful for comparative protein studies

Limitations
Smaller than newer databases
Less frequently updated compared to UniProt


4. SWISS-PROT Database

Overview
SWISS-PROT is a manually curated, high-quality protein sequence database known for its accuracy and reliability.

Maintained by
Swiss Institute of Bioinformatics (SIB)
European Bioinformatics Institute (EMBL-EBI)

Data Content

Amino acid sequences
Protein function
Enzyme activity
Post-translational modifications
Domain structure
Subcellular localization


Key Features

Manual curation by experts
Minimal redundancy
High annotation accuracy
Extensive cross-references


SWISS-PROT Entry Includes : 
Accession number
Protein name
Organism
Function
Sequence length
Amino acid sequence

Advantages
Highly reliable
Preferred for functional studies
Limitations
Slow growth due to manual annotation

5. TrEMBL (Translated EMBL)

Overview
TrEMBL is a computer-annotated protein database that contains protein sequences translated from nucleotide sequence databases.

Maintained by
EMBL-EBI
Swiss Institute of Bioinformatics

Data Source
Translations of coding sequences from:
EMBL
GenBank
DDBJ
Key Features
Automatically annotated
Large and rapidly growing database
Supplement to SWISS-PROT

Advantages
Covers newly discovered proteins
Fast data availability

Limitations

Annotation may contain errors
Less reliable than SWISS-PROT

6. UniProt Knowledgebase (UniProtKB)

SWISS-PROT and TrEMBL together form the UniProt Knowledgebase (UniProtKB).
Components
UniProtKB/Swiss-Prot – reviewed, manually curated
UniProtKB/TrEMBL – unreviewed, automatically annotated

Purpose
Provide comprehensive protein sequence and functional information
Serve as a central protein knowledge hub


7. Comparison of PIR, SWISS-PROT, and TrEMBL


8. Applications of Protein Sequence Databases

Protein function prediction
Identification of conserved domains
Comparative protein analysis
Phylogenetic studies
Drug target identification
Enzyme characterization

9. Importance of Protein Sequence Databases
Link genes to protein function
Support proteomics research
Assist in metabolic pathway analysis
Aid in molecular evolution studies
Help in crop improvement and biotechnology

10. Conclusion
Protein sequence databases such as PIR, SWISS-PROT, and TrEMBL play a vital role in modern bioinformatics. While SWISS-PROT provides high-quality, manually curated protein data, TrEMBL ensures rapid availability of newly sequenced proteins. PIR contributes valuable evolutionary and functional classifications. Together, these databases support comprehensive protein research and biological discovery.

Comments

Popular Posts

••CLASSIFICATION OF ALGAE - FRITSCH

      MODULE -1       PHYCOLOGY  CLASSIFICATION OF ALGAE - FRITSCH  ❖F.E. Fritsch (1935, 1945) in his book“The Structure and  Reproduction of the Algae”proposed a system of classification of  algae. He treated algae giving rank of division and divided it into 11  classes. His classification of algae is mainly based upon characters of  pigments, flagella and reserve food material.     Classification of Fritsch was based on the following criteria o Pigmentation. o Types of flagella  o Assimilatory products  o Thallus structure  o Method of reproduction          Fritsch divided algae into the following 11 classes  1. Chlorophyceae  2. Xanthophyceae  3. Chrysophyceae  4. Bacillariophyceae  5. Cryptophyceae  6. Dinophyceae  7. Chloromonadineae  8. Euglenineae    9. Phaeophyceae  10. Rhodophyceae  11. Myxophyce...

Mapping of DNA

DNA MAPPING   1. Introduction DNA mapping refers to the process of determining the relative positions of genes or DNA sequences on a chromosome. It provides information about the organization, structure, and distance between genetic markers in a genome. DNA mapping is an essential step toward genome sequencing, gene identification, disease diagnosis, and genetic engineering. DNA maps serve as roadmaps that guide researchers to locate specific genes associated with traits or diseases. 2. Objectives of DNA Mapping To locate genes on chromosomes To determine the order of genes To estimate distances between genes or markers To study genome organization To assist in genome sequencing projects. 3. Principles of DNA Mapping DNA mapping is based on: Recombination frequency Physical distance between DNA fragments Hybridization of complementary DNA Restriction enzyme digestion Use of genetic markers The closer two genes are, the less frequently they recombine during meiosis. 4 . Types of DNA...

Biological Databases – Types of Data and DatabasesNucleotide Sequence Databases (EMBL, GenBank, DDBJ)

Biological Databases – Types of Data and Databases Nucleotide Sequence Databases (EMBL, GenBank, DDBJ) 1. Introduction Biological databases are systematic, computerized collections of biological information that allow efficient storage, retrieval, updating, and analysis of large volumes of biological data. With the advent of genome sequencing, molecular biology, and bioinformatics, biological databases have become essential tools in biological research. These databases support studies in genomics, proteomics, evolutionary biology, taxonomy, medicine, agriculture, and biotechnology. 2. Types of Data Stored in Biological Databases Biological databases store diverse types of biological information, including: 1. Sequence Data DNA sequences RNA sequences Protein sequences 2. Structural Data Three-dimensional structures of proteins Nucleic acid structures 3. Functional Data Gene functions Enzyme activity Regulatory elements 4. Genomic Annotation Data Gene location Exons, introns Promoters a...

Agrobacterium & CaMV-Mediated Gene Transfer –

Agrobacterium and CaMV-Mediated Gene Transfer – Detailed Notes 1. Introduction Gene transfer in plants is often achieved by exploiting natural genetic mechanisms of Agrobacterium tumefaciens and Cauliflower Mosaic Virus (CaMV). These systems allow stable introduction of foreign genes into plant genomes for transgenic plant development. 2. Agrobacterium-Mediated Gene Transfer 2.1 Definition Agrobacterium-mediated gene transfer uses the natural ability of Agrobacterium tumefaciens, a soil bacterium, to transfer a part of its DNA (T-DNA) into plant cells. T-DNA integrates into the plant nuclear genome, enabling stable transformation. 2.2 Mechanism Recognition and attachment Agrobacterium detects phenolic compounds secreted by wounded plant cells. These compounds activate virulence (vir) genes on the Ti (tumor-inducing) plasmid. Activation of vir genes VirA (sensor kinase) and VirG (response regulator) induce expression of other vir genes (VirB, VirC, VirD, VirE). T-DNA processing and tran...

❃HPLC – High Performance Liquid Chromatography

HPLC – High Performance Liquid Chromatography ┏━━━━━ •❃°•°❀°•°❃•━━━━•━━━┓  1. Introduction High Performance Liquid Chromatography (HPLC) is an advanced analytical technique used for the separation, identification, and quantification of components present in a mixture. It is based on the differential distribution of analytes between a stationary phase and a liquid mobile phase under high pressure. HPLC is widely used in biochemistry, biotechnology, pharmaceuticals, food analysis, environmental studies, and clinical diagnostics. 2. Principle of HPLC The principle of HPLC is based on partition, adsorption, ion-exchange, or size-exclusion mechanisms, depending on the type of column used. A liquid mobile phase is pumped at high pressure through a column packed with fine stationary phase particles Sample components interact differently with the stationary phase Components with stronger interaction elute slower Components with weaker interaction elute faster Separated components are detec...

❃HPTLC (HIGH PERFORMANCE THIN LAYER CHROMATOGRAPHY) DETAILED NOTES

HPTLC (HIGH PERFORMANCE THIN LAYER CHROMATOGRAPHY) DETAILED NOTES ┏━━━━━ •❃°•°❀°•°❃•━━━━•━━━┓ 1. INTRODUCTION HPTLC is an advanced form of Thin Layer Chromatography (TLC) that allows high-resolution separation and quantitative analysis of chemical compounds. It combines classical TLC principles with automation, precise sample application, and densitometric detection. HPTLC is widely used in pharmaceuticals, herbal medicine, food analysis, and chemical research. Compared to TLC, HPTLC offers: Better resolution Higher sensitivity Quantitative capabilities Example: Fingerprinting of plant extracts, identification of drugs in mixtures, detection of contaminants in food. 2. PRINCIPLE HPTLC separates compounds based on differential migration on a stationary phase under the influence of a mobile phase. Principle: Adsorption chromatography Compounds interact with the stationary phase (silica gel, alumina, or cellulose) differently depending on polarity, molecular size, or functional groups. Mo...

❃LC-MS (LIQUID CHROMATOGRAPHY – MASS SPECTROMETRY)

LC-MS (LIQUID CHROMATOGRAPHY – MASS SPECTROMETRY)  ┏━━━━━ •❃°•°❀°•°❃•━━━━•━━━┓ 1. INTRODUCTION LC-MS is a hyphenated analytical technique combining Liquid Chromatography (LC) and Mass Spectrometry (MS). It is used for separation, identification, and quantification of compounds in complex mixtures. LC separates analytes based on polarity, size, or charge, while MS detects molecules based on mass-to-charge ratio (m/z). Developed in the 1970s–1980s, LC-MS is now widely used in pharmaceutical, clinical, environmental, and food analysis. Importance : Detects trace levels of compounds (ng–pg range) Analyzes non-volatile, thermally labile compounds that cannot be analyzed by GC-MS Provides structural information through mass fragmentation Example: Detection of drugs in plasma, protein identification in proteomics, pesticide residue analysis in food. 2. COMPONENTS OF LC-MS The LC-MS system has three main parts: A. Liquid Chromatograph (LC) Function: Separates components of a mixture befor...