Skip to main content

Biological Databases – Types of Data and DatabasesNucleotide Sequence Databases (EMBL, GenBank, DDBJ)


Biological Databases – Types of Data and Databases
Nucleotide Sequence Databases (EMBL, GenBank, DDBJ)

1. Introduction

Biological databases are systematic, computerized collections of biological information that allow efficient storage, retrieval, updating, and analysis of large volumes of biological data. With the advent of genome sequencing, molecular biology, and bioinformatics, biological databases have become essential tools in biological research.
These databases support studies in genomics, proteomics, evolutionary biology, taxonomy, medicine, agriculture, and biotechnology.

2. Types of Data Stored in Biological Databases
Biological databases store diverse types of biological information, including:

1. Sequence Data
DNA sequences
RNA sequences
Protein sequences

2. Structural Data

Three-dimensional structures of proteins
Nucleic acid structures

3. Functional Data

Gene functions
Enzyme activity
Regulatory elements

4. Genomic Annotation Data

Gene location
Exons, introns
Promoters and regulatory regions

5. Expression Data

Transcriptome data
Gene expression profiles

3. Classification of Biological Databases
Based on content and level of data processing, biological databases are classified into:

A. Primary Databases

Contain raw experimental data
Direct submissions from researchers
Minimal annotation
Examples:
GenBank, EMBL, DDBJ, Protein Data Bank (PDB)

B. Secondary Databases

Data derived from primary databases
Highly curated and analyzed
Provide functional and structural annotations
Examples:
UniProt, PROSITE, Pfam, SCOP


C. Composite (Integrated) Databases


Combine information from multiple databases
Reduce redundancy
Provide non-overlapping datasets
Examples:
RefSeq, UniGene, Ensembl

4. Nucleotide Sequence Databases

Nucleotide sequence databases store DNA and RNA sequences obtained through sequencing experiments. They are essential for gene discovery, genome analysis, comparative genomics, and evolutionary studies.

The three major global nucleotide sequence databases are:
GenBank (USA)
EMBL-ENA (Europe)
DDBJ (Japan)

These databases function under the International Nucleotide Sequence Database Collaboration (INSDC).


5. International Nucleotide Sequence Database Collaboration (INSDC)


INSDC is a global consortium that ensures:
Free and open access to nucleotide sequence data
Daily exchange of data among databases
Uniform data formats and annotation standards.


Members of INSDC:
GenBank – NCBI (USA)
EMBL-ENA – EMBL-EBI (Europe)
DDBJ – National Institute of Genetics (Japan)

6. GenBank
Overview
GenBank is a comprehensive nucleotide sequence database maintained by the National Center for Biotechnology Information (NCBI), USA. It is one of the largest and most widely used biological databases.
Types of Data Stored

Genomic DNA
cDNA and mRNA sequences
ESTs (Expressed Sequence Tags)
Whole genome sequences
Organelle genomes


7. EMBL (European Molecular Biology Laboratory Database)

Overview
The EMBL nucleotide database is maintained by the European Bioinformatics Institute (EMBL-EBI) and is now part of the European Nucleotide Archive (ENA).



8. DDBJ (DNA Data Bank of Japan)

Overview
DDBJ is maintained by the National Institute of Genetics (NIG), Japan. It mainly accepts sequence submissions from Asian countries but is globally accessible.
Data Stored
DNA and RNA sequences
Whole genome sequences
Environmental and metagenomic data
Special Features
Uses data formats similar to GenBank and EMBL
Exchanges data daily with other INSDC members
Provides online submission tools
9. Comparison of GenBank, EMBL, and DDBJ




➡ All three contain identical data but differ in access portals and management.


10. Importance of Nucleotide Sequence Databases
Preserve genetic information
Support genome sequencing projects
Enable gene identification and annotation
Facilitate evolutionary and phylogenetic studies
Assist in medical, agricultural, and environmental research
11. Applications
Comparative genomics
Molecular taxonomy
Gene cloning and primer design
Mutation analysis
Crop improvement and breeding programmes
12. Conclusion
Biological databases play a central role in modern biological research. Among them, nucleotide sequence databases such as GenBank, EMBL, and DDBJ are primary repositories that store DNA and RNA sequences. Through the INSDC collaboration, these databases ensure global data sharing, accuracy, and accessibility, making them indispensable resources for genomics, bioinformatics, and biotechnology.




Comments

Popular Posts

••CLASSIFICATION OF ALGAE - FRITSCH

      MODULE -1       PHYCOLOGY  CLASSIFICATION OF ALGAE - FRITSCH  ❖F.E. Fritsch (1935, 1945) in his book“The Structure and  Reproduction of the Algae”proposed a system of classification of  algae. He treated algae giving rank of division and divided it into 11  classes. His classification of algae is mainly based upon characters of  pigments, flagella and reserve food material.     Classification of Fritsch was based on the following criteria o Pigmentation. o Types of flagella  o Assimilatory products  o Thallus structure  o Method of reproduction          Fritsch divided algae into the following 11 classes  1. Chlorophyceae  2. Xanthophyceae  3. Chrysophyceae  4. Bacillariophyceae  5. Cryptophyceae  6. Dinophyceae  7. Chloromonadineae  8. Euglenineae    9. Phaeophyceae  10. Rhodophyceae  11. Myxophyce...

Agrobacterium & CaMV-Mediated Gene Transfer –

Agrobacterium and CaMV-Mediated Gene Transfer – Detailed Notes 1. Introduction Gene transfer in plants is often achieved by exploiting natural genetic mechanisms of Agrobacterium tumefaciens and Cauliflower Mosaic Virus (CaMV). These systems allow stable introduction of foreign genes into plant genomes for transgenic plant development. 2. Agrobacterium-Mediated Gene Transfer 2.1 Definition Agrobacterium-mediated gene transfer uses the natural ability of Agrobacterium tumefaciens, a soil bacterium, to transfer a part of its DNA (T-DNA) into plant cells. T-DNA integrates into the plant nuclear genome, enabling stable transformation. 2.2 Mechanism Recognition and attachment Agrobacterium detects phenolic compounds secreted by wounded plant cells. These compounds activate virulence (vir) genes on the Ti (tumor-inducing) plasmid. Activation of vir genes VirA (sensor kinase) and VirG (response regulator) induce expression of other vir genes (VirB, VirC, VirD, VirE). T-DNA processing and tran...

Mapping of DNA

DNA MAPPING   1. Introduction DNA mapping refers to the process of determining the relative positions of genes or DNA sequences on a chromosome. It provides information about the organization, structure, and distance between genetic markers in a genome. DNA mapping is an essential step toward genome sequencing, gene identification, disease diagnosis, and genetic engineering. DNA maps serve as roadmaps that guide researchers to locate specific genes associated with traits or diseases. 2. Objectives of DNA Mapping To locate genes on chromosomes To determine the order of genes To estimate distances between genes or markers To study genome organization To assist in genome sequencing projects. 3. Principles of DNA Mapping DNA mapping is based on: Recombination frequency Physical distance between DNA fragments Hybridization of complementary DNA Restriction enzyme digestion Use of genetic markers The closer two genes are, the less frequently they recombine during meiosis. 4 . Types of DNA...

❃HPLC – High Performance Liquid Chromatography

HPLC – High Performance Liquid Chromatography ┏━━━━━ •❃°•°❀°•°❃•━━━━•━━━┓  1. Introduction High Performance Liquid Chromatography (HPLC) is an advanced analytical technique used for the separation, identification, and quantification of components present in a mixture. It is based on the differential distribution of analytes between a stationary phase and a liquid mobile phase under high pressure. HPLC is widely used in biochemistry, biotechnology, pharmaceuticals, food analysis, environmental studies, and clinical diagnostics. 2. Principle of HPLC The principle of HPLC is based on partition, adsorption, ion-exchange, or size-exclusion mechanisms, depending on the type of column used. A liquid mobile phase is pumped at high pressure through a column packed with fine stationary phase particles Sample components interact differently with the stationary phase Components with stronger interaction elute slower Components with weaker interaction elute faster Separated components are detec...

❃HPTLC (HIGH PERFORMANCE THIN LAYER CHROMATOGRAPHY) DETAILED NOTES

HPTLC (HIGH PERFORMANCE THIN LAYER CHROMATOGRAPHY) DETAILED NOTES ┏━━━━━ •❃°•°❀°•°❃•━━━━•━━━┓ 1. INTRODUCTION HPTLC is an advanced form of Thin Layer Chromatography (TLC) that allows high-resolution separation and quantitative analysis of chemical compounds. It combines classical TLC principles with automation, precise sample application, and densitometric detection. HPTLC is widely used in pharmaceuticals, herbal medicine, food analysis, and chemical research. Compared to TLC, HPTLC offers: Better resolution Higher sensitivity Quantitative capabilities Example: Fingerprinting of plant extracts, identification of drugs in mixtures, detection of contaminants in food. 2. PRINCIPLE HPTLC separates compounds based on differential migration on a stationary phase under the influence of a mobile phase. Principle: Adsorption chromatography Compounds interact with the stationary phase (silica gel, alumina, or cellulose) differently depending on polarity, molecular size, or functional groups. Mo...

❥ Southern Blotting Notes

Southern Blotting  ❥ 𓆞❥ 𓆞❥ 𓆞❥ 𓆞❥ 𓆞❥ 𓆞❥ 𓆞❥ 𓆞❥ 𓆞❥  Introduction Southern blotting is a molecular biology technique used for the detection of specific DNA sequences in a complex mixture of DNA. It was developed by Edwin M. Southern in 1975. The method involves restriction digestion of DNA, separation by gel electrophoresis, transfer (blotting) onto a membrane, and hybridization with a labeled DNA probe. Principle of Southern Blotting The technique is based on the principle of complementary base pairing. A single-stranded labeled DNA probe hybridizes specifically with its complementary DNA sequence immobilized on a membrane. Detection of the label confirms the presence and size of the target DNA fragment. Steps Involved in Southern Blotting. 1. Isolation of DNA Genomic DNA is extracted from cells or tissues. DNA must be pure and intact to ensure accurate results. 2. Restriction Enzyme  Digestion DNA is digested using specific restriction endonucleases. Produces DNA f...

❃LC-MS (LIQUID CHROMATOGRAPHY – MASS SPECTROMETRY)

LC-MS (LIQUID CHROMATOGRAPHY – MASS SPECTROMETRY)  ┏━━━━━ •❃°•°❀°•°❃•━━━━•━━━┓ 1. INTRODUCTION LC-MS is a hyphenated analytical technique combining Liquid Chromatography (LC) and Mass Spectrometry (MS). It is used for separation, identification, and quantification of compounds in complex mixtures. LC separates analytes based on polarity, size, or charge, while MS detects molecules based on mass-to-charge ratio (m/z). Developed in the 1970s–1980s, LC-MS is now widely used in pharmaceutical, clinical, environmental, and food analysis. Importance : Detects trace levels of compounds (ng–pg range) Analyzes non-volatile, thermally labile compounds that cannot be analyzed by GC-MS Provides structural information through mass fragmentation Example: Detection of drugs in plasma, protein identification in proteomics, pesticide residue analysis in food. 2. COMPONENTS OF LC-MS The LC-MS system has three main parts: A. Liquid Chromatograph (LC) Function: Separates components of a mixture befor...

Micropropagation for Large-Scale Production of Medicinal Plants, Tree Species and Ornamentals –

Micropropagation for Large-Scale Production of Medicinal Plants, Tree Species and Ornamentals –  1. Introduction Micropropagation is an in-vitro clonal propagation technique used for rapid multiplication of plants under aseptic and controlled laboratory conditions. It enables the production of a large number of genetically uniform, disease-free plants from a small amount of starting material (explant). This technique is especially important for medicinal plants, forest tree species and ornamental plants, where conventional propagation is slow, seasonal or inefficient. 2. Principle of Micropropagation Micropropagation is based on totipotency, the inherent ability of a single plant cell to regenerate into a complete plant when provided with: Suitable nutrient medium Proper plant growth regulators Controlled light, temperature and humidity Sterile conditions. 3. Stages of Micropropagation Micropropagation generally involves five stages : Stage I – Selection and Sterilization of Expla...

Suspension culture and development - methodology, kinetics of growth and production formation, elicitation methods, hairy root culture. Detailed notes

Suspension culture and development - methodology, kinetics of growth and production formation, elicitation methods, hairy root culture. Detailed notes 1. Introduction Suspension culture is a type of plant tissue culture in which single cells or small cell aggregates are grown in liquid nutrient medium under continuous agitation. It is mainly used for: Large-scale biomass production Secondary metabolite production Cell physiology and biochemical studies Genetic manipulation and selection. 2. Methodology of Suspension Culture 2.1 Source of Explant Usually initiated from friable callus Callus derived from: Leaf Stem Root Hypocotyl Friable callus is preferred as it disintegrates easily into single cells. 2.2 Preparation of Cell Suspension Friable callus is transferred into liquid MS medium Medium contains: Carbon source (usually sucrose) Auxins (2,4-D commonly used) Culture maintained in: Conical flasks Orbital shaker (100–150 rpm) 2.3 Culture Conditions Parameter Requirement Temperature ...