Skip to main content

Biological Databases – Types of Data and DatabasesNucleotide Sequence Databases (EMBL, GenBank, DDBJ)


Biological Databases – Types of Data and Databases
Nucleotide Sequence Databases (EMBL, GenBank, DDBJ)

1. Introduction

Biological databases are systematic, computerized collections of biological information that allow efficient storage, retrieval, updating, and analysis of large volumes of biological data. With the advent of genome sequencing, molecular biology, and bioinformatics, biological databases have become essential tools in biological research.
These databases support studies in genomics, proteomics, evolutionary biology, taxonomy, medicine, agriculture, and biotechnology.

2. Types of Data Stored in Biological Databases
Biological databases store diverse types of biological information, including:

1. Sequence Data
DNA sequences
RNA sequences
Protein sequences

2. Structural Data

Three-dimensional structures of proteins
Nucleic acid structures

3. Functional Data

Gene functions
Enzyme activity
Regulatory elements

4. Genomic Annotation Data

Gene location
Exons, introns
Promoters and regulatory regions

5. Expression Data

Transcriptome data
Gene expression profiles

3. Classification of Biological Databases
Based on content and level of data processing, biological databases are classified into:

A. Primary Databases

Contain raw experimental data
Direct submissions from researchers
Minimal annotation
Examples:
GenBank, EMBL, DDBJ, Protein Data Bank (PDB)

B. Secondary Databases

Data derived from primary databases
Highly curated and analyzed
Provide functional and structural annotations
Examples:
UniProt, PROSITE, Pfam, SCOP


C. Composite (Integrated) Databases


Combine information from multiple databases
Reduce redundancy
Provide non-overlapping datasets
Examples:
RefSeq, UniGene, Ensembl

4. Nucleotide Sequence Databases

Nucleotide sequence databases store DNA and RNA sequences obtained through sequencing experiments. They are essential for gene discovery, genome analysis, comparative genomics, and evolutionary studies.

The three major global nucleotide sequence databases are:
GenBank (USA)
EMBL-ENA (Europe)
DDBJ (Japan)

These databases function under the International Nucleotide Sequence Database Collaboration (INSDC).


5. International Nucleotide Sequence Database Collaboration (INSDC)


INSDC is a global consortium that ensures:
Free and open access to nucleotide sequence data
Daily exchange of data among databases
Uniform data formats and annotation standards.


Members of INSDC:
GenBank – NCBI (USA)
EMBL-ENA – EMBL-EBI (Europe)
DDBJ – National Institute of Genetics (Japan)

6. GenBank
Overview
GenBank is a comprehensive nucleotide sequence database maintained by the National Center for Biotechnology Information (NCBI), USA. It is one of the largest and most widely used biological databases.
Types of Data Stored

Genomic DNA
cDNA and mRNA sequences
ESTs (Expressed Sequence Tags)
Whole genome sequences
Organelle genomes


7. EMBL (European Molecular Biology Laboratory Database)

Overview
The EMBL nucleotide database is maintained by the European Bioinformatics Institute (EMBL-EBI) and is now part of the European Nucleotide Archive (ENA).



8. DDBJ (DNA Data Bank of Japan)

Overview
DDBJ is maintained by the National Institute of Genetics (NIG), Japan. It mainly accepts sequence submissions from Asian countries but is globally accessible.
Data Stored
DNA and RNA sequences
Whole genome sequences
Environmental and metagenomic data
Special Features
Uses data formats similar to GenBank and EMBL
Exchanges data daily with other INSDC members
Provides online submission tools
9. Comparison of GenBank, EMBL, and DDBJ




➡ All three contain identical data but differ in access portals and management.


10. Importance of Nucleotide Sequence Databases
Preserve genetic information
Support genome sequencing projects
Enable gene identification and annotation
Facilitate evolutionary and phylogenetic studies
Assist in medical, agricultural, and environmental research
11. Applications
Comparative genomics
Molecular taxonomy
Gene cloning and primer design
Mutation analysis
Crop improvement and breeding programmes
12. Conclusion
Biological databases play a central role in modern biological research. Among them, nucleotide sequence databases such as GenBank, EMBL, and DDBJ are primary repositories that store DNA and RNA sequences. Through the INSDC collaboration, these databases ensure global data sharing, accuracy, and accessibility, making them indispensable resources for genomics, bioinformatics, and biotechnology.




Comments

Popular Posts

❥NORTHERN BLOTTING

NORTHERN BLOTTING – 30 MARK DETAILED NOTES  𓆞❥ 𓆞❥ 𓆞❥ 𓆞❥ 𓆞❥ 𓆞 ❥ 𓆞❥ 𓆞❥  Northern blotting is a molecular biology technique used to detect specific RNA molecules in a complex mixture. It provides information about gene expression, RNA size, and transcript abundance by hybridizing RNA with a labeled complementary DNA or RNA probe. 📌 Named by analogy to Southern blotting (DNA detection). 2. Principle The principle of Northern blotting is based on: Separation of RNA molecules by size using denaturing agarose gel electrophoresis Transfer (blotting) of separated RNA onto a nylon or nitrocellulose membrane Hybridization of membrane-bound RNA with a labeled complementary probe Detection of RNA–probe hybrids by autoradiography or chemiluminescence ✔ Only RNA sequences complementary to the probe will be detected. 3. Types of RNA Analyzed mRNA (most common) rRNA tRNA miRNA and siRNA (with modified protocols) 4. Requirements / Materials Total RNA or poly(A)+ RNA Denaturing agarose ...

Information retrieval from databases - search concepts, Tools for searching, homology searching, finding Domain and Functional site homologies

Information retrieval from databases - search concepts, Tools for searching, homology searching, finding Domain and Functional site homologies Information Retrieval from Databases 1. Introduction Information retrieval in bioinformatics refers to the process of extracting relevant biological data (DNA, RNA, protein sequences, structures, or functional information) from databases. Aim : Identify sequences, functions, or structural features for analysis, comparison, and annotation. Databases can be primary (raw sequence data) or secondary/derived (annotated, processed data). 2. Search Concepts in Biological Databases 2.1 Types of Searches Exact Match Search Returns results only if the query exactly matches database entries. Useful for known accession numbers or IDs. Pattern/Keyword Search Searches based on specific motifs, keywords, or annotations. Example: “kinase domain,” “signal peptide.” Similarity/Homology Search Detects sequences similar to the query based on sequence alignment. Use...

Exploitation of Somaclonal and Gametoclonal Variations for Plant Improvement

Exploitation of Somaclonal and Gametoclonal Variations for Plant Improvement  1. Introduction Plant tissue culture often induces genetic and epigenetic variations among regenerated plants. These variations, when stable and heritable, can be exploited as a source of novel traits for crop improvement. Somaclonal variation: Variation arising in plants regenerated from somatic cells cultured in vitro. Gametoclonal variation: Variation arising in plants regenerated from gametic cells (anther, pollen, ovule culture). Both provide additional genetic variability beyond conventional breeding. 2. Somaclonal Variation 2.1 Definition Somaclonal variation refers to genetic variation observed among plants regenerated from somatic tissue cultures, such as callus, suspension cultures, or explants. Term coined by Larkin and Scowcroft (1981). 2.2 Sources of Somaclonal Variation Chromosomal changes Aneuploidy Polyploidy Chromosome rearrangements Gene mutations Point mutations Insertions and deletions...

❃HPLC – High Performance Liquid Chromatography

HPLC – High Performance Liquid Chromatography ┏━━━━━ •❃°•°❀°•°❃•━━━━•━━━┓  1. Introduction High Performance Liquid Chromatography (HPLC) is an advanced analytical technique used for the separation, identification, and quantification of components present in a mixture. It is based on the differential distribution of analytes between a stationary phase and a liquid mobile phase under high pressure. HPLC is widely used in biochemistry, biotechnology, pharmaceuticals, food analysis, environmental studies, and clinical diagnostics. 2. Principle of HPLC The principle of HPLC is based on partition, adsorption, ion-exchange, or size-exclusion mechanisms, depending on the type of column used. A liquid mobile phase is pumped at high pressure through a column packed with fine stationary phase particles Sample components interact differently with the stationary phase Components with stronger interaction elute slower Components with weaker interaction elute faster Separated components are detec...

••CLASSIFICATION OF ALGAE - FRITSCH

      MODULE -1       PHYCOLOGY  CLASSIFICATION OF ALGAE - FRITSCH  ❖F.E. Fritsch (1935, 1945) in his book“The Structure and  Reproduction of the Algae”proposed a system of classification of  algae. He treated algae giving rank of division and divided it into 11  classes. His classification of algae is mainly based upon characters of  pigments, flagella and reserve food material.     Classification of Fritsch was based on the following criteria o Pigmentation. o Types of flagella  o Assimilatory products  o Thallus structure  o Method of reproduction          Fritsch divided algae into the following 11 classes  1. Chlorophyceae  2. Xanthophyceae  3. Chrysophyceae  4. Bacillariophyceae  5. Cryptophyceae  6. Dinophyceae  7. Chloromonadineae  8. Euglenineae    9. Phaeophyceae  10. Rhodophyceae  11. Myxophyce...

𓆉 INDEX PAGE -NOTETHEPOINT43

INDEX PAGE   MAIN    CONTENT 1.   HSST BOTANY SYLLABUS, DETAILED NOTES, MCQ 2.  SET GENERAL PAPER SYLLABUS, DETAILED NOTES, 50MCQ 3.  SET BOTANY SYLLABUS, DETAILED NOTES, MCQ 4. MSC BOTANY THIRD SEMESTER SYLLABUS, NOTES (KERALA UNIVERSITY ) 5. MSC BOTANY THIRD SEMESTER QUESTION PAPER (KERALA UNIVERSITY ) 6. MSC BOTANY FOURTH SEMESTER SYLLABUS &NOTES (KERALA UNIVERSITY ) 7. FOURTH SEMESTER MSC BOTANY PREVIOUS QUESTION PAPER  (KERALA UNIVERSITY )