Biological Databases – Types of Data and DatabasesNucleotide Sequence Databases (EMBL, GenBank, DDBJ)

Biological Databases – Types of Data and Databases

Nucleotide Sequence Databases (EMBL, GenBank, DDBJ)

1. Introduction

Biological databases are systematic, computerized collections of biological information that allow efficient storage, retrieval, updating, and analysis of large volumes of biological data. With the advent of genome sequencing, molecular biology, and bioinformatics, biological databases have become essential tools in biological research.

These databases support studies in genomics, proteomics, evolutionary biology, taxonomy, medicine, agriculture, and biotechnology.

2. Types of Data Stored in Biological Databases

Biological databases store diverse types of biological information, including:

1. Sequence Data

DNA sequences

RNA sequences

Protein sequences

2. Structural Data

Three-dimensional structures of proteins

Nucleic acid structures

3. Functional Data

Gene functions

Enzyme activity

Regulatory elements

4. Genomic Annotation Data

Gene location

Exons, introns

Promoters and regulatory regions

5. Expression Data

Transcriptome data

Gene expression profiles

3. Classification of Biological Databases

Based on content and level of data processing, biological databases are classified into:

A. Primary Databases

Contain raw experimental data

Direct submissions from researchers

Minimal annotation

Examples:

GenBank, EMBL, DDBJ, Protein Data Bank (PDB)

B. Secondary Databases

Data derived from primary databases

Highly curated and analyzed

Provide functional and structural annotations

Examples:

UniProt, PROSITE, Pfam, SCOP

C. Composite (Integrated) Databases

Combine information from multiple databases

Reduce redundancy

Provide non-overlapping datasets

Examples:

RefSeq, UniGene, Ensembl

4. Nucleotide Sequence Databases

Nucleotide sequence databases store DNA and RNA sequences obtained through sequencing experiments. They are essential for gene discovery, genome analysis, comparative genomics, and evolutionary studies.

The three major global nucleotide sequence databases are:

GenBank (USA)

EMBL-ENA (Europe)

DDBJ (Japan)

These databases function under the International Nucleotide Sequence Database Collaboration (INSDC).

5. International Nucleotide Sequence Database Collaboration (INSDC)

INSDC is a global consortium that ensures:

Free and open access to nucleotide sequence data

Daily exchange of data among databases

Uniform data formats and annotation standards.

Members of INSDC:

GenBank – NCBI (USA)

EMBL-ENA – EMBL-EBI (Europe)

DDBJ – National Institute of Genetics (Japan)

6. GenBank

Overview

GenBank is a comprehensive nucleotide sequence database maintained by the National Center for Biotechnology Information (NCBI), USA. It is one of the largest and most widely used biological databases.

Types of Data Stored

Genomic DNA

cDNA and mRNA sequences

ESTs (Expressed Sequence Tags)

Whole genome sequences

Organelle genomes

7. EMBL (European Molecular Biology Laboratory Database)

Overview

The EMBL nucleotide database is maintained by the European Bioinformatics Institute (EMBL-EBI) and is now part of the European Nucleotide Archive (ENA).

8. DDBJ (DNA Data Bank of Japan)

Overview

DDBJ is maintained by the National Institute of Genetics (NIG), Japan. It mainly accepts sequence submissions from Asian countries but is globally accessible.

Data Stored

DNA and RNA sequences

Whole genome sequences

Environmental and metagenomic data

Special Features

Uses data formats similar to GenBank and EMBL

Exchanges data daily with other INSDC members

Provides online submission tools

9. Comparison of GenBank, EMBL, and DDBJ

➡ All three contain identical data but differ in access portals and management.

10. Importance of Nucleotide Sequence Databases

Preserve genetic information

Support genome sequencing projects

Enable gene identification and annotation

Facilitate evolutionary and phylogenetic studies

Assist in medical, agricultural, and environmental research

11. Applications

Comparative genomics

Molecular taxonomy

Gene cloning and primer design

Mutation analysis

Crop improvement and breeding programmes

12. Conclusion

Biological databases play a central role in modern biological research. Among them, nucleotide sequence databases such as GenBank, EMBL, and DDBJ are primary repositories that store DNA and RNA sequences. Through the INSDC collaboration, these databases ensure global data sharing, accuracy, and accessibility, making them indispensable resources for genomics, bioinformatics, and biotechnology.

Notethepoint 43official Previous Question Paper Updates2.0

Search This Blog

Biological Databases – Types of Data and DatabasesNucleotide Sequence Databases (EMBL, GenBank, DDBJ)

Comments

Popular Posts

••CLASSIFICATION OF ALGAE - FRITSCH

Genetically modified microbes - biodegradation, biopesticides, bioremediation, mineral leaching and biofertilizers.

Protein Structure Database (PDB)

DNA FOOTPRINTING