Skip to main content

Biological Databases – Types of Data and DatabasesNucleotide Sequence Databases (EMBL, GenBank, DDBJ)


Biological Databases – Types of Data and Databases
Nucleotide Sequence Databases (EMBL, GenBank, DDBJ)

1. Introduction

Biological databases are systematic, computerized collections of biological information that allow efficient storage, retrieval, updating, and analysis of large volumes of biological data. With the advent of genome sequencing, molecular biology, and bioinformatics, biological databases have become essential tools in biological research.
These databases support studies in genomics, proteomics, evolutionary biology, taxonomy, medicine, agriculture, and biotechnology.

2. Types of Data Stored in Biological Databases
Biological databases store diverse types of biological information, including:

1. Sequence Data
DNA sequences
RNA sequences
Protein sequences

2. Structural Data

Three-dimensional structures of proteins
Nucleic acid structures

3. Functional Data

Gene functions
Enzyme activity
Regulatory elements

4. Genomic Annotation Data

Gene location
Exons, introns
Promoters and regulatory regions

5. Expression Data

Transcriptome data
Gene expression profiles

3. Classification of Biological Databases
Based on content and level of data processing, biological databases are classified into:

A. Primary Databases

Contain raw experimental data
Direct submissions from researchers
Minimal annotation
Examples:
GenBank, EMBL, DDBJ, Protein Data Bank (PDB)

B. Secondary Databases

Data derived from primary databases
Highly curated and analyzed
Provide functional and structural annotations
Examples:
UniProt, PROSITE, Pfam, SCOP


C. Composite (Integrated) Databases


Combine information from multiple databases
Reduce redundancy
Provide non-overlapping datasets
Examples:
RefSeq, UniGene, Ensembl

4. Nucleotide Sequence Databases

Nucleotide sequence databases store DNA and RNA sequences obtained through sequencing experiments. They are essential for gene discovery, genome analysis, comparative genomics, and evolutionary studies.

The three major global nucleotide sequence databases are:
GenBank (USA)
EMBL-ENA (Europe)
DDBJ (Japan)

These databases function under the International Nucleotide Sequence Database Collaboration (INSDC).


5. International Nucleotide Sequence Database Collaboration (INSDC)


INSDC is a global consortium that ensures:
Free and open access to nucleotide sequence data
Daily exchange of data among databases
Uniform data formats and annotation standards.


Members of INSDC:
GenBank – NCBI (USA)
EMBL-ENA – EMBL-EBI (Europe)
DDBJ – National Institute of Genetics (Japan)

6. GenBank
Overview
GenBank is a comprehensive nucleotide sequence database maintained by the National Center for Biotechnology Information (NCBI), USA. It is one of the largest and most widely used biological databases.
Types of Data Stored

Genomic DNA
cDNA and mRNA sequences
ESTs (Expressed Sequence Tags)
Whole genome sequences
Organelle genomes


7. EMBL (European Molecular Biology Laboratory Database)

Overview
The EMBL nucleotide database is maintained by the European Bioinformatics Institute (EMBL-EBI) and is now part of the European Nucleotide Archive (ENA).



8. DDBJ (DNA Data Bank of Japan)

Overview
DDBJ is maintained by the National Institute of Genetics (NIG), Japan. It mainly accepts sequence submissions from Asian countries but is globally accessible.
Data Stored
DNA and RNA sequences
Whole genome sequences
Environmental and metagenomic data
Special Features
Uses data formats similar to GenBank and EMBL
Exchanges data daily with other INSDC members
Provides online submission tools
9. Comparison of GenBank, EMBL, and DDBJ




➡ All three contain identical data but differ in access portals and management.


10. Importance of Nucleotide Sequence Databases
Preserve genetic information
Support genome sequencing projects
Enable gene identification and annotation
Facilitate evolutionary and phylogenetic studies
Assist in medical, agricultural, and environmental research
11. Applications
Comparative genomics
Molecular taxonomy
Gene cloning and primer design
Mutation analysis
Crop improvement and breeding programmes
12. Conclusion
Biological databases play a central role in modern biological research. Among them, nucleotide sequence databases such as GenBank, EMBL, and DDBJ are primary repositories that store DNA and RNA sequences. Through the INSDC collaboration, these databases ensure global data sharing, accuracy, and accessibility, making them indispensable resources for genomics, bioinformatics, and biotechnology.




Comments

Popular Posts

Secondary Databases (PROSITE, PRINTS, BLOCKS)

Secondary Databases (PROSITE, PRINTS, BLOCKS  Secondary Databases Introduction Biological databases are broadly classified into primary and secondary databases. Primary databases store raw experimental data (e.g., nucleotide or protein sequences), whereas secondary databases contain derived information obtained by analyzing primary sequence data. Secondary databases are mainly used to: Identify protein families Detect conserved motifs, patterns, and domains Predict protein function Study structure–function relationships Examples of secondary databases include PROSITE, PRINTS, BLOCKS, Pfam, etc. 1. PROSITE Database Definition PROSITE is a secondary database that documents protein domains, families, and functional sites in the form of patterns and profiles. Developed by Swiss Institute of Bioinformatics (SIB) Maintained along with UniProt Principle PROSITE is based on the idea that functionally important regions of proteins are conserved during evolution. These conserved regions can ...

Intellectual Property Rights (IPR) – Detailed Notes

Intellectual Property Rights (IPR) – Detailed Notes 1. Introduction Intellectual Property Rights (IPR) are legal rights granted to creators and inventors over their creations or inventions. They protect innovation and creativity, providing the owner exclusive rights to use, sell, or license their creation. IPR encourages research, development, and economic growth by rewarding creativity. 2. Importance of IPR Protects inventions, designs, and creative work. Prevents unauthorized use, copying, or commercialization. Encourages innovation and research. Provides financial benefits to inventors through licensing or royalties. Supports economic growth and competitiveness. Safeguards traditional knowledge and biodiversity. 3. Types of Intellectual Property Rights A. Patents Definition: Exclusive right granted to an inventor for a new invention for a limited period (usually 20 years). Requirements: Novelty – must be new and not published. Inventive step – non-obvious to someone skilled in the f...

Fourth Semester M.Sc. Degree Examination, March 2021Time: 3 HoursBotanyBO 241: BIOINFORMATICS AND BIOPHYSICS(2019 Admission)

Fourth Semester M.Sc. Degree Examination, March 2021 Time: 3 Hours Botany BO 241: BIOINFORMATICS AND BIOPHYSICS (2019 Admission) 1. Answer the following questions. 1. Expand EMBL and DDBJ. 2. What do bootstrap values indicate? 3. What is multiple sequence alignment? 4. What is SNP? 5. Define transcriptome. 6. What is Smith Waterman algorithm? 7. Comment on Phylip. 8. What are the factors that determine the electrophoretic mobility of a particle? 9. Differentiate between resolution and resolving power of the microscope. 10. Which are the factors that determine the sedimentation of a component during centrifugation? (10 x 1= 10 Marks) II. Answer the following questions in not more than 50 words. 11. (a) What is the difference between rooted and unrooted phylogenetic tree? OR (b) What is ORF? What is its significance in functional genomics? 12. (a) Explain the use of GENSCAN. OR (b) Explain the assumptions in molecular clock hypothesis. 13. (a) Write a brief explanation on KEGG. OR (b) Co...

Electroporation – Detailed Notes

Electroporation – Detailed Notes Definition : Electroporation is a physical method of gene transfer in which cells are exposed to a brief, high-voltage electric pulse, creating temporary pores in the cell membrane. This allows DNA, RNA, proteins, or other molecules to enter the cytoplasm. It is widely used in bacteria, yeast, plant protoplasts, and mammalian cells. Key Concept: The electric field destabilizes the membrane, making it permeable to macromolecules. 1. Principle Cells are suspended in a conductive medium. A brief electrical pulse induces transient pores in the plasma membrane. DNA or other molecules present in the medium enter the cell through these pores. Membrane reseals after the pulse, and the molecule is retained inside the cell. Advantages of Principle: Direct and rapid. Works in many cell types. Does not require chemical carriers or viral vectors. 2. Materials Required Cells – bacterial, yeast, plant protoplasts, mammalian cells. DNA/RNA/other macromolecule – purifie...

Third Semester M.Sc. Degree Examination, January 2023 Botany BO 231 PLANT BREEDING, HORTICULTURE AND BIOSTATISTICS

Third Semester M.Sc. Degree Examination, January 2023 Botany BO 231 PLANT BREEDING, HORTICULTURE AND BIOSTATISTICS Time: Three Hours (2019 Admission Onwards) I. Answer the following questions. 1.What is green super rice? 2.What are the functions of ICAR-NBPGR? 3.Give the importance of floral biology in plant breeding. 4.How do you develop a synthetic variety? 5.Where can you find gene-for-gene relationships? 6.Describe the significance of biodiversity policy. 7.What is Olericulture? 8.Describe the advantages of in door garden. 9.What is Students's t-test? 10. Explain Ogive graph. (10 × 1 = 10 Marks) 11.Answer the following questions in not more than 50 words . 11. (a) Explain hybridization and mention it's procedure. OR (b) Write short notes on the concept of centers of origin proposed by Vavilov. 12. (a) Describe cytoplasmic male sterility and its uses. OR (b) Explain the role of interspecific and intergeneric hybridization. 13. (a) What is seed certification? How is it done? ...

Third Semester M.Sc. Degree Examination, February 2024 231: PLANT BREEDING, HORTICULTURE AND BIOSTATISTICS

Third Semester M.Sc. Degree Examination, February 2024                 Botany BO 231: PLANT BREEDING, HORTICULTURE AND BIOSTATISTICS (2019 Admission onwards) Time: 3 Hours I.Answer the following questions. 1.What is atomic gardening? 2.Name the cardamom research institute in Kerala. 3.Explain advantages of distant hybridisation. 4.Describe plant variety rights. 5.Write short notes on arboriculture. 6.What is vermicomposting? 7.Give short notes on cut flower industry. 8.What is ANOVA? 9.Describe the properties of binomial distribution. 10. Explain the use of LSD. Max. Marks: 75 (10 x 1 = 10 Marks) II.Answer the following questions in not more that 50 words. 11. (a) What do you mean by genetic modification techniques? OR (b) What is center of diversity of a species? 12. (a) Compare auto and allopolyploidy. OR (b) What are requirements of back cross breeding? 13. (a) Describe ideotype breeding and its significance. OR (b) What is the role of seed cer...

❃LC-MS (LIQUID CHROMATOGRAPHY – MASS SPECTROMETRY)

LC-MS (LIQUID CHROMATOGRAPHY – MASS SPECTROMETRY)  ┏━━━━━ •❃°•°❀°•°❃•━━━━•━━━┓ 1. INTRODUCTION LC-MS is a hyphenated analytical technique combining Liquid Chromatography (LC) and Mass Spectrometry (MS). It is used for separation, identification, and quantification of compounds in complex mixtures. LC separates analytes based on polarity, size, or charge, while MS detects molecules based on mass-to-charge ratio (m/z). Developed in the 1970s–1980s, LC-MS is now widely used in pharmaceutical, clinical, environmental, and food analysis. Importance : Detects trace levels of compounds (ng–pg range) Analyzes non-volatile, thermally labile compounds that cannot be analyzed by GC-MS Provides structural information through mass fragmentation Example: Detection of drugs in plasma, protein identification in proteomics, pesticide residue analysis in food. 2. COMPONENTS OF LC-MS The LC-MS system has three main parts: A. Liquid Chromatograph (LC) Function: Separates components of a mixture befor...

Fourth Semester M.Sc. Degree Examination, June 2022BotanySpecial Paper II - ElectiveBO 242 a: BIOTECHNOLOGY

Reg. No.: Name: N-6273 Fourth Semester M.Sc. Degree Examination, June 2022 Botany Special Paper II - Elective BO 242 a: BIOTECHNOLOGY Time: 3 Hours (2019 Admission Onwards) Max. Marks: 75 1. Instruction: Draw diagrams and illustrate with examples wherever necessary. Answer the following questions. 1. What are the desirable features of a cloning vehicle? 2. What is a palindrome? 3. What is the significance of Ori C site? 4. What is the actual function of restriction enzymes in a bacterial system?  5.Name any two bacteria and fungi used for alcohol fermentation. 6. What is a starter culture? 7. What are adapters? 8. What are probes? 9. What is biopiracy? 10. Define cybrids. (10 x 1 = 10 Marks) II. Answer the following questions in not more than 50 words .  11. (a) Why is callus culture a prerequisite for somaclonal variations? OR (b) How is virus elimination done via plant tissue culture? 12. (a) How is aeration maintained in a bioreactor? OR (b) What are the methods available f...

••CLASSIFICATION OF ALGAE - FRITSCH

      MODULE -1       PHYCOLOGY  CLASSIFICATION OF ALGAE - FRITSCH  ❖F.E. Fritsch (1935, 1945) in his book“The Structure and  Reproduction of the Algae”proposed a system of classification of  algae. He treated algae giving rank of division and divided it into 11  classes. His classification of algae is mainly based upon characters of  pigments, flagella and reserve food material.     Classification of Fritsch was based on the following criteria o Pigmentation. o Types of flagella  o Assimilatory products  o Thallus structure  o Method of reproduction          Fritsch divided algae into the following 11 classes  1. Chlorophyceae  2. Xanthophyceae  3. Chrysophyceae  4. Bacillariophyceae  5. Cryptophyceae  6. Dinophyceae  7. Chloromonadineae  8. Euglenineae    9. Phaeophyceae  10. Rhodophyceae  11. Myxophyce...

Protein Sequence DatabasesPIR, SWISS-PROT and TREMBEL

Protein Sequence Databases PIR, SWISS-PROT and TREMBEL 1. Introduction Protein sequence databases are biological databases that store information about amino acid sequences of proteins, along with their functional, structural, and biochemical characteristics. Since proteins are the functional molecules of the cell, protein databases are essential for understanding gene expression, metabolism, enzymatic activity, signaling pathways, and evolution. Protein sequence databases mainly contain data derived from translated nucleotide sequences and experimental protein studies. 2. Types of Protein Sequence Databases Protein sequence databases are broadly classified into: A. Primary Protein Databases Contain original protein sequence data Minimal or no manual annotation B. Secondary Protein Databases Derived from primary databases Provide curated functional and structural information C. Composite Protein Databases Combine protein data from multiple sources Reduce redundancy 3. Protein Informati...