Abstract
Deoxyribonucleic acid (DNA) is a complex molecule that carries the genetic information of all living organisms.
The discovery of its structure in 1953 by Watson and Crick revolutionized the field of molecular biology and
led to significant advances in our understanding of genetics. DNA can adopt various conformations, and its
sequence determines the sequence of amino acids in the resulting protein. Genes are segments of DNA that contain
the instructions for making proteins, and the human genome contains approximately 20,000 to 25,000 genes.
The relationship between DNA and genomes is essential to our understanding of genetics. This article aims
to provide a comprehensive overview of the structure, function, biology-relevant conformations, genes,
and relation to genomes of DNA.
Key words
DNA, Genome, DNA conformations, Genetic Code, Biological Function, 3D-Structure, Bio-Databases
Introduction
DNA is a fundamental molecule that plays a crucial role in the storage and transmission of genetic information in all living organisms.
It is a long polymer composed of four different nucleotide monomers: adenine (A), cytosine (C), guanine (G), and thymine (T).
These nucleotides are arranged in a specific sequence to form the genetic code that determines the characteristics of an organism, Figure 1.
The discovery of the structure of DNA by Watson and Crick in 1953 was a turning point in the history of biology.
Their discovery revealed that DNA has a double helix structure, Figure 2, which is composed of two chains, also
known as strands, which in turn consist of nucleotides (A, C, G and T) that are connected to the next by a
covalent bonds. The DNA strands run in opposite directions to each other and are
held together by hydrogen bonds between the nitrogenous base pairs, which are located in
the center of the helix. The two strands are oriented in an antiparallel fashion, meaning that one strand
runs in the 5' to 3' direction, while the other strand runs in the opposite 3' to 5' direction.
This arrangement of the strands has been found crucial for DNA replication and transcription, which are
essential processes in the expression of genetic information.The structure of DNA allows for the faithful
transmission of genetic information from one generation to the next.
DNA's two strands in antiparallel orientation, one strand runs in the 5' to 3' direction and the
other runs in opposite 3' to 5' direction.
Figure 2. The Watson and Crick model for double helix DNA. The two strands
are antiparallel oriented, one strand runs in the 5' to 3' direction and the
other runs in opposite 3' to 5' direction. The two strands are held together via hydrogen bonds between pairs of the bases A-T and C-G.
DNA can be characterised as the fundamental code and blueprint of life. This is due to the fact that DNA carries
the genetic information that governs the characteristics and functions of all living organisms. Composed of
nucleotides arranged in a specific sequence, the DNA molecule encodes the instructions for an organism's growth,
development, and reproduction.
The sequence of nucleotides in DNA serves as a code for the production of proteins, which are crucial
for the structure and function of cells and tissues. The DNA code is passed down from generation to
generation, providing a blueprint for the development of new individuals with inherited traits.
X-ray Crystallography and Photograph 51:
The photograph known as Photograph 51, Figure 3, played a crucial role in uncovering the structure of DNA.
This X-ray diffraction image was taken by Rosalind Franklin and Raymond Gosling at King's College
London in 1952. By providing important details about the helical structure of DNA, the photograph
contributed significantly to the discovery of the molecule's overall structure.
Franklin used the photograph to determine the distance between the phosphate groups
in the DNA backbone and the distance between the nitrogenous bases. She was able to
deduce from this information that the DNA structure was helical and that the bases
were stacked on top of one another.
James Watson and Francis Crick, who were also working on the DNA structure problem at the time,
were shown the photograph by their colleague, Maurice Wilkins. The photograph was a key piece of
evidence that supported Watson and Crick's own model of the DNA double helix. It provided them
with the missing piece of the puzzle that confirmed their theory of the structure of DNA.
Initially, Franklin's contributions to the discovery of the structure of DNA were overlooked,
but her work, including the photograph, was eventually recognized as a crucial component of
the discovery. Today, Photograph 51 remains an iconic symbol of the discovery of the structure
of DNA and serves as a reminder of the importance of interdisciplinary scientific research.
Function of DNA:
The primary function of DNA is to carry the genetic information of an organism. The genetic information
is encoded in the sequence of nucleotides, which are read and translated into proteins. Proteins are essential
for the structure, function, and regulation of cells, tissues, and organs in the body. DNA also plays a crucial
role in cell division, as it is replicated during cell division to ensure that each daughter cell receives a
complete copy of the genetic information.
Genetic code:
The genetic code refers to a system of rules that governs the translation of genetic information
encoded within DNA and RNA sequences into proteins. This code functions as a language, consisting
of four nucleotide "letters" in DNA and RNA, and 20 amino acid "words" in proteins.
Every three-nucleotide set, known as a codon, specifies a particular amino acid or a stop signal
during protein synthesis. There are 64 possible codons, encoding for 20 amino acids and three
stop signals. The redundancy in the genetic code provides some flexibility, as multiple codons
can code for the same amino acid.
The genetic code is nearly universal among living organisms, with some minor variations in certain organisms
such as mitochondria and some bacteria. This universality of the genetic code allows for the transfer of
genetic information between different organisms, as well as the development of techniques such as recombinant
DNA technology.
Understanding the genetic code and its variations is essential in the study of genetics and molecular biology.
The discovery of the genetic code was a major milestone in molecular biology, opening the door for the exploration
of gene and protein structure and function.
Genes:
Genes are segments of DNA that contain the instructions for making proteins. Each gene has a
specific sequence of nucleotides that determines the sequence of amino acids in the resulting
protein. The human genome contains approximately 20,000 to 25,000 genes. Genes are regulated
by various mechanisms, such as transcription factors, which control the expression of genes
in response to various signals.
Biology-relevant Conformations:
DNA can adopt various conformations that are essential for its function. The most common conformation of DNA
is the B-form, which is the double helix structure first described by Watson and Crick. However, DNA can also
adopt other conformations, such as the A-form, the Z-form, and the cruciform structure, Figure 4. These conformations
are involved in various biological processes, such as transcription, replication, and DNA repair.
A-DNA - right-handed double helix. Associated with low water content.
B-DNA - right-handed double helix. More common conformation is vivo and associated with high water content.
Z-DNA - Left-handed structure is more stable than right-handed due to the positioning of the sugar-phosphate backbone. Z-DNA is rare in vivo, though it has been seen associated with regions rich in GC content of the genome, and it may play a role in gene expression and regulation.
Figure 4. The three biology-relevant conformations of DNA.
DNA relation to genome:
A genome is the complete set of genetic information of an organism. The human genome, for example, consists of
approximately 3 gigabase pairs (Gbp - billion base pairs) arranged in 23 pairs of chromosomes.
However, the largest known genome todate is that of the rare plant Paris Japonica which is 149 Gbp;
around 50 times larger than a human genome. Otherwise, the smallest know genome is that of Carsonella ruddii
which is only160 kilobase pairs in size.
It's important to note that genome size can vary widely even within a single taxonomic group, and new discoveries
could potentially shift the rankings of the smallest and largest genomes.
The relationship between DNA and genomes is essential to our understanding of genetics. The study of genomes
has led to significant advances in our understanding of human biology, evolution, and disease. Publicly available
databases, such as NCBI and EMBL-EBI, provide access to vast amounts of data on DNA sequences, structures, and
genomes and have been instrumental in advancing our understanding of genetics.
DNA data and available databases:
High-throughput sequencing technologies lead to the generation of massive amounts of genomic and DNA
sequences data. Implementation of bioinformatics based analysis of sequence data and annotation resulted
in important databases that contain information on entire genomes.
These databases contain not only the DNA sequence of genomes, but also annotations of genes, information on motifs,
transcription factors and many other DNA elements.and other genomic features, such as regulatory elements and chromatin domains.
The availability of these databases has greatly facilitated the study of genetics and genomics,
allowing researchers to compare and analyze the DNA sequences and structures of different organisms.
Publicly available databases, such as NCBI and EMBL-EBI, provide access to vast amounts of data on
DNA sequences, structures, and genomes. GenBank is a comprehensive database of DNA sequences, maintained
by the National Center for Biotechnology Information (NCBI). It provides free access to over 250 million
nucleotide sequences, including complete genomes, plasmids, and organelle genomes. Other databases, such
as the Protein Data Bank (PDB), contain information on the 3D structure of DNA targeted sequences in
addition to protein structures.
Motif databases such as TRANSFAC and JASPAR are examples of resources that contain information
on DNA motifs, which are used to identify and annotate regulatory regions of genes and to predict the
function of non-coding regions of DNA.
These databases are also used to develop models of transcription factor binding sites, which are essential
for understanding gene regulation.
The database such as the Nucleic Acids and Ligands Database (NALD)
provide DNA structural motifs and ligands binding data that would be relevant to novel drugs design.
The Human Genome Project, completed in 2003, was a collaborative effort to sequence and map the
entire human genome. The project provided valuable insights into the structure and function of
the human genome and paved the way for personalized medicine and genomic medicine.
Methods
The study of DNA has a long and storied history, starting with its initial discovery in the late 1800s.
In 1869, Friedrich Miescher, a Swiss physician, isolated a substance from the nuclei of white blood
cells that he called "nuclein." This substance was later identified as DNA, and subsequent research
into its properties revealed its central role in the transmission of genetic information.
In the early 1950s, James Watson and Francis Crick made a breakthrough in the study of DNA by proposing
its structure as a double helix. Their discovery was based on a combination of X-ray crystallography
data collected by Rosalind Franklin and Maurice Wilkins, as well as their own theoretical work.
This discovery paved the way for a better understanding of the function and biology of DNA, as well
as its relationship to the transmission of genetic information.
Since then, a great deal of research has been conducted on DNA, including investigations into its
structure, function, biology-relevant conformations, genes, and its relationship to genomes.
This research has been facilitated by the development of numerous experimental techniques and
technologies, including X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy,
and next-generation sequencing (NGS).
The study of genomes has been revolutionized by the development of bioinformatics analysis and computational methods.
One of the most important applications of bioinformatics is in the analysis of DNA sequences, which has led to the development
of databases that contain information on genomes, DNA sequences of verity of sources and objectives, motifs, genes, and
other DNA elements.
Discussion
The discovery of the structure of DNA by Watson and Crick in 1953 was a significant milestone
in the history of biology. It revolutionized the field of molecular biology and led to significant
advances in our understanding of genetics. The study of DNA has allowed us to unravel the mysteries
of human biology, including the basis of diseases and the mechanisms of evolution.
The structure of DNA is crucial to its function. The double helix structure allows for the
storage and transmission of genetic information, as well as the replication of DNA during
cell division. DNA also plays a critical role in the regulation of gene expression. DNA is
transcribed into messenger RNA (mRNA), which is then translated into protein. The sequence of
nucleotides in DNA determines the sequence of amino acids in the resulting protein.
DNA can adopt different conformations, such as A-DNA, B-DNA, and Z-DNA. A-DNA and B-DNA are
the most common forms found in nature, while Z-DNA is less common but has been implicated in
gene regulation. A-DNA is a right-handed double helix that is shorter and wider than B-DNA.
It is found in RNA-DNA hybrids and in DNA-protein complexes. B-DNA is a right-handed double
helix with a uniform diameter and is the most stable conformation of DNA in cells. It is found
in the majority of DNA structures, including the human genome. Z-DNA is a left-handed double
helix that is longer and thinner than B-DNA. It is formed when DNA is under torsional strain,
and its formation has been implicated in gene regulation.
Genes are segments of DNA that contain the instructions for making proteins. Genes can be turned
on or off, depending on the needs of the cell. The Human Genome Project, which was completed in 2003,
identified all the genes in the human genome. The human genome contains approximately 20,000 to 25,000
genes, which make up less than 2% of the total genome. The remaining DNA is non-coding and has been
found to play important roles in gene regulation and other cellular processes.
The relationship between DNA and genomes is essential to our understanding of genetics. A genome is the
complete set of genetic instructions for an organism. The genome of an organism is stored in its DNA,
which is organized into chromosomes. Humans have 23 pairs of chromosomes, which contain approximately
3 billion base pairs of DNA. The sequence of DNA in an organism's genome determines its traits
and characteristics.
Publicly available databases provide access to DNA sequences, structures, and genomes. Some of the most
popular databases include the National Center for Biotechnology Information (NCBI) and the European Molecular
Biology Laboratory-European Bioinformatics Institute (EMBL-EBI). These databases contain vast amounts of data
on DNA sequences, structures, and genomes from various organisms, including humans.
The NCBI provides access to several databases, including GenBank, which is a comprehensive database
of DNA sequences from over 400,000 organisms. The NCBI also provides access to the RefSeq database,
which contains annotated and curated DNA sequences for over 130,000 organisms. In addition, the NCBI
provides access to several databases that contain information on genetic variation, such as dbSNP and dbVar.
The EMBL-EBI provides access to several databases, including Ensembl, which is a
comprehensive database of annotated genomes for over 80 species, including humans.
Ensembl provides access to genomic data, including DNA sequences, gene annotations,
and genetic variation data. The EMBL-EBI also provides access to several other databases,
including UniProt, which is a comprehensive database of protein sequences and annotations.
Conclusion
DNA is a complex molecule that carries the genetic information of all living organisms.
The discovery of its structure in 1953 by Watson and Crick revolutionized the field of
molecular biology and led to significant advances in our understanding of genetics.
DNA can adopt various conformations that are essential for its function, and genes are
segments of DNA that contain the instructions for making proteins. The study of DNA and
genomes has provided valuable insights into the structure and function of living organisms,
and publicly available databases have facilitated the discovery of new genes and
the development of novel therapies.
In conclusion, the study of DNA is a critical component of modern biology, and its impact on
fields such as medicine and biotechnology cannot be overstated. The discovery of its structure,
function, and biology-relevant conformations has provided us with an incredible understanding
of the molecular basis of life. The identification of genes and the relationship between DNA
and genomes have paved the way for personalized medicine and genomic medicine, and the
availability of publicly accessible databases has enabled researchers worldwide
to advance their research on DNA.
It is important to recognize that the study of DNA is an ongoing process, and there is
still much to learn. New techniques and technologies continue to be developed, and ongoing
research will undoubtedly lead to new discoveries and insights into the molecular basis of life.
Nonetheless, the field of molecular biology and genetics owes a great debt to the pioneering work
of Watson and Crick, who unlocked the secrets of DNA over half a century ago.
References
🕮 Alberts B, Johnson A, Lewis J, Raff M, Roberts K, Walter P. Molecular Biology of the Cell. 4th edition. New York: Garland Science; 2002. DNA Structure and Replication.
Available at:
https://www.ncbi.nlm.nih.gov/books/NBK26821
🕮 Bailey TL, Boden M, Buske FA, et al. (2009). "MEME Suite: tools for motif discovery and searching". Nucleic Acids Research. 37 (Web Server issue): W202–8.
🕮 Collins FS, Morgan M, Patrinos A. The Human Genome Project: Lessons from Large-Scale Biology. Science. 2003 Apr 25;300(5617):286-90.
Available at:
https://pubmed.ncbi.nlm.nih.gov/12690187
🕮 Kim EJ, Lee YS. DNA Conformation: Z-DNA, an Unusual DNA Structure. Molecules. 2019 Apr 26;24(9):1636.
🕮 Mathelier A, Fornes O, Arenillas DJ, et al. (2016). "JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles". Nucleic Acids Research. 44 (D1): D110–5.
🕮 Nakabachi, A., Yamashita, A., Toh, H., Ishikawa, H., Dunbar, H. E., Moran, N. A., & Hattori, M. (2006). The 160-kilobase genome of the bacterial endosymbiont Carsonella. Science, 314(5797), 267.
Available at:
https://www.science.org/doi/10.1126/science.1134196
🕮 Pennisi, E. (2010). ScienceShot: Biggest Genome Ever Japanese flower has 50 times more DNA than humans do. Science 2010
Available at:
scienceshot-biggest-genome-ever
🕮 Sinden RR. DNA Structure and Function. Academic Press; 2013.
🕮 Watson JD, Crick FHC. Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature. 1953 Apr 25;171(4356):737-8.
Available at:
https://www.nature.com/articles/171737a0
🕮 Wingender E, Schoeps T, Haubrock M, et al. (2015). "TFClass: expanding the classification of human transcription factors to their mammalian orthologs". Nucleic Acids Research. 44 (D1): D300–4.
🕮 Wilkins, M. H. F., Stokes, A. R., & Wilson, H. R. (1953). Molecular structure of deoxypentose nucleic acids. Nature, 171(4356), 737-738.
Available at:
https://www.nature.com/articles/171738a0
🕮 Abdelkrim, R. (2023). Bioinformatics: An Exciting Field of Science - Importance and Applications. Journal of Concepts in Structural Biology & Bioinformatics (JSBB), 1(4).
Available at:
Bioinformatics_An_Exciting_Field_of_Science
🕮 Abdelkrim, R. (2023). Databases in Biology. Journal of Concepts in Structural Biology & Bioinformatics (JSBB), 1(4).
Available at:
Dbases_In_Biology
🕮 Abdelkrim R., Khuphukile M. NALD: Nucleic Acids and Ligands Database, Modeling Approaches and Algorithms for Advanced Computer Applications (Springer) 2013, 488: 329-336
Available at:
http://www.springerlink.com/openurl.asp?id=doi:10.1007/978-3-319-00560-7_36
🕮 NCBI. National Center for Biotechnology Information. [Online]
Available at:
https://www.ncbi.nlm.nih.gov/
🕮 NALD. Nucleic Acids and Ligands Database. [Online]
Available at:
https://bioinformatics.univ-saida.dz/bit2/?arg=SB1
🕮 PDB. Protein Data Bank. [Online]
Available at:
https://www.rcsb.org/
🕮 The Genome Reference Consortium.
Available at:
https://www.ncbi.nlm.nih.gov/grc
🕮 Ensembl Genome Browser.
Available at:
https://www.ensembl.org
🕮 The Human Genome Variation Database.
Available at:
https://www.hgvd.genome.med.kyoto-u.ac.jp
🕮 The Exome Aggregation Consortium.
Available at:
https://gnomad.broadinstitute.org/