Introduction to Bacterial genomes

From MC Chem Wiki
Jump to navigation Jump to search

(BES, June 2021)

This page is designed as a basic introduction to understanding the bacterial species found in a sample based on the chromosomal DNA extracted from a sample.

Basic Information

1) Bacteria can be separated into two different types: Gram-positive and, 2) Gram-negative. This distinction between Gram-positive/negative is based on a lab staining procedure (by Gram) using crystal violet. Gram-negative bacteria have a more complex cell wall, therefore crystal violet does not "stain" the cell wall. The specific details of the cell walls can be found by clicking the links above. The type of bacteria is important to note since, as will be discussed below, we must break open (lyse) the bacteria to collect the chromosomal DNA in order to identify the bacteria. Note: archaea are very similar to bacteria; they have a cell wall that is different from bacteria and hence Gram staining is not used to identify.

2) Bacteria contain two different types of DNA: chromosomal/genomic DNA, and plasmid DNA. In a most basic sense, chromosomal DNA is the hardwired set of instructions defining all aspects of the certain bacteria. Different bacteria, with different functions, have different chromosomal DNA. There are more than 1 million bacteria species that have been sequenced; Wikipedia has a list of sequenced bacterial "genomes", aka chromosomal DNA. Plasmids are considered an "extra" collection of genetic material that in NOT a part of the hardwired instructions, but can add some functionality to the bacteria. Plasmid DNA is not a unique identifier of a bacterial species; to identify a bacterial species the chromosomal DNA must be sequenced.

3) DNA is DeoxyriboNucleic Acid. DNA carries genetic instructions for the development, function, growth, and reproduction (ie. everything) of all known organisms and many viruses. DNA is very, very, very similar in similar organisms, but the information contained within DNA is expressed differently in different individuals leading to different physical traits. Regardless of how DNA is expressed, the DNA sequence is a representation of a particular organism/species.

4) DNA is a chemical substance composed of two polynucleotide chains that coil around each other to form a double helix. Each chain or polymer is made up of nucleotides which in turn are composed of three parts: a five-carbon sugar called deoxyribose, a nucleobase, either adenine (A), cytosine (C), guanine (G), or thymine (T), and a phosphate group. The nucleobase in one polynucleotide chain matches to the nucleobase in the other polynucleotide chain based on the pairing, A-T, C-G.

5) One frequently used section of bacterial and archaea chromosomal DNA is called 16S rRNA gene; this DNA codes for a particular part of the bacteria's ribosome. This section of DNA has been studied extensively and contains ~1500 DNA base pairs. Although the entire 1500 base pairs in the 16S rRNA gene (chromosomal DNA) can be sequenced, there are particular parts that give enough information to differentiate species of bacteria and archaea. The majority of the 1500 base pairs is the same (ie. conserved) in all bacteria/archaea, but there are 9 sections of the DNA (called hypervariable regions) where the DNA is different and specific to a particular bacterial/archaea species. These regions are referred to as V1, V2, V3, V4, V5, V6, V7, V8, and V9 or abbreviated as V1-V9. Note that surrounding the variable regions that are conserved regions; this will become important when carrying out the analysis (ie PCR).

Single Bacterial Species DNA (Most Simple Example)

As noted above, the 16S rRNA gene is approximately 1500 base pairs long and includes nine hypervariable regions (V1-V9); it can be implied that there are then ~10 conserved regions.

References

- The effect of 16S rRNA region choice on bacterial community metabarcoding results, refs 1–3.
- A renaissance for the pioneering 16S rRNA gene (ref 1 from above)
- Conservative Fragments in Bacterial 16S rRNA Genes and Primer Design for 16S Ribosomal DNA Amplicons in Metagenomic Studies (ref 2 from above)
- Evaluation of different partial 16S rRNA gene sequence regions for phylogenetic analysis of microbiomes (ref 3 from above)

Let's work a specific example

Consider the bacterial species that is listed in a phylogenetic tree as belonging to the following:

- Domain Bacteria,
- Phylum Firmicutes,
- Class Bacilli,
- Order Bacillales,
- Family Bacillaceae
- Genus/Genera Lactobacillus
- Species acidophilus.

This bacteria would be commonly referred to using the genus-species name, ie. Lactobacillus acidophilus or just L. acidophilus This bacteria is gram-positive with a know functionality of fermenting sugars into lactic acid. It is very tolerant of low pH environments and is considered a beneficial bacteria in the human mouth and gut. The L. acidophilus chromosomal DNA has been fully sequenced and has 1,993,564 base pairs (bp) and 1,864 genes that code for a characteristic or specific function(ref) . One of these 1,864 genes is the 16S rRNA gene composed of ~1500 base pairs.

If you wanted to know what the nucleotide sequence for the entire genome, you can go to the NIH GenBank. At this site you can search "Lactobacillus acidophilus NCFM", where NCFM is a particular strain of L. acidophilus. The first hit in the list is Lactobacillus acidophilus NCFM, complete sequence. Selecting the main title on this entry then shows information, but not the sequence. What we really want to see is the FASTA (pronounced fast A) file format...the first 5 lines (70 bp per line) looks like this:

ATGCTTTTTACAAATGGGGGAATATTAACTTTGTTCAATTTTGAAAAATTTTGGCAACATTTTAACGATG
AAATGCGTGCTCGTTTTAATGAGGTTGCCTATAATGCATGGTTTAAAAATACTAAGCCTATCTCGTACAA
CCAAAAAACGCATGAATTAAAAATTCAAGTTCAAAATCCAGTTGCAAAAGGTTATTGGGAAAAAAATCTT
TCTTCGCAACTAATTCAATCTGCATATGGTTATGCTGGTGTTGAACTTCTACCTGTCTTTCAAATCTCCG
AAGACAGTGATACACCTGAAAGAATAGTAACGCCTGAACCTCAACATAATTTGCAAACCACACCAACGCG...

16s

Where in this 1,993,564 base pairs is the sequence of the 16S rRNA gene? A similar search for "Lactobacillus acidophilus 16S" results in a few hits showing the ~1,500 bp sequences. Selecting Lactobacillus acidophilus strain JCM_1132 16S ribosomal RNA gene, partial sequence, then the FASTA generates the following:

GCGTGCTAATACATGCAAGTCGAGCGAGCTGAACCAACAGATTCACTTCGGTGATGACGTTGGGAACGCG
AGCGGCGGATGGGTGAGTAACACGTGGGGAACCTGCCCCATAGTCTGGGATACCACTTGGAAACAGGTGC
TAATACCGGATAAGAAAGCAGATCGCATGATCAGCTTATAAAAGGCGGCGTAAGCTGTCGCTATGGGATG
GCCCCGCGGTGCATTAGCTAGTTGGTAGGGTAACGGCCTACCAAGGCAATGATGCATAGCCGAGTTGAGA
GACTGATCGGCCACATTGGGACTGAGACACGGCCCAAACTCCTACGGGAGGCAGCAGTAGGGAATCTTCC
ACAATGGACGAAAGTCTGATGGAGCAACGCCGCGTGAGTGAAGAAGGTTTTCGGATCGTAAAGCTCTGTT
GTTGGTGAAGAAGGATAGAGGTAGTAACTGGCCTTTATTTGACGGTAATCAACCAGAAAGTCACGGCTAA-
CTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGTTGTCCGGATTTATTGGGCGTAAAGCGAGC
GCAGGCGGAAGAATAAGTCTGATGTGAAAGCCCTCGGCTTAACCGAGGAACTGCATCGGAAACTGTTTTT
CTTGAGTGCAGAAGAGGAGAGTGGAACTCCATGTGTAGCGGTGGAATGCGTAGATATATGGAAGAACACC
AGTGGCGAAGGCGGCTCTCTGGTCTGCAACTGACGCTGAGGCTCGAAAGCATGGGTAGCGAACAGGATTA-
GATACCCTGGTAGTCCATGCCGTAAACGATGAGTGCTAAGTGTTGGGAGGTTTCCGCCTCTCAGTGCTGC
AGCTAACGCATTAAGCACTCCGCCTGGGGAGTACGACCGCAAGGTTGAAACTCAAAGGAATTGACGGGGG-
CCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCAGGTCTTGACATCT
AGTGCAATCCGTAGAGATACGGAGTTCCCTTCGGGGACACTAAGACAGGTGGTGCATGGCTGTCGTCAGC-
TCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTCATTAGTTGCCAGCATTAAGT
TGGGCACTCTAATGAGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAGTCATCATGCCCC
TTATGACCTGGGCTACACACGTGCTACAATGGACAGTACAACGAGGAGCAAGCCTGCGAAGGCAAGCGAA
TCTCTTAAAGCTGTTCTCAGTTCGGACTGCAGTCTGCAACTCGACTGCACGAAGCTGGAATCGCTAGTAA
TCGCGGATCAGCACGCCGCGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCGTCACACCATGGGAGT
CTGCAATGCCCAAAGCCGGTGGCCTAACCTTCGGGAAGGAGCCGTCTAAGCAG

bp in this color are conserved regions.

primers

Forward primer (CS1_341F) (5’) ACACTGACGACATGGTTCTACACCTACGGGAGGCAGCAGCAG (3’) (actual primer in yellow)

Reverse primer (CS2_806R) (5’) TACGGTAGCAGAGACTTGGTCTGGACTACHVGGGTWTCTAAT (3’) (actual primer in yellow)

If we have isolated a single bacteria species, then we can isolate the chromosomal DNA using a Qiagen kit. Since the entire chromosomal DNA is , amplify the 16S rRNA gene DNA (PCR)

The V3 region of the 16s gene is ~ base pairs. The V4 region of the 16s gene is ~ 254 base pairs.