Transcript for:
Understanding DNA, CPG Sites, and Methylation

DNA is made up of four bases, cytosine, guanine, thymine, and adenine. Cytosine base pairs with guanine, and thymine base pairs with adenine. CPG sites are DNA regions where a cytosine is followed by a guanine in the 5'to 3'direction. Note that a CPG refers to a cytosine connected to a guanine by a phosphodiester bond, as shown here.

It is not a cytosine base paired with a guanine, as shown here. Due to random probability, you would expect the frequency of CPGs to be 6.25%. However, instead the observed frequency in humans is about 1%. To understand why, we first need to discuss the most common single nucleotide mutation. Spontaneous deamination of an unmethylated cytosine turns it into a uracil.

Uracil isn't one of the four base pairs making up DNA. but is purely a product of deamination. It is recognized and removed by uracil DNA glycosylase, starting the process of DNA repair.

Easy peasy. Well, sometimes, a DNA methyl transferase adds a methyl group at the 5 position of the cytosine in a CPG dinucleotide, forming 5-methylcytosine. In this case, spontaneous deamination turns the cytosine into a thymine.

Thymine is one of the four bases making up DNA, so this can only be corrected through mismatch repair, which is really inefficient. Thymine DNA glycosylase, or TDG, replaces T's from TG mismatches. However, it isn't fast enough to keep up with the rate of rapid nucleotide mutation, so the DNA replicates before the mutation is fixed, and over time, methylated CG sequences get converted, to TG sequences. For vertebrates, this leads to CPGs being observed at a much lower frequency in the DNA sequence than would be expected by pure chance, a phenomenon called CG suppression. In the genomes of invertebrates, such as Drosophila and C.

elegans, however, CPGs are observed at the expected frequency. Why is that? Well, unlike vertebrates, these organisms have little or no methylation.

so they don't get accidental deamination of 5-methylcytosine to form thymine, only cytosine forming uracil. But vertebrates do have methylation, and cytosine tends to get methylated when it is followed by guanine, leading to its mutation to thymine. So CPGs are rare in vertebrate DNA. However, there are areas of the genome that have what are called CPG islands. In general, These are defined as being regions of at least 200 base pairs with a GC percentage greater than 50%, and an observed to expected CPG ratio exceeding 60%.

Unlike the otherwise heavily methylated genome, CPG islands tend to be unmethylated. Why? Although methylation allows a distinction between the parent and newly synthesized strands, which aids in DNA proofreading, Spontaneous deamination tends to turn methylated cytosines into thymines over time.

Genes whose expression is suppressed have methylated CPG sequences, so the deamination cytosine massacre begins, and over evolutionary timescales, this eventually leads to the observed deficiency of CPGs in these inactive genes. On the other hand, the existence of CPG islands in active genes means that these parts of the genome have forces acting on them that select for higher CPG content and less methylation in an area. But what could these forces protecting CPG islands from mutations be? Evidence suggests that CPG islands, the CPG-rich areas that have survived through evolutionary time, tend to be protected, by remaining unmethylated, because of their action as promoters, regions of DNA that initiate gene transcription.

Around 70% of human promoters have a high percentage of CPGs, and they are the most common promoter type in the vertebrate genome. CPG islands are typically near the transcription start site of genes, and are especially common near the transcription start site of housekeeping genes in vertebrates. Housekeeping genes are genes that are necessary for basic survival functions of the cell, and are hence found throughout all cells.

CPGs in promoter regions tend to remain unmethylated if the genes are expressed, but CPGs in promoter regions of inactive genes get methylated. The methylation of CPG island promoters prevents binding of transcription factors and results in gene silencing. Extrapolating on the idea that unmethylated CPG islands are found in the promoters of genes that get expressed, transcription during early development must be important to establish which DNA should be in the methylation-free state.

Specifically, all CPG island promoters need to be active during the waves of de novo methylation that occur at the blastocyst stage. And indeed, large-scale analyses have shown that 90% of genes with CPG island promoters are expressed in the early embryo. However, some CPG islands do become methylated during normal development.

For example, hundreds of CPG islands are heavily methylated on just one of the two X chromosomes in female eutherian mammals, such as primates. The CPG islands on this silenced chromosome only become methylated after gene silencing, locking in the silenced state rather than initiating it. X chromosome inactivation is a normal developmental process, where the methylation of CPG islands results in the stable silencing of the associated promoters.

This ensures that females don't get excess expression of X chromosome genes in comparison to males. One important implication of this silencing via methylation is for cancer. Cancerous tissues have methylation differences from their original tissues, and most of these methylation differences occur at CPG island shores, which are located at short distances from islands, not on the islands themselves. Hypermethylation of CPG islands in promoter regions is 10 times more likely to cause loss of expression of genes compared to mutations.

If this hypermethylation results in the repression of DNA repair genes, this can promote cancer development. So now you can see why CPG islands would be found in areas of the genome which have been protected from mutation, such as the promoters of those DNA repair genes. Interestingly, apart from their elevated CPG density, CPG islands do not have much long-range sequence conservation. Sometimes they even lack core promoter elements such as the tata box.

which specifies where transcription begins. So why are CPG islands fit to act as promoters? Perhaps it's simply because the high frequency of guanines and cytosines increases the probability that transcription factors will bind.

Non-methylated CPG islands show a characteristic organization of chromatin structure that predisposes them to promoter activity, and chromatin with CPG islands has high levels of acetylation of the H3 and H4 histones. while H1 is depleted, which is characteristic of transcriptionally active chromatin. If you liked this video, like and subscribe.

You can also support me by following the link to my Patreon. If you have any topics you'd like me to cover, please leave a comment.