Effects of repetitive DNA and epigenetics on human genome regulation
The highly developed and specialized anatomical and physiological characteristics observed for eukaryotes in general and mammals in particular are underwritten by an elaborate and intricate process of genome regulation. This precise control of the location, timing and amplitude of gene expression is achieved by a variety of genetic and epigenetic tools and mechanisms. Such tools include cis- and trans- transcriptional regulation, epigenetic marks and chromosomal conformation in the nucleus [78, 79]. While all these regulatory mechanisms have been extensively studied, our understanding of the complex and diverse associations between various epigenetic marks and genetic elements with genome regulatory systems has remained incomplete. However, the last few years have seen a profound development in two areas that have significantly improved the depth and breadth to which their functions and relationships can be understood; 1) Next generation sequencing (NGS) and 2) its application in the genome-wide profiling of multiple DNA elements and functional factors. These include suites of histone modifications, transcription factors, DNA methylations and DNAse hypersensitive sites in various mammalian tissues by the ENCODE consortium and other research laboratories. The objective of this thesis has been to apply bioinformatic computational and statistical tools to analyze and interpret various recent high throughput datasets from a combination of Next generation sequencing and Chromatin immune precipitation (ChIP-seq ) experiments. These datasets have been analyzed to further our understanding of the dynamics of gene regulation in humans particularly as it relates to repetitive DNA, cis-regulation and DNA methylation. The thesis thus resides at the intersection of three major areas in the overarching domain of human genome regulation; transposable elements, cis-regulatory elements and epigenetics. It explores how those three aspects of regulation relate with gene expression and the functional implications of those interactions. From this analysis of high throughput datasets, the thesis provides new insights into; 1) the relationship between the transposable element environment of human genes and their expression, 2) the role of mammalian-wide interspersed repeats (MIRs) in the function of human enhancers and enhancement of tissue-specific functions, 3) the existence and function of composite cis-regulatory elements and 4) the dynamics and relationship between human gene-body DNA methylation and gene expression. The specific advances of my research in the field of human genome regulation are summarized as follows: Research advance 1: With both TE fractions and GL being highly correlated to gene length, this study evaluated the two parameters together and teased apart their relative contributions to the gene expression parameters of tissue-specificity and expression levels. By showing that GL is strongly correlated with overall expression level but weakly correlated with the breadth of expression, this study elicited evidence for the selection hypothesis  that attributes the compactness of highly expressed genes to selection for economy of transcription as opposed to the genomic design hypothesis . In fact, TE fractions of human genes were shown to be more anti-correlated to gene expression levels, suggesting that TEs, rather than GL might be more important targets of selection for transcriptional economy. Finally, MIRs were found to be the only TEs that positively associate with tissue-specific gene expression. Relevance of TEs environment for gene expression was confirmed and distinct mechanisms by which they may contribute to genome regulation were adduced. Research advance 2: Mammalian-wide interspersed repeats (MIRs), previously shown to be related to tissue-specific gene expression , are shown to execute this function primarily through enhancers. This study found MIRs to be significantly enriched within enhancers and reports many novel MIR-derived enhancers. Indeed, the density of enhancer-MIRs around genes is shown to be significantly related to both their level of expression, their tissue specificity and to be involved in tissue-specific cellular functions. MIRs within enhancers are shown to possess significantly higher numbers of transcriptional factor binding sites (TFBSs) relative to the genomic background, a finding that might explain their co-option into enhancers and thus their longstanding conservation and wide distribution in the mammalian clade. Research advance 3: This research adduced evidence that confirmed previous postulations that distinctions between different classes of cis-regulatory elements may not be definitive and that different elements might share regulatory features and mechanisms. Taking boundary elements and enhancers within the human CD4+ T cells as examples, we identified 174 composite cis-regulatory elements, for which both enhancers and boundary elements are co-located. These composite cis-regulatory elements possess unique chromatin environments and regulatory features and are revealed to facilitate cell-type specific functions. Research advance 4: This research used the approach of a meta-analysis of new high throughput chromatin, methylation and gene expression datasets to address aspects of the long standing DNA methylation paradox . Contrary to previous knowledge [2, 4, 56, 83, 88, 108], it is shown that the relationship between gene-body methylation and gene expression levels is not linear but actually non-monotonic (bell-shaped). These results confirm that gene-body DNA methylation does serve to repress spurious intragenic transcription. However, they also illustrate that role to be only epiphenomenal, with gene-body methylation levels being predominantly determined by the accessibility of the DNA to methylating enzyme complexes rather than by an evolutionary adaptation to minimize the spurious intragenic transcription.