Data and Information relative to the article published in PLoS Genetics (DOI:)
Extensive divergence of transcription factor binding in Drosophila embryos with highly conserved gene expression
Mathilde Paris1, Tommy Kaplan1,2, Xiao Yong Li1,3, Jacqueline E. Villalta2, Susan E. Lott1,4, Michael B. Eisen1,2,3
1 Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, California, United States of America, 2 School of Computer Science and Engineering, The Hebrew University, Jerusalem, Israel, 3 Howard Hughes Medical Institute, University of California Berkeley, Berkeley, California, United States of America, 4 Department of Evolution and Ecology, University of California, Davis, California, United States of America.
Corresponding authors: MP (firstname.lastname@example.org) and MBE (email@example.com)
To better characterize how variation in regulatory sequences drives divergence in gene expression, we undertook a systematic study of transcription factor binding and gene expression in blastoderm embryos of four species, which sample much of the diversity in the 40 million-year old genus Drosophila: D. melanogaster, D. yakuba, D. pseudoobscura and D. virilis. We compared gene expression, measured by mRNA-seq, to the genome-wide binding, measured by ChIP-seq, of four transcription factors involved in early anterior-posterior patterning. We found that mRNA levels are much better conserved than individual transcription factor binding events, and that changes in a gene’s expression were poorly explained by changes in adjacent transcription factor binding. However highly bound sites, sites in regions bound by multiple factors and sites near genes are conserved more frequently than other binding, suggesting that a considerable amount of transcription factor binding is weakly or non-functional and not subject to purifying selection.
- All sequencing data were deposited in the NCBI GEO (accession number GSE50773)
- Alignment of the genomes of D. melanogaster, D. yakuba, D. pseudoobscura and D. virilis. Synteny map was created using MERCATOR and alignments were created using PECAN.
- Modified annotations for D. yakuba, D. pseudoobscura and D. virilis. These annotations were created using the reference annotations as well as RNA-seq data of pooled embryos (GEO accession number GSE50773). Gene IDs correspond to the IDs in the orthology table described below.
- Expression levels per set of orthologous genes per species between D. melanogaster, D. yakuba, D. pseudoobscura and D. virilis. Inferred ancestral values as well as parameters of the Brownian motion model are given. Orthology assignment between genes was established based on the whole-genome alignment: genes were considered orthologous if the coordinates of their exons intersect more than 40% of their total length and if their orientation is the same (or unknown). Because this method is genome-alignment based, it takes into account both sequence similarity and synteny, thus favoring ortholog over paralog association. We removed from the analysis genes that showed orthology inconsistencies (e.g. genes with different orthologs in different species).