|
|
Mouse Enhancer Screen Handbook and Methods
Purpose
The goal of this project is to identify distant-acting transcriptional enhancers in the human genome by coupling the identification of evolutionary conserved noncoding sequences with a moderate throughput mouse transgenesis enhancer assay.
Data Sets
There are two types of data found at this site. They are (1) an "Experimental Dataset" of conserved noncoding human sequences which have been tested for enhancer activity in transgenic mice and (2) a "Computational Dataset" of whole human genome conserved noncoding elements based on maximizing constraint in human-mouse-rat genome comparisons (Prabhakar et al. Genome Res. 2006;16:855-63). Details on each of these datasets and their availability in the website are further described below. The Experimental Dataset is also integrated into the Computational Dataset to facilitate the various search options described below.
(1) Experimental Dataset: This section contains all elements that have been tested for enhancer activity in transgenic mice. As of October 2006, approximately 300 elements can be found in this portion of the database and this number is expected to grow steadily.
Within the Experimental Dataset (accessible from the Enhancer Browser homepage), a list of tested enhancer elements is provided. The left hand column indicates the location of the tested elements in the human genome (based on build hg17, May 2004, UCSC). The second column describes the known genes flanking the defined element or, in the case of intronic elements, the gene within which it is located. The third column indicates if the element is a positive enhancer (blue mouse) or negative for enhancer activity at e11.5 (white mouse). Finally, the right-most column indicates the species displaying conservation with the human element. The term "ULTRA" refers to human-rodent ultraconserved elements, defined as >=200bp of perfect sequence identity between human/mouse/rat (Bejerano et al., Science 304:1321-5).
By clicking on the coordinates hyperlink (left column), the user is taken to the existing data for that given element. Features of these drill-down pages include the elements:
- coordinates - flanking genes - description of the expression pattern in standardized anatomical terms - medium-resolution images of representative transgenic embryos (click on an image to display the high-resolution version) - FASTA sequence of the DNA element tested in the in vivo assay - PCR primers used for cloning - a UCSC/Vista Browser overview and clickable hyperlink.
(2) Computational Dataset: Our primary goal is to facilitate the identification of distant-acting transcriptional enhancers in the human genome. Our approach has been to use human-mouse-rat sequence comparison to identify the top evolutionarily conserved regions in the genome. These conserved regions are filtered for any evidence of coding or exonic sequence, resulting in a set of ~170,000 noncoding sequences that are highly conserved between humans and rodents. Details on the methods can be found below, as well as in Prabhakar et al. (Genome Res 16:855-63).
Caveats
Experimental Dataset: Users should keep in mind that our assay captures only a single embryonic timepoint. A negative result reported in our experimental dataset does not necessarily imply that this conserved element is not a transcriptional enhancer, since it might be active at earlier or later timepoints in development.
Computational Dataset: We set a high threshold for the conservation score (p-value < 0.001) in order to identify only the most constrained genomic elements, and users should keep in mind that it is likely that weakly conserved regulatory elements will not be contained in this set. Moreover, we recommend that users manually examine conserved elements for evidence of function as a protein-coding or RNA gene, as mRNA and EST databases have expanded since the dataset was generated. Searching the Enhancer Browser
The data can be accessed in several ways from the Enhancer Browser homepage (http://enhancer.lbl.gov):
Keyword Search: Searchable keywords include Gene Symbols, GenBank Accession Numbers, and Entrez Gene Numbers. By default, only the Experimental Dataset is searched. Go to the "Computational Dataset" section of the database to search the larger set of candidate enhancer regions. Since a significant proportion of genes in the human genome do not contain strong conservation in noncoding sequence, the user should not be surprised or discouraged by the lack of elements in our dataset near certain genes. Larger sets of conserved elements can be found on the UCSC (http://genome.ucsc.edu/) or VISTA (http://pipeline.lbl.gov/) Genome Browsers.
Coordinate Search: Both the Computational and Experimental Datasets can be searched using genomic coordinates (human genome, build hg17, May 2004) in "chr16:1-10000000" format. This query will generate a list of all elements in the specified genomic region.
Chromosome Search: Users can select a human chromosome to enter the "Computational Database" and the cataloged elements will appear in a browsable list based on their consecutive coordinates.
Expression Pattern Search: All elements that have been experimentally validated as transcriptional enhancers are annotated using a standardized anatomical vocabulary. These annotations can be searched using the Expression Pattern Search option. Checking more than one structure in this query form will retrieve all elements with reproducible expression in any of the selected structures.
Additional search options: Additional search options are available in the Advanced Query form. These include searches restricted to elements with a user-defined conservation depth (e.g. human-chicken, human-fugu) as well as the possibility to list elements that were negative at for enhancer activity at e11.5.
Bulk downloads: The results of all queries can be downloaded as a bulk file in FASTA format using the "Download Data" function at the very bottom of the results table. The headers for each sequence indicate if an element was tested positive or negative and, in the case of positive enhancers, the reproducibility and tissue specificity. In order to download a complete list of all experimental results including positive and negative elements, display the entire Experimental Dataset and use this function.
Materials and Methods
The basic flow-chart of events for this project are depicted in Figure 1.
Computational Analysis
We performed comparative analysis between the human genome and a wide range of available species (mouse, rat, chicken, frog, fugu, tetraodon and zebrafish) (http://pga.lbl.gov/gumby). We used PARAGON (Ahituv et al. Hum Mol Genet. 2005;14:3057-63) to identify regions of evolutionarily conserved synteny, MLAGAN (Brudno et al. Genome Res. 2003;13:721-31) to align syntenic blocks, and Gumby (Prabhakar et al. Genome Res. 2006;16:855-63) to identify evolutionarily conserved regions (p-value < 0.001). Since our goal is the identification of gene enhancer sequences, these comparative alignments were filtered for overlap with exons of known genes, mRNAs or spliced "Expressed Sequence Tags." In general, our primary focus is to study human-fugu conserved noncoding sequences where the human version is tested in our transgenic enhancer assay. In addition, we have utilized other comparative genomic datasets of extreme conservation such as noncoding "Ultra-conserved Elements" (defined as >200bp and 100% identical between human/mouse/rat) (Bejerano, et al. Science. 304(5675):1321-5). In total this comparative analysis and exonic filtering resulted in the identification of ~3,100 human-mouse-fish conserved noncoding and 256 ultra-conserved elements in the Human Genome. As of October 2006, ~300 of these elements have been tested in transgenic mice and deposited in our Enhancer Browser Database (http://enhancer.lbl.gov).
Summary Statistics of the Human Noncoding "Computational Dataset":
All elements: 171,853 (defined in Prabhakar et al. Genome Res. 2006, and available through http://pga.lbl.gov/gumby)
conserved in chicken 40,033
conserved in frog 14,568
conserved in fugu 3,124
conserved in fugu, tetraodon or zebrafish 5,668
We should emphasize that these lists are not static but will change as additional genomic data become available. We will also manually examine elements to be tested for potential missed or new coding sequence evidence or false-positive alignments. As is found in all genome-wide filtering and alignment strategies, we observe occasional sequences that do not meet our original criteria. We are also interested in providing outside investigators the opportunity to prioritize elements for transgenic testing through requests at our website. Our current pipeline processes batches of 48 or 96 elements for testing.
Shyam Prabhakar, Francis Poulin, Malak Shoukry, Veena Afzal, Edward M. Rubin, Olivier Couronne, Len A. Pennacchio. Close sequence comparisons are sufficient to identify human cis-regulatory elements. Genome Res. 2006;16:855-63. Molecular Biology
Primer design
PCR primers are designed to amplify candidate human gene enhancers selected based on evolutionary conservation.
Primers are designed in bulk using an existing local version of the program primer3. This program is fed a flat text file of repeatmasked conserved sequence plus 200-400 bp of flanking sequence on each end and use a uniform Tm of 60oC.
Cloning
In order to increase the throughput for reporter construct generation, we are exploiting the Gateway Recombination System (Invitrogen). Primers are ordered in a 96-well format and forward oligos are 5' tailed with the required sequence (CACC) for cloning into Gateway vectors. PCRs are performed on human genomic DNA (BD Biosciences), sequenced and successful products cloned into the standard Gateway entry vector (pENTR/D-TOPO vector, Invitrogen) using the manufacturer recommended protocol. Using the gateway system, the shuffling of these insert sequences to other vectors of choice is made simple and these vectors are available by request.
We have generated a Gateway compatible reporter vector that contains a Gateway cassette and a Hsp68 promoter coupled to the LacZ reporter gene. Each Entry clone is transferred into the destination reporter vector using the LR recombination reaction and vector PCR and/or restriction enzyme analysis are used to confirm successful insert transfer.
Microinjection into fertilized eggs
Plasmid DNA is linearized with XhoI or HindIII, followed by purification on Micropure EZ columns and Montage PCR filter units (Millipore). The DNA is diluted with injection buffer (10 mM Tris, pH 7.5; 0.1 mM EDTA) to a final concentration of 1.5 to 2 ng/ul and used for pronuclear injections of FVB embryos in accordance with standard protocols approved by the Lawrence Berkeley National Laboratory.
Embryo harvesting and LacZ staining
Embryos are harvested at embryonic day 11.5 and dissected in cold PBS, followed by 30 min of incubation with 4% paraformaldehyde at 4oC. The embryos are washed three times for 30 min with wash buffer (2mM MgCl2; 0.01% deoxycholate; 0.02% NP-40; 100mM phosphate buffer, pH 7.3). Embryos are stained for 24 h at room temperature with freshly made staining solution (0.8mg/ml X-gal; 4mM potassium ferrocyanide; 4mM potassium ferricyanide; 20mM Tris, pH 7.5 in wash buffer) followed by 3 rinses in PBS and post-fixed in 4% paraformaldehyde. Yolk sacs are dissected from embryos and DNA prepared by boiling the tissue for 20 min in 75ml of solution 1 (25mM NaOH; 0.2mM EDTA), followed by neutralization with 75ml of solution 2 (40mM Tris-HCl). Yolk sac DNA is screened by PCR with LacZ primers (LacZ-F5'-TTTCCATGTTGCCACTCGC;LacZ-R5'-AACGGCTTGCCGTTCAGCA).
Annotation
Each positive developmental enhancer is annotated based on the observed spatial pattern of expression. These annotations are done by multiple curators in a group setting. To be defined as a POSITIVE enhancer, an element has to show reproducible expression in the same structure in at least three independent transgenic embryos. For each structure we provide the reproducibility of the observed pattern. Elements for which at least five transgenic embryos have been obtained, but no reproducible expression in any structure is observed in at least three different embryos are defined as NEGATIVE.
All embryos are preserved and, upon request, can be provided to outside investigators for their own examination and interpretation. References:Pennacchio, L. A., Ahituv, N., Moses, A., Prabhakar, S., Nobrega, M., Shoukry, M., Minovitsky, A., Dubchak, I., Holt, A., Lewis, K., Plazer-Frick, I., Akiyama, J., DeVal, S., Afzal, V., Black, B., Couronne, O., Eisen, M., Visel, A., and Rubin, E.M. 2006. In vivo enhancer analysis of human conserved non-coding sequences. Nature, 444(7118):499-502. Poulin, F., Nobrega, M., Plajzer-Frick, I., Holt, A., Afzal, V., Rubin, E. M., and Pennacchio, L. A. 2005. In vivo characterization of a vertebrate ultra-conserved enhancer. Genomics, 85(6):774-781. Prabhakar, S., Poulin, F., Shoukry, M., Afzal, V., Rubin, E. M., Couronne, O., and Pennacchio, L. A. 2006. Close sequence comparisons are sufficient to identify human cis-regulatory elements. Genome Research, 16(7):855-63. Visel, A., Minovitsky, S., Dubchak, I., Pennacchio, L. A. 2007. VISTA Enhancer Browser-A Database of Tissue-Specific Human Enhancers. Nucleic Acids Research, 2006 Nov 27; [Epub ahead of print].
|