zCNE

zCNE Resource

Methods

This CNE set relies on a multiple sequence alignment to tetrapod and fish species using the following species (assemblies): fugu (fr3), medaka (oryLat2), stickleback (gasAcu1), tetraodon (tetNig2), lamprey (petMar1), cow (bosTau4), dog (canFam2), horse (equCab2), chicken (galGal3), human (hg19), elephant (loxAfr3), mouse (mm9), opossum (monDom5), platypus (ornAna1) and frog (xenTro2). Zebrafish is the reference species in the syntenic multiple genome alignment. Each CNE is >= 50 bp and conserved to at least two species requiring at least 65% sequence identity and an alignment entropy of >= 1.8 bits. To also make use of species without assembled genomes, we used a sensitive last [1] screen against DNA sequences in the NCBI trace archive and GenBank and added CNEs that align well to only one species in the genome alignment and to at least one other non-tetrapod vertebrate with 75% identity. Please see our manuscript for further details.

We annotate human/mouse conservation if a zCNE overlaps a well-aligning window to human/mouse with at least 15 bp. To more comprehensively annotate human and mouse ancestry, we also used ancestral sequence reconstruction and transitivity in addition to directly aligning sequences to identify homologies not detected in the multiple alignment.

Ancestral reconstruction

Computationally reconstructing the likely sequence of ancestors can be used to reduce the large evolutionary distances between zebrafish and human/mouse and to uncover homologies. We used prequel (Phast package, [2]) for ancestral reconstruction and lastz [3] to align ancestral sequences.

Transitivity

Transitivity means that if a sequence is orthologous between species A and B as well as B and C, then it can be inferred that this sequence is also orthologous between species A and C, even if the alignment A and C was not directly detected. This is equivalent to using species B as the reference species.

Output

UCSC Custom Tracks

All of the zCNEs will be loaded into the UCSC Genome Browser as the custom track "Bej zCNEv1" for Zebrafish Assembly: Wellcome Trust Zv9 (danRer7, Jul/2010). The zCNEs will be colored blue and red. The red elements are those with evidence of conservation to mouse or human. Make sure you click the "Outside Link" in the track details to get more information about each CNE of interest. Ex: The details for zCNEv1_47264.

GREAT Results

Why submit the zCNEs to GREAT: Genomic Regions Enrichment of Annotations Tool?

GREAT calculates statistical enrichments for associations between genomic regions and the functional annotations of flanking genes. This allows GREAT to generate hypotheses about the regulatory functions of the set of genomic regions. Such hypotheses can be tested by directed zebrafish experiments to reveal insights into vertebrate biology.

How are the distances to the genes defined?

The zCNE resource defines distances to gene TSS in the same away as GREAT defines them.

Why are some of the p-values I will see in GREAT be 0.0000?

Large clustering of zCNEs around many key genes causes the p-value for the observation to be smaller than the precision of the computer and thus the number becomes 0. The terms can be considered very significant.

Filter Data

zCNEs by Target Gene

Enter a target gene of interest to download or view in UCSC Genome Browser all CNEs that are putatively regulating that gene. Putative target genes are called using the Basal plus extension GREAT regulatory domains.

How do you associate zebfrafish genes with zCNEs?

Zebrafish genes are associated with zCNEs using GREAT regulatory domains. Please see GREAT genes and GREAT associations rules for more information on how the gene set is created and then associated with regions.

zCNEs by Region

Enter the genome coordinates of the loci of interest in Wellcome Trust Zv9 (danRer7, Jul/2010) coordinates to download or view in UCSC Genome Browser all CNEs that are in that loci.

What if a zCNE is partially in, partially out of the specified region?

To avoid confusion, we provide the BED coordinates or FASTA sequence of the full zCNE even if the zCNE only overlaps the specified region by 1 base.

Data Formats

The zCNE interface provides data for download in either BED9 or FASTA format.

Troubleshooting

When I click over to the human or mouse genome, why do I sometimes not see an alignment to zebrafish even if the zCNE indicates it is conserved to mouse or human?

We use more sensitive methods such as transitivity and ancestral reconstruction to detect conservation that the public UCSC alignment. Please see above methods section and our manuscript for details on how we improve our sensitivity.

Why might some links be broken?

We occasionally link to external resources such as ZFIN or Ensembl for more details. When external resources changes the format of the URLs, our links break. Please report broken links so that we can fix them.

Credits

These data were generated by the Bejerano Lab.
Michael Hiller and Saatvik Agarwal generated the pairwise and multiple alignments and the CNE set, Jim Notwell applied transitivity, Michael Hiller applied ancestral reconstruction and Aaron Wenger contributed tools and to the data analysis. Ravi Parikh and Harendra Guturu built the website.

Cite Us

If you use the zCNEs in your work, please cite:

  • Michael Hiller, Saatvik Agarwal, Jim H. Notwell, Ravi Parikh, Harendra Guturu, Aaron M. Wenger, Gill Bejerano. "Computational methods to detect conserved non-genic elements in phylogenetically isolated genomes: application to zebrafish". Nucleic Acids Res., 2013. PMID 23814184

References


Hubisz MJ, Pollard KS & Siepel A. PHAST and RPHAST: phylogenetic analysis with space/time models. Brief Bioinform 12, 41-51 (2011).

Harris RS (2007) Improved pairwise alignment of genomic DNA. Ph.D. Thesis, The Pennsylvania State University.