Motivation: Clusters of extremely conserved non-coding elements (CNEs) mark genomic regions
devoted to cis-regulation of key developmental genes in Metazoa. We have recently shown that their
span coincides with that of topologically associating domains (TADs), making them useful for estimat-
ing conserved TAD boundaries in the absence of Hi-C data. The standard approach - detecting CNEs
in genome alignments and then establishing the boundaries of their clusters - requires tuning of sev-
eral parameters and breaks down when comparing closely related genomes.
Results: We present a novel, kurtosis-based measure of pairwise non-coding conservation that re-
quires no pre-set thresholds for conservation level and length of CNEs. We show that it performs ro-
bustly across a large span of evolutionary distances, including across the closely related genomes of
primates for which standard approaches fail. The method is straightforward to implement and enables
detection and comparison of clusters of CNEs and estimation of underlying TADs across a vastly
increased range of Metazoan genomes.
Contact: [email protected]
Supplementary information: Supplementary data are available at Bioinformatics online.
3.3 Kurtosis-based GRB Identification in Moderately to Dis- forms discrete peaks that are easily distinguished from the genomic
tantly Related Species background (Engström et al. 2007; Kikuta et al. 2007; Akalin et al.
In the past, GRB identification has succeeded for moderate to distant 2009; Harmston et al. 2017). To test how well the kurtosis-based meas-
evolutionary comparisons because the CNE density across the genome ure of conservation can discriminate highly conserved regions of the
genome from non-conserved regions, we used binned kurtosis values to clearly do not coincide with TADs, it is possible that at greater evolu-
identify GRBs from human to moderately and distantly related species tionary distances, the kurtosis-based conservation measure is only identi-
which have previously been used for CNE-based GRB prediction fying the core, highly conserved regions of each GRB, and thus underes-
(Harmston et al. 2017). The number and size of GRBs identified for timating their true extent – possibly because of some turnover of the
each comparison are presented in Table 1. The number of GRBs identi- boundary positions themselves. Based on the concordance between the
al. 2017). Since kurtosis-based GRB prediction is also sequence-based, CNE-based GRB prediction yielded 744 human to rhesus GRBs with a
it is not immune to this problem. For the human-gorilla GRBs a similar mean width of 482.9 kb and 2220 human to gorilla GRBs with a mean
issue is visible. The largest third of GRBs display no visible funnel in the width of 504,4 kb. The number of GRBs identified in human-rhesus is
DI heatmaps, however there is a noisy funnel visible in the rest of the greater than for the other species comparisons used so far, but not ex-
GRBs. Overall these results suggest that kurtosis-based conservation can ceedingly so. For the human-gorilla comparison, however, there were a
identify signatures of non-coding conservation in very closely related very large number of predicted GRBs. In Figure 3C, the average Hi-C DI
species, but that GRB boundary prediction becomes less precise in the is plotted across the predicted GRBs from both sets. We can clearly see
most closely related comparisons. that, for the human-rhesus comparison, the kurtosis-based GRBs have a
Next, we compared our kurtosis-based GRBs to GRBs identified us- stronger peak of the positive and negative DI (at their starts and ends
ing the CNE-based approach described in Harmston et al. 2017. The respectively) than the CNE-based GRBs. There is also a much sharper
boundary effect in the kurtosis-based GRBs, with the DI signal spreading Acknowledgements
well beyond the boundaries of the CNE-based GRBs. In the human- We thank Dr Ge Tan for generating a number of the CNE datasets used in this
gorilla comparison the kurtosis-based GRBs boundaries also coincide analysis, and Dr Nathan Harmston for processing the Hi-C data. We are also grate-
with peaks in the positive and negative DI, while the CNE-based GRBs
ful to Dr Leonie Roos, Dr Anja Baresic, Dr Sasha Murrell and Dr Ben Murrell for
show no enrichment of DI score at either boundary.
4 Discussion
