Papers by Giovanna Ambrosini
Computing in High Energy Physics '95, 1996
To configure the RD13 data acquisition system, we need many parameters which describe the various... more To configure the RD13 data acquisition system, we need many parameters which describe the various hardware and software components. Such information has been defined using an entityrelation model and stored in a commercial memory-resident database. During the last year, Itasca, an object oriented database management system (OODB), was chosen as a replacement database system. We have ported the existing databases (hw and sw configurations, run parameters etc.) to Itasca and integrated it with the run control system. We believe that it is possible to use an OODB in real-time environments such as DAQ systems. In this paper, we present our experience and impression: why we wanted to change from an entity-relational approach, some useful features of Itasca, the issues we meet during this project including integration of the database into an existing distributed environment and factors which influence performance.
BMC Genomics, Apr 7, 2011
Background: Multiplex experimental assays coupled to computational predictions are being increasi... more Background: Multiplex experimental assays coupled to computational predictions are being increasingly employed for the simultaneous analysis of many specimens at the genome scale, which quickly generates very large amounts of data. However, inferring valuable biological information from the comparisons of very large genomic datasets still represents an enormous challenge. Results: As a study model, we chose the NFI/CTF family of mammalian transcription factors and we compared the results obtained from a genome-wide study of its binding sites with chromatin structure assays, gene expression microarray data, and in silico binding site predictions. We found that NFI/CTF family members preferentially bind their DNA target sites when they are located around transcription start sites when compared to control datasets generated from the random subsampling of the complete set of NFI binding sites. NFI proteins preferably associate with the upstream regions of genes that are highly expressed and that are enriched in active chromatin modifications such as H3K4me3 and H3K36me3. We postulate that this is a causal association and that NFI proteins mainly act as activators of transcription. This was documented for one member of the family (NFI-C), which revealed as a more potent gene activator than repressor in global gene expression analysis. Interestingly, we also discovered the association of NFI with the tri-methylation of lysine 9 of histone H3, a chromatin marker previously associated with the protection against silencing of telomeric genes by NFI. Conclusion: Taken together, we illustrate approaches that can be taken to analyze large genomic data, and provide evidence that NFI family members may act in conjunction with specific chromatin modifications to activate gene expression.
Nature Methods, Jan 16, 2017
resolving the dnA-binding specificities of transcription factors (tFs) is of critical value for u... more resolving the dnA-binding specificities of transcription factors (tFs) is of critical value for understanding gene regulation. here, we present a novel, semiautomated protein-dnA interaction characterization technology, selective microfluidics-based ligand enrichment followed by sequencing (smile-seq). smile-seq is neither limited by dnA bait length nor biased toward strong affinity binders; it probes the dnA-binding properties of tFs over a wide affinity range in a fast and cost-effective fashion. We validated smile-seq by analyzing 58 full-length human, mouse, and Drosophila tFs from distinct structural classes. All tested tFs yielded dnA-binding models with predictive power comparable to or greater than that of other in vitro assays. De novo motif discovery on all Jun-Fos heterodimers and several nuclear receptor-tF complexes provided novel insights into partner-specific heterodimer dnA-binding preferences. We also successfully analyzed the dnA-binding properties of uncharacterized human c2h2 zinc-finger proteins and validated several using chiP-exo.
Nucleic Acids Research, Nov 28, 2016
SNP2TFBS is a computational resource intended to support researchers investigating the molecular ... more SNP2TFBS is a computational resource intended to support researchers investigating the molecular mechanisms underlying regulatory variation in the human genome. The database essentially consists of a collection of text files providing specific annotations for human single nucleotide polymorphisms (SNPs), namely whether they are predicted to abolish, create or change the affinity of one or several transcription factor (TF) binding sites. A SNP's effect on TF binding is estimated based on a position weight matrix (PWM) model for the binding specificity of the corresponding factor. These data files are regenerated at regular intervals by an automatic procedure that takes as input a reference genome, a comprehensive SNP catalogue and a collection of PWMs. SNP2TFBS is also accessible over a web interface, enabling users to view the information provided for an individual SNP, to extract SNPs based on various search criteria, to annotate uploaded sets of SNPs or to display statistics about the frequencies of binding sites affected by selected SNPs. Homepage: http://ccg.vital-it.ch/snp2tfbs/.
Protocol exchange, Jan 17, 2017
Selective Microfluidics-based Ligand Enrichment followed by sequencing (SMiLE-seq) is a rapid, se... more Selective Microfluidics-based Ligand Enrichment followed by sequencing (SMiLE-seq) is a rapid, semiautomated method aimed at resolving the DNA binding specificities of full-length transcription factors (TFs). The core of SMiLE-seq is a cross talk-devoid microfluidic platform that performs selection of DNA that is specifically bound to TFs from a pool of randomized DNA. Coupled to high-throughput sequencing, this platform allows the characterization of TF DNA binding preferences at an unprecedented resolution in just a single day. Unlike other, already established in vitro technologies that also aim to determine TF binding specificities, SMiLE-seq operates at micro scale and requires minute amounts of biological material. Moreover, it produces specificity models that characterize even low-affinity and transient molecular interactions and that have equal to superior predictive power than previously reported motifs. Finally, SMiLE-seq enables motif detection for monomers, homodimers, as well as heterodimers. SMiLE-seq should therefore prove highly valuable in deriving unbiased quantitative specificity models for single and dimeric, full-length TFs.
Nature Communications
More than 70% of human breast cancers (BCs) are estrogen receptor α-positive (ER+). A clinical ch... more More than 70% of human breast cancers (BCs) are estrogen receptor α-positive (ER+). A clinical challenge of ER+BC is that they can recur decades after initial treatments. Mechanisms governing latent disease remain elusive due to lack of adequate in vivo models. We compare intraductal xenografts of ER+and triple-negative (TN) BC cells and demonstrate that disseminated TNBC cells proliferate similarly as TNBC cells at the primary site whereas disseminated ER+BC cells proliferate slower, they decreaseCDH1and increaseZEB1,2expressions, and exhibit characteristics of epithelial-mesenchymal plasticity (EMP) and dormancy. Forced E-cadherin expression overcomes ER+BC dormancy. Cytokine signalings are enriched in more activeversusinactive disseminated tumour cells, suggesting microenvironmental triggers for awakening. We conclude that intraductal xenografts model ER + BC dormancy and reveal that EMP is essential for the generation of a dormant cell state and that targeting exit from EMP has ...
bioRxiv (Cold Spring Harbor Laboratory), May 1, 2022
Background: Pioneering research has shown that high-throughput epigenomics assays such as ChIP-se... more Background: Pioneering research has shown that high-throughput epigenomics assays such as ChIP-seq and ATAC-seq are applicable to patient-derived breast tumor samples. A host of public data has been accumulated since then, which are potentially of high value for basic research as well as personalized medicine. Such data sets constitute encyclopedias of biological knowledge. However, their impact has so far been limited by access obstacles, especially with regard to extraction and visualization of small portions of data that could potentially answer specific questions arising in a research context. Results: We developed the breast cancer epigenomics track hub (BC hub), a resource intended to make it easy for occasional users to find, access and view data of their interest. The BC hub harbors ChIP-seq, ATAC-seq and copy number data from breast tumors, normal breast cells, patient-derived xenografts and breast cancer cell lines in a genome browsable track format. The tracks can be accessed via hyperlinks that automatically configure customized views for different interest groups. Here, we present a detailed description of the resource and informative use cases illustrating its potential in answering specific biological questions. Conclusions: We show that track hubs constitute a powerful way of bringing epigenomics data to the user who could benefit from them. The examples presented highlight the added-value of joint visualization of breast cancer data from different sources. The proof-of-concept provided here exemplifies and underscores the importance of efforts to make biological data FAIR (findable, accessible, interoperable and reusable), and may serve as an encouragement of similar bottomup initiatives in other research fields. The BC hub is freely accessible at https://bchub.epfl.ch. .
Nucleic Acids Research, Nov 6, 2014
We present an update of EPDNew (http://epd.vital-it. ch), a recently introduced new part of the E... more We present an update of EPDNew (http://epd.vital-it. ch), a recently introduced new part of the Eukaryotic Promoter Database (EPD) which has been described in more detail in a previous NAR Database Issue. EPD is an old database of experimentally characterized eukaryotic POL II promoters, which are conceptually defined as transcription initiation sites or regions. EPDnew is a collection of automatically compiled, organism-specific promoter lists complementing the old corpus of manually compiled promoter entries of EPD. This new part is exclusively derived from next generation sequencing data from highthroughput promoter mapping experiments. We report on the recent growth of EPDnew, its extension to additional model organisms and its improved integration with other bioinformatics resources developed by our group, in particular the Signal Search Analysis and ChIP-Seq web servers.
Nucleic Acids Research, Nov 4, 2019
The Eukaryotic Promoter Database (EPD), available online at https://epd.epfl.ch, provides accurat... more The Eukaryotic Promoter Database (EPD), available online at https://epd.epfl.ch, provides accurate transcription start site (TSS) information for promoters of 15 model organisms plus corresponding functional genomics data that can be viewed in a genome browser, queried or analyzed via web interfaces, or exported in standard formats (FASTA, BED, CSV) for subsequent analysis with other tools. Recent work has focused on the improvement of the EPD promoter viewers, which use the UCSC Genome Browser as visualization platform. Thousands of high-resolution tracks for CAGE, ChIP-seq and similar data have been generated and organized into public track hubs. Customized, reproducible promoter views, combining EPD-supplied tracks with native UCSC Genome Browser tracks, can be accessed from the organism summary pages or from individual promoter entries. Moreover, thanks to recent improvements and stabilization of ncRNA gene catalogs, we were able to release promoter collections for certain classes of ncRNAs from human and mouse. Furthermore, we developed automatic computational protocols to assign orphan TSS peaks to downstream genes based on paired-end (RAMPAGE) TSS mapping data, which enabled us to add nearly 9000 new entries to the human promoter collection. Since our last article in this journal, EPD was extended to five more model organisms: rhesus monkey, rat, dog, chicken and Plasmodium falciparum.
Nucleic Acids Research, Nov 26, 2012
The Eukaryotic Promoter Database (EPD), available online at http://epd.vital-it.ch, is a collecti... more The Eukaryotic Promoter Database (EPD), available online at http://epd.vital-it.ch, is a collection of experimentally defined eukaryotic POL II promoters which has been maintained for more than 25 years. A promoter is represented by a single position in the genome, typically the major transcription start site (TSS). EPD primarily serves biologists interested in analysing the motif content, chromatin structure or DNA methylation status of co-regulated promoter subsets. Initially, promoter evidence came from TSS mapping experiments targeted at single genes and published in journal articles. Today, the TSS positions provided by EPD are inferred from next-generation sequencing data distributed in electronic form. Traditionally, EPD has been a high-quality database with low coverage. The focus of recent efforts has been to reach complete gene coverage for important model organisms. To this end, we introduced a new section called EPDnew, which is automatically assembled from multiple, carefully selected input datasets. As another novelty, we started to use chromatin signatures in addition to mRNA 5 0 tags to locate promoters of weekly expressed genes. Regarding user interfaces, we introduced a new promoter viewer which enables users to explore promoterdefining experimental evidence in a UCSC genome browser window.
bioRxiv (Cold Spring Harbor Laboratory), Jul 15, 2016
The recruitment of RNA-Pol-II to the transcription start site (TSS) is an important step in gene ... more The recruitment of RNA-Pol-II to the transcription start site (TSS) is an important step in gene regulation in all organisms. Core promoter elements (CPE) are conserved sequence motifs that guide Pol-II to the TSS by interacting with specific transcription factors (TFs). However, only a minority of animal promoters contains CPEs. It is still unknown how Pol-II selects the TSS in their absence. Here we present a comparative analysis of promoters' sequence composition and chromatin architecture in five eukaryotic model organisms, which shows the presence of common and unique DNA-encoded features used to organize chromatin. Analysis of Pol-II initiation patterns uncovers that, in the absence of certain CPEs, there is a strong correlation between the spread of initiation and the intensity of the 10 bp periodic signal in the nearest downstream nucleosome. Moreover, promoters' primary and secondary initiation sites show a characteristic 10 bp periodicity in the absence of CPEs. We also show that DNA natural variants in the region immediately downstream the TSS are able to affect both the nucleosome-DNA affinity and Pol-II initiation pattern. These findings support the notion that, in addition to CPEs mediated selection, sequence-induced nucleosome positioning could be a common and conserved mechanism of TSS selection in animals.
IWBBIO, 2014
The DNA sequence determinants which direct RNA Pol-II to the correct transcription start site (TS... more The DNA sequence determinants which direct RNA Pol-II to the correct transcription start site (TSS) are only partly understood. Conserved DNA motifs (core promoter elements) or a conserved nucleosome architecture may play a role in TSS selection. A complicating factor is that promoters are quite variable in many respects. Some have very focused while others have highly dispersed initiation site patters. Promoters also differ by the presence or absence of CPEs. Here we show that promoters without CPEs have a strong sequence-intrinsic nucleosomepositioning signal in the +1 nucleosome region, in both vertebrates and flies. The strength of the signal is inversely proportional to the degree of TSS dispersion. Interestingly, the nucleosome-positioning signal is completely absent in CPE containing promoters. Together, these findings suggest that transcription factor binding to CPEs and DNA sequenceinduced nucleosome positioning are two mutually exclusive pathways of Pol-II recruitment to TSSs in eukaryotic promoters.
Nucleic Acids Research, Nov 28, 2016
We present an update of the Eukaryotic Promoter Database EPD (http://epd.vital-it.ch), more speci... more We present an update of the Eukaryotic Promoter Database EPD (http://epd.vital-it.ch), more specifically on the EPDnew division, which contains comprehensive organisms-specific transcription start site (TSS) collections automatically derived from next generation sequencing (NGS) data. Thanks to the abundant release of new high-throughput transcript mapping data (CAGE, TSS-seq, GRO-cap) the database could be extended to plant and fungal species. We further report on the expansion of the mass genome annotation (MGA) repository containing promoter-relevant chromatin profiling data and on improvements for the EPD entry viewers. Finally, we present a new data access tool, ChIP-Extract, which enables computational biologists to extract diverse types of promoter-associated data in numerical table formats that are readily imported into statistical analysis platforms such as R.
Genomics and computational biology, Sep 18, 2015
Nucleic Acids Research, Oct 24, 2017
The Mass Genome Annotation (MGA) repository is a resource designed to store published next genera... more The Mass Genome Annotation (MGA) repository is a resource designed to store published next generation sequencing data and other genome annotation data (such as gene start sites, SNPs, etc.) in a completely standardised format. Each sample has undergone local processing in order the meet the strict MGA format requirements. The original data source, the reformatting procedure and the biological characteristics of the samples are described in an accompanying documentation file manually edited by data curators. 10 model organisms are currently represented: Homo sapiens, Mus musculus, Danio rerio, Drosophila melanogaster, Apis mellifera, Caenorhabditis elegans, Arabidopsis thaliana, Zea mays, Saccharomyces cerevisiae and Schizosaccharomyces pombe. As of today, the resource contains over 24 000 samples. In conjunction with other tools developed by our group (the ChIP-Seq and SSA servers), it allows users to carry out a great variety of analysis task with MGA samples, such as making aggregation plots and heat maps for selected genomic regions, finding peak regions, generating custom tracks for visualizing genomic features in a UCSC genome browser window, or downloading chromatin data in a table format suitable for local processing with more advanced statistical analysis software such as R. Home page: http://ccg.vital-it.ch/mga/.
Data for the paper "Insights gained from a comprehensive all-against-all transcription facto... more Data for the paper "Insights gained from a comprehensive all-against-all transcription factor binding motif benchmarking study".
Additional file 6. Annotation of TFs and motifs.
Additional file 4. Interactive t-SNE plots for HT-SELEX (cut-off 50%).
Uploads
Papers by Giovanna Ambrosini