High throughput mRNA sample sequencing, known as RNA-seq, is as a powerful approach to detect dif... more High throughput mRNA sample sequencing, known as RNA-seq, is as a powerful approach to detect differentially expressed genes starting from millions of short sequence reads. Although several workflows have been proposed to analyze RNA-seq data, the experiment quality control as a whole is not usually considered, thus potentially biasing the results and/or causing information lost. Experiment quality control refers to the analysis of the experiment as a whole, prior to any analysis. It not only inspects the presence of technical effects, but also if general biological assumptions are fulfilled. In this sense, multivariate approaches are crucial for this task. Here, a multivariate approach for quality control in RNA-seq experiments is proposed. This approach uses simple and yet effective well-known statistical methodologies. In particular, Principal Component Analysis was successfully applied over real data to detect and remove outlier samples. In addition, traditional multivariate exploration tools were applied in order to asses several controls that can help to ensure the results quality. Based on differential expression and functional enrichment analysis, here is demonstrated that the information retrieval is significantly enhanced through experiment quality control. Results show that the proposed multivariate approach increases the information obtained from RNA-seq data after outlier samples removal.
High throughput mRNA sample sequencing, known as RNA-seq, is as a powerful approach to detect dif... more High throughput mRNA sample sequencing, known as RNA-seq, is as a powerful approach to detect differentially expressed genes starting from millions of short sequence reads. Although several workflows have been proposed to analyze RNA-seq data, the experiment quality control as a whole is not usually considered, thus potentially biasing the results and/or causing information lost. Experiment quality control refers to the analysis of the experiment as a whole, prior to any analysis. It not only inspects the presence of technical effects, but also if general biological assumptions are fulfilled. In this sense, multivariate approaches are crucial for this task. Here, a multivariate approach for quality control in RNA-seq experiments is proposed. This approach uses simple and yet effective well-known statistical methodologies. In particular, Principal Component Analysis was successfully applied over real data to detect and remove outlier samples. In addition, traditional multivariate exploration tools were applied in order to asses several controls that can help to ensure the results quality. Based on differential expression and functional enrichment analysis, here is demonstrated that the information retrieval is significantly enhanced through experiment quality control. Results show that the proposed multivariate approach increases the information obtained from RNA-seq data after outlier samples removal.
Uploads
Papers by Laura Prato