Uyên Mai

Followers

Following

Public Views

Interests

Uploads

Papers by Uyên Mai

Nghiên Cứu Định Lượng Hỗn Hợp Hai Thành Phần Chứa Paracetamol Và Ibuprofen Ở Dạng Dược Phẩm Bào Chế Rắn Bằng Phương Pháp Quang Phổ Tử Ngoại Tỷ Đối

Tạp chí Y học Việt Nam, Jun 15, 2023

Download

Ảnh Hưởng Của Nhiệt Độ, Nồng Độ Và Bước Sóng Ánh Sáng Đến Góc Quay Cực Riêng Của Dung Dịch Chất Hoạt Quang

Tạp chí Y học Việt Nam, Jun 1, 2023

Download

Expectation-Maximization enables Phylogenetic Dating under a Categorical Rate Model

bioRxiv (Cold Spring Harbor Laboratory), Oct 8, 2022

Dating phylogenetic trees to obtain branch lengths in the unit of time is essential for many down... more Dating phylogenetic trees to obtain branch lengths in the unit of time is essential for many downstream applications but has remained challenging. Dating requires inferring mutation rates that can change across the tree. While we can assume to have information about a small subset of nodes from the fossil record or sampling times (for fast-evolving organisms), inferring the ages of the other nodes essentially requires extrapolation and interpolation. Assuming a clock model that defines a distribution over rates, we can formulate dating as a constrained maximum likelihood (ML) estimation problem. While ML dating methods exist, their accuracy degrades in the face of model misspecification where the assumed parametric statistical clock model vastly differs from the true distribution. Notably, existing methods tend to assume rigid, often unimodal rate distributions. A second challenge is that the likelihood function involves an integral over the continuous domain of the rates and often leads to difficult non-convex optimization problems. To tackle these two challenges, we propose a new method called Molecular Dating using Categorical-models (MD-Cat). MD-Cat uses a categorical model of rates inspired by nonparametric statistics and can approximate a large family of models by discretizing the rate distribution into k categories. Under this model, we can use the Expectation-Maximization (EM) algorithm to coestimate rate categories and branch lengths in the time unit. Our model has fewer assumptions about the true clock model than parametric models such as Gamma or LogNormal distribution. Our results on two simulated and real datasets of Angiosperms and HIV and a wide selection of rate distributions show that MD-Cat is often more accurate than the alternatives, especially on datasets with nonmodal or multimodal clock models.

Download

Log Transformation Improves Dating of Phylogenies

Molecular Biology and Evolution, Sep 4, 2020

Phylogenetic trees inferred from sequence data often have branch lengths measured in the expected... more Phylogenetic trees inferred from sequence data often have branch lengths measured in the expected number of substitutions and therefore, do not have divergence times estimated. These trees give an incomplete view of evolutionary histories since many applications of phylogenies require time trees. Many methods have been developed to convert the inferred branch lengths from substitution unit to time unit using calibration points, but none is universally accepted as they are challenged in both scalability and accuracy under complex models. Here, we introduce a new method that formulates dating as a nonconvex optimization problem where the variance of log-transformed rate multipliers is minimized across the tree. On simulated and real data, we show that our method, wLogDate, is often more accurate than alternatives and is more robust to various model assumptions.

Download

Acquiring professional skills: Virtual facilitator as model for team communication

... Model for Team Communication Tracey Wiggins, Daniel Swift, Uyen Mai, and Ray Luechtefeld Univ... more

TreeShrink: Efficient Detection of Outlier Tree Leaves

Comparative Genomics

Download

balabanmetin/Tree-Based-Clustering-Paper: v1.2

This repository containts data and script used in the paper 'TreeCluster: clustering biologic... more

Additional file 1 of TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees

Supplementary material. Appendix A â Theorem proofs. Appendix B â Supplementary figures and table... more

biocore/wol: Initial release for journal publication

Reference Phylogeny for Bacterial and Archaeal Genomes

Completing gene trees without species trees in sub-quadratic time

Bioinformatics, 2021

Motivation As genome-wide reconstruction of phylogenetic trees becomes more widespread, limitatio... more Motivation As genome-wide reconstruction of phylogenetic trees becomes more widespread, limitations of available data are being appreciated more than ever before. One issue is that phylogenomic datasets are riddled with missing data, and gene trees, in particular, almost always lack representatives from some species otherwise available in the dataset. Since many downstream applications of gene trees require or can benefit from access to complete gene trees, it will be beneficial to algorithmically complete gene trees. Also, gene trees are often unrooted, and rooting them is useful for downstream applications. While completing and rooting a gene tree with respect to a given species tree has been studied, those problems are not studied in depth when we lack such a reference species tree. Results We study completion of gene trees without a need for a reference species tree. We formulate an optimization problem to complete the gene trees while minimizing their quartet distance to the gi...

Download

Gene exchange networks define species-like units in marine prokaryotes

SUMMARYAlthough horizontal gene transfer is recognized as a major evolutionary process in Bacteri... more SUMMARYAlthough horizontal gene transfer is recognized as a major evolutionary process in Bacteria and Archaea, its general patterns remain elusive, due to difficulties tracking genes at relevant resolution and scale within complex microbiomes. To circumvent these challenges, we analyzed a randomized sample of >12,000 genomes of individual cells of Bacteria and Archaea in the tropical and subtropical ocean - a well-mixed, global environment. We found that marine microorganisms form gene exchange networks (GENs) within which transfers of both flexible and core genes are frequent, including the rRNA operon that is commonly used as a conservative taxonomic marker. The data revealed efficient gene exchange among genomes with <28% nucleotide difference, indicating that GENs are much broader lineages than the nominal microbial species, which are currently delineated at 4-6% nucleotide difference. The 42 largest GENs accounted for 90% of cells in the tropical ocean microbiome. Freque...

Download

Log Transformation Improves Dating of Phylogenies

Phylogenetic trees inferred from sequence data often have branch lengths measured in the expected... more Phylogenetic trees inferred from sequence data often have branch lengths measured in the expected number of substitutions and therefore, do not have divergence times estimated. These trees give an incomplete view of evolutionary histories since many applications of phylogenies require time trees. Many methods have been developed to convert the inferred branch lengths from substitution unit to time unit using calibration points, but none is universally accepted as they are challenged in both scalability and accuracy under complex models. Here, we introduce a new method that formulates dating as a non-convex optimization problem where the variance of log-transformed rate multipliers are minimized across the tree. On simulated and real data, we show that our method, wLogDate, is often more accurate than alternatives and is more robust to various model assumptions.

Download

Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea

Nature Communications, 2019

Rapid growth of genome data provides opportunities for updating microbial evolutionary relationsh... more Rapid growth of genome data provides opportunities for updating microbial evolutionary relationships, but this is challenged by the discordant evolution of individual genes. Here we build a reference phylogeny of 10,575 evenly-sampled bacterial and archaeal genomes, based on a comprehensive set of 381 markers, using multiple strategies. Our trees indicate remarkably closer evolutionary proximity between Archaea and Bacteria than previous estimates that were limited to fewer “core” genes, such as the ribosomal proteins. The robustness of the results was tested with respect to several variables, including taxon and site sampling, amino acid substitution heterogeneity and saturation, non-vertical evolution, and the impact of exclusion of candidate phyla radiation (CPR) taxa. Our results provide an updated view of domain-level relationships.

Download

TreeCluster: clustering biological sequences using phylogenetic trees

Clustering homologous sequences based on their similarity is a problem that appears in many bioin... more Clustering homologous sequences based on their similarity is a problem that appears in many bioinformatics applications. The fact that sequences cluster is ultimately the result of their phylogenetic relationships. Despite this observation and the natural ways in which a tree can define clusters, most applications of sequence clustering do not use a phylogenetic tree and instead operate on pairwise sequence distances. Due to advances in large-scale phylogenetic inference, we argue that tree-based clustering is under-utilized. We define a family of optimization problems that, given a (not necessarily ultrametric) tree, return the minimum number of clusters such that all clusters adhere to constraints on their heterogeneity. We study three specific constraints that limit the diameter of each cluster, the sum of its branch lengths, or chains of pairwise distances. These three versions of the problem can be solved in time that increases linearly with the size of the tree, a fact that ha...

Download

TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees

BMC genomics, Jan 8, 2018

Sequence data used in reconstructing phylogenetic trees may include various sources of error. Typ... more Sequence data used in reconstructing phylogenetic trees may include various sources of error. Typically errors are detected at the sequence level, but when missed, the erroneous sequences often appear as unexpectedly long branches in the inferred phylogeny. We propose an automatic method to detect such errors. We build a phylogeny including all the data then detect sequences that artificially inflate the tree diameter. We formulate an optimization problem, called the k-shrink problem, that seeks to find k leaves that could be removed to maximally reduce the tree diameter. We present an algorithm to find the exact solution for this problem in polynomial time. We then use several statistical tests to find outlier species that have an unexpectedly high impact on the tree diameter. These tests can use a single tree or a set of related gene trees and can also adjust to species-specific patterns of branch length. The resulting method is called TreeShrink. We test our method on six phyloge...

Download

Minimum variance rooting of phylogenetic trees and implications for species tree reconstruction

PLOS ONE, 2017

In this section we prove the following propositions, which were used in the main text to support ... more In this section we prove the following propositions, which were used in the main text to support the theory of the MinVar rooting method. Please refer to the main paper for more details. Proposition 1. A point p on tree T is a local MV if and only if it is a balance point. Based on Proposition 1, we refer to local MV and balance point interchangeably. Proposition 2. Any tree has at least one local MV. Proposition 3. The global MV of any tree is one of its local MVs.

Download

Detecting Rule of Balance in Photography

Download

A grounded theory approach to effects of virtual facilitation on team communication and the development of professional skills

2011 Frontiers in Education Conference (FIE), 2011

... edu. Tracey Wiggins, Doctoral Candidate, University of La Verne, [email protected]. ... more

Acquiring professional skills: Virtual facilitator as model for team communication

2011 Frontiers in Education Conference (FIE), 2011

... Model for Team Communication Tracey Wiggins, Daniel Swift, Uyen Mai, and Ray Luechtefeld Univ... more

Nghiên Cứu Định Lượng Hỗn Hợp Hai Thành Phần Chứa Paracetamol Và Ibuprofen Ở Dạng Dược Phẩm Bào Chế Rắn Bằng Phương Pháp Quang Phổ Tử Ngoại Tỷ Đối

Tạp chí Y học Việt Nam, Jun 15, 2023

Download

Ảnh Hưởng Của Nhiệt Độ, Nồng Độ Và Bước Sóng Ánh Sáng Đến Góc Quay Cực Riêng Của Dung Dịch Chất Hoạt Quang

Tạp chí Y học Việt Nam, Jun 1, 2023

Download

Expectation-Maximization enables Phylogenetic Dating under a Categorical Rate Model

bioRxiv (Cold Spring Harbor Laboratory), Oct 8, 2022

Download

Log Transformation Improves Dating of Phylogenies

Molecular Biology and Evolution, Sep 4, 2020

Download

Acquiring professional skills: Virtual facilitator as model for team communication

... Model for Team Communication Tracey Wiggins, Daniel Swift, Uyen Mai, and Ray Luechtefeld Univ... more

TreeShrink: Efficient Detection of Outlier Tree Leaves

Comparative Genomics

Download

balabanmetin/Tree-Based-Clustering-Paper: v1.2

This repository containts data and script used in the paper 'TreeCluster: clustering biologic... more

Additional file 1 of TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees

Supplementary material. Appendix A â Theorem proofs. Appendix B â Supplementary figures and table... more

biocore/wol: Initial release for journal publication

Reference Phylogeny for Bacterial and Archaeal Genomes

Completing gene trees without species trees in sub-quadratic time

Bioinformatics, 2021

Download

Gene exchange networks define species-like units in marine prokaryotes

Download

Log Transformation Improves Dating of Phylogenies

Phylogenetic trees inferred from sequence data often have branch lengths measured in the expected... more Phylogenetic trees inferred from sequence data often have branch lengths measured in the expected number of substitutions and therefore, do not have divergence times estimated. These trees give an incomplete view of evolutionary histories since many applications of phylogenies require time trees. Many methods have been developed to convert the inferred branch lengths from substitution unit to time unit using calibration points, but none is universally accepted as they are challenged in both scalability and accuracy under complex models. Here, we introduce a new method that formulates dating as a non-convex optimization problem where the variance of log-transformed rate multipliers are minimized across the tree. On simulated and real data, we show that our method, wLogDate, is often more accurate than alternatives and is more robust to various model assumptions.

Download

Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea

Nature Communications, 2019

Download

TreeCluster: clustering biological sequences using phylogenetic trees

Download

TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees

BMC genomics, Jan 8, 2018

Download

Minimum variance rooting of phylogenetic trees and implications for species tree reconstruction

PLOS ONE, 2017

Download