Papers by Ozlem O Garibay
Briefings in Bioinformatics, Jul 12, 2022
In this study, we introduce an interpretable graph-based deep learning prediction model, Attentio... more In this study, we introduce an interpretable graph-based deep learning prediction model, AttentionSiteDTI, which utilizes protein binding sites along with a self-attention mechanism to address the problem of drug–target interaction prediction. Our proposed model is inspired by sentence classification models in the field of Natural Language Processing, where the drug–target complex is treated as a sentence with relational meaning between its biochemical entities a.k.a. protein pockets and drug molecule. AttentionSiteDTI enables interpretability by identifying the protein binding sites that contribute the most toward the drug–target interaction. Results on three benchmark datasets show improved performance compared with the current state-of-the-art models. More significantly, unlike previous studies, our model shows superior performance, when tested on new proteins (i.e. high generalizability). Through multidisciplinary collaboration, we further experimentally evaluate the practical potential of our proposed approach. To achieve this, we first computationally predict the binding interactions between some candidate compounds and a target protein, then experimentally validate the binding interactions for these pairs in the laboratory. The high agreement between the computationally predicted and experimentally observed (measured) drug–target interactions illustrates the potential of our method as an effective pre-screening tool in drug repurposing applications.
arXiv (Cornell University), Sep 1, 2021
With the increasing reliance on automated decision making, the issue of algorithmic fairness has ... more With the increasing reliance on automated decision making, the issue of algorithmic fairness has gained increasing importance. In this paper, we propose a Generative Adversarial Network for tabular data generation. The model includes two phases of training. In the first phase, the model is trained to accurately generate synthetic data similar to the reference dataset. In the second phase we modify the value function to add fairness constraint, and continue training the network to generate data that is both accurate and fair. We test our results in both cases of unconstrained, and constrained fair data generation. In the unconstrained case, i.e. when the model is only trained in the first phase and is only meant to generate accurate data following the same joint probability distribution of the real data, the results show that the model beats state-of-the-art GANs proposed in the literature to produce synthetic tabular data. Also, in the constrained case in which the first phase of training is followed by the second phase, we train the network and test it on four datasets studied in the fairness literature and compare our results with another state-of-the-art pre-processing method, and present the promising results that it achieves. Comparing to other studies utilizing GANs for fair data generation, our model is comparably more stable by using only one critic, and also by avoiding major problems of original GAN model, such as mode-dropping and non-convergence, by implementing a Wasserstein GAN.
Lecture Notes in Computer Science, 2023
arXiv (Cornell University), Sep 18, 2022
With the recent growth in computer vision applications, the question of how fair and unbiased the... more With the recent growth in computer vision applications, the question of how fair and unbiased they are has yet to be explored. There is abundant evidence that the bias present in training data is reflected in the models, or even amplified. Many previous methods for image dataset de-biasing, including models based on augmenting datasets, are computationally expensive to implement. In this study, we present a fast and effective model to de-bias an image dataset through reconstruction and minimizing the statistical dependence between intended variables. Our architecture includes a U-net to reconstruct images, combined with a pre-trained classifier which penalizes the statistical dependence between target attribute and the protected attribute. We evaluate our proposed model on CelebA dataset, compare the results with a state-of-the-art de-biasing method, and show that the model achieves a promising fairness-accuracy combination.
The widespread use of artificial intelligence algorithms and their role in decision-making with c... more The widespread use of artificial intelligence algorithms and their role in decision-making with consequential decisions for human subjects has resulted in a growing interest in designing AI algorithms accounting for fairness considerations. There have been attempts to account for fairness of AI algorithms without compromising their accuracy to improve poorly designed algorithms that disregard sensitive attributes (e.g., age, race, and gender) at the peril of introducing or increasing bias against specific groups. Although many studies have examined the optimal trade-off between fairness and accuracy, it remains a challenge to understand the sources of unfairness in decision-making and mitigate it effectively. To tackle this problem, researchers have proposed fair causal learning approaches which assist us in modeling cause and effect knowledge structures, discovering bias sources, and refining AI algorithms to make them more transparent and explainable. In this study, we formalize probabilistic interpretations of both contrastive and counterfactual causality as essential features in order to encourage users' trust and to expand the applicability of such automated systems. We use this formalism to define a novel fairness criterion that we call contrastive counterfactual fairness. This paper introduces, to the best of our knowledge, the first probabilistic fairness-aware data augmentation approach that is based on contrastive counterfactual causality. We tested our approach on two well-known fairness-related datasets, UCI Adult and German Credit, and concluded that our proposed method has a promising ability to capture and mitigate unfairness in AI deployment. This model-agnostic approach can be used with any AI model because it is applied in pre-processing. CCS CONCEPTS • Computer systems organization → Computing methodologies; • Machine learning → Machine learning approaches; • Machine learning approaches → Classification and regression trees.
Lecture Notes in Computer Science, 2023
Lecture Notes in Computer Science, 2023
With the recent growth in computer vision applications, the question of how fair and unbiased the... more With the recent growth in computer vision applications, the question of how fair and unbiased they are has yet to be explored. There is abundant evidence that the bias present in training data is reflected in the models, or even amplified. Many previous methods for image dataset de-biasing, including models based on augmenting datasets, are computationally expensive to implement. In this study, we present a fast and effective model to de-bias an image dataset through reconstruction and minimizing the statistical dependence between intended variables. Our architecture includes a U-net to reconstruct images, combined with a pre-trained classifier which penalizes the statistical dependence between target attribute and the protected attribute. We evaluate our proposed model on CelebA dataset, compare the results with a state-of-the-art de-biasing method, and show that the model achieves a promising fairness-accuracy combination.
bioRxiv (Cold Spring Harbor Laboratory), Dec 9, 2021
In this study, we introduce and implement an interpretable graph-based deep learning prediction m... more In this study, we introduce and implement an interpretable graph-based deep learning prediction model, which utilizes protein binding sites along with self-attention to learn which protein binding sites interact with a given ligand. Our proposed model enables interpretability by identifying the protein binding sites that contribute the most towards the Drug-Target Interaction. Results on three benchmark datasets show improved performance compared to previous graph-based models. More significantly, unlike previous studies our model performance remains close to the optimal performance when tested with new proteins (ie., high generalizablity). Through multidisciplinary collaboration, we further experimentally evaluate the practical potential of our proposed approach. To achieve this, we first computationally predict binding interaction of some candidate compounds with a target protein, then experimentally validate the binding interactions for these pairs in the laboratory. The high agreement between the computationally-predicted and experimentally-observed (measured) DTIs illustrates the potential of our method as an effective pre-screening tool in drug re-purposing applications.
Lecture Notes in Computer Science, 2021
Journal of Biomolecular Structure and Dynamics
arXiv (Cornell University), Mar 14, 2022
Bias in training datasets must be managed for various groups in classification tasks to ensure pa... more Bias in training datasets must be managed for various groups in classification tasks to ensure parity or equal treatment. With the recent growth in artificial intelligence models and their expanding role in automated decision-making, it is vital to ensure that these models are not biased. There is an abundance of evidence suggesting that these models could contain or even amplify the bias present in the data on which they are trained, inherent to their objective function and learning algorithms; Existing methods for mitigating bias result in information loss and do not provide a suitable balance between accuracy and fairness or do not ensure limiting the biases in training. To this end, we propose a powerful strategy for training deep learning models called the Distraction module, which can be effective in controlling bias from affecting the classification results. This method can be utilized with different data types (e.g., tabular, images, graphs). We demonstrate the potency of the proposed method by testing it on UCI Adult and Heritage Health datasets (tabular), POKEC-Z, POKEC-N and NBA datasets (graph), and CelebA dataset (vision). Considering state-of-the-art methods proposed in the fairness literature for each dataset, we exhibit that our model is superior to these proposed methods in minimizing bias and maintaining accuracy.
While research into Drug-Target Interaction (DTI) prediction is fairly mature, generalizability a... more While research into Drug-Target Interaction (DTI) prediction is fairly mature, generalizability and interpretability are not always addressed in the existing works in this field. In this paper, we propose a deep learning-based framework, called BindingSite-AugmentedDTA, which improves Drug-Target Affinity (DTA) predictions by reducing the search space of potential binding sites of the protein, thus making the binding affinity prediction more efficient and accurate. Our BindingSite-AugmentedDTA is highly generalizable as it can be integrated with any DL-based regression model, while it significantly improves their prediction performance. Also, unlike many existing models, our model is highly interpretable due to its architecture and self-attention mechanism, which can provide a deeper understanding of its underlying prediction mechanism by mapping attention weights back to protein binding sites. The computational results confirm that our framework can enhance the prediction performan...
Machine Learning and Knowledge Extraction
With the increasing reliance on automated decision making, the issue of algorithmic fairness has ... more With the increasing reliance on automated decision making, the issue of algorithmic fairness has gained increasing importance. In this paper, we propose a Generative Adversarial Network for tabular data generation. The model includes two phases of training. In the first phase, the model is trained to accurately generate synthetic data similar to the reference dataset. In the second phase we modify the value function to add fairness constraint, and continue training the network to generate data that is both accurate and fair. We test our results in both cases of unconstrained, and constrained fair data generation. We show that using a fairly simple architecture and applying quantile transformation of numerical attributes the model achieves promising performance. In the unconstrained case, i.e., when the model is only trained in the first phase and is only meant to generate accurate data following the same joint probability distribution of the real data, the results show that the mode...
Molecules
Drug-target interaction (DTI) prediction through in vitro methods is expensive and time-consuming... more Drug-target interaction (DTI) prediction through in vitro methods is expensive and time-consuming. On the other hand, computational methods can save time and money while enhancing drug discovery efficiency. Most of the computational methods frame DTI prediction as a binary classification task. One important challenge is that the number of negative interactions in all DTI-related datasets is far greater than the number of positive interactions, leading to the class imbalance problem. As a result, a classifier is trained biased towards the majority class (negative class), whereas the minority class (interacting pairs) is of interest. This class imbalance problem is not widely taken into account in DTI prediction studies, and the few previous studies considering balancing in DTI do not focus on the imbalance issue itself. Additionally, they do not benefit from deep learning models and experimental validation. In this study, we propose a computational framework along with experimental v...
Journal of Applied Research in Higher Education, 2021
PurposePostsecondary institutions use metrics such as student retention and college completion ra... more PurposePostsecondary institutions use metrics such as student retention and college completion rates to measure student success. Multiple factors affect the success of first time in college (FTIC) and transfer students. Transfer student success rates are significantly low, with most transfer students nationwide failing to complete their degrees in four-year institutions. The purpose of this study is to better understand the degree progression patterns of both student types in two undergraduate science, technology, engineering and mathematics (STEM) programs: computer science (CS) and information technology (IT). Recommendations concerning academic advising are discussed to improve transfer student success.Design/methodology/approachThis study describes how transfer student success can be improved by thoroughly analyzing their degree progression patterns. This study uses institutional data from a public university in the United States. Specifically, this study utilizes the data of FT...
Perfiles de Ingeniería, 2020
Las instituciones de educación superior tradicionalmente miden el éxito de los estudiantes y la c... more Las instituciones de educación superior tradicionalmente miden el éxito de los estudiantes y la calidad de un programa académico mediante métricas estándar, como el tiempo que tarda un estudiante en obtener un título, las tasas de graduación y las tasas de retención. Además, algunos programas han instituido un "examen de calificación" como una forma alternativa de medir la calidad de un programa académico y evaluar el dominio del estudiante en los conceptos básicos. El programa de Ciencias de la Computación (CS) de la Universidad de Florida Central implementó un examen de calificación "Examen básico" en 1998 con el propósito de evaluar el dominio de los estudiantes de los conceptos básicos de CS y controlar la calidad del programa. Sin embargo, las tasas de aprobación de los estudiantes que tomaron este examen de calificación fueron significativamente bajas a lo largo de los años. Además, algunos estudiantes retrasaron sistemáticamente la realización de este exam...
Journal of Applied Research in Higher Education, 2020
PurposeSome degree programs in colleges and universities utilize entrance exams to ensure that st... more PurposeSome degree programs in colleges and universities utilize entrance exams to ensure that students pursuing a given degree have mastered foundational concepts needed for that program. However, often these exams become a barrier to student success. The purpose of this study is to discuss the impact of policies governing an undergraduate Computer Science (CS) entry/qualifying exam at a large public university in the United States on overall student success in the program. This case study focuses on whether reforming program policies impacts students' time-to-degree, graduation and mastery in CS core skills.Design/methodology/approachThis case study describes how the CS student success was improved by updating program policies based on institutional data and the input of course instructors. The policy changes include introducing a maximum limit to attempt the exam, changing the exam requirements as well as the structure of the exam itself.FindingsThe pass rates of students tak...
Studies in Higher Education, 2019
In an increasingly innovation-driven economic environment, universities serve as engines of econo... more In an increasingly innovation-driven economic environment, universities serve as engines of economic growth by igniting innovation, fueling entrepreneurship, and inspiring the next generation of scientists and professionals. While universities are committed to enhancing their economic impact, university 'economic engagement' is in many ways an emerging field. This research investigates key strategies used by US research universities to drive economic engagement by analysing 55 successful applications for the Innovation and Economic Prosperity (IEP) University designation, which consist of extensive self-study exercises, using a grounded theory approach. Six key strategies emerge from this corpus: forming mutually beneficial partnerships with industry, developing collaboration networks with relevant communities, building an innovation culture, supporting researchers in bringing research outcomes to market, promoting the transfer of new technologies to industry, and encouraging entrepreneurial activities. These results can serve as a guide for universities seeking the best-practices to advance their economic engagement.
International Journal of Human–Computer Interaction
Widespread adoption of artificial intelligence (AI) technologies is substantially affecting the h... more Widespread adoption of artificial intelligence (AI) technologies is substantially affecting the human condition in ways that are not yet well understood. Negative unintended consequences abound including the perpetuation and exacerbation of societal inequalities and divisions via algorithmic decision making. We present six grand challenges for the scientific community to create AI technologies that are humancentered, that is, ethical, fair, and enhance the human condition. These grand challenges are the result of an international collaboration across academia, industry and government and represent the consensus views of a group of 26 experts in the field of human-centered artificial intelligence (HCAI). In essence, these challenges advocate for a human-centered approach to AI that (1) is centered in human wellbeing, (2) is designed responsibly, (3) respects privacy, (4) follows human-centered design principles, (5) is subject to appropriate governance and oversight, and (6) interacts with individuals while respecting human's cognitive capacities. We hope that these challenges and their associated research directions serve as a call for action to conduct research and development in AI that serves as a force multiplier towards more fair, equitable and sustainable societies.
Uploads
Papers by Ozlem O Garibay