Skip to main content

Michel Desmarais

École Polytechnique de Montréal, Département de génie informatique et génie logiciel, Faculty Member

Followers

27

Following

2

Public Views

Address: Canada

less

Interests

Uploads

Papers by Michel Desmarais

A Dataset of Learnersourced Explanations from an Online Peer Instruction Environment

Educational Data Mining, Jul 1, 2020

Online Peer Instruction has become prevalent in many "flipped classroom" settings, yet little wor... more Online Peer Instruction has become prevalent in many "flipped classroom" settings, yet little work has been done to examine the content students generate in such a learning environment. This study characterizes a dataset generated by an open-source, web-based homework system that prompts students to first answer questions, and then provide explanations of their reasoning. Of particular interest in this dataset, is that students are also prompted to evaluate a subset of peer explanations based on how convincing they are, as part of the Peer Instruction learning script. Since these student "votes" are then used in the selection of what is shown to future learners, we cast this as an instance of learnersourcing, a paradigm that presents new research opportunities for the Learning Analytics community. This study characterizes a dataset from one Peer Instruction tool, that includes not only the student generated answers and explanations, but this novel "vote" attribute, which aims to capture how convincing each explanation is to other learners. The dataset includes longitudinal observations of student responses over the course of a semester, following groups from three STEM disciplines. The data is made available to interested researchers 1 .

Learnersourcing Quality Assessment of Explanations for Peer Instruction

Lecture Notes in Computer Science, 2020

This paper presents the results of a study, carried out as part of the design-based development o... more This paper presents the results of a study, carried out as part of the design-based development of an online self-assessment for prospective students in higher online education. The self-assessment consists of a set of tests – predictive of completion – and is meant to improve informed decision making prior to enrolment. The rationale being that better decision making will help to address the ongoing concern of non-completion in higher online education. A prototypical design of the self-assessment was created based on an extensive literature review and correlational research, aimed at investigating validity evidence concerning the predictive value of the tests. The present study focused on investigating validity evidence regarding the content of the self-assessment (including the feedback it provides) from a user perspective. Results from a survey among prospective students (N = 66) indicated that predictive validity and content validity of the self-assessment are somewhat at odds: three out of the five tests included in the current prototype were considered relevant by prospective students. Moreover, students rated eleven additionally suggested tests – currently not included – as relevant concerning their study decision. Expectations regarding the feedback to be provided in connection with the tests include an explanation of the measurement and advice for further preparation. A comparison of the obtained scores to a reference group (i.e., other test-takers or successful students) is not expected. Implications for further development and evaluation of the self-assessment are discussed.

GitHub Copilot AI pair programmer: Asset or Liability?

arXiv (Cornell University), Jun 30, 2022

Automatic program synthesis is a long-lasting dream in software engineering. Recently, a promisin... more Automatic program synthesis is a long-lasting dream in software engineering. Recently, a promising Deep Learning (DL) based solution, called Copilot, has been proposed by OpenAI and Microsoft as an industrial product. Although some studies evaluate the correctness of Copilot solutions and report its issues, more empirical evaluations are necessary to understand how developers can benefit from it effectively. In this paper, we study the capabilities of Copilot in two different programming tasks: (i) generating (and reproducing) correct and efficient solutions for fundamental algorithmic problems, and (ii) comparing Copilot's proposed solutions with those of human programmers on a set of programming tasks. For the former, we assess the performance and functionality of Copilot in solving selected fundamental problems in computer science, like sorting and implementing data structures. In the latter, a dataset of programming problems with human-provided solutions is used. The results show that Copilot is capable of providing solutions for almost all fundamental algorithmic problems, however, some solutions are buggy and non-reproducible. Moreover, Copilot has some difficulties in combining multiple methods to generate a solution. Comparing Copilot to humans, our results show that the correct ratio of humans' solutions is greater than Copilot's suggestions, while the buggy solutions generated by Copilot require less effort to be repaired. Based on our findings, if Copilot is used by expert developers in software projects, it can become an asset since its suggestions could be comparable to humans' contributions in terms of quality. However, Copilot can become a liability if it is used by novice developers who may fail to filter its buggy or non-optimal solutions due to a lack of expertise.

Dev2vec: Representing Domain Expertise of Developers in an Embedding Space

arXiv (Cornell University), Jul 11, 2022

Accurate assessment of the domain expertise of developers is important for assigning the proper c... more Accurate assessment of the domain expertise of developers is important for assigning the proper candidate to contribute to a project, or to attend a job role. Since the potential candidate can come from a large pool, the automated assessment of this domain expertise is a desirable goal. While previous methods have had some success within a single software project, the assessment of a developer's domain expertise from contributions across multiple projects is more challenging. In this paper, we employ doc2vec to represent the domain expertise of developers as embedding vectors. These vectors are derived from different sources that contain evidence of developers' expertise, such as the description of repositories that they contributed, their issue resolving history, and API calls in their commits. We name it dev2vec and demonstrate its effectiveness in representing the technical specialization of developers. Our results indicate that encoding the expertise of developers in an embedding vector outperforms state-of-the-art methods and improves the F1score up to 21%. Moreover, our findings suggest that "issue resolving history" of developers is the most informative source of information to represent the domain expertise of developers in embedding spaces.

UMAP '17: Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, Bratislava, Slovakia — July 09 - 12, 2017

Item does not contain fulltext402 p

Editorial Acknowledgements and Introduction to the Special Issue on EDM Journal Track

Educational Data Mining, Nov 5, 2016

The EDM Conference was held in Raleigh this year, from June 29 to July 2, and for the second time... more The EDM Conference was held in Raleigh this year, from June 29 to July 2, and for the second time it held a Journal track which was edited by Kalina Yacef this year. The Journal track allows papers submitted to JEDM to be presented at the conference. A summary is available in the proceedings, and the full text is published in the Journal.

Deep Knowledge Tracing and Dynamic Student Classification for Knowledge Tracing

arXiv (Cornell University), Sep 23, 2018

Proceedings of the International Conference on Educational Data Mining (EDM) (12th, Montreal, Canada, July 2-5, 2019)

Educational Data Mining, Jul 1, 2019

Introduction to the Special Issue on User Modeling, Adaptation and Personalization

ACM UMAP is an annual conference on user modeling, adaptation and personalization. User modeling ... more ACM UMAP is an annual conference on user modeling, adaptation and personalization. User modeling concerns the process of understanding the user’s needs, preferences, interests, knowledge and other aspects. This is achieved by reasoning about and extracting knowledge from user data, which includes both data that is explicitly provided by the user—such as profile data—and implicitly gathered usage data—such as browsing data. Adaptation and personalization techniques exploit the user models in order to better tailor a software system, such as a website, to the user needs. Recommender systems are the best known type of personalized systems, but the field is much wider and includes among others personalized search, adaptive user interfaces, personalized advice, and personalized technology-enhanced learning. This special issue contains extended versions of selected papers from UMAP 2017, the 25th edition of the conference series. The conference was hosted in Bratislava, Slovakia, from 9 to 12 July 2017. The conference consisted of five tracks that represent the variety of disciplines and application areas in which user modeling, adaptation and personalization play a role. User interface aspects, including adaptive presentation and navigation, were covered by the tracks Intelligent User Interfaces and Adaptive Hypermedia. As one of the most visible and largest application area of personalization is the Social Web, we received in the corresponding track submissions that both analyzed user behavior to function as input for personalization, as well as the effect of personalization on user behavior. Being the most prominent and most applied adaptive technique, Recommender Systems were given a dedicated track as well. Finally, we dedicated a track to the field of Technology-Enhanced Adaptive Learning, as this is an application area with important and tangible impacts on society. The papers in this special issue belong to the latter two areas. Three papers are situated in the field of Technology-Enhanced Learning. The first paper, “Analysis and Design of Mastery Learning Criteria” (Pelánek and Řihák), shows that, under the assumption of isolated skills, the decision over skill mastery, and whether a system should let the student move on to the next skill to learn, can rest on a simple exponential moving average rather than on the more sophisticated Bayesian and logistic approaches to learner modeling. They also show that the choice of an appropriate mastery threshold and of the source of information is more influential than the choice of the learner modeling technique. The second paper focuses on open learner models, which is an approach for making a student’s learner model explicit to the student, in order to enhance reflection, self-awareness and self-regulation of the learning process. In “Navigation Support in Complex Learner Models: Assessing Visual Design Alternatives” (Guerra, Schunn, Bull, BarríaPineda and Brusilovsky), six alternative prototypes were investigated in two control studies. The results provide several insights in how to balance between ease of use and complexity during the design of such open learner models, and open several lines of future research.

Proceedings of the 20th international conference on User Modeling, Adaptation, and Personalization

Introduction to the special issue of the EDM2015 Journal track

Educational Data Mining, Nov 5, 2016

The EDM Conference was held in Madrid, Spain, this year, from June 26 to June 29, and it included... more The EDM Conference was held in Madrid, Spain, this year, from June 26 to June 29, and it included for the first time a Journal track which was edited by Michel Desmarais and Mykola Pechenizskiy. The Journal track allows papers submitted to JEDM to be presented at the conference. A summary is available in the proceedings, and the full text is published in the Journal.

Refinement of a Q-matrix with an Ensemble Technique Based on Multi-label Classification Algorithms

Lecture Notes in Computer Science, 2016

There are numerous algorithms and tools to help an expert map exercises and tasks to underlying s... more There are numerous algorithms and tools to help an expert map exercises and tasks to underlying skills. The last decade has witnessed a wealth of data driven approaches aiming to refine expert-defined mappings of tasks to skill. This refinement can be seen as a classification problem: for each possible mapping of task to skill, the classifier has to decide whether the expert’s advice is correct, or incorrect. Whereas most algorithms are working at the level of individual mappings, we introduce an approach based on a multi-label classification algorithm that is trained on the mapping of a task to all skills simultaneously. The approach is shown to outperform the existing task to skill mapping refinement techniques.

Learning aerodynamics with neural network

Scientific Reports

We propose a neural network (NN) architecture, the Element Spatial Convolution Neural Network (ES... more We propose a neural network (NN) architecture, the Element Spatial Convolution Neural Network (ESCNN), towards the airfoil lift coefficient prediction task. The ESCNN outperforms existing state-of-the-art NNs in terms of prediction accuracy, with two orders of less parameters. We further investigate and explain how the ESCNN succeeds in making accurate predictions with standard convolution layers. We discover that the ESCNN has the ability to extract physical patterns that emerge from aerodynamics, and such patterns are clearly reflected within a layer of the network. We show that the ESCNN is capable of learning the physical laws and equation of aerodynamics from simulation data.

Encoding User as More Than the Sum of Their Parts

Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, 2017

Neural networks and word embeddings are powerful tools to capture latent factors. These tools can... more Neural networks and word embeddings are powerful tools to capture latent factors. These tools can provide effective measures of similarities between users or items in the context of sparse data. We propose a novel approach that relies on neural networks and word embeddings to the problem of matching a learner looking for mentoring, and a tutor that is willing to provide this mentoring. Tutors and learners can issue multiple offers/requests on different topics. The approach matches over the whole array of topics specified by learners and tutors. Its performance for tutor-learner matching is compared with the state of the art. It yields similar results in terms of precision, but improves the recall.

Spatial Convolution Neural Network for Efficient Prediction of Aerodynamic Coefficients

AIAA Scitech 2021 Forum, 2021

Editorial Acknowledgement

The Editor and Associate Editors would like to warmly thank the editorial board and colleagues wh... more

Editorial Acknowledgement

The JEDM editor and associate editors express their sincere gratitude and thank the editorial boa... more

Editorial Acknowledgement

The JEDM editor and associate editors express their sincere gratitude and thank the editorial boa... more

Editorial Acknowledgement

The Editor and Associate Editors would like to warmly thank the editorial board and colleagues wh... more

Filtering non-relevant short answers in peer learning applications

A Dataset of Learnersourced Explanations from an Online Peer Instruction Environment

Educational Data Mining, Jul 1, 2020

Online Peer Instruction has become prevalent in many "flipped classroom" settings, yet little wor... more Online Peer Instruction has become prevalent in many "flipped classroom" settings, yet little work has been done to examine the content students generate in such a learning environment. This study characterizes a dataset generated by an open-source, web-based homework system that prompts students to first answer questions, and then provide explanations of their reasoning. Of particular interest in this dataset, is that students are also prompted to evaluate a subset of peer explanations based on how convincing they are, as part of the Peer Instruction learning script. Since these student "votes" are then used in the selection of what is shown to future learners, we cast this as an instance of learnersourcing, a paradigm that presents new research opportunities for the Learning Analytics community. This study characterizes a dataset from one Peer Instruction tool, that includes not only the student generated answers and explanations, but this novel "vote" attribute, which aims to capture how convincing each explanation is to other learners. The dataset includes longitudinal observations of student responses over the course of a semester, following groups from three STEM disciplines. The data is made available to interested researchers 1 .

Learnersourcing Quality Assessment of Explanations for Peer Instruction

Lecture Notes in Computer Science, 2020

This paper presents the results of a study, carried out as part of the design-based development o... more This paper presents the results of a study, carried out as part of the design-based development of an online self-assessment for prospective students in higher online education. The self-assessment consists of a set of tests – predictive of completion – and is meant to improve informed decision making prior to enrolment. The rationale being that better decision making will help to address the ongoing concern of non-completion in higher online education. A prototypical design of the self-assessment was created based on an extensive literature review and correlational research, aimed at investigating validity evidence concerning the predictive value of the tests. The present study focused on investigating validity evidence regarding the content of the self-assessment (including the feedback it provides) from a user perspective. Results from a survey among prospective students (N = 66) indicated that predictive validity and content validity of the self-assessment are somewhat at odds: three out of the five tests included in the current prototype were considered relevant by prospective students. Moreover, students rated eleven additionally suggested tests – currently not included – as relevant concerning their study decision. Expectations regarding the feedback to be provided in connection with the tests include an explanation of the measurement and advice for further preparation. A comparison of the obtained scores to a reference group (i.e., other test-takers or successful students) is not expected. Implications for further development and evaluation of the self-assessment are discussed.

GitHub Copilot AI pair programmer: Asset or Liability?

arXiv (Cornell University), Jun 30, 2022

Automatic program synthesis is a long-lasting dream in software engineering. Recently, a promisin... more Automatic program synthesis is a long-lasting dream in software engineering. Recently, a promising Deep Learning (DL) based solution, called Copilot, has been proposed by OpenAI and Microsoft as an industrial product. Although some studies evaluate the correctness of Copilot solutions and report its issues, more empirical evaluations are necessary to understand how developers can benefit from it effectively. In this paper, we study the capabilities of Copilot in two different programming tasks: (i) generating (and reproducing) correct and efficient solutions for fundamental algorithmic problems, and (ii) comparing Copilot's proposed solutions with those of human programmers on a set of programming tasks. For the former, we assess the performance and functionality of Copilot in solving selected fundamental problems in computer science, like sorting and implementing data structures. In the latter, a dataset of programming problems with human-provided solutions is used. The results show that Copilot is capable of providing solutions for almost all fundamental algorithmic problems, however, some solutions are buggy and non-reproducible. Moreover, Copilot has some difficulties in combining multiple methods to generate a solution. Comparing Copilot to humans, our results show that the correct ratio of humans' solutions is greater than Copilot's suggestions, while the buggy solutions generated by Copilot require less effort to be repaired. Based on our findings, if Copilot is used by expert developers in software projects, it can become an asset since its suggestions could be comparable to humans' contributions in terms of quality. However, Copilot can become a liability if it is used by novice developers who may fail to filter its buggy or non-optimal solutions due to a lack of expertise.

Dev2vec: Representing Domain Expertise of Developers in an Embedding Space

arXiv (Cornell University), Jul 11, 2022

Accurate assessment of the domain expertise of developers is important for assigning the proper c... more Accurate assessment of the domain expertise of developers is important for assigning the proper candidate to contribute to a project, or to attend a job role. Since the potential candidate can come from a large pool, the automated assessment of this domain expertise is a desirable goal. While previous methods have had some success within a single software project, the assessment of a developer's domain expertise from contributions across multiple projects is more challenging. In this paper, we employ doc2vec to represent the domain expertise of developers as embedding vectors. These vectors are derived from different sources that contain evidence of developers' expertise, such as the description of repositories that they contributed, their issue resolving history, and API calls in their commits. We name it dev2vec and demonstrate its effectiveness in representing the technical specialization of developers. Our results indicate that encoding the expertise of developers in an embedding vector outperforms state-of-the-art methods and improves the F1score up to 21%. Moreover, our findings suggest that "issue resolving history" of developers is the most informative source of information to represent the domain expertise of developers in embedding spaces.

UMAP '17: Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, Bratislava, Slovakia — July 09 - 12, 2017

Item does not contain fulltext402 p

Editorial Acknowledgements and Introduction to the Special Issue on EDM Journal Track

Educational Data Mining, Nov 5, 2016

The EDM Conference was held in Raleigh this year, from June 29 to July 2, and for the second time... more The EDM Conference was held in Raleigh this year, from June 29 to July 2, and for the second time it held a Journal track which was edited by Kalina Yacef this year. The Journal track allows papers submitted to JEDM to be presented at the conference. A summary is available in the proceedings, and the full text is published in the Journal.

Deep Knowledge Tracing and Dynamic Student Classification for Knowledge Tracing

arXiv (Cornell University), Sep 23, 2018

Proceedings of the International Conference on Educational Data Mining (EDM) (12th, Montreal, Canada, July 2-5, 2019)

Educational Data Mining, Jul 1, 2019

Introduction to the Special Issue on User Modeling, Adaptation and Personalization

ACM UMAP is an annual conference on user modeling, adaptation and personalization. User modeling ... more ACM UMAP is an annual conference on user modeling, adaptation and personalization. User modeling concerns the process of understanding the user’s needs, preferences, interests, knowledge and other aspects. This is achieved by reasoning about and extracting knowledge from user data, which includes both data that is explicitly provided by the user—such as profile data—and implicitly gathered usage data—such as browsing data. Adaptation and personalization techniques exploit the user models in order to better tailor a software system, such as a website, to the user needs. Recommender systems are the best known type of personalized systems, but the field is much wider and includes among others personalized search, adaptive user interfaces, personalized advice, and personalized technology-enhanced learning. This special issue contains extended versions of selected papers from UMAP 2017, the 25th edition of the conference series. The conference was hosted in Bratislava, Slovakia, from 9 to 12 July 2017. The conference consisted of five tracks that represent the variety of disciplines and application areas in which user modeling, adaptation and personalization play a role. User interface aspects, including adaptive presentation and navigation, were covered by the tracks Intelligent User Interfaces and Adaptive Hypermedia. As one of the most visible and largest application area of personalization is the Social Web, we received in the corresponding track submissions that both analyzed user behavior to function as input for personalization, as well as the effect of personalization on user behavior. Being the most prominent and most applied adaptive technique, Recommender Systems were given a dedicated track as well. Finally, we dedicated a track to the field of Technology-Enhanced Adaptive Learning, as this is an application area with important and tangible impacts on society. The papers in this special issue belong to the latter two areas. Three papers are situated in the field of Technology-Enhanced Learning. The first paper, “Analysis and Design of Mastery Learning Criteria” (Pelánek and Řihák), shows that, under the assumption of isolated skills, the decision over skill mastery, and whether a system should let the student move on to the next skill to learn, can rest on a simple exponential moving average rather than on the more sophisticated Bayesian and logistic approaches to learner modeling. They also show that the choice of an appropriate mastery threshold and of the source of information is more influential than the choice of the learner modeling technique. The second paper focuses on open learner models, which is an approach for making a student’s learner model explicit to the student, in order to enhance reflection, self-awareness and self-regulation of the learning process. In “Navigation Support in Complex Learner Models: Assessing Visual Design Alternatives” (Guerra, Schunn, Bull, BarríaPineda and Brusilovsky), six alternative prototypes were investigated in two control studies. The results provide several insights in how to balance between ease of use and complexity during the design of such open learner models, and open several lines of future research.

Proceedings of the 20th international conference on User Modeling, Adaptation, and Personalization

Introduction to the special issue of the EDM2015 Journal track

Educational Data Mining, Nov 5, 2016

The EDM Conference was held in Madrid, Spain, this year, from June 26 to June 29, and it included... more The EDM Conference was held in Madrid, Spain, this year, from June 26 to June 29, and it included for the first time a Journal track which was edited by Michel Desmarais and Mykola Pechenizskiy. The Journal track allows papers submitted to JEDM to be presented at the conference. A summary is available in the proceedings, and the full text is published in the Journal.

Refinement of a Q-matrix with an Ensemble Technique Based on Multi-label Classification Algorithms

Lecture Notes in Computer Science, 2016

There are numerous algorithms and tools to help an expert map exercises and tasks to underlying s... more There are numerous algorithms and tools to help an expert map exercises and tasks to underlying skills. The last decade has witnessed a wealth of data driven approaches aiming to refine expert-defined mappings of tasks to skill. This refinement can be seen as a classification problem: for each possible mapping of task to skill, the classifier has to decide whether the expert’s advice is correct, or incorrect. Whereas most algorithms are working at the level of individual mappings, we introduce an approach based on a multi-label classification algorithm that is trained on the mapping of a task to all skills simultaneously. The approach is shown to outperform the existing task to skill mapping refinement techniques.

Learning aerodynamics with neural network

Scientific Reports

We propose a neural network (NN) architecture, the Element Spatial Convolution Neural Network (ES... more We propose a neural network (NN) architecture, the Element Spatial Convolution Neural Network (ESCNN), towards the airfoil lift coefficient prediction task. The ESCNN outperforms existing state-of-the-art NNs in terms of prediction accuracy, with two orders of less parameters. We further investigate and explain how the ESCNN succeeds in making accurate predictions with standard convolution layers. We discover that the ESCNN has the ability to extract physical patterns that emerge from aerodynamics, and such patterns are clearly reflected within a layer of the network. We show that the ESCNN is capable of learning the physical laws and equation of aerodynamics from simulation data.

Encoding User as More Than the Sum of Their Parts

Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization, 2017

Neural networks and word embeddings are powerful tools to capture latent factors. These tools can... more Neural networks and word embeddings are powerful tools to capture latent factors. These tools can provide effective measures of similarities between users or items in the context of sparse data. We propose a novel approach that relies on neural networks and word embeddings to the problem of matching a learner looking for mentoring, and a tutor that is willing to provide this mentoring. Tutors and learners can issue multiple offers/requests on different topics. The approach matches over the whole array of topics specified by learners and tutors. Its performance for tutor-learner matching is compared with the state of the art. It yields similar results in terms of precision, but improves the recall.

Spatial Convolution Neural Network for Efficient Prediction of Aerodynamic Coefficients

AIAA Scitech 2021 Forum, 2021

Editorial Acknowledgement

The Editor and Associate Editors would like to warmly thank the editorial board and colleagues wh... more

Editorial Acknowledgement

The JEDM editor and associate editors express their sincere gratitude and thank the editorial boa... more

Editorial Acknowledgement

The JEDM editor and associate editors express their sincere gratitude and thank the editorial boa... more

Editorial Acknowledgement

The Editor and Associate Editors would like to warmly thank the editorial board and colleagues wh... more

Filtering non-relevant short answers in peer learning applications