Skip to main content

Jack Mostow

Carnegie Mellon University, Robotics Institute, Faculty Member

Followers

94

Following

12

Co-authors

9

Mentions

1

Public Views

Interests

Uploads

Papers by Jack Mostow

Toward exploiting EEG input in a reading tutor

Artificial Intelligence in Education, Jun 28, 2011

A new type of sensor for students' mental states is a single-channel EEG headset simple enough to... more A new type of sensor for students' mental states is a single-channel EEG headset simple enough to use in schools. Using its signal from adults and children reading text and isolated words, both aloud and silently, we train and test classifiers to tell easy from hard sentences, and to distinguish among easy words, hard words, pseudo-words, and unpronounceable strings. We also identify which EEG components appear sensitive to which lexical features. Better-than-chance performance shows promise for tutors to use EEG at school.

Automated Vocabulary Instruction in a Reading Tutor

Springer eBooks, 2006

This paper presents a within-subject, randomized experiment to compare automated interventions fo... more This paper presents a within-subject, randomized experiment to compare automated interventions for teaching vocabulary to young readers using Project LISTEN's Reading Tutor. The experiment compared three conditions: no explicit instruction, a quick definition, and a quick definition plus a poststory battery of extended instruction based on a published instructional sequence for human teachers. A month long study with elementary school children indicates that the quick instruction, which lasts about seven seconds, has immediate effects on learning gains that did not persist. Extended instruction which lasted about thirty seconds longer than the quick instruction had a persistent effect and produced gains on a posttest one week later.

Artificial Intelligence in Education

Springer eBooks, 2013

Parsing to learn fine grained rules

National Conference on Artificial Intelligence, Aug 21, 1988

The grain size of rules acquired by explanation-based learning may vary widely depending on the s... more

Intelligent Tutoring Systems

Lecture Notes in Computer Science, 1996

The documents below are the templates used for the creation of skill builders, which were used in... more

Evaluating Human and Automated Generation of Distractors for Diagnostic Multiple-Choice Cloze Questions to Assess Children’s Reading Comprehension

Lecture Notes in Computer Science, 2015

We report an experiment to evaluate DQGen's performance in generating three types of distractors ... more We report an experiment to evaluate DQGen's performance in generating three types of distractors for diagnostic multiple-choice cloze (fill-in-theblank) questions to assess children's reading comprehension processes. Ungrammatical distractors test syntax, nonsensical distractors test semantics, and locally plausible distractors test inter-sentential processing. 27 knowledgeable humans rated candidate answers as correct, plausible, nonsensical, or ungrammatical without knowing their intended type or whether they were generated by DQGen, written by other humans, or correct. Surprisingly, DQGen did significantly better than humans at generating ungrammatical distractors and slightly better than them at generating nonsensical distractors, albeit worse at generating plausible distractors. Vetting its output and writing distractors only when necessary would take half as long as writing them all, and improve their quality.

Exploiting Predictable Response Training to Improve Automatic Recognition of Children’s Spoken Responses

Lecture Notes in Computer Science, 2010

The unpredictability of spoken responses by young children (6-7 years old) makes them problematic... more The unpredictability of spoken responses by young children (6-7 years old) makes them problematic for automatic speech recognizers. Aist and Mostow proposed predictable response training to improve automatic recognition of children's free-form spoken responses. We apply this approach in the context of Project LISTEN's Reading Tutor to the task of teaching children an important reading comprehension strategy, namely to make up their own questions about text while reading it. We show how to use knowledge about strategy instruction and the story text to generate a language model that predicts questions spoken by children during comprehension instruction. We evaluated this model on a previously unseen test set of 18 utterances totaling 137 words spoken by 11 second grade children in response to prompts the Reading Tutor inserted as they read. Compared to using a baseline trigram language model that does not incorporate this knowledge, speech recognition using the generated language model achieved concept recall 5 times higherso much that the difference was statistically significant despite small sample size.

Using Item Response Theory to Refine Knowledge Tracing

Educational Data Mining, 2013

Previous work on knowledge tracing has fit parameters per skill (ignoring differences between stu... more Previous work on knowledge tracing has fit parameters per skill (ignoring differences between students), per student (ignoring differences between skills), or independently for each <student, skill> pair (risking sparse training data and overfitting, and undergeneralizing by ignoring overlap of students or skills across pairs). To address these limitations, we first use a higher order Item Response Theory (IRT) model that approximates students' initial knowledge as their one-dimensional (or low-dimensional) overall proficiency, and combines it with the estimated difficulty and discrimination of each skill to estimate the probability knew of knowing a skill before practicing it. We then fit skill-specific knowledge tracing probabilities for learn, guess, and slip. Using synthetic data, we show that Markov Chain Monte Carlo (MCMC) can recover the parameters of this Higher-Order Knowledge Tracing (HO-KT) model. Using real data, we show that HO-KT predicts performance in an algebra tutors significantly better than fitting knowledge tracing parameters per student or per skill.

Semi-Supervised Learning to Perceive Children's Affective States in a Tablet Tutor

Proceedings of the ... AAAI Conference on Artificial Intelligence, Apr 3, 2020

When do Students Interrupt Help? Effects of Time, Help Type, and Individual Differences

Artificial Intelligence in Education, May 6, 2005

When do students interrupt help to request different help? To study this question, we embedded a ... more When do students interrupt help to request different help? To study this question, we embedded a within-subject experiment in the 2003-2004 version of Project LISTEN's Reading Tutor. We analyze 168,983 trials of this experiment, randomized by help type, and report patterns in when students choose to interrupt help. Using the amount of prior help, we fit an exponential curve to predict interruption rate with an r 2 of 0.97 on aggregate data and an r 2 of 0.22 on individual data. To improve the model fit for individual data, we adjust our model to account for different types of help and individual differences. Finally, we report small but significant correlations between a student parameter in our model and external measures of motivation and academic performance.

Toward Learning at Scale in Developing Countries

Proceedings of the Seventh ACM Conference on Learning @ Scale

Advances in education technology are enabling tremendous advances in learning at scale. However, ... more Advances in education technology are enabling tremendous advances in learning at scale. However, they typically assume resources taken for granted in developed countries, including reliable electricity, high-bandwidth Internet access, fast WiFi, powerful computers, sophisticated sensors, and expert technical support to keep it all working. This paper examines these assumptions in the context of a massive test of learning at scale in a developing country. We examine each assumption, how it was broken, and some workarounds used in a 15-month-long independent controlled evaluation of pre-to posttest learning and social-emotional gains by over 2,000 children in 168 villages in Tanzania. We analyze those gains to characterize who gained how much, using test score data, social-emotional measures, and detailed logs from RoboTutor. We quantify the relative impact of pretest scores, literate aspirations, treatment, and usage on learning gains.

Using EEG in Knowledge Tracing

Knowledge tracing (KT) is widely used in Intelligent Tutoring Systems (ITS) to measure student le... more Knowledge tracing (KT) is widely used in Intelligent Tutoring Systems (ITS) to measure student learning. Inexpensive portable electroencephalography (EEG) devices are viable as a way to help detect a number of student mental states relevant to learning, e.g. engagement or attention. This paper reports a first attempt to improve KT estimates of the student’s hidden knowledge state by adding EEG-measured mental states as inputs. Values of learn, forget, guess and slip differ significantly for different EEG states.

A Public Toolkit and ITS Dataset for EEG

We present a data set collected since 2012 containing children’s EEG signals logged during their ... more We present a data set collected since 2012 containing children’s EEG signals logged during their usage of Project LISTEN’s Reading Tutor. We also present EEG-ML, an integrated machine learning toolkit to preprocess EEG data, extract and select features, train and cross-validate classifiers to predict behavioral labels, and analyze their statistical reliability. To illustrate, we describe and evaluate a classifier to estimate a student’s amount of prior exposure to a given word. We make this dataset and toolkit publically available to help researchers explore how EEG might improve intelligent tutoring systems.

Developing, evaluating, and refining an automatic generator of diagnostic multiple choice cloze questions to assess children's comprehension while reading

Natural Language Engineering, 2016

We describe the development, pilot-testing, refinement, and four evaluations of Diagnostic Questi... more We describe the development, pilot-testing, refinement, and four evaluations of Diagnostic Question Generator (DQGen), which automatically generates multiple choice cloze (fill-in-the-blank) questions to test children's comprehension while reading a given text. Unlike previous methods, DQGen tests comprehension not only of an individual sentence but of the context preceding it. To test different aspects of comprehension, DQGen generates three types of distractors: ungrammatical distractors test syntax; nonsensical distractors test semantics; and locally plausible distractors test inter-sentential processing.(1)A pilot study of DQGen 2012 evaluated its overall questions and individual distractors, guiding its refinement into DQGen 2014.(2)Twenty-four elementary students generated 200 responses to multiple choice cloze questions that DQGen 2014 generated from forty-eight stories. In 130 of the responses, the child chose the correct answer. We define thedistractivenessof a distract...

Experimentally augmenting an intelligent tutoring system with human-supplied capabilities: adding human-provided emotional scaffolding to an automated reading tutor that listens

Proceedings. Fourth IEEE International Conference on Multimodal Interfaces

This paper presents the first statistically reliable empirical evidence from a controlled study f... more This paper presents the first statistically reliable empirical evidence from a controlled study for the effect of human-provided emotional scaffolding on student persistence in an intelligent tutoring system. We describe an experiment that added human-provided emotional scaffolding to an automated Reading Tutor that listens, and discuss the methodology we developed to conduct this experiment. Each student participated in one (experimental) session with emotional scaffolding, and in one (control) session without emotional scaffolding, counterbalanced by order of session. Each session was divided into several portions. After each portion of the session was completed, the Reading Tutor gave the student a choice: continue, or quit. We measured persistence as the number of portions the student completed. Human-provided emotional scaffolding added to the automated Reading Tutor resulted in increased student persistence, compared to the Reading Tutor alone. Increased persistence means increased time on task, which ought lead to improved learning. If these results for reading turn out to hold for other domains too, the implication for intelligent tutoring systems is that they should respond with not just cognitive support-but emotional scaffolding as well. Furthermore, the general technique of adding human-supplied capabilities to an existing intelligent tutoring system should prove useful for studying other ITSs too.

Evaluating Human and Automated Generation of Distractors for Diagnostic Multiple-Choice Cloze Questions to Assess Children’s Reading Comprehension

Lecture Notes in Computer Science, 2015

We report an experiment to evaluate DQGen's performance in generating three types of distractors ... more We report an experiment to evaluate DQGen's performance in generating three types of distractors for diagnostic multiple-choice cloze (fill-in-theblank) questions to assess children's reading comprehension processes. Ungrammatical distractors test syntax, nonsensical distractors test semantics, and locally plausible distractors test inter-sentential processing. 27 knowledgeable humans rated candidate answers as correct, plausible, nonsensical, or ungrammatical without knowing their intended type or whether they were generated by DQGen, written by other humans, or correct. Surprisingly, DQGen did significantly better than humans at generating ungrammatical distractors and slightly better than them at generating nonsensical distractors, albeit worse at generating plausible distractors. Vetting its output and writing distractors only when necessary would take half as long as writing them all, and improve their quality.

Exploiting Predictable Response Training to Improve Automatic Recognition of Children’s Spoken Responses

Lecture Notes in Computer Science, 2010

The unpredictability of spoken responses by young children (6-7 years old) makes them problematic... more The unpredictability of spoken responses by young children (6-7 years old) makes them problematic for automatic speech recognizers. Aist and Mostow proposed predictable response training to improve automatic recognition of children's free-form spoken responses. We apply this approach in the context of Project LISTEN's Reading Tutor to the task of teaching children an important reading comprehension strategy, namely to make up their own questions about text while reading it. We show how to use knowledge about strategy instruction and the story text to generate a language model that predicts questions spoken by children during comprehension instruction. We evaluated this model on a previously unseen test set of 18 utterances totaling 137 words spoken by 11 second grade children in response to prompts the Reading Tutor inserted as they read. Compared to using a baseline trigram language model that does not incorporate this knowledge, speech recognition using the generated language model achieved concept recall 5 times higherso much that the difference was statistically significant despite small sample size.

Does Help Help? Introducing the Bayesian Evaluation and Assessment Methodology

Lecture Notes in Computer Science

Most ITS have a means of providing assistance to the student, either on student request or when t... more Most ITS have a means of providing assistance to the student, either on student request or when the tutor determines it would be effective. Presumably, such assistance is included by the ITS designers since they feel it benefits the students. However, whether-and how-help helps students has not been a well studied problem in the ITS community. In this paper we present three approaches for evaluating the efficacy of the Reading Tutor's help: creating experimental trials from data, learning decomposition, and Bayesian Evaluation and Assessment, an approach that uses dynamic Bayesian networks. We have found that experimental trials and learning decomposition both find a negative benefit for help-that is, help hurts! However, the Bayesian Evaluation and Assessment framework finds that help both promotes student long-term learning and provides additional scaffolding on the current problem. We discuss why these approaches give divergent results, and suggest that the Bayesian Evaluation and Assessment framework is the strongest of the three. In addition to introducing Bayesian Evaluation and Assessment, a method for simultaneously assessing students and evaluating tutorial interventions, this paper describes how help can both scaffold the current problem attempt as well as teach the student knowledge that will transfer to later problems.

A Bayes Net Toolkit for Student Modeling in Intelligent Tutoring Systems

Lecture Notes in Computer Science, 2006

This paper describes an effort to model a student's changing knowledge state during skill acquisi... more This paper describes an effort to model a student's changing knowledge state during skill acquisition. Dynamic Bayes Nets (DBNs) provide a powerful way to represent and reason about uncertainty in time series data, and are therefore well-suited to model student knowledge. Many general-purpose Bayes net packages have been implemented and distributed; however, constructing DBNs often involves complicated coding effort. To address this problem, we introduce a tool called BNT-SM. BNT-SM inputs a data set and a compact XML specification of a Bayes net model hypothesized by a researcher to describe causal relationships among student knowledge and observed behavior. BNT-SM generates and executes the code to train and test the model using the Bayes Net Toolbox [1]. Compared to the BNT code it outputs, BNT-SM reduces the number of lines of code required to use a DBN by a factor of 5. In addition to supporting more flexible models, we illustrate how to use BNT-SM to simulate Knowledge Tracing (KT) [2], an established technique for student modeling. The trained DBN does a better job of modeling and predicting student performance than the original KT code (Area Under Curve = 0.610 > 0.568), due to differences in how it estimates parameters.

Comparaison de l'impact de techniques de diagnostic des connaissances sur l'apprentissage d'une stratégie d'aide

Cet article est une traduction de l'article Comparing Student Models in Different Formalisms by P... more Cet article est une traduction de l'article Comparing Student Models in Different Formalisms by Predicting their Impact on Help Success publié dans les actes de la 16th International Conference on Artificial Intelligence in Education. Comparaison de l'impact de techniques de diagnostic des connaissances sur l'apprentissage d'une stratégie d'aide

Toward exploiting EEG input in a reading tutor

Artificial Intelligence in Education, Jun 28, 2011

A new type of sensor for students' mental states is a single-channel EEG headset simple enough to... more A new type of sensor for students' mental states is a single-channel EEG headset simple enough to use in schools. Using its signal from adults and children reading text and isolated words, both aloud and silently, we train and test classifiers to tell easy from hard sentences, and to distinguish among easy words, hard words, pseudo-words, and unpronounceable strings. We also identify which EEG components appear sensitive to which lexical features. Better-than-chance performance shows promise for tutors to use EEG at school.

Automated Vocabulary Instruction in a Reading Tutor

Springer eBooks, 2006

This paper presents a within-subject, randomized experiment to compare automated interventions fo... more This paper presents a within-subject, randomized experiment to compare automated interventions for teaching vocabulary to young readers using Project LISTEN's Reading Tutor. The experiment compared three conditions: no explicit instruction, a quick definition, and a quick definition plus a poststory battery of extended instruction based on a published instructional sequence for human teachers. A month long study with elementary school children indicates that the quick instruction, which lasts about seven seconds, has immediate effects on learning gains that did not persist. Extended instruction which lasted about thirty seconds longer than the quick instruction had a persistent effect and produced gains on a posttest one week later.

Artificial Intelligence in Education

Springer eBooks, 2013

Parsing to learn fine grained rules

National Conference on Artificial Intelligence, Aug 21, 1988

The grain size of rules acquired by explanation-based learning may vary widely depending on the s... more

Intelligent Tutoring Systems

Lecture Notes in Computer Science, 1996

The documents below are the templates used for the creation of skill builders, which were used in... more

Evaluating Human and Automated Generation of Distractors for Diagnostic Multiple-Choice Cloze Questions to Assess Children’s Reading Comprehension

Lecture Notes in Computer Science, 2015

We report an experiment to evaluate DQGen's performance in generating three types of distractors ... more We report an experiment to evaluate DQGen's performance in generating three types of distractors for diagnostic multiple-choice cloze (fill-in-theblank) questions to assess children's reading comprehension processes. Ungrammatical distractors test syntax, nonsensical distractors test semantics, and locally plausible distractors test inter-sentential processing. 27 knowledgeable humans rated candidate answers as correct, plausible, nonsensical, or ungrammatical without knowing their intended type or whether they were generated by DQGen, written by other humans, or correct. Surprisingly, DQGen did significantly better than humans at generating ungrammatical distractors and slightly better than them at generating nonsensical distractors, albeit worse at generating plausible distractors. Vetting its output and writing distractors only when necessary would take half as long as writing them all, and improve their quality.

Exploiting Predictable Response Training to Improve Automatic Recognition of Children’s Spoken Responses

Lecture Notes in Computer Science, 2010

The unpredictability of spoken responses by young children (6-7 years old) makes them problematic... more The unpredictability of spoken responses by young children (6-7 years old) makes them problematic for automatic speech recognizers. Aist and Mostow proposed predictable response training to improve automatic recognition of children's free-form spoken responses. We apply this approach in the context of Project LISTEN's Reading Tutor to the task of teaching children an important reading comprehension strategy, namely to make up their own questions about text while reading it. We show how to use knowledge about strategy instruction and the story text to generate a language model that predicts questions spoken by children during comprehension instruction. We evaluated this model on a previously unseen test set of 18 utterances totaling 137 words spoken by 11 second grade children in response to prompts the Reading Tutor inserted as they read. Compared to using a baseline trigram language model that does not incorporate this knowledge, speech recognition using the generated language model achieved concept recall 5 times higherso much that the difference was statistically significant despite small sample size.

Using Item Response Theory to Refine Knowledge Tracing

Educational Data Mining, 2013

Previous work on knowledge tracing has fit parameters per skill (ignoring differences between stu... more Previous work on knowledge tracing has fit parameters per skill (ignoring differences between students), per student (ignoring differences between skills), or independently for each <student, skill> pair (risking sparse training data and overfitting, and undergeneralizing by ignoring overlap of students or skills across pairs). To address these limitations, we first use a higher order Item Response Theory (IRT) model that approximates students' initial knowledge as their one-dimensional (or low-dimensional) overall proficiency, and combines it with the estimated difficulty and discrimination of each skill to estimate the probability knew of knowing a skill before practicing it. We then fit skill-specific knowledge tracing probabilities for learn, guess, and slip. Using synthetic data, we show that Markov Chain Monte Carlo (MCMC) can recover the parameters of this Higher-Order Knowledge Tracing (HO-KT) model. Using real data, we show that HO-KT predicts performance in an algebra tutors significantly better than fitting knowledge tracing parameters per student or per skill.

Semi-Supervised Learning to Perceive Children's Affective States in a Tablet Tutor

Proceedings of the ... AAAI Conference on Artificial Intelligence, Apr 3, 2020

When do Students Interrupt Help? Effects of Time, Help Type, and Individual Differences

Artificial Intelligence in Education, May 6, 2005

When do students interrupt help to request different help? To study this question, we embedded a ... more When do students interrupt help to request different help? To study this question, we embedded a within-subject experiment in the 2003-2004 version of Project LISTEN's Reading Tutor. We analyze 168,983 trials of this experiment, randomized by help type, and report patterns in when students choose to interrupt help. Using the amount of prior help, we fit an exponential curve to predict interruption rate with an r 2 of 0.97 on aggregate data and an r 2 of 0.22 on individual data. To improve the model fit for individual data, we adjust our model to account for different types of help and individual differences. Finally, we report small but significant correlations between a student parameter in our model and external measures of motivation and academic performance.

Toward Learning at Scale in Developing Countries

Proceedings of the Seventh ACM Conference on Learning @ Scale

Advances in education technology are enabling tremendous advances in learning at scale. However, ... more Advances in education technology are enabling tremendous advances in learning at scale. However, they typically assume resources taken for granted in developed countries, including reliable electricity, high-bandwidth Internet access, fast WiFi, powerful computers, sophisticated sensors, and expert technical support to keep it all working. This paper examines these assumptions in the context of a massive test of learning at scale in a developing country. We examine each assumption, how it was broken, and some workarounds used in a 15-month-long independent controlled evaluation of pre-to posttest learning and social-emotional gains by over 2,000 children in 168 villages in Tanzania. We analyze those gains to characterize who gained how much, using test score data, social-emotional measures, and detailed logs from RoboTutor. We quantify the relative impact of pretest scores, literate aspirations, treatment, and usage on learning gains.

Using EEG in Knowledge Tracing

Knowledge tracing (KT) is widely used in Intelligent Tutoring Systems (ITS) to measure student le... more Knowledge tracing (KT) is widely used in Intelligent Tutoring Systems (ITS) to measure student learning. Inexpensive portable electroencephalography (EEG) devices are viable as a way to help detect a number of student mental states relevant to learning, e.g. engagement or attention. This paper reports a first attempt to improve KT estimates of the student’s hidden knowledge state by adding EEG-measured mental states as inputs. Values of learn, forget, guess and slip differ significantly for different EEG states.

A Public Toolkit and ITS Dataset for EEG

We present a data set collected since 2012 containing children’s EEG signals logged during their ... more We present a data set collected since 2012 containing children’s EEG signals logged during their usage of Project LISTEN’s Reading Tutor. We also present EEG-ML, an integrated machine learning toolkit to preprocess EEG data, extract and select features, train and cross-validate classifiers to predict behavioral labels, and analyze their statistical reliability. To illustrate, we describe and evaluate a classifier to estimate a student’s amount of prior exposure to a given word. We make this dataset and toolkit publically available to help researchers explore how EEG might improve intelligent tutoring systems.

Developing, evaluating, and refining an automatic generator of diagnostic multiple choice cloze questions to assess children's comprehension while reading

Natural Language Engineering, 2016

We describe the development, pilot-testing, refinement, and four evaluations of Diagnostic Questi... more We describe the development, pilot-testing, refinement, and four evaluations of Diagnostic Question Generator (DQGen), which automatically generates multiple choice cloze (fill-in-the-blank) questions to test children's comprehension while reading a given text. Unlike previous methods, DQGen tests comprehension not only of an individual sentence but of the context preceding it. To test different aspects of comprehension, DQGen generates three types of distractors: ungrammatical distractors test syntax; nonsensical distractors test semantics; and locally plausible distractors test inter-sentential processing.(1)A pilot study of DQGen 2012 evaluated its overall questions and individual distractors, guiding its refinement into DQGen 2014.(2)Twenty-four elementary students generated 200 responses to multiple choice cloze questions that DQGen 2014 generated from forty-eight stories. In 130 of the responses, the child chose the correct answer. We define thedistractivenessof a distract...

Experimentally augmenting an intelligent tutoring system with human-supplied capabilities: adding human-provided emotional scaffolding to an automated reading tutor that listens

Proceedings. Fourth IEEE International Conference on Multimodal Interfaces

This paper presents the first statistically reliable empirical evidence from a controlled study f... more This paper presents the first statistically reliable empirical evidence from a controlled study for the effect of human-provided emotional scaffolding on student persistence in an intelligent tutoring system. We describe an experiment that added human-provided emotional scaffolding to an automated Reading Tutor that listens, and discuss the methodology we developed to conduct this experiment. Each student participated in one (experimental) session with emotional scaffolding, and in one (control) session without emotional scaffolding, counterbalanced by order of session. Each session was divided into several portions. After each portion of the session was completed, the Reading Tutor gave the student a choice: continue, or quit. We measured persistence as the number of portions the student completed. Human-provided emotional scaffolding added to the automated Reading Tutor resulted in increased student persistence, compared to the Reading Tutor alone. Increased persistence means increased time on task, which ought lead to improved learning. If these results for reading turn out to hold for other domains too, the implication for intelligent tutoring systems is that they should respond with not just cognitive support-but emotional scaffolding as well. Furthermore, the general technique of adding human-supplied capabilities to an existing intelligent tutoring system should prove useful for studying other ITSs too.

Evaluating Human and Automated Generation of Distractors for Diagnostic Multiple-Choice Cloze Questions to Assess Children’s Reading Comprehension

Lecture Notes in Computer Science, 2015

We report an experiment to evaluate DQGen's performance in generating three types of distractors ... more We report an experiment to evaluate DQGen's performance in generating three types of distractors for diagnostic multiple-choice cloze (fill-in-theblank) questions to assess children's reading comprehension processes. Ungrammatical distractors test syntax, nonsensical distractors test semantics, and locally plausible distractors test inter-sentential processing. 27 knowledgeable humans rated candidate answers as correct, plausible, nonsensical, or ungrammatical without knowing their intended type or whether they were generated by DQGen, written by other humans, or correct. Surprisingly, DQGen did significantly better than humans at generating ungrammatical distractors and slightly better than them at generating nonsensical distractors, albeit worse at generating plausible distractors. Vetting its output and writing distractors only when necessary would take half as long as writing them all, and improve their quality.

Exploiting Predictable Response Training to Improve Automatic Recognition of Children’s Spoken Responses

Lecture Notes in Computer Science, 2010

The unpredictability of spoken responses by young children (6-7 years old) makes them problematic... more The unpredictability of spoken responses by young children (6-7 years old) makes them problematic for automatic speech recognizers. Aist and Mostow proposed predictable response training to improve automatic recognition of children's free-form spoken responses. We apply this approach in the context of Project LISTEN's Reading Tutor to the task of teaching children an important reading comprehension strategy, namely to make up their own questions about text while reading it. We show how to use knowledge about strategy instruction and the story text to generate a language model that predicts questions spoken by children during comprehension instruction. We evaluated this model on a previously unseen test set of 18 utterances totaling 137 words spoken by 11 second grade children in response to prompts the Reading Tutor inserted as they read. Compared to using a baseline trigram language model that does not incorporate this knowledge, speech recognition using the generated language model achieved concept recall 5 times higherso much that the difference was statistically significant despite small sample size.

Does Help Help? Introducing the Bayesian Evaluation and Assessment Methodology

Lecture Notes in Computer Science

Most ITS have a means of providing assistance to the student, either on student request or when t... more Most ITS have a means of providing assistance to the student, either on student request or when the tutor determines it would be effective. Presumably, such assistance is included by the ITS designers since they feel it benefits the students. However, whether-and how-help helps students has not been a well studied problem in the ITS community. In this paper we present three approaches for evaluating the efficacy of the Reading Tutor's help: creating experimental trials from data, learning decomposition, and Bayesian Evaluation and Assessment, an approach that uses dynamic Bayesian networks. We have found that experimental trials and learning decomposition both find a negative benefit for help-that is, help hurts! However, the Bayesian Evaluation and Assessment framework finds that help both promotes student long-term learning and provides additional scaffolding on the current problem. We discuss why these approaches give divergent results, and suggest that the Bayesian Evaluation and Assessment framework is the strongest of the three. In addition to introducing Bayesian Evaluation and Assessment, a method for simultaneously assessing students and evaluating tutorial interventions, this paper describes how help can both scaffold the current problem attempt as well as teach the student knowledge that will transfer to later problems.

A Bayes Net Toolkit for Student Modeling in Intelligent Tutoring Systems

Lecture Notes in Computer Science, 2006

This paper describes an effort to model a student's changing knowledge state during skill acquisi... more This paper describes an effort to model a student's changing knowledge state during skill acquisition. Dynamic Bayes Nets (DBNs) provide a powerful way to represent and reason about uncertainty in time series data, and are therefore well-suited to model student knowledge. Many general-purpose Bayes net packages have been implemented and distributed; however, constructing DBNs often involves complicated coding effort. To address this problem, we introduce a tool called BNT-SM. BNT-SM inputs a data set and a compact XML specification of a Bayes net model hypothesized by a researcher to describe causal relationships among student knowledge and observed behavior. BNT-SM generates and executes the code to train and test the model using the Bayes Net Toolbox [1]. Compared to the BNT code it outputs, BNT-SM reduces the number of lines of code required to use a DBN by a factor of 5. In addition to supporting more flexible models, we illustrate how to use BNT-SM to simulate Knowledge Tracing (KT) [2], an established technique for student modeling. The trained DBN does a better job of modeling and predicting student performance than the original KT code (Area Under Curve = 0.610 > 0.568), due to differences in how it estimates parameters.

Comparaison de l'impact de techniques de diagnostic des connaissances sur l'apprentissage d'une stratégie d'aide

Cet article est une traduction de l'article Comparing Student Models in Different Formalisms by P... more Cet article est une traduction de l'article Comparing Student Models in Different Formalisms by Predicting their Impact on Help Success publié dans les actes de la 16th International Conference on Artificial Intelligence in Education. Comparaison de l'impact de techniques de diagnostic des connaissances sur l'apprentissage d'une stratégie d'aide