We describe an approach to creating interactive and animated graphical displays using R's graphic... more We describe an approach to creating interactive and animated graphical displays using R's graphics engine and Scalable Vector Graphics, an XML vocabulary for describing two-dimensional graphical displays. We use the svg() graphics device inR and then postprocess the resulting XML documents. The post-processing identifies the elements in the SVG that correspond to the different components of the graphical display, e.g., points, axes, labels, lines. One can then annotate these elements to add interactivity and animation effects. One can also use JavaScript to provide dynamic interactive effects to the plot, enabling rich user interactions and compelling visualizations. The resulting SVG documents can be embedded within HTML documents and can involve JavaScript code that integrates the SVG and HTML objects. The functionality is provided via the SV-GAnnotation package and makes static plots generated viaR graphics functions available as stand-alone, interactive and animated plots for the Web and other venues.
Dynamic documents that combine text and code, which is evaluated to dynamically create content wh... more Dynamic documents that combine text and code, which is evaluated to dynamically create content when the document is “rendered,” for example, Sweave, are a large step forward in reproducible data analysis and computation. However, to capture the research process, we need richer paradigms and infrastructure. The process includes all the investigations and computations, and not just the final reported ones, and the entirety represents reproducible research. In addition to richer paradigms for reproducability, we want to be able to capture more complex aspects of the computational process, such as the use of multiple languages, and also engage different communities using other programming languages so that reproducible computations and research become more widespread. We also need to integrate existing and future approaches with commonly used tools such as Microsoft Word and make the resulting documents richer for authors and readers. We present two approaches to structured, dynamic documents that use modern, ubiquitous standard technologies (XML) and provide extensible infrastructure for richer documents. The first integrates R and Microsoft Word for use by a broader audience and provides some innovations in this interface, and the second uses eXtensible Stylesheet Language (XSL) and R to provide a flexible and extensible infrastructure for richer, more accessible dynamic documents.
Recently, there has been a lot of discussion about what a statistics curriculum should contain, a... more Recently, there has been a lot of discussion about what a statistics curriculum should contain, and which elements are important for different types of students. For the most part, attention has been understandably focused on the introductory statistics course. This course services thousands of students who take only one statistics course. In the United States, the course typically fulfills a general education requirement of the university or a degree program. There has also been considerable activity regarding the use of computers to present statistical concepts and to leverage the Web and course management software to interact with students. Recently, there has been debate as to whether statisticians should make ambitious changes using resampling, the bootstrap, and simulation in place of the more traditional mathematical topics that are seen as the fundamentals or origins of the field (Cobb, 2007). It is unclear that we are achieving the goals of basic statistical literacy by focusing on formulae or even by concentrating almost exclusively on methodology. Instead, we believe the field and students would be significantly better served by showing the challenges and applicability of statistics to everyday life, policy, and scientific decision making in many contexts, and by teaching students how to think statistically and creatively. In contrast to the activity at the introductory level, there has been much less attention paid to updating the statistics curricula for other categories of students. While smaller in number, these students—undergraduate majors and minors, masters, and doctoral students—are very important, as they are the ones who will use statistics to further the field and improve the quality of research. Other disciplines (e.g., biology, geo graphy, and political and social sciences) are increasingly appreciating the importance of statistics and including statistical material in their curricula. Further, statistics has become a broader subject and field. However, the statistics curricula at these levelshave not changed much past the introductory courses. Students taking courses for just 2 years may not see any modern statistical methods, leading them to a view that the important statistical ideas have all been developed. More importantly, few students will see how these methods are really used, and even fewer will know at the end of their studies what a statistician actually does. This is because statisticians very rarely attempt to teach this; instead, they labor over the details of various methodologies. The statistics curricula are based on presenting an intellectual infrastructure in order to understand the statistical method. This has significant consequences for improved quantitative literacy. As the practice of science and statistics research continues to change, its perspective and attitudes must also change so as to realize the field's potential and maximize the important influence that statistical thinking has on scientific endeavors. To a large extent, this means learning from the past and challenging the status quo. Instead of teaching the same concepts with varying degrees of mathematical rigor, statisticians need to address what is missing from the curricula. In our work, we look at what statistics students might do and howstatistics programs could change to allow graduates to attain their potential.
Karen Kafadar is the 1998 Chair of the Statistical Computing Sec- tion. In her column she conside... more Karen Kafadar is the 1998 Chair of the Statistical Computing Sec- tion. In her column she considers the Section Charter and the oppor- tunities it represents.
It is increasingly clear that computing is becoming an essential skill for statisticians and anyb... more It is increasingly clear that computing is becoming an essential skill for statisticians and anybody working with data. Computing is as important as mathematics in both statistical practice and research, yet it occupies a tiny portion of our curricula. We have an obligation to reform our upper-division and graduate curricula and integrate computing. We need to change our view of the role of computing in our programs, and teach computational fundamentals and reasoning, rather than ad hoc "tricks" or templates. Furthermore, we must broaden our notion of "statistical computing" to teach modern data technologies. The needs for statistical computing are different from computer science and we must teach this increasingly diverse topic within the statistics curricula. This requires us to fit more into our curricula and also for many of us to learn this material. Computing is important in its own right but can also greatly improve how students learn the traditional material and introduce them to a different aspect of statistics.
Significant efforts have been made to overhaul the introductory statistics courses by placing gre... more Significant efforts have been made to overhaul the introductory statistics courses by placing greater emphasis on statistical thinking and literacy and less on rules, methods and procedures. We advocate broadening and increasing this effort to all levels of students and, importantly, using topical, interesting, substantive problems that come from the actual practice of statistics. We want students to understand the thought process of the “masters” in context, seeing their choices, different approaches and explorations. Similar to Open Source software, we think it is vital that the work of the community of researchers is accessible to the community of educators so that students can experience statistical applications and learn how to approach analyses themselves. We describe a mechanism by which one can collect all aspects or fragments of an analysis or simulation into a “document” so that the computations and results are reproducible, reusable and amenable to extensions. These documents contain various pieces of information (e.g. text, code, data, exploration paths) and can be processed to create regular descriptive papers in various formats (e.g. PDF, HTML), as well as acting as a database of the analysis which we can explore in rich new ways. Researchers, instructors and readers can control the various steps in the processing and rendering of the document. For example, this type of document supports interactive components with which a student can easily control and alter the inputs to the computations in a semi-guided fashion, gradually delve deeper into the details, and go on to her own free-form analysis. Our implementation for this system is based on widely used and standardized frameworks and readily supports multiple and different programming languages. Also, it is highly extensible which allows adaptation and future developments.
Statistical computing is part of a more general process, which can be called computing with data.... more Statistical computing is part of a more general process, which can be called computing with data. Besides traditional statistical analysis, this involves acquiring, organizing, and visualizing data, often in large, structured datasets organized in database management systems and used for purposes beyond analysis. An important challenge for statistical computing (and statistics in general) is to increase the scope of our involvement in this diverse environment. At the same time, the computing environment itself is becoming more diverse in all respects: data and users are widely spread and using many different systems.
We describe an approach to creating interactive and animated graphical displays using R's graphic... more We describe an approach to creating interactive and animated graphical displays using R's graphics engine and Scalable Vector Graphics, an XML vocabulary for describing two-dimensional graphical displays. We use the svg() graphics device inR and then postprocess the resulting XML documents. The post-processing identifies the elements in the SVG that correspond to the different components of the graphical display, e.g., points, axes, labels, lines. One can then annotate these elements to add interactivity and animation effects. One can also use JavaScript to provide dynamic interactive effects to the plot, enabling rich user interactions and compelling visualizations. The resulting SVG documents can be embedded within HTML documents and can involve JavaScript code that integrates the SVG and HTML objects. The functionality is provided via the SV-GAnnotation package and makes static plots generated viaR graphics functions available as stand-alone, interactive and animated plots for the Web and other venues.
Dynamic documents that combine text and code, which is evaluated to dynamically create content wh... more Dynamic documents that combine text and code, which is evaluated to dynamically create content when the document is “rendered,” for example, Sweave, are a large step forward in reproducible data analysis and computation. However, to capture the research process, we need richer paradigms and infrastructure. The process includes all the investigations and computations, and not just the final reported ones, and the entirety represents reproducible research. In addition to richer paradigms for reproducability, we want to be able to capture more complex aspects of the computational process, such as the use of multiple languages, and also engage different communities using other programming languages so that reproducible computations and research become more widespread. We also need to integrate existing and future approaches with commonly used tools such as Microsoft Word and make the resulting documents richer for authors and readers. We present two approaches to structured, dynamic documents that use modern, ubiquitous standard technologies (XML) and provide extensible infrastructure for richer documents. The first integrates R and Microsoft Word for use by a broader audience and provides some innovations in this interface, and the second uses eXtensible Stylesheet Language (XSL) and R to provide a flexible and extensible infrastructure for richer, more accessible dynamic documents.
Recently, there has been a lot of discussion about what a statistics curriculum should contain, a... more Recently, there has been a lot of discussion about what a statistics curriculum should contain, and which elements are important for different types of students. For the most part, attention has been understandably focused on the introductory statistics course. This course services thousands of students who take only one statistics course. In the United States, the course typically fulfills a general education requirement of the university or a degree program. There has also been considerable activity regarding the use of computers to present statistical concepts and to leverage the Web and course management software to interact with students. Recently, there has been debate as to whether statisticians should make ambitious changes using resampling, the bootstrap, and simulation in place of the more traditional mathematical topics that are seen as the fundamentals or origins of the field (Cobb, 2007). It is unclear that we are achieving the goals of basic statistical literacy by focusing on formulae or even by concentrating almost exclusively on methodology. Instead, we believe the field and students would be significantly better served by showing the challenges and applicability of statistics to everyday life, policy, and scientific decision making in many contexts, and by teaching students how to think statistically and creatively. In contrast to the activity at the introductory level, there has been much less attention paid to updating the statistics curricula for other categories of students. While smaller in number, these students—undergraduate majors and minors, masters, and doctoral students—are very important, as they are the ones who will use statistics to further the field and improve the quality of research. Other disciplines (e.g., biology, geo graphy, and political and social sciences) are increasingly appreciating the importance of statistics and including statistical material in their curricula. Further, statistics has become a broader subject and field. However, the statistics curricula at these levelshave not changed much past the introductory courses. Students taking courses for just 2 years may not see any modern statistical methods, leading them to a view that the important statistical ideas have all been developed. More importantly, few students will see how these methods are really used, and even fewer will know at the end of their studies what a statistician actually does. This is because statisticians very rarely attempt to teach this; instead, they labor over the details of various methodologies. The statistics curricula are based on presenting an intellectual infrastructure in order to understand the statistical method. This has significant consequences for improved quantitative literacy. As the practice of science and statistics research continues to change, its perspective and attitudes must also change so as to realize the field's potential and maximize the important influence that statistical thinking has on scientific endeavors. To a large extent, this means learning from the past and challenging the status quo. Instead of teaching the same concepts with varying degrees of mathematical rigor, statisticians need to address what is missing from the curricula. In our work, we look at what statistics students might do and howstatistics programs could change to allow graduates to attain their potential.
Karen Kafadar is the 1998 Chair of the Statistical Computing Sec- tion. In her column she conside... more Karen Kafadar is the 1998 Chair of the Statistical Computing Sec- tion. In her column she considers the Section Charter and the oppor- tunities it represents.
It is increasingly clear that computing is becoming an essential skill for statisticians and anyb... more It is increasingly clear that computing is becoming an essential skill for statisticians and anybody working with data. Computing is as important as mathematics in both statistical practice and research, yet it occupies a tiny portion of our curricula. We have an obligation to reform our upper-division and graduate curricula and integrate computing. We need to change our view of the role of computing in our programs, and teach computational fundamentals and reasoning, rather than ad hoc "tricks" or templates. Furthermore, we must broaden our notion of "statistical computing" to teach modern data technologies. The needs for statistical computing are different from computer science and we must teach this increasingly diverse topic within the statistics curricula. This requires us to fit more into our curricula and also for many of us to learn this material. Computing is important in its own right but can also greatly improve how students learn the traditional material and introduce them to a different aspect of statistics.
Significant efforts have been made to overhaul the introductory statistics courses by placing gre... more Significant efforts have been made to overhaul the introductory statistics courses by placing greater emphasis on statistical thinking and literacy and less on rules, methods and procedures. We advocate broadening and increasing this effort to all levels of students and, importantly, using topical, interesting, substantive problems that come from the actual practice of statistics. We want students to understand the thought process of the “masters” in context, seeing their choices, different approaches and explorations. Similar to Open Source software, we think it is vital that the work of the community of researchers is accessible to the community of educators so that students can experience statistical applications and learn how to approach analyses themselves. We describe a mechanism by which one can collect all aspects or fragments of an analysis or simulation into a “document” so that the computations and results are reproducible, reusable and amenable to extensions. These documents contain various pieces of information (e.g. text, code, data, exploration paths) and can be processed to create regular descriptive papers in various formats (e.g. PDF, HTML), as well as acting as a database of the analysis which we can explore in rich new ways. Researchers, instructors and readers can control the various steps in the processing and rendering of the document. For example, this type of document supports interactive components with which a student can easily control and alter the inputs to the computations in a semi-guided fashion, gradually delve deeper into the details, and go on to her own free-form analysis. Our implementation for this system is based on widely used and standardized frameworks and readily supports multiple and different programming languages. Also, it is highly extensible which allows adaptation and future developments.
Statistical computing is part of a more general process, which can be called computing with data.... more Statistical computing is part of a more general process, which can be called computing with data. Besides traditional statistical analysis, this involves acquiring, organizing, and visualizing data, often in large, structured datasets organized in database management systems and used for purposes beyond analysis. An important challenge for statistical computing (and statistics in general) is to increase the scope of our involvement in this diverse environment. At the same time, the computing environment itself is becoming more diverse in all respects: data and users are widely spread and using many different systems.
Uploads
Papers by Duncan Lang