SPSS Statistics for Data Analysis and Visualization
By Keith McCormick, Jesus Salcedo, Jon Peck and
()
About this ebook
SPSS Statistics for Data Analysis and Visualization goes beyond the basics of SPSS Statistics to show you advanced techniques that exploit the full capabilities of SPSS. The authors explain when and why to use each technique, and then walk you through the execution with a pragmatic, nuts and bolts example. Coverage includes extensive, in-depth discussion of advanced statistical techniques, data visualization, predictive analytics, and SPSS programming, including automation and integration with other languages like R and Python. You'll learn the best methods to power through an analysis, with more efficient, elegant, and accurate code.
IBM SPSS Statistics is complex: true mastery requires a deep understanding of statistical theory, the user interface, and programming. Most users don't encounter all of the methods SPSS offers, leaving many little-known modules undiscovered. This book walks you through tools you may have never noticed, and shows you how they can be used to streamline your workflow and enable you to produce more accurate results.
- Conduct a more efficient and accurate analysis
- Display complex relationships and create better visualizations
- Model complex interactions and master predictive analytics
- Integrate R and Python with SPSS Statistics for more efficient, more powerful code
These "hidden tools" can help you produce charts that simply wouldn't be possible any other way, and the support for other programming languages gives you better options for solving complex problems. If you're ready to take advantage of everything this powerful software package has to offer, SPSS Statistics for Data Analysis and Visualization is the expert-led training you need.
Read more from Keith Mc Cormick
SPSS Statistics For Dummies Rating: 3 out of 5 stars3/5SPSS Statistics Workbook For Dummies Rating: 0 out of 5 stars0 ratingsIBM SPSS Modeler Cookbook Rating: 0 out of 5 stars0 ratingsIBM SPSS Modeler Essentials: Effective techniques for building powerful data mining and predictive analytics solutions Rating: 0 out of 5 stars0 ratings
Related to SPSS Statistics for Data Analysis and Visualization
Related ebooks
Statistical Analysis with R For Dummies Rating: 0 out of 5 stars0 ratingsSPSS for Applied Sciences: Basic Statistical Testing Rating: 3 out of 5 stars3/5Beginning Statistics with Data Analysis Rating: 4 out of 5 stars4/5Quantitative Method-Breviary - SPSS: A problem-oriented reference for market researchers Rating: 0 out of 5 stars0 ratingsSPSS for you Rating: 4 out of 5 stars4/5Introduction to Biostatistics with JMP (Hardcover edition) Rating: 1 out of 5 stars1/5Data Visualization in Healthcare A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratingsIBM SPSS Statistics 21 Brief Guide Rating: 0 out of 5 stars0 ratingsData Visualization In Healthcare A Complete Guide - 2020 Edition Rating: 0 out of 5 stars0 ratingsData Preparation and Exploration: Applied to Healthcare Data Rating: 0 out of 5 stars0 ratingsBusiness Analytics Using SAS Enterprise Guide and SAS Enterprise Miner: A Beginner's Guide Rating: 0 out of 5 stars0 ratingsData Analysis with Stata Rating: 5 out of 5 stars5/5Simulation for Data Science with R Rating: 0 out of 5 stars0 ratingsR Data Science Essentials Rating: 2 out of 5 stars2/5Introduction to Statistics: An Intuitive Guide for Analyzing Data and Unlocking Discoveries Rating: 5 out of 5 stars5/5R for Data Science Rating: 5 out of 5 stars5/5SPSS A Complete Guide - 2021 Edition Rating: 0 out of 5 stars0 ratingsBiostatistics Explored Through R Software: An Overview Rating: 4 out of 5 stars4/5Mastering Scientific Computing with R Rating: 3 out of 5 stars3/5Categorical Data Analysis Using SAS, Third Edition Rating: 0 out of 5 stars0 ratingsCluster Analysis Rating: 4 out of 5 stars4/5SPSS: The Ultimate Data Analysis Tool Rating: 5 out of 5 stars5/5Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today's Businesses Rating: 0 out of 5 stars0 ratingsR Graphs Cookbook Second Edition Rating: 3 out of 5 stars3/5Chi Squared for Beginners Rating: 0 out of 5 stars0 ratingsBig Data: Using SMART Big Data, Analytics and Metrics To Make Better Decisions and Improve Performance Rating: 4 out of 5 stars4/5Data Analytics Rating: 1 out of 5 stars1/5Analysing Data For Your PhD: An Introduction: PhD Knowledge, #3 Rating: 0 out of 5 stars0 ratings
Computers For You
Mastering ChatGPT: 21 Prompts Templates for Effortless Writing Rating: 4 out of 5 stars4/5Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are Rating: 4 out of 5 stars4/5Data Analytics for Beginners: Introduction to Data Analytics Rating: 4 out of 5 stars4/5The Innovators: How a Group of Hackers, Geniuses, and Geeks Created the Digital Revolution Rating: 4 out of 5 stars4/5Creating Online Courses with ChatGPT | A Step-by-Step Guide with Prompt Templates Rating: 4 out of 5 stars4/5Algorithms to Live By: The Computer Science of Human Decisions Rating: 4 out of 5 stars4/5Storytelling with Data: Let's Practice! Rating: 4 out of 5 stars4/5Grokking Algorithms: An illustrated guide for programmers and other curious people Rating: 4 out of 5 stars4/5Practical Data Analysis Rating: 4 out of 5 stars4/5Learning the Chess Openings Rating: 5 out of 5 stars5/5Elon Musk Rating: 4 out of 5 stars4/5Get Into UX: A foolproof guide to getting your first user experience job Rating: 4 out of 5 stars4/5Black Holes: The Key to Understanding the Universe Rating: 5 out of 5 stars5/5Python Machine Learning By Example Rating: 4 out of 5 stars4/5Master Obsidian Quickly: Boost Your Learning & Productivity with a Free, Modern, Powerful Knowledge Toolkit Rating: 4 out of 5 stars4/5The Alignment Problem: How Can Machines Learn Human Values? Rating: 4 out of 5 stars4/5Prompt Engineering ; The Future Of Language Generation Rating: 3 out of 5 stars3/5SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Artificial Intelligence: The Complete Beginner’s Guide to the Future of A.I. Rating: 4 out of 5 stars4/5Deep Learning with PyTorch Rating: 5 out of 5 stars5/5Python for Finance Cookbook: Over 50 recipes for applying modern Python libraries to financial data analysis Rating: 0 out of 5 stars0 ratingsComputer Science I Essentials Rating: 5 out of 5 stars5/5Learn Algorithmic Trading: Build and deploy algorithmic trading systems and strategies using Python and advanced data analysis Rating: 0 out of 5 stars0 ratings
Reviews for SPSS Statistics for Data Analysis and Visualization
0 ratings0 reviews
Book preview
SPSS Statistics for Data Analysis and Visualization - Keith McCormick
Foreword
In my various roles at SPSS and IBM I met Keith and Jesus many years ago. They both have over 20 years of statistical consulting experience, and they both have been training people on statistics and how to use SPSS for many years. Each has in fact trained thousands of students. They are uniquely qualified to bring the message and content of this book to you, and they have done so with rigor and grace. SPSS has so many techniques and procedures to perform both simple and complex analysis, and Keith and Jesus will introduce you to this rich tapestry so that it pays dividends in benefiting your endeavors in driving societal change based on data and analytics for years to come. This book goes beyond the elementary treatments found in most of the other books on SPSS Statistics but is written for users who do not necessarily have an advanced statistical background. It can make the reader a better analyst by expanding their toolkit to include powerful techniques that he or she might not otherwise consider but that can have a big payoff in increased insight.
Keith and Jesus’ outstanding new book on SPSS Statistics has brought back so many thoughts about this great product and the influence it has had on so many people that I thought I would briefly reminisce.
I first became involved with this software when I went to work for SPSS in 1995 as Director of Quality Assurance. A year earlier, SPSS had released its first Microsoft Windows product—which, while solid, did not really take advantage of the amazing possibilities a true graphical interface could provide. This was a huge and important time for the company as the SPSS team was hard at work revolutionizing both the front-end user interface and the output to create a standard that is still in place and considered best of breed today. These innovations enabled sophisticated pivot table output as well as much more customized graphical output than had ever been attempted before. Indeed, in the years to come it was that spirit of always getting ahead of every technological trend that would keep this software right in the heart of what the data analysis community demanded.
When I say the heart of the data analysis community I am not in any way exaggerating. This software has been used by hundreds of thousands of students in college and graduate school and by similar numbers in government and commercial environments worldwide. Over the years I have literally had hundreds, if not thousands of people say to me I used SPSS in college
when I introduced myself. And of course, I can’t leave out the bootleg copies I have seen in innumerable places during my travels and personally purchased on the streets of Santiago and Beijing.
Impressive? Absolutely. But of course the real question is … WHY is SPSS so heavily used and so well loved? WHY has its community of users stayed vibrant and loyal even eight years after the company itself was acquired by IBM?
The answer is the combination of power and simplicity combined with elegance. This is a big statement. To back this up—and apropos of the subject matter—I’ll contribute a data point as my best evidence. A few years ago, when I was still with IBM (which acquired SPSS in 2009), we hired a summer intern who had used our software for a semester in college. After about a month on the job, we debriefed her on the progress of her user interface design assignment. She discussed at length the challenges she was having coming up with a design that was up to the standard of the rest of the product in terms of simplicity, backed by immense power. This led to a discussion of the first time she used the product as a student. Of course, opening a statistics
product for the first time filled this iPhone-using millennial with much trepidation; however, as she described to us within just a few minutes she was loading and manipulating data, building predictive models, and producing output for her class. In just a short time beyond that she was digging into the depths of some of the power the product provided. Even a user nearly born and bred with the beautiful user designs of the smartphone consumer era was right at home using SPSS. What an amazing statement in and of itself. Think about it! This is made even more extraordinary because this same student had interactions with professors and researchers on her campus who were using—in fact, relying on—that very same product to do their cutting-edge work. As I said, the answer is the combination of power and simplicity combined with elegance.
This amazing simplicity does not come at the expense of power. As Keith and Jesus make clear in this book, SPSS Statistics is an incredibly powerful tool for data analysis and visualization. Even today there is no tool that works with its users of any level (novice, intermediate, or expert) to uncover meanings and relationships in data as powerfully as SPSS does. Further, once the data has been prepared, the models built, and the analysis done, there is no software available that is better at explaining the results to non-data analysts who have to act on it. This increases the value of the tool immeasurably—since it creates the understanding and confidence to deploy its insights into the real world to create real value. Having seen this done so many times, by so many people, in so many domains, I can say to those starting with this product for the first time that I truly envy you—you are about to start on a journey of learning and getting results that will amaze you—and the people you work with.
Let’s put this all in perspective. This product is now in its sixth decade of existence. That’s right—it first came out in the late 1960s. How many products can you name that have survived and prospered for that long? Not many. The Leica M camera and the Porsche 911 car with their classic timeless designs come to mind, but not much else. How many COMPUTER products? Even less; perhaps only the venerable IBM mainframe, in fact. But here we have IBM SPSS Statistics—not only surviving but still as relevant and vital as ever—right in the midst of the new age of big data and machine learning, heavily used by experts who dig deep into data and model building, but usable by novices in the iPhone era as well.
Now, let us switch our focus from celebrating the vibrancy and staying power of the SPSS journey and into the heart of what Keith and Jesus have addressed in this book. This is first and foremost a book for data analysis practitioners at intermediate and advanced levels. The question this begs is how this product can help that audience create the most value in the modern era.
Unlike the world of the late 1960s when SPSS was created, we now live in an age where there are many tools to do quick and fast analysis of datasets. For example, Tableau is a fine tool for more business-oriented users with less data analysis training to get immediate and useful visual insights from their data. So what then is the need for IBM SPSS Statistics in this new world?
To answer that question, let me take you back several years to a conference called MinneAnalytics,
sponsored by a Minnesota-based organization of analytic professionals, where I delivered a presentation on Advanced Analytics called What’s Your World View?
In that presentation, I envisioned a rapidly approaching new age where big data
would meet advanced analytic techniques running in real time and that combination would drive every decision- making aspect of how our society would work. I compared the importance of this movement to previous huge steps that changed the very foundation of society—including the invention of the automobile and the invention of assembly-line production for manufacturing many different types of goods.
Well, a mere three years later that future
society is here already—right now. It is happening all around us. Analytics on big data is driving decision making and processes everywhere you look. Hospitals apply real-time analytics to data feeds from patient-monitoring instruments in intensive care units to message doctors automatically that their patient in the ICU will shortly take a turn for the worse. Firms managing trucking use analytics to intervene proactively when the system tells them one of their drivers is predicted to have an accident. Airplanes and cars apply real-time analytics to engine sensors to predict failure and inform the pilots and drivers to take action before such failure occurs. Indeed, big data analytics has become one of the most disruptive forces in business history and is unleashing new value creation quite literally wherever you look. All of these examples clearly show a fundamental point—quick visual understanding is one thing—but deep insight yielding confidence in a predictive model that is deployed in real time at critical decision points at vast scale is quite another. It is in this realm of confirmation and confidence that SPSS Statistics shines like no other.
Mass deployment of advanced analytics will create benefits for society that are for all intents and purposes unimaginable. Assuming, of course, that the deployed analytics are in fact correct (and with the right tweaking and trade-offs between accuracy and stability) and deployed properly. It is the almost unique benefit of SPSS that no matter what language in which those analytics are built (SPSS, R, Python, supervised or unsupervised, standard or machine learning, executed programmatically or through visual interfaces, or any other variant you can think of) the product can be used to confirm confidence that the desired results will be achieved, and in understanding the risks involved. It can also be used to explain the results to others in the enterprise, aligning those who need to be in the know on exactly and precisely how analytics drive their new business models. There is no better hub
for data scientists to practice their craft and contribute their value to the creation of a new world—a new world of staggering rates of change guided or driven by data and analytics.
IBM SPSS Statistics is the perfect tool for this new world when used by well-trained analysts who can put all the data and all the insights together without mistakes to create the most value. People who can take the output of machine learning, add traditional data and then other new forms of data (like sensors and social media for example), to get insights well beyond those quick insights from Tableau and other surface-level tools. People who know how to use the advanced capabilities of the tool, such as the ability to do mixed model analysis of data at different levels (for example, within a hierarchy to find even deeper insights). Such a tool, in the hands of such people—well-trained data scientists—can drive us into this new remarkable world with both confidence and safety. To become one of those who drive this societal transformation using SPSS you can benefit from having this book as your guide.
Enjoy the book…and enjoy the next 50 years of IBM SPSS Statistics as well!
— Jason Verlen
Jason Verlen is currently Senior Vice President of Product Management and Marketing at CCC Information Services, based in Chicago. Before moving to CCC he spent 20 years at SPSS and then IBM (after its acquisition of SPSS) in various roles ending with being named Vice President of Big Data Analytics at IBM.
Introduction
This book is a collaboration between me (Keith) and several other career-long SPSSers,
and the editorial decisions about what to cover, and how to cover it, are greatly affected by that fact. My own career took a turn down a road that led to a life of learning, teaching, and consulting about SPSS almost 20 years ago. I was contemplating a PhD in Psychometrics at the University of North Carolina, Chapel Hill. My plans didn’t get much further than auditing some prerequisites and establishing residency. So, on paper, I hadn’t made much progress, but moving 1000 miles (from Massachusetts) to relocate and purchasing a house represented a milestone in my life and career. I’m still in that same house (more than 22 years now), and I’m still using SPSS almost daily. Like many things in life, it seems almost accidental. I was doing contract statistics work using SPSS, working from home while I planned for a life in graduate school, and I drove up to Arlington, VA to take advantage of what SPSS training then called the training subscription.
The concept was to take as many classes as you can manage in a year. It was remarkably cost effective. I was able to convince my primary contract client to pay for the subscription under the condition that I covered all other expenses, and didn’t let it affect my deadlines. I already had several years of daily SPSS use under my belt, so I was hardly a rookie, but it was too good to pass up. I found a summer sublet in Washington, DC, took advantage of the training classes almost daily for a couple of months, learned all the latest features, learned about modules that I had never tried, made some good new friends, and worked late into the evening trying to keep my contract research work on schedule. Then suddenly I was asked if I wanted to relocate and take on teaching the basic classes in that same office. I declined the full-time position (the grad school idea was still alive), but I did start making occasional trips. Within a year they were frequent trips, and it became effectively full time, including training trips all over the United States and Canada.
A bit of nostalgia, perhaps, but there is a good reason to reflect on that time period in SPSS Inc.’s history. As Jason Verlen notes in his foreword to this book, the mid to late ’90s was a pivotal time in the development of SPSS. With Windows 95 came a whole new world, and SPSS Inc. leaped into the fray. Also, in the late ’90s, SPSS Inc. bought ISL, and with it, Clementine. The revolutionary software package then became SPSS Clementine, and is now called IBM SPSS Modeler. While this book is dedicated to SPSS Statistics and not SPSS Modeler, my career certainly was never quite the same since. Although that was the acquisition that most influenced my career, it was certainly not the only one. There were numerous acquisitions during that period, growing the SPSS family to include products like AMOS, SPSS Data Collection, and Showcase.
It was also a bit of a golden age in SPSS training. Almost 20 of us offered SPSS training frequently. On any given day, there were at least a couple of SPSS training events being held in one of several cities that had permanent full-time SPSS training facilities. Traveling to public training was common then—online training hadn’t yet arrived. It simply was how training was done. In light of this very active, live, corporate-managed, instructor-led training economy more than 30 distinct classes were offered that represented 50–60+ days of training content. It took me three years before I found myself teaching 80% of them, and even longer before I taught all of them. Classroom training was seen as a key way to support the user community, so even classes that were infrequent, and therefore not very profitable, were still scheduled to support the product. Everything changes over time, and certainly traveling cross-country to a corporate training center for 5 continuous days of training, with a stack of huge books, along with 16 strangers from other companies seems quaint now.
For all of us who experienced it as trainers and participants, however, we are forever changed. One of the things that always struck me, and that still knocks me off my feet, was that the 32 books we used were not enough! SPSS had so many great new features coming out with each new version that it was hard to keep up, even though we were in the classroom three-quarters of the time. The Arlington office frequently had another trainer teaching in a room next door, so we would have lunch together, and admit to each other that we had left ourselves with a few too many pages for day three. Day three! And that was just the Regression class! We’d sometimes lament that someone had shown up for a class, but had skipped one or more of the three prerequisites. Can you imagine? Seven days of prerequisites to take a training class! It just wouldn’t work to require that many days now, but we worked hard, and covered a lot of ground, and we went through all the software output, step by step. Then we would make a change to the model, or respond to an audience question, and go through the entire output again, step by step. Go ahead and admit it—if you are like us it probably sounds great. And it was.
My friend and coauthor Jesus Salcedo had a similar experience, and in those same classrooms. He also had an interest in psychometrics, except that he actually acted on his interest and earned his PhD. We met in the very busy New York City SPSS Training office when I was sent there as a contractor during his tenure. He was the full-time trainer in that office. We’d often chat about our favorite course guides (and least favorite) and became friends over an occasional shared meal in that Empire State Building office, or nearby in New York’s Koreatown neighborhood. So, the perspective that we both start with is that SPSS is a big topic, a worthy topic, and frankly, a sometimes intimidating topic. We still feel this way today. There is so much to learn that we struggle to keep up with everything new. At a consultancy where we worked for a time as a team, we put together a series of monthly seminars that proved to us again that there was always something new to learn. Each and every month, we discovered new features when we were preparing for a new topic. So tens of thousands of training hours later, we still learn something new all the time.
Of course, we aren’t asked to really show what we can do as often as we used to be. The reason, of course, is that training these days is rushed. We are often asked to cover two days’ worth of information in just one, or five days’ in just two, or ten days’ in just four. It happens all the time. We are pros, and we do as we are asked, but we know, we really know, that to do a proper job it takes more time. The book market is flooded with rookie SPSS books. The more advanced books tend to be more advanced in the theory, but not at all advanced in the practice of using SPSS, its efficient use, or the sophisticated use of its features. A major motivation in writing this book is the loss of organizational memory that has occurred since in-depth specialized SPSS training courses have started to disappear over the last ten years.
So, with this book, we get to call the shots, and what we are trying to offer all of you is a chance to learn some intermediate to advanced topics thoroughly enough that you will be tempted to use them yourself, very possibly for the first time. We don’t try to cover every topic—barely two dozen out of a hundred that we could have chosen, in fact. This is not at all encyclopedic. It certainly is also not a book-length treatment on a single subject. It gives you a taste of what attending one of our classes 15 years ago might have been like—a couple hours’ worth on each of several interesting, powerful topics that you might not even know existed.
The Audience for This Book
We think that this book fills an important niche. Books on the fundamentals of using SPSS Statistics are not in short supply. There are certainly dozens of them. Some are better than others. Naturally, we are proud of our own contribution: IBM SPSS Statistics For Dummies, 3rd Edition (Wiley, 2011). However, this book is certainly not a book about the fundamentals of settings up SPSS properly, or running routine statistics like T-tests or Chi-Square. Nor is this book a good choice for reviewing Statistics 101. Knowledge of topics like Ordinary Least Squares regression and ANOVA is assumed.
Since beginning the quest to contribute something we felt was new and needed for the SPSS Statistics community, Jesus Salcedo and I have consistently thought of the same audience. We have imagined the intermediate-level practitioner, perhaps relatively new or perhaps even a long-time user of SPSS, who is stuck in a rut. We imagine ourselves in a sense. If it wasn’t for our training careers, forcing us to learn the new features as soon as they come out, we probably wouldn’t be familiar with all of the techniques in this book. We use the shortcuts because we are active in the corporate community of SPSS, yet we meet veteran users all the time who don’t even know they exist. We have our own personal favorite techniques, tips, and tricks, but we know many users who know their theory very well, yet haven’t discovered a key feature that could make their analysis more effective, even though it’s been in the last 10 versions. I mention this specifically because it is a constant, even humorous, but telling exchange:
Wow, that is amazing. I’m so glad that they added that feature. It must be brand new.
Actually, we’ve had that since version X. It’s been around for about 8 years.
So the phrase spread their wings a bit
has been used between us since the early days of this book. We’ve been writing for the kind of SPSS user that we’ve met in class over many years: the kind who might know more about SPSS than their colleagues or their boss; the kind who knows all the logistics of SPSS pretty well; the kind who knows the logistics of SPSS pretty darn well, but sometimes gets frustrated knowing that there is another way to tackle a problem, but there is no time to research that right now; and the kind who wants to know that there are a few interesting professional development opportunities out there, but isn’t quite sure which ones to explore, and there never seems to be much of a training budget to pursue them.
We are exploring this expand your knowledge
theme rather broadly, including topics like Hierarchical Linear Modeling, but also techniques like Graphics Production Language. This is firmly a software book. We assume that you use SPSS, and you are interested in all aspects of it. In short, we assume that you use it fairly often, and you want to slowly and surely work toward being a power user
even if you don’t describe yourself that way today.
How This Book Is Organized
The book is organized into 18 chapters in 4 major parts. In addition to this organizational overview, there is a short introduction at the start of each part that discusses the specific techniques covered in each chapter, and whether the techniques are in SPSS Base or require one of the modules. We aren’t shy about showing you a feature in one of the modules when appropriate, and there is a thorough discussion of bundles and modules near the end of this introduction.
Each of this book’s four parts has a collection of techniques that fits a particular theme, and each part begins with an introduction that summarizes how the pieces fit together. It will always be helpful to take a quick look at these introductions before diving into a chapter within a part of the book because they will clarify why the chapters are sequenced the way they are, and anything that you should know about prerequisites. Cross referencing within parts will be more common than between parts, but you’ll find advice will be given about chapters found in other parts as well.
The four major parts are as follows:
Part I: Advanced Statistics: In this section, we focus on statistical techniques that you can turn to either when more traditional or more common techniques might pose problems, or when you face situations where there is a more sophisticated option awaiting you if you are willing to try it. So we tackle options like Structural Equations modeling, but also options like Bootstrapping.
Part II: Data Visualization: We have not restricted ourselves to how to make bar charts and pie charts in this section. Frankly, those topics would not deserve a major section of this length, wouldn’t be all that interesting to experienced users, and would belong in a different kind of book. In this section, we bring the full power of SPSS to bear on data. We believe that Data Visualization includes properly analyzing and prepping the data to facilitate visualization so techniques like Correspondence Analysis are fair game. Also, we cover advanced features that you may not be familiar with, and brand new features in the most recent versions like GeoSpatial Association Rules.
Part III: Predictive Analytics: Predictive analytics and data mining are more associated with SPSS Modeler than SPSS Statistics. Can SPSS Statistics be an effective option? We answer in the affirmative, walk through the differences between statistics and data mining, and introduce some algorithms available in SPSS Statistics.
Part IV: Syntax, Data Management, and Programmability: Thirty years ago, an SPSS user could not escape learning SPSS Syntax, but now you can. Why tackle SPSS Syntax? What features have been added both to the language and the menus to add in the logistics of SPSS? What is programmability, and how can you take advantage of its features without having to become a serious programmer? These are the questions we answer in this final part of the book.
You may be curious about the authors’ various contributions. Jesus’ presence was felt in each and every chapter, but he was lead author on Chapters 3, 4, 7, 10, 12, and 17. Jesus collaborated extensively with Keith on Chapters 2 and 5. Andrew contributed Chapter 8, and Jon contributed Chapter 18. Jon’s contribution goes far beyond a single chapter in that his knowledge and role of technical reviewer had a positive impact on the entire book. In addition to the chapters on which he was lead, Keith wrote the front matter and the book and part introductions. Keith will serve as primary contact for the authors and can be reached at keithmccormick.com.
How to Use This Book
All of the examples in this book come with practice datasets, and when necessary, supporting SPSS Syntax. This is a hands-on book. You can read it on a plane or during a commute, but at some point you will want to sit down at the computer and try these techniques. All of the chapters are hands-on in this way, and chapters are rarely a prerequisite for other chapters.
All practice datasets and supporting SPSS Syntax are available on this book’s webpage on Wiley.com. Go to http://www.wiley.com/WileyCDA/ and search for SPSS Statistics.
On the page of results that opens, select this book, then, on the book’s main webpage, locate the Downloads
section and click the Click to Download
link.
There are a couple of notable exceptions. You will always want to read the short introduction of each of the four major parts of the book before reading chapters in that part. Chapters 5 and 6 are a pair and are best read together, and in order. The opening chapter of Part III, Predictive Analytics,
Chapter 11, should be read before the others in Part III, especially if you are new to data mining. The opening chapter of Part IV should be read before the others if you are new to Syntax.
If you don’t have a module, but the chapter looks interesting, think about taking advantage of the software trial. Trial versions always have the complete complement of modules. AMOS is a little different. It is standalone sibling software that belongs to the SPSS family, but is not part of SPSS Statistics, per se. You can get a trial of it as well. You may want to read a number of chapters in anticipation of using the trial to make the best use of the time period. A popular way to learn SPSS, and get more time to have access to it, is to take a class. Many classes, both online and local to you, probably would allow you to use a student version. The combination of this technique and the trials should allow you to try everything that you read about in the book.
The Themes of the Book
Alternate strategies when statistical assumptions are not met and the ongoing debates surrounding p values
The debates between the frequentists and the Bayesians, traditionalists and data miners, proponents of p values and proponents of effect sizes, can be fascinating, but can also be frustrating. If you have mastered one approach, but have not mastered the alternative, it can be frustrating. If you are exploring other options, but your colleagues are not encouraging, that can also be frustrating. This book is not about these debates, but it is about options. The discussions about options will sometimes make it seem that we are entering the fray. Mostly, however, we want to show you that SPSS may offer alternatives that you have not yet mastered. Specifically, when we think that the traditional approach may fail you because you don’t meet the assumptions there are at least three other options to explore:
Use a technique with different assumptions.
Use a technique that doesn’t have classical assumptions.
Use additional or alternate reporting criteria.
We won’t review the traditional approaches all that much, and we largely assume them (Chapter 1 may be a bit of an exception, so please do read that chapter first). We do try to open up completely new avenues. For example, while we don’t discuss Bayesian approaches, we do try to open the door to new approaches by introducing Bootstrapping and Monte Carlo Simulation. Also, in a very real sense the section on Predictive Analytics will force you to reexamine to a degree what we are doing when we do hypothesis testing. There is a whole literature around these debates and we will occasionally mention books in the text to further pursue these topics.
Expanding the toolkit for data visualization in SPSS Statistics, broadening the notion of what effective visualization is
SPSS users are somewhat notorious for performing analysis
in SPSS, but then reporting and charting elsewhere, usually in Excel. Those of us who use SPSS every day are frankly somewhat bemused by this. SPSS gets better and better with each release, and we gave up this kind of patchwork approach in the ’90s. However, we are also trainers. We see lots of end users, and we understand why it seems like a good idea—and more rarely we see situations where something within SPSS truly isn’t working out for a client. It is not displaced loyalty on our part to encourage a more comprehensive use of SPSS. We’ve seen the horror of wasted effort of constantly moving back and forth, often by cutting and pasting.
This was a major motivation for dedicating such a large portion of the book to visualization. SPSS has tremendous power that many have not yet discovered. Also, we strongly believe that visualization is not just about colors and shapes. Data has to be prepared to support visualization, and that often requires distilling the patterns down to their essence so that they can be visualized. That is why we believe that Chapters 8, 9, and 10—which are all powerful examples of analysis in support of visualization—belong in this book, and specifically in the visualization section. Correspondence Analysis, Multi-Dimensional Scaling, Spatio-Temporal Prediction, and Generalized Spatial Association Rules (all addressed in these three chapters) produce compelling visualizations but they do so by crunching the input data in powerful ways.
Exploring predictive analytics and performing predictive analytics tasks in SPSS Statistics
Data mining, as a phrase, seems a bit out of fashion these days, but the collection of techniques the phrase represents is on the rise. Data mining,
however, probably is the phrase that makes most salient the potential contrast between itself and the techniques of traditional statistics. The similarities are fairly obvious, and to some, the differences can cause concern. What are we proving
with data mining? they might ask. It is not a small question. We dedicated the entire Predictive Analytics
section of five chapters to this theme. Also, the section introduction is very much a part of the discussion. Taken together, these chapters come the closest to forming a book within a book.
Increasing sophistication with the mechanics of SPSS Statistics
Power users of SPSS all use SPSS Syntax, at least occasionally. Back in the ’90s when the lead authors were really getting started in SPSS there was a bit of tension between those who used Syntax and those who used only the GUI. As Jason Verlen points out in his foreword to this book, 1995 was a critical and exciting time of transition for SPSS. The GUI was becoming more feature rich than ever before. However, those who already had a great deal of experience recognized that the GUI was only catching up, it sometimes seemed to them more trouble than it was worth, and it briefly seemed like the SPSS community was going to become two communities. This never happened. Everyone uses the GUI, and rightly so. It is powerful and elegant. It is hard to imagine not using it. So what about Syntax? Well, SPSS doesn’t force competence in this area as much anymore. But to the expert user, there are absolutely times when it is the best choice.
Experiencing some newer or under-appreciated techniques of SPSS
The module (and bundle) system of features has tended to create a large collection of third-party SPSS training guides that focus solely on SPSS Base. The fear, we speculate, is that the authors of those books don’t want to cover anything that some readers might not have access to. They are truly numerous, and scores of books cover the basics. As career-long members of the SPSS community and as SPSS trainers, we’ve seen the resources on more advanced techniques dwindle, and related courses are rarely offered. These are truly powerful techniques, and they deserve a wider audience. We feel that more advanced users need a support system, too.
We want to reverse that trend to the small degree that one book can accomplish. The bundle system makes many of these modules more readily available. So much so, that we frequently meet clients that have modules that they don’t know they have. Five of the chapters include material that requires nothing more than SPSS Base. Most do, however. The alignment of the chapters, modules, bundles, and techniques is outlined in the next section. So while the reader should be cautioned to investigate what they have access to at home and office, we urge a wider audience of users to be familiar with the full spectrum of what SPSS can do.
Understanding the SPSS Bundles and the SPSS Modules
For decades within the SPSS community, add-on modules have allowed the price of SPSS Base to be a lower entry level than the full package. IBM has introduced bundles of modules, and as a result, you might hear less and less about the individual modules. This could cause confusion if one person who is used to the old system is discussing functionality with someone who has just bought a bundle. There are numerous places on the IBM website to get further clarification, including this URL: http://www-01.ibm.com/software/analytics/spss/products/statistics/edition-comparison.html.
The following chart shows the relevance of the modules to the topics and chapters in this book.
The New SPSS Subscription Bundles
As this book goes to press in early 2017, IBM has announced an SPSS Statistics subscription offering. Paid monthly, among its features is that it is easy to update and it is easy to add features like those discussed in this book. The numbered versions do not go away and the bundles described in the previous table do not go away. This is just a new option. Noteworthy to readers of this book is that two modules, Data Preparation and Bootstrapping, which each get a dedicated chapter, are included as part of Base.
The following chart shows where in this book the subscription and add-ons are discussed.