The Best Statistics Software
There was a recent online discussion among colleagues who are active in the American Association for Public Opinion Research (AAPOR) about experiences with and recommendations for survey analysis and statistics software. Many noted that traditional commercial software is getting extremely expensive, but that newer options do not always have the same capabilities that many of us have come to rely on.
Here is a summary of the pros and cons that were discussed in this forum regarding four software options in particular: SPSS, SAS, Stata, and R.
SPSS is particularly valued for manipulating and managing data. Many use this package for up-front data work such as cleaning, coding, labeling, merging, and so on, and then move their data into other programs for advanced analysis. Older versions of SPSS work just fine, but they can be difficult to move onto newer operating systems because SPSS provides little, if any, support. In terms of analysis, basic statistics are good, but unless you pay extra, it does not offer advanced statistical procedures, and there are some things it cannot handle at all (like replicate weighting).
R is an extremely powerful tool with sophisticated capabilities, and it is free. But the learning curve is steep, requiring a significant investment of time before you can begin to do even the simplest work. One downside is that using specific capabilities (like simple weighting for surveys) can be frustrating. That’s because it is open-source software and specific procedures are usually developed within user-created packages. On the positive side, many of the packages are exceptionally good, and the people creating the most popular packages tend to be among the world’s best statisticians. R uses script rather than menus, and coding is simple and elegant. It is awesome for statistical analysis, but not all that useful for up-front data work like cleaning, coding, labeling, merging, and so on.
SAS has powerful capabilities and is known for handling extremely large datasets and interfacing with databases more efficiently than other programs. It involves complicated coding (not nearly as elegant as R), but because it was the most powerful system during the last couple of decades, companies invested a lot in their coding, and their “legacy coding” continues to work well for them. The program is very expensive, which makes it impractical for small groups.
Stata is an elegant script-based tool that has robust statistical capabilities, and over the last fifteen years many quantitative researchers in economics, sociology, demography, and political science have moved away from SPSS to Stata. Usage may be shifting again as many academics begin learning (and falling in love with) R.
That is just a quick overview of some comparisons from a small professional audience, but it aligns with our experiences at Versta. If you are interested in a more detailed comparison of these packages along with several other statistical packages, Professor Alan Zaslavsky at Harvard Medical School has a useful website that compares software options for statistical analysis. There is also an interesting and always-up-to-date website put together by Robert Muenchen that tracks statistical software usage trends worth looking at.