Mind Games 2.0

Bloggin' 'bout science and life

The Best Statistical Software For A Scientist To Use

I am now planning the next offering of a Generalized Linear Mixed Models course that I sometimes teach to our graduate students.  I’m teaching the next offering next spring.  All our graduate students are clamoring for a course in R, and I am sure I’ll get much pressure to teach this course using R. 

I have found a fantastic textbook for this course: Generalized Linear Mixed Models: Modern Concepts, Methods, and Applications by Walter W. Stroup, a statistician at the University of Nebraska, Lincoln.  The textbook uses exclusively the GLIMMAX procedure of SAS.  I’ve used SAS since I was an undergraduate, and it is the gold standard of statistical computing software in my opinion.  However, students don’t like to learn to use it anymore because it is not FREE.  R is free.  

The best comment about R vs. SAS I have ever heard was told to me by Joe Travis, Florida State, who was repeating a comment that Charlie Baer, University of Florida, told him.  According to Joe, Charlie said:

If R gives you the wrong number, some guy in his basement says, “oops.”

If SAS gives you a wrong number, statisticians in Cary, North Carolina get fired.

Who would you trust?

I also always think of the old adage that you get what you pay for.  

Share

Previous

Valuable Advice About Editing And Teaching In General

Next

The Other Adult In The Room

6 Comments

  1. Matt Z

    Everyone uses R, not just because it is free, but because it is flexible and transparent. It is becoming the standard, in part, because of ease of collaboration and, in part, because of ease of replication. Researchers can send their data and R scripts to anyone who asks and that person should be able to replicate the analysis given the data. It is much harder with proprietary software, like SAS, that fewer and fewer researchers use. If R gives you the wrong number, you can find it in the code.

    Richard McElreath’s “Statistical Rethinking” is a great textbook that covers GLMMs and has examples in R.

    • Certainly Matt. But I have yet to find one person who has ever validated an R package they have downloaded. I don’t trust anyone that much. If I have to validate everyone else’s code, I’d rather just write it myself. I think R is fantastic if you are a seasoned statistician with years of training under the hood, but that’s not the person you are describing. And it’s just as easy for me to circulate SAS scripts as it is R scripts.

      • Anon

        How does the average R user know that a wrong number was produced? I doubt most users have the mathematical/computational sophistication to find and correct these issues. Certainly, one could argue that the level of mathematical/computational sophistication should be increased for everyone but I am skeptical that the vast majority of biologists have the desire/ability/time to achieve the level of mastery necessary to routinely and adequately evaluate R code and output to a satisfactory level.

        I doubt I would be convinced that a student who takes a course or two in statistics and used R in their labs would convince me that they have the sufficient math/computational skills to appropriately analyze data with R most of the time just as I would have a hard time being convinced that an average student who takes a course or two in botany and mycology would not suffer serious health issues if they only lived off the plants and fungi they collect in the wild. This is not to say that I expect everyone to be perfect but rather I see more opportunities for problems to arise when an average user implements R rather than SAS.

        Often times, but not always, when I hear about students “using” R, they are 1) lifting code from someone without evaluating the code or knowing what the code is really doing or 2) someone else is actually writing and running the code for them and leaving the student to interpret without really knowing critical details (e.g., approach to partition sums of squares) that are essential to interpret correctly. The latter is not necessarily a problem if the person providing the code helps in the interpretation but we should be honest that it is not really all scientists that are learning and using R. Collaboration is great but we should make sure that all students have an understanding of how to successfully complete rather rudimentary analyses and then seek collaborators for the more sophisticated things when necessary.

        As Mark indicates, people can easily replicate results with SAS, share code, and be transparent about analyses. SAS manuals are so much clearer and better referenced than R documents and are therefore more transparent to me.

        I do see that there is greater flexibility for things that one could do in R but I question whether most scientists need that level of flexibility. There is certainly a need for some people to have this flexibility but not everyone does. Who would argue that everyone needs to be a master chef and create new things to eat when there are lots of great things that people can make to eat with existing recipes?

        If you don’t need the flexibility, I see no advantage to R over SAS. Before R was free while SAS was not. This is no longer true for SAS. When used for non-commercial purposes, SAS allows users to perform statistical analyses freely with their SAS University and SAS On demand platforms.

  2. Brian McGill

    I strongly agree that R packages are caveat emptor. The reputation of the authors is critical. I have high confidence in the core statistics that ships with R. Or the nlme or lmer mixed model packages written by and with reputation attached of prominent computational statisticians (ecology’s own Ben Bolker for the later). Or the vegan package for which ditto (Jari Oksanen).

    But one of the reasons everybody thinks R is so great is the literally hundreds, probably thousands now, of R packages. And I can tell you for a fact that I’ve downloaded several packages that I took the time to validate against known quality software and the packages were full of bugs. I know of one package (which I’ve seen used in papers during peer review) that spit out utter nonsense (for N>10000 it started to get floating point overflow errors that were ignored and so it effectively was a fancy random number generator). I notified the author of the bug and was told their paper was about to be published so it “was a bad time” to fix it. Five years later it is still not fixed. And I still hear of people downloading it. And for those of you who think being on CRAN means something, all it means is the package does what the author says it should – there is no independent validation or peer.

  3. More tales of Woe & Intrigue from Dan Bolnick, a friend at the University of Texas, Austin.

    http://ecoevoevoeco.blogspot.ca/2016/12/wrong-lot.html?m=1

Leave a Reply

Powered by WordPress & Theme by Anders Norén