RStats

A collection of statistical tests and utility functions in ruby.

Copyright (c) 2006,2007 Josh Myer <josh@joshisanerd.com>

Released under the terms of the GPL v2 and v2 only (without the upgrade provision). See the file COPYING for more information.

Overview

RStats is a collection of approximate statistical tests and techniques. It's meant mostly as my notes while learning statistics, but I'm releasing it in the hopes that someone else might find it useful.

Most people will only be interested in RStats.chi_square_gof and RStats.chi_square_cont, a ruby implementation of χ2 (chi-squared) goodness-of-fit and χ2 contingency/independence tests.

Downloading

You can obtain a copy by git cloning this URL (right-click, copy URL, and git clone it). This will get you a complete copy of the repo, along with rough instructions on how to make your own purely-static distribution of the site/code. It may also be available at github:jbm9/rstats.

Usage

All the functions are documented, so you're best to simply look at the appropriate function documentation. For examples, see the "unit tests" in test/. These are made up of the questions at the end of each section of Langley's book.

Included Tests

The following tests are included in the current release:

References and Sources

Langley, Russell. Practical Statistics Simply Explained.
Dover Publications, Inc., New York, 1971.
ISBN 0486227294

Bass, Issa. A Zest of Non Parametric Testing - The Chi Square Test.
SixSigmaFirst, http://www.sixsigmafirst.com/chi_square_test.htm
Accessed 2006-12-20.

The primary reference source for this is Russell Langley's excellent Practical Statistics Simply Explained. It's a Dover print, available as ISBN 0-486-22729-4, for about 13$US. The book is a gentle introduction to statistics, which makes it a good primer, but not a terribly good reference over the longer term. It focuses on approximate tests, which give reasonably good results but are easy to run by hand. Also, it was written before computers were commonly available, which makes many of the tests described somewhat obsolete.

Unfortunately, Langley's section on χ2 is messy. χ2 is used in many different circumstances, all with different procedures. In Langley, these are all presented in parallel, which is confusing. To make the most common uses of χ2 (that I use, at least) more clear, I did some quick googling. Since it's such a common test, there's no shortage of tutorial sites on the web; feel free to find your own. I like Bass's A Zest of Non Parametric Testing - The Chi Square Test, because it's short and to the point.

Finally, the pochisq function is a direct translation of the C version available in the public domain. The original C was done by Gary Perlman of Wang Institute, Tyngsboro, MA 01879. With that in mind, the pochisq function may be freely adapted into the public domain (you'll need to pull in z_to_prob as well, which is a trivial function).


Copyright © 2006,2007 Josh Myer
The author's homepage.