Great illustration of th power and flexibility of ruby! Another, albeit less illustrative, approach would be to use the ruby-gsl library, which utilizes the GNU Scientific Library. GSL is a numerical library in C that is deployed in a myriad of scientific computation tasks. http://rb-gsl.rubyforge.org/

rweald

@Sameer Thanks for your feedback. I looked at ruby-gsl but decided against it for this post. My goal was to illustrate the underlying math, and show the simplicity of the Ruby necessary to perform this basic statistical method. If I was going to be deploying a large scale production system I would definitely consider using a C based approach. The GPL license of GSL does make it somewhat problematic.

Tony Arkles

What did you use to plot the final plot? Is that an R plot or a ruby plot? I just implemented this (plus confidence intervals on alpha and beta) the other day, but haven’t gotten around to visualizing it from Ruby yet.

rweald

I used R to generate the graph, ggplot2 to be specific. I wish there was a Ruby library that provided rich graphing capabilities but I have been disappointed with most of the ones I have found.

http://jonathanclemons.com Jonathan Clemons

Slick work! Will be nice to use this and Sameer’s suggestion of GSL in custom dashboards!

To echo Tony, I too am curious what you used to plot the final plot.

Tony Arkles

@Sameer: it’s also important to note that GSL is GPL-licensed and they make it really clear that any application that uses GSL must be GPL-licensed as well. That rules it out for a number of potential users.

Marco Falcioni

I think it would be best to coerce the data type to Decimal. Computing averages like you do above is bound to run into rounding errors. For large data sets “sum” will eventually become much larger than “value”, and you lose precision.
It’s neat to have “simple” implementations of mathematical formulas, but floating point math is tricky.

rweald

Marco Falcioni thanks for your feedback. I thought about the risk with large values but decided that for this particular post I wanted to make the code as simple as possible to try and help people understand the math. In a production setting I would work with a big number library to ensure precision wouldn’t be lost. This code is ment more as a basic example than a fully featured library but perhaps I should have included a warning about the risk with large numbers and I appreciate you pointing this out.

http://citizen428.net Michael Kohl

In “mean”, “total = values.reduce(0) { |sum, x| x + sum }” is the same as the much shorter “total = values.reduce(0, :+)”. Personally I’d just reduce the whole method body to “values.reduce(0, :+).to_f / values.size”.

It describes the pros and cons of three Ruby/R interfaces (RinRuby, RSRuby, and RServe) and illustrates with a text classification problem.

Probably not a solution for very large data sets, but for the non-trivial stuff where you don’t want to have to dump out of Ruby but you also don’t want to hand-code, could come in handy.

Non Plus

Why not just write this in a different language? I notice you refer to methods as functions in several cases. This seems like a square peg in a round hole. Classes are superfluous when you’re just using conditions and Enumerable methods, and even if you decided to ignore that aspect of Ruby (forced OOP), then you’d still need to contend with the dismal performance of Ruby’s numerics, lack of native threads, and so on.

Jim

I’d like to see a post that shows an example of when Float is not appropriate, and alternatives that can be used. That would be a great follow-up blog post.

/**
* Compute linear least squares regression line.
*
* @link http://en.wikipedia.org/wiki/Linear_least_squares#Computation
* @acces public
* @static
* @param array $y An array of y values.
* @param array $x An array of x values or y-keys if not specified.
* @return array(b,m) for the equation y = mx + b
*/
function linest($y,$x=null) {
$x = ($x===null) ? array_keys($y) : array_values($x);
$y = array_values($y);
$n = count($y);
if( $n < 2 ) {
return false;
}

return array($b,$m);
} // END: function linest($y,$x=null)

Sam Umbach

Minor correction: In the “residual sums of squares” equation, you refer to alpha and beta, but in the other equations these are referred to as beta-not and beta-one.

Thanks for the approachable introduction to linear regression!

rweald

Sam Umbach, thanks for pointing that out. My mistake in editing the LaTeX. Will make the fix now.

Pingback: This Week in Ruby: Rails Rumble Dates, Active Admin 0.5, Protected Methods in Ruby 2.0