Great illustration of th power and flexibility of ruby! Another, albeit less illustrative, approach would be to use the ruby-gsl library, which utilizes the GNU Scientific Library. GSL is a numerical library in C that is deployed in a myriad of scientific computation tasks. http://rb-gsl.rubyforge.org/

@Sameer Thanks for your feedback. I looked at ruby-gsl but decided against it for this post. My goal was to illustrate the underlying math, and show the simplicity of the Ruby necessary to perform this basic statistical method. If I was going to be deploying a large scale production system I would definitely consider using a C based approach. The GPL license of GSL does make it somewhat problematic.

What did you use to plot the final plot? Is that an R plot or a ruby plot? I just implemented this (plus confidence intervals on alpha and beta) the other day, but haven’t gotten around to visualizing it from Ruby yet.

I used R to generate the graph, ggplot2 to be specific. I wish there was a Ruby library that provided rich graphing capabilities but I have been disappointed with most of the ones I have found.

@Sameer: it’s also important to note that GSL is GPL-licensed and they make it really clear that any application that uses GSL must be GPL-licensed as well. That rules it out for a number of potential users.

I think it would be best to coerce the data type to Decimal. Computing averages like you do above is bound to run into rounding errors. For large data sets “sum” will eventually become much larger than “value”, and you lose precision.
It’s neat to have “simple” implementations of mathematical formulas, but floating point math is tricky.

Marco Falcioni thanks for your feedback. I thought about the risk with large values but decided that for this particular post I wanted to make the code as simple as possible to try and help people understand the math. In a production setting I would work with a big number library to ensure precision wouldn’t be lost. This code is ment more as a basic example than a fully featured library but perhaps I should have included a warning about the risk with large numbers and I appreciate you pointing this out.

In “mean”, “total = values.reduce(0) { |sum, x| x + sum }” is the same as the much shorter “total = values.reduce(0, :+)”. Personally I’d just reduce the whole method body to “values.reduce(0, :+).to_f / values.size”.

It describes the pros and cons of three Ruby/R interfaces (RinRuby, RSRuby, and RServe) and illustrates with a text classification problem.

Probably not a solution for very large data sets, but for the non-trivial stuff where you don’t want to have to dump out of Ruby but you also don’t want to hand-code, could come in handy.

Why not just write this in a different language? I notice you refer to methods as functions in several cases. This seems like a square peg in a round hole. Classes are superfluous when you’re just using conditions and Enumerable methods, and even if you decided to ignore that aspect of Ruby (forced OOP), then you’d still need to contend with the dismal performance of Ruby’s numerics, lack of native threads, and so on.

I’d like to see a post that shows an example of when Float is not appropriate, and alternatives that can be used. That would be a great follow-up blog post.

/**
* Compute linear least squares regression line.
*
* @link http://en.wikipedia.org/wiki/Linear_least_squares#Computation
* @acces public
* @static
* @param array $y An array of y values.
* @param array $x An array of x values or y-keys if not specified.
* @return array(b,m) for the equation y = mx + b
*/
function linest($y,$x=null) {
$x = ($x===null) ? array_keys($y) : array_values($x);
$y = array_values($y);
$n = count($y);
if( $n < 2 ) {
return false;
}

Minor correction: In the “residual sums of squares” equation, you refer to alpha and beta, but in the other equations these are referred to as beta-not and beta-one.

Thanks for the approachable introduction to linear regression!

Very cool to see this working outside of matlab. Great job!

Great illustration of th power and flexibility of ruby! Another, albeit less illustrative, approach would be to use the ruby-gsl library, which utilizes the GNU Scientific Library. GSL is a numerical library in C that is deployed in a myriad of scientific computation tasks.

http://rb-gsl.rubyforge.org/

@Sameer Thanks for your feedback. I looked at ruby-gsl but decided against it for this post. My goal was to illustrate the underlying math, and show the simplicity of the Ruby necessary to perform this basic statistical method. If I was going to be deploying a large scale production system I would definitely consider using a C based approach. The GPL license of GSL does make it somewhat problematic.

What did you use to plot the final plot? Is that an R plot or a ruby plot? I just implemented this (plus confidence intervals on alpha and beta) the other day, but haven’t gotten around to visualizing it from Ruby yet.

I used R to generate the graph, ggplot2 to be specific. I wish there was a Ruby library that provided rich graphing capabilities but I have been disappointed with most of the ones I have found.

Slick work! Will be nice to use this and Sameer’s suggestion of GSL in custom dashboards!

To echo Tony, I too am curious what you used to plot the final plot.

@Sameer: it’s also important to note that GSL is GPL-licensed and they make it really clear that any application that uses GSL must be GPL-licensed as well. That rules it out for a number of potential users.

I think it would be best to coerce the data type to Decimal. Computing averages like you do above is bound to run into rounding errors. For large data sets “sum” will eventually become much larger than “value”, and you lose precision.

It’s neat to have “simple” implementations of mathematical formulas, but floating point math is tricky.

Marco Falcioni thanks for your feedback. I thought about the risk with large values but decided that for this particular post I wanted to make the code as simple as possible to try and help people understand the math. In a production setting I would work with a big number library to ensure precision wouldn’t be lost. This code is ment more as a basic example than a fully featured library but perhaps I should have included a warning about the risk with large numbers and I appreciate you pointing this out.

In “mean”, “total = values.reduce(0) { |sum, x| x + sum }” is the same as the much shorter “total = values.reduce(0, :+)”. Personally I’d just reduce the whole method body to “values.reduce(0, :+).to_f / values.size”.

I dig. You might be interested in this presentation from a dude who just wrote a neat book on doing rad things with Ruby and R: http://www.slideshare.net/sausheong/rubyand-r

It describes the pros and cons of three Ruby/R interfaces (RinRuby, RSRuby, and RServe) and illustrates with a text classification problem.

Probably not a solution for very large data sets, but for the non-trivial stuff where you don’t want to have to dump out of Ruby but you also don’t want to hand-code, could come in handy.

Why not just write this in a different language? I notice you refer to methods as functions in several cases. This seems like a square peg in a round hole. Classes are superfluous when you’re just using conditions and Enumerable methods, and even if you decided to ignore that aspect of Ruby (forced OOP), then you’d still need to contend with the dismal performance of Ruby’s numerics, lack of native threads, and so on.

I’d like to see a post that shows an example of when Float is not appropriate, and alternatives that can be used. That would be a great follow-up blog post.

Linear Regression using PHP:

/**

* Compute linear least squares regression line.

*

* @link http://en.wikipedia.org/wiki/Linear_least_squares#Computation

* @acces public

* @static

* @param array $y An array of y values.

* @param array $x An array of x values or y-keys if not specified.

* @return array(b,m) for the equation y = mx + b

*/

function linest($y,$x=null) {

$x = ($x===null) ? array_keys($y) : array_values($x);

$y = array_values($y);

$n = count($y);

if( $n < 2 ) {

return false;

}

$sum_x = 0;

$sum_xx = 0;

$sum_y = 0;

$sum_xy = 0;

for($i=0; $i<$n; $i++) {

$sum_x += $x[$i];

$sum_y += $y[$i];

$sum_xx += $x[$i]*$x[$i];

$sum_xy += $x[$i]*$y[$i];

}

$m = ( ($n*$sum_xy)-($sum_y*$sum_x) ) / ( ($n*$sum_xx)-($sum_x*$sum_x) );

$b = ($sum_y – $m*$sum_x)/$n;

return array($b,$m);

} // END: function linest($y,$x=null)

Minor correction: In the “residual sums of squares” equation, you refer to alpha and beta, but in the other equations these are referred to as beta-not and beta-one.

Thanks for the approachable introduction to linear regression!

Sam Umbach, thanks for pointing that out. My mistake in editing the LaTeX. Will make the fix now.

really wonderful weblog. I am also operating on different techniques about how we can use internet websites for

internet promotion. And your website really help me lot in that. Thanks for publishing

Great demonstration.Thanks for giving.this concept exciting weblog It is very useful.

Linear Title

Nice post! I converted the code from Ruby to Scala here: https://gist.github.com/ryanlecompte/5942470