We are hugely proud and excited to announce that Jay Webster has joined Sharethrough as Chief Product Officer. Jay is an ad industry O.G. With more than twenty years of experience leading product strategy and development for highly successful advertising technology startups (and some not so small companies), Jay brings an enormous amount of insight, talent and guitar pedals to Sharethrough.
And he’s getting right to work on Sharethrough’s social video platform, so get ready, we’re about to turn it to 11.
If you are one of the few that have not heard of the mighty Jay Webster, allow us to inform you. Jay previously served as Chief Product Officer of Quova, where he guided product development until their acquisition by Neustar. Previous to Quova, Jay was President and COO of Consorte Media, GM of Lead Generation for Yahoo!, GM of Performance Marketing and CTO for BlueLithium (acquired by Yahoo for $300 million), VP of Engineering for Adteractive, Inc., as well as President and CTO for Fathom Online.
Wow! Must spend all his time just building amazing advertising technology, you might ask. Nope! Jay is active in philanthropy through his association with Little Kids Rock and manages to fit in an active hobby schedule including motor sports, guitar, and triathlons.
Wanna know more? Read our full announcement here: http://bit.ly/mSyqd8
I’m working on fitting more tech into the title of this post; suggestions are welcome
At Sharethrough we’re enthusiastic users of RVM and Gemsets both locally and on the server. They make maintaining parallel configurations a breeze and are very easy to work with… until your CI (Continuous Integration) and deployment environments need to work with them! Of course we also use Bundler which introduces just enough additional complexity to keep things spicy.
Earlier versions of TeamCity (prior to v6) lacked explicit support for RVM. There were workarounds that allowed you to have a working build but without direct support, it was cumbersome to say the least. You had to manually configure several paths (e.g. GEM_HOME, GEM_PATH, BUNDLER_PATH, PATH) and even then getting a “bundle install” to put gems in the right place was a superhuman endeavor. It was difficult to wrap our heads around what exactly TeamCity was doing to the runtime environment. Some of our steps would be successful but when Cucumber spawned unicorn_rails, it couldn’t see our Gemset. This really was a symptom though of the overall problem that TeamCity just didn’t have baked-in support of RVM. The whole point of TeamCity is that you shouldn’t have to know those things.
Well you’re luck sirs!
TeamCity 6 simplifies all of this with the notion of a Ruby Environment Configurator, a new type of Build Feature (2) located on the Build Steps (1) panel in the Configuration Settings for a Project. If you ask me, this is one of the best and yet most unheard features of TeamCity 6. The ability to have multiple build steps (as we do in the image shown below) comes in a slight second as far as mind-blowingly-handy features of TC6 go.

Adding the Ruby Environment Configurator to a Project lets TeamCity know that this combination of interpreter and Gemset should be used for all build steps, greatly simplifying your build configuration.
Remember to fully specify the Ruby interpreter and Gemset you’d like to use. Note that you cannot use shortcuts for the Ruby interpreter version the way RVM has conditioned us to do
To TeamCity and the Ruby Build Configurator, “ree” is not the same as “ree-1.8.7″ which is not the same as “ree-1.8.7-2010.02″.
Once specified, notice how simple the build configuration is. We have a several-step build where no step specifies more than a couple of options – really just the rake task(s) to be executed and in what order.
Any TeamCity 6 tips? Comment away!
Rob
If you want to know when a site visitor is leaving a page, one option is the jQuery unload event. Given the following caveat:
“The exact handling of the unload event has varied from version to version of browsers. For example, some versions of Firefox trigger the event when a link is followed, but not when the window is closed. In practical usage, behavior should be tested on all supported browsers, and contrasted with the proprietary beforeunload event.”
…I decided to test support for the event across the latest versions of the most popular browsers on both Windows 7 and OS/X. Note however, that one key part of the caveat is not included in this test:
“…has varied from version to version…”
This is interesting not only because of the given Firefox example, but for Safari as well (which some claim had ‘better’ support for unload in earlier versions).
On each pairing of browser and platform, I tested four different ways one may leave a page:
There is one notable omission – typing in a new URL. If someone would like to do that last bit of testing, I’ll gladly update the chart
In addition, I tested three types of JavaScript activities:
Here are the results:

Learnings:
Rob Slifka
At Sharethrough, we accumulate roughly 10GB of analytics on our ad units per day. Collecting and processing an ever-increasing amount of data is an ongoing challenge and as such, we’re constantly on the lookout for ways to increase the efficiency and robustness of this process. One important part of that flow is first “simply” capturing log data from our large number of front-end nginx web servers. Rather than roll our own suite of shell scripts for distributed log aggregation, we decided have a look at Flume. Flume is a relatively recently open-sourced project (mid-2010) from the folks at Cloudera, has a terrific amount of flexibility and is surprisingly well-documented. In addition to solving all of those niggling problems you run into when baking your own set of aggregation scripts, Flume adds things like automatic fail-over, load balancing, throttling, and various reliability and delivery models.
In this post, we’ll walk through setting up a basic “distributed” Flume configuration, with a single agent sending log data to a single collector writing to S3. Because data is flowing from the agent to the collector, ensure your Ec2 Security Group configuration is setup to permit access to the appropriate ports. We have agents running in our “front-end” group and collectors and masters in the “flume” group, so we’ve added each group to the other’s Security Group access list.
Our goal is to get you writing to S3 as quickly as possible so you can iterate on other Flume settings unrelated to S3 (paths, filenames, rolling frequency, etc.).
To get started you’ll need:
Flume has terrific installation documentation. I recommend installing the flume-node (on the agent and collector) and flume-master (on the master) packages to take advantage of the bundled init scripts, etc.
On the Master
The master has no configuration per-se. It uses Apache Zookeeper to store the configuration you give to the nodes, which we’ll specify at runtime via the web UI.
If you installed using a package manager, the master should already be running as the package manager creates and kicks off the init script. If not, have a look in /var/log/flume for what might be wrong.
On the Collector
The collector only needs to know where the master is (in /etc/flume/conf/flume-site.xml):
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>flume.master.servers</name>
<value>master.sharethrough.com</value>
</property>
</configuration>
The collector is writing to S3 and as such requires a small amount of additional configuration.
Copy the following files (c/o Eric Lubow and FLUME-66):
…to /usr/lib/flume/lib/.
After copying, you’ll need to update the symlink to the Hadoop jar:
cd /usr/lib/flume/lib sudo ln -s emr-hadoop-core-0.20.jar hadoop-core.jar
On the Agent
Flume has a very functional set of defaults, which we’ll take advantage of to keep things simple. While our config file is relatively empty, it’s actually using the default values from /etc/flume/conf/flume-conf.xml.
Create /etc/flume/conf/flume-site.xml replacing “master.sharethrough.com” and “collector.sharethrough.com”.
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>flume.master.servers</name>
<value>master.sharethrough.com</value>
</property>
<property>
<name>flume.collector.event.host</name>
<value>collector.sharethrough.com</value>
</property>
</configuration>
Configuring the Nodes via the Master
In the Flume DSL, here is the config we’d like to propogate from the master.
agent.sharethrough.com : tail("/opt/nginx/logs/access.log") | agentE2ESink;
collector.sharethrough.com : collectorSource | collectorSink("s3n://sharethrough_access_key:sharethrough_secret_key@sharethrough-bucket-name/", "prefix");
This is tailored to our environment and you’ll need to replace several things:
Head on over to the master’s web UI at port 35871. Click on “config” at the top of the screen.
Paste the above block (after you’ve replaced the relevant values) into the “Configure multiple nodes” text area and click “Submit Query”.
Did it Work?
While configuring Flume, we’ve found it helpful to be tailing the flume log on the agent and collector when pushing configuration changes from the master. You should see the agent and collector receiving a new configuration and restarting.
On the collector, success looks like this (asterisks replacing sensitive values):
2011-02-10 04:49:53,018 INFO com.cloudera.flume.handlers.hdfs.CustomDfsSink: Opening HDFS file: s3n://*:*@*/prefixlog.00000028.20110210-044952946+0000.204280657961552.seq 2011-02-10 04:49:59,031 INFO com.cloudera.flume.handlers.endtoend.AckChecksumChecker: Starting checksum group called log.00001412.20110210-044949027+0000.210961069233780.seq 2011-02-10 04:49:59,031 INFO com.cloudera.flume.handlers.endtoend.AckChecksumChecker: initial checksum is 12e0de759e3 2011-02-10 04:49:59,031 INFO com.cloudera.flume.handlers.endtoend.AckChecksumChecker: Finishing checksum group called 'log.00001412.20110210-044949027+0000.210961069233780.seq' 2011-02-10 04:49:59,031 INFO com.cloudera.flume.handlers.endtoend.AckChecksumChecker: Checksum succeeded 12e0de759e3 2011-02-10 04:49:59,032 INFO com.cloudera.flume.handlers.endtoend.AckChecksumChecker: moved from partial to complete log.00001412.20110210-044949027+0000.210961069233780.seq 2011-02-10 04:50:09,171 INFO com.cloudera.flume.handlers.endtoend.AckChecksumChecker: Starting checksum group called log.00001412.20110210-044959167+0000.210971209172108.seq 2011-02-10 04:50:09,171 INFO com.cloudera.flume.handlers.endtoend.AckChecksumChecker: initial checksum is 12e0de7817f 2011-02-10 04:50:09,171 INFO com.cloudera.flume.handlers.endtoend.AckChecksumChecker: Finishing checksum group called 'log.00001412.20110210-044959167+0000.210971209172108.seq'
Next Steps
From here, you’ll want to examine various settings like rolling frequency, output format, etc.
Any questions, post a comment!
Rob Slifka
This one was really throwing me for a day or two. I couldn’t understand why saying “foo.should == bar” wasn’t the same as “foo.should be bar”. Turns out that all variants of “be” expect the following token to be a method ending in “?” which returns a boolean.
should be
should be_true
should be_false
should be_nil
should be_arbitrary_predicate(*args)
should_not be_nil
should_not be_arbitrary_predicate(*args)Given true, false, or nil, will pass if given value is true, false or nil (respectively). Given no args means the caller should satisfy an if condition (to be or not to be).
Predicates are any Ruby method that ends in a “?” and returns true or false. Given be_ followed by arbitrary_predicate (without the “?”), RSpec will match convert that into a query against the target object.
The arbitrary_predicate feature will handle any predicate prefixed with “be_an_” (e.g. be_an_instance_of), “be_a_” (e.g. be_a_kind_of) or “be_” (e.g. be_empty), letting you choose the prefix that best suits the predicate.
Examplestarget.should be
target.should be_true
target.should be_false
target.should be_nil
target.should_not be_nilcollection.should be_empty #passes if target.empty?
“this string”.should be_an_intance_of(String)target.should_not be_empty #passes unless target.empty?
target.should_not be_old_enough(16) #passes unless target.old_enough?(16)
Source
There’s no real technical knowledge to be imparted by this post. Sometimes I just have to sit back and admire the beauty of Ruby, in this case some RSpec tests I’m currently working on.

If it hasn’t been made clear by the previous few technical posts, we’ve made the shift from a PHP shop to a Ruby on Rails shop. The verdict is still out on how this technology shift will affect business value (if at all), but from an engineering standard it is really obvious how much cleaner and better designed our code is.
Additional reasons why Rails has impressed me so far are:
I have been pretty slow on starting to blog about new engineering learnings, but I could not pass after we tried this.
The idea was to check whether the class was a descendant of a particular class, and the class name was stored as a String
APP_CONFIG['creative_types'].each do |klass, config|
lambda {
Kernel.const_get(klass).should < Creative
}.should_not raise_error
end
So #1,
Kernel.const_get(klass) gets the class from the String of the class name (and this should not raise an error since the array should only contain defined classes)
and #2, Creative300x250 < Creative checks whether it’s derived from Creative class, which is the main objective of the test. We can simply use the “<” after the “should” method and it just works.
Pretty cool huh?
More to come.
Not long ago, we created an application in Node.js, which uses MongoDB for persistence. Now we’re rewriting our Rails 2.3 application using MongoDB instead of MySQL. We’ve chosen Mongoid as our ORM.
While Mongoid provides a lot of powerful features, it is still very new and undergoing lots of changes and bug fixes. Here are a few things we’ve discovered this week.
More to come..
I ran into a little frustration today when I just wanted a simple command or script (preferably in Ruby) to recursively convert tabs to spaces on every file in the working directory. I pieced together this little Thor task.
class File :tabs_to_spaces
desc 'tabs_to_spaces', 'convert all tabs to two spaces in all files under the current directory'
method_options :spaces => :numeric
# thor file:tabs_to_spaces --spaces 3
def tabs_to_spaces
num_spaces = options[:spaces] || 2
Dir['**/*'].each do |file|
if ::File.ftype(file) == 'file'
file = "#{file.gsub(/["$`\]/) {"\\#{$1}"}}"
system("expand -t#{num_spaces} "#{file}" > "#{file}.tabs_to_spaces" && mv "#{file}.tabs_to_spaces" "#{file}"")
end
end
end
end
I recently was interviewed by a reporter for SearchSoftwareQuality.com regarding SauceLabs, our cloud-hosted testing solution. The SauceLabs guys are working on some really interesting things and really pushing forward the Selenium project and the do’s and don’ts of testing.
Here’s the blurb where I talk about the pain that I had and how SauceLabs solves it:
Rob Fan said the company initially used Selenium to get some basic browser coverage. However, as their testing grew more complex, requiring hundreds of tests running simultaneously, he said they looked into Selenium Grid, “but we realized in playing around with it, you could get it running but to have it usable by the entire team it would become a lot of hassle. With testing you want the results to occur as quickly as possible, so it wouldn’t have been worth the investment of our resources.”
When he came upon Sauce OnDemand, “I liked how they took that concern out of my hands. I could focus on writing the tests I needed to write, plug them in, and run them on their system.”
Check out the entire article here.
At Sharethrough we strongly believe in using what’s already been built to solve our pains rather than re-constructing the “wheel” each time. Here’s a quick list of some of the open source projects that we support and use.
On the Backend
On the Frontend
System
Testing Frameworks
Monitoring, Continuous Integration/Deployment
So far we’ve gotten all these projects working together pretty smoothly. It may appear like an integration nightmare, but they are all pretty mature projects and amount of “time” they have bought us is well worth the effort.