I’ve been writing Ruby code at my day job for a couple years now. The more code I write, the more Ruby becomes my everyday language of choice. There’s just something about its expressiveness; if you can think it, it probably exists as a construct or method. It’s very natural to work in.

But, Ruby has its drawbacks, and they are mostly performance related. Alright, let’s all admit this to each other: Ruby is a historically slow language. It was terrible with allocation, had a bad GC, and was purely interpreted. Over time, different approaches have emerged to solve each one of those issues. The master stroke, though, was porting the language to the JVM: JRuby.

I was planning to make the case for replacing CRuby with JRuby in a production environment at work when I realized that, hey, I’m an engineer. I don’t need an argument, I need a proof! It then occurred to me that I haven’t even proven anything to myself yet; I’ve just been playing around with JRuby and loving it. Playing around does not a production-version changing-proof make.

So where to start? Threading.

Ruby’s threading model has been a source of mystery for me, so I set out to figure it out. Luckily there are some good sources of reference.

Not only do I have to prove that JRuby’s threading model is better than (and out-performs) REE 1.8.7 (which we’re currently on) but also 1.9.x (which we’ll be looking to move to). And not only do I have to prove that, but also that this allegedly superior thread model will in fact be useful in practice.

I’ll sum up what I’ve learned:

CRuby 1.8.x uses green threads. Green threads are an implementation of thread-level concurrency without thread-level parallelism. What does that mean? It means the ruby process itself is single threaded (in terms of native OS threads) but it uses its own scheduler to pull its own green threads off the CPU when they block. No two green threads can ever truly run in parallel, since they take turns on a single CPU, but when one blocks the other runs.

Here’s an example test (the entire file is included at the bottom):

def execute_in_parallel number_of_threads
    threads = []
    number_of_threads.times do |i|
        threads << Thread.new(i) { yield }
    end
    threads.each {|t| t.join}
end

def test_execute_in_parallel_ping
    mutex = Mutex.new
    commands = [
        "ping -c 5 localhost",
        "echo 'Executing second but you should see me first!'"
    ]

    execute_in_parallel(2) do
        cmd = nil
        while true
            mutex.synchronize do
                cmd = commands.shift
            end
            break unless cmd
            puts "Executing this command now: #{cmd}"
            puts `#{cmd}`
        end
    end
end

When the ping thread executes, the thread blocks waiting for the return (which is going to take about 5 seconds). Immediately, the echo thread executes.

So this is pretty good. Even though we're blocking in one thread, the other executes. Neither thread is CPU intensive, so we maximize our CPU utilization.

Let's try with something that is CPU intensive, like the following XOR to a bazillion:

def test_single_thread_count_to_bazillion
    num = 0
    100000000000000.times do |i|
        num = num ^ i
    end
end

When we run this single-threaded loop, it pegs the CPU.

Now let's try to parallelize it:

def test_execute_in_parallel_count_to_bazillion
    mutex = Mutex.new
    commands = [true, true]

    execute_in_parallel(2) do
        cmd = nil
        num = 0

        while true
            mutex.synchronize do
                cmd = commands.shift
            end
            break unless cmd
            100000000000000.times do |i|
                num = num ^ i
            end
        end
    end
end

OK, the CPU is still pegged. If we let this run to completion (we won't, because we don't have all night) it would take roughly twice as long as the single threaded run. Once a CPU is pegged, green threading becomes essentially useless.

That was REE 1.8.7. What about 1.9.2?

Ruby 1.9 introduced native (OS) threading, but it also uses a global interpreter lock (GIL). The GIL only allows execution of a single native thread at a time, even if they are non-blocking. As far as I can tell, this is to ensure that non-thread-safe code which would not have crashed in 1.8.x will continue to not crash in 1.9.x. If someone has a better explanation, please let me know.

The performance characteristics of our tests above are very similar between 1.8 and 1.9; I suspect we're getting less latency when switching between threads but we're not measuring that.

In the end, we get the equivalent of pegging a single CPU.

How does JRuby do? Well, it behaves exactly like CRuby 1.8.7 and 1.9.2 on the blocking IO test, which is what we'd expect.

In the two-threaded XOR test, though, JRuby wins hands down, because it uses native threads and no GIL. JRuby comes very close to pegging two CPUs, which is exactly what we want. Really, there is no comparison in this test.

If we have multiple computationally intensive tasks that share some sort of state, JRuby is simply better.

I kind of feel silly even doing this now. It's like when Apple finally put cut-copy-paste into iOS and we all oohed and ahhed. But then it was, "why wasn't this here all along...?" Wow, pthreads! These are new and exciting!

The final step (for me, not you) is to prove that there is a niche for computationally intensive, multithreaded Ruby programs. But that's for another day...

The following is the entire file used in the above tests, as promised:

require 'thread'
require 'test/unit'

class ThreadTester < Test::Unit::TestCase

    def execute_in_parallel number_of_threads
        threads = []
        number_of_threads.times do |i|
            threads << Thread.new(i) { yield }
        end
        threads.each {|t| t.join}
    end

    def test_execute_in_parallel_ping
        mutex = Mutex.new
        commands = [
            "ping -c 5 localhost",
            "echo 'Executing second but you should see me first!'"
        ]

        execute_in_parallel(2) do
            cmd = nil
            while true
                mutex.synchronize do
                    cmd = commands.shift
                end
                break unless cmd
                puts "Executing this command now: #{cmd}"
                puts `#{cmd}`
            end
        end
    end

    def test_single_thread_count_to_bazillion
        num = 0
        100000000000000.times do |i|
            num = num ^ i
        end
    end

    def test_execute_in_parallel_count_to_bazillion
        mutex = Mutex.new
        commands = [true, true]

        execute_in_parallel(2) do
            cmd = nil
            num = 0

            while true
                mutex.synchronize do
                    cmd = commands.shift
                end
                break unless cmd
                100000000000000.times do |i|
                    num = num ^ i
                end
            end
        end
    end

end