I’ve been writing Ruby code at my day job for a couple years now. The more code I write, the more Ruby becomes my everyday language of choice. There’s just something about its expressiveness; if you can think it, it probably exists as a construct or method. It’s very natural to work in.
But, Ruby has its drawbacks, and they are mostly performance related. Alright, let’s all admit this to each other: Ruby is a historically slow language. It was terrible with allocation, had a bad GC, and was purely interpreted. Over time, different approaches have emerged to solve each one of those issues. The master stroke, though, was porting the language to the JVM: JRuby.
I was planning to make the case for replacing CRuby with JRuby in a production environment at work when I realized that, hey, I’m an engineer. I don’t need an argument, I need a proof! It then occurred to me that I haven’t even proven anything to myself yet; I’ve just been playing around with JRuby and loving it. Playing around does not a production-version changing-proof make.
So where to start? Threading.
Ruby’s threading model has been a source of mystery for me, so I set out to figure it out. Luckily there are some good sources of reference.
Not only do I have to prove that JRuby’s threading model is better than (and out-performs) REE 1.8.7 (which we’re currently on) but also 1.9.x (which we’ll be looking to move to). And not only do I have to prove that, but also that this allegedly superior thread model will in fact be useful in practice.
I’ll sum up what I’ve learned:
CRuby 1.8.x uses green threads. Green threads are an implementation of thread-level concurrency without thread-level parallelism. What does that mean? It means the ruby process itself is single threaded (in terms of native OS threads) but it uses its own scheduler to pull its own green threads off the CPU when they block. No two green threads can ever truly run in parallel, since they take turns on a single CPU, but when one blocks the other runs.
Here’s an example test (the entire file is included at the bottom):
def execute_in_parallel number_of_threads
threads = []
number_of_threads.times do |i|
threads << Thread.new(i) { yield }
end
threads.each {|t| t.join}
end
def test_execute_in_parallel_ping
mutex = Mutex.new
commands = [
"ping -c 5 localhost",
"echo 'Executing second but you should see me first!'"
]
execute_in_parallel(2) do
cmd = nil
while true
mutex.synchronize do
cmd = commands.shift
end
break unless cmd
puts "Executing this command now: #{cmd}"
puts `#{cmd}`
end
end
end
When the ping thread executes, the thread blocks waiting for the return (which is going to take about 5 seconds). Immediately, the echo thread executes.
So this is pretty good. Even though we're blocking in one thread, the other executes. Neither thread is CPU intensive, so we maximize our CPU utilization.
Let's try with something that is CPU intensive, like the following XOR to a bazillion:
def test_single_thread_count_to_bazillion
num = 0
100000000000000.times do |i|
num = num ^ i
end
end
When we run this single-threaded loop, it pegs the CPU.
Now let's try to parallelize it:
def test_execute_in_parallel_count_to_bazillion
mutex = Mutex.new
commands = [true, true]
execute_in_parallel(2) do
cmd = nil
num = 0
while true
mutex.synchronize do
cmd = commands.shift
end
break unless cmd
100000000000000.times do |i|
num = num ^ i
end
end
end
end
OK, the CPU is still pegged. If we let this run to completion (we won't, because we don't have all night) it would take roughly twice as long as the single threaded run. Once a CPU is pegged, green threading becomes essentially useless.
That was REE 1.8.7. What about 1.9.2?
Ruby 1.9 introduced native (OS) threading, but it also uses a global interpreter lock (GIL). The GIL only allows execution of a single native thread at a time, even if they are non-blocking. As far as I can tell, this is to ensure that non-thread-safe code which would not have crashed in 1.8.x will continue to not crash in 1.9.x. If someone has a better explanation, please let me know.
The performance characteristics of our tests above are very similar between 1.8 and 1.9; I suspect we're getting less latency when switching between threads but we're not measuring that.
In the end, we get the equivalent of pegging a single CPU.
How does JRuby do? Well, it behaves exactly like CRuby 1.8.7 and 1.9.2 on the blocking IO test, which is what we'd expect.
In the two-threaded XOR test, though, JRuby wins hands down, because it uses native threads and no GIL. JRuby comes very close to pegging two CPUs, which is exactly what we want. Really, there is no comparison in this test.
If we have multiple computationally intensive tasks that share some sort of state, JRuby is simply better.
I kind of feel silly even doing this now. It's like when Apple finally put cut-copy-paste into iOS and we all oohed and ahhed. But then it was, "why wasn't this here all along...?" Wow, pthreads! These are new and exciting!
The final step (for me, not you) is to prove that there is a niche for computationally intensive, multithreaded Ruby programs. But that's for another day...
The following is the entire file used in the above tests, as promised:
require 'thread'
require 'test/unit'
class ThreadTester < Test::Unit::TestCase
def execute_in_parallel number_of_threads
threads = []
number_of_threads.times do |i|
threads << Thread.new(i) { yield }
end
threads.each {|t| t.join}
end
def test_execute_in_parallel_ping
mutex = Mutex.new
commands = [
"ping -c 5 localhost",
"echo 'Executing second but you should see me first!'"
]
execute_in_parallel(2) do
cmd = nil
while true
mutex.synchronize do
cmd = commands.shift
end
break unless cmd
puts "Executing this command now: #{cmd}"
puts `#{cmd}`
end
end
end
def test_single_thread_count_to_bazillion
num = 0
100000000000000.times do |i|
num = num ^ i
end
end
def test_execute_in_parallel_count_to_bazillion
mutex = Mutex.new
commands = [true, true]
execute_in_parallel(2) do
cmd = nil
num = 0
while true
mutex.synchronize do
cmd = commands.shift
end
break unless cmd
100000000000000.times do |i|
num = num ^ i
end
end
end
end
end