Question
I've discovered a memory leak in my Rails code - that is to say, I've found what code leaks but not why it leaks. I've reduced it down to a test case that doesn't require Rails:
require 'csspool'
require 'ruby-mass'
def report
puts 'Memory ' + `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`.strip.split.map(&:to_i)[1].to_s + 'KB'
Mass.print
end
report
# note I do not store the return value here
CSSPool::CSS::Document.parse(File.new('/home/jason/big.css'))
ObjectSpace.garbage_collect
sleep 1
report
ruby-mass supposedly lets me see all the objects in memory. CSSPool is a CSS parser based on racc. /home/jason/big.css is a 1.5MB CSS file.
This outputs:
Memory 9264KB
==================================================
Objects within [] namespace
==================================================
String: 7261
RubyVM::InstructionSequence: 1151
Array: 562
Class: 313
Regexp: 181
Proc: 111
Encoding: 99
Gem::StubSpecification: 66
Gem::StubSpecification::StubLine: 60
Gem::Version: 60
Module: 31
Hash: 29
Gem::Requirement: 25
RubyVM::Env: 11
Gem::Specification: 8
Float: 7
Gem::Dependency: 7
Range: 4
Bignum: 3
IO: 3
Mutex: 3
Time: 3
Object: 2
ARGF.class: 1
Binding: 1
Complex: 1
Data: 1
Gem::PathSupport: 1
IOError: 1
MatchData: 1
Monitor: 1
NoMemoryError: 1
Process::Status: 1
Random: 1
RubyVM: 1
SystemStackError: 1
Thread: 1
ThreadGroup: 1
fatal: 1
==================================================
Memory 258860KB
==================================================
Objects within [] namespace
==================================================
String: 7456
RubyVM::InstructionSequence: 1151
Array: 564
Class: 313
Regexp: 181
Proc: 113
Encoding: 99
Gem::StubSpecification: 66
Gem::StubSpecification::StubLine: 60
Gem::Version: 60
Module: 31
Hash: 30
Gem::Requirement: 25
RubyVM::Env: 13
Gem::Specification: 8
Float: 7
Gem::Dependency: 7
Range: 4
Bignum: 3
IO: 3
Mutex: 3
Time: 3
Object: 2
ARGF.class: 1
Binding: 1
Complex: 1
Data: 1
Gem::PathSupport: 1
IOError: 1
MatchData: 1
Monitor: 1
NoMemoryError: 1
Process::Status: 1
Random: 1
RubyVM: 1
SystemStackError: 1
Thread: 1
ThreadGroup: 1
fatal: 1
==================================================
You can see the memory going way up. Some of the counters go up, but no objects specific to CSSPool are present. I used ruby-mass's "index" method to inspect the objects that have references like so:
Mass.index.each do |k,v|
v.each do |id|
refs = Mass.references(Mass[id])
puts refs if !refs.empty?
end
end
But again, this doesn't give me anything related to CSSPool, just gem info and such.
I've also tried outputting "GC.stat"...
puts GC.stat
CSSPool::CSS::Document.parse(File.new('/home/jason/big.css'))
ObjectSpace.garbage_collect
sleep 1
puts GC.stat
Result:
{:count=>4, :heap_used=>126, :heap_length=>138, :heap_increment=>12, :heap_live_num=>50924, :heap_free_num=>24595, :heap_final_num=>0, :total_allocated_object=>86030, :total_freed_object=>35106}
{:count=>16, :heap_used=>6039, :heap_length=>12933, :heap_increment=>3841, :heap_live_num=>13369, :heap_free_num=>2443302, :heap_final_num=>0, :total_allocated_object=>3771675, :total_freed_object=>3758306}
As I understand it, if an object is not referenced and garbage collection happens, then that object should be cleared from memory. But that doesn't seem to be what's happening here.
I've also read about C-level memory leaks, and since CSSPool uses Racc which uses C code, I think this is a possibility. I've run my code through Valgrind:
valgrind --partial-loads-ok=yes --undef-value-errors=no --leak-check=full --fullpath-after= ruby leak.rb 2> valgrind.txt
Results are here. I'm not sure if this confirms a C-level leak, as I've also read that Ruby does things with memory that Valgrind doesn't understand.
Versions used:
- Ruby 2.0.0-p247 (this is what my Rails app runs)
- Ruby 1.9.3-p392-ref (for testing with ruby-mass)
- ruby-mass 0.1.3
- CSSPool 4.0.0 from here
- CentOS 6.4 and Ubuntu 13.10
Answer
It looks like you are entering The Lost World here. I don’t think the
problem is with c-bindings in racc
either.
Ruby memory management is both elegant and cumbersome. It stores objects
(named RVALUE
s) in so-called heaps of size of approx 16KB. On a low level,
RVALUE
is a c-struct, containing a union
of different standard ruby object
representations.
So, heaps store RVALUE
objects, which size is not more than 40 bytes. For
such objects as String
, Array
, Hash
etc. this means that small objects
can fit in the heap, but as soon as they reach a threshold, an extra memory
outside of the Ruby heaps will be allocated.
This extra memory is flexible; is will be freed as soon as an object became
GC’ed. That’s why your testcase with big_string
shows the memory up-down
behaviour:
def report
puts 'Memory ' + `ps ax -o pid,rss | grep -E "^[[:space:]]*#{$$}"`
.strip.split.map(&:to_i)[1].to_s + 'KB'
end
report
big_var = " " * 10000000
report
big_var = nil
report
ObjectSpace.garbage_collect
sleep 1
report
# ⇒ Memory 11788KB
# ⇒ Memory 65188KB
# ⇒ Memory 65188KB
# ⇒ Memory 11788KB
But the heaps (see GC[:heap_length]
) themselves are not released back to
OS, once acquired. Look, I’ll make a humdrum change to your testcase:
- big_var = " " * 10000000
+ big_var = 1_000_000.times.map(&:to_s)
And, voilá:
# ⇒ Memory 11788KB
# ⇒ Memory 65188KB
# ⇒ Memory 65188KB
# ⇒ Memory 57448KB
The memory is not released back to OS anymore, because each element of the
array I introduced suits the RVALUE
size and is stored in the ruby heap.
If you’ll examine the output of GC.stat
after the GC was run, you’ll find
that GC[:heap_used]
value is decreased as expected. Ruby now has a lot of
empty heaps, ready.
The summing up: I don’t think, the c
code leaks. I think the problem is
within base64 representation of huge image in your css
. I have no clue,
what’s happening inside parser, but it looks like the huge string forces the
ruby heap count to increase.
Hope it helps.