Monday, November 16, 2009

Memory Debugging with Meliae

Background of Meliae 0.1.0

Earlier this year I started working on a new memory debugging program for python. I had originally tried to use heapy, but at the time it didn't support Windows, Mac, or 64-bit environments. (Which turned out to be all of my interesting platforms.) The other major problem is that I'm often debugging memory consumption of up to a GB of active data. While I think some of the former issues have been fixed, the latter is still a major issue for me.

So with the help of Michael Hudson, I started putting together a new structure. The code would be split into a scanner and a processor (loader). Such that you can interrupt a running process, dump the memory consumption to disk, and then analyze it in a separate process. (Often after the former has stopped.) The scanner can have a minimal memory profile, so even if your system is already swapping, you can dump out the memory info. (Robert Collins successfully dumped a 6GB memory profile, though analyzing that beast is still an issue.) The other advantage of this system, is that I don't have to play tricks with objects that represent the current state, like Guppy does with all sorts of crazy decorators.

In recent months, I've also focused on improving Bazaar's memory profile, which also meant improving memory profiling. Enough that I felt it was worth releasing the code. So officially Meliae 0.1.0 has been released. (For those wondering about the name, it is from Ash-Wood Nymph in Greek Mythology, aka it is just a fun name.)

Doing real work
So how does one actually use the program. Bazaar has a very nice ability, that you can use SIGQUIT (Ctrl+|) or SIGBREAK (Ctrl+Pause/Break) to drop into a debugger in the middle of a process to figure out what is going on. At that point, you can just:
from meliae import scanner
scanner.dump_all_objects('filename.json')
(There is an alternative scanner.dump_gc_objects() which has even lower memory profile, but will dump some objects more than once, creating a larger dump file.)

This creates a file describing all of the Python objects it was able to find along with their known size, references, and for some objects (strings, ints) their content. From there, you start another shell, and use:
>>> from meliae import loader
>>> om = loader.load('filename.json')
>>> s = om.summarize(); s

This dumps out something like:
Total 17916 objects, 96 types, Total size = 1.5MiB (1539583 bytes)
Index Count % Size % Cum Max Kind
0 701 3 546460 35 35 49292 dict
1 7138 39 414639 26 62 4858 str
2 208 1 94016 6 68 452 type
3 1371 7 93228 6 74 68 code
4 1431 7 85860 5 80 60 function
5 1448 8 59808 3 84 280 tuple
6 552 3 40760 2 86 684 list
7 56 0 29152 1 88 596 StgDict
8 2167 12 26004 1 90 12 int
9 619 3 24760 1 91 40 wrapper_descriptor
10 570 3 20520 1 93 36 builtin_function_or_method
...

Showing the top objects and what data they consume. This can often be revealing it itself. Do you have millions of tuples? One giant dict that is consuming a surprising amount of memory? (A dict with 200k entries is ~6MB on a 32-bit platform.)

There is more that can be done. You can run:
om.compute_referrers()

At this point, you can look at a single node, and find out what was referencing it. (So what was referencing that largest dict?)
>>> om[s.summaries[0].max_address]
MemObject(29351984, dict, 49292 bytes, 1578 refs [...], 1 referrers [26683840])

>>> om[26683840]
MemObject(29337264, function, format_string, 60 bytes, 6 refs...)

However, it also turns out that all 'classic' classes in Python indirect to their data via self.__dict__, which is a bit annoying to walk through. It also makes it looks like 'dict' is the #1 memory consumer, when actually it might be instances of Foo, which happen to use dicts. So you can use
om.collapse_instance_dicts()

Which will find all instances that seem to have trivial references to a __dict__, and then collapse it so that all references are directly from the instance, and all referenced objects then claim the instance as the referrer.

The above dump changes to:
>>> s = om.summarize(); s
Total 17701 objects, 96 types, Total size = 1.5MiB (1539583 bytes)
Index Count % Size % Cum Max Kind
0 7138 40 414639 26 26 4858 str
1 486 2 394632 25 52 49292 dict
2 208 1 94016 6 58 452 type
3 1371 7 93228 6 64 68 code
4 1431 8 85860 5 70 60 function
5 149 0 82844 5 75 556 ReadLineTextBuffer
6 93 0 65384 4 79 6312 module
7 1448 8 59808 3 83 280 tuple
8 552 3 40760 2 86 684 list
9 56 0 29152 1 88 596 StgDict
10 2167 12 26004 1 90 12 int

Which shows that ReadLineTextBuffer is actually a large consumer of memory.

Misc
There are other bits to explore, and improvements to be made. "scanner.get_recursive_size()" can be useful if you don't want to dump out a big file to analyze memory referenced from a given object (such as a cache). It doesn't give the whole picture, but can be useful in an interactive session.


Results
In the end, this code has enabled us to cut the memory consumption of Bazaar
roughly in half (for bzr branch). It also lets you see nice summaries
like this:

Total 2805995 objects, 276 types, Total size = 946.0MiB (991983819 bytes)
Index Count % Size % Cum Max Kind
0 1939090 69 916011611 92 92 5762600 str
1 9449 0 33069868 3 95 3145868 dict
2 132202 4 12506732 1 96 536 unicode
3 383436 13 7048652 0 97 20 bzrlib._static_tuple_c.StaticTuple
4 160027 5 5873744 0 98 304 tuple
5 5429 0 5185252 0 98 412236 list
6 62256 2 4482432 0 99 72 InventoryFile
7 148 0 1334032 0 99 1048692 set
8 2185 0 1214860 0 99 556 GroupCompressBlock
9 8003 0 992372 0 99 124 CHKInventoryDirectory
...


(Note that after seeing this, we changed the code to not cache as many strings in memory, and I managed to decrease memory consumption to about 1/3rd it once was for this operation.)

The code isn't perfect, but being able to get a view of where memory is going, and what objects are holding on to it, is a huge improvement over just being in the dark.

2 comments:

Chris Jones said...

I'm curious how my code is performing and this seems like a great tool, but I sadly don't really understand the output I'm seeing. Any chance of a dummy's guide? :D

jam said...

Thanks for your interest. :)

During vacation time, I'm actually working on a bit of a rewrite of some of the core data structures, which should make it easier to display the information you actually care about.

I'll try to get another post out once I finish.