Saturday, January 9, 2010

Meliae 0.2.0

And here we are, with a new release of Meliae 0.2.0. This is a fairly major reworking of the internals, though it should be mostly compatible with 0.1.2. (The disk format did not change, most of the apis have deprecated thunks to help you migrate.)

The main difference is how data is stored in memory. Instead of using a Python dict + python objects, I know use a custom data collection. Python's generic objects are great for getting stuff going, but I was able to cut memory consumption in half with a custom object. This means that finally, analyzing a 600MB dump takes less than 600MB of memory (currently takes about ~300MB). Of course that also depends on your data structures (600MB dump that is one 500MB string will take up very little memory for analysis.)

The second biggest feature is hopefully a cleaner interface.
  1. Call references 'parents' or 'children'. Indicating objects which point to me, and objects which I point to, respectively. 'ref_list' and 'referrers' was confusing. Both start with 'ref', so it takes a bit to sort them out.
  2. Add attributes to get direct access to parents and children, rather than having to go back through the ObjManager.
  3. Change the formatting strings to be more compact. No longer show the refs by default, since you can get to the objects anyway.
A third minor improvement is support for collapsing old-style classes (ones that don't inherit from 'object'.)

So how about an example. To start with, you need a way to interrupt your running process and get a dump of memory. I can't really give you much help, but you'll end up wanting:
from meliae import scanner

(This is the simplest method. There are others that take less memory while dumping, if overhead is a concern.)

Once you have that dump file, start up another python process and let's analyze it.
$ python
>>> from meliae import loader
>>> om = loader.load('test-file.dump')
loaded line 3579013, 3579014 objs, 377.4 / 377.4 MiB read in 79.6s

I recommend just always running these lines. If you used a different method of dumping, there are other things to do, which is why it isn't automatic (yet).
>>> om.compute_parents(); om.collapse_instance_dicts()
set parents 3579013 / 3579014
checked 3579013 / 3579014 collapsed 383480
set parents 3195533 / 3195534

Now we can look at the data, and get a feel for where our memory has gone:
>>> s  = om.summarize(); s
Total 3195534 objects, 418 types, Total size = 496.8MiB (520926557 bytes)
Index Count % Size % Cum Max Kind
0 189886 5 211153232 40 40 1112 Thread
1 199117 6 72510520 13 5412583192 dict
2 189892 5 65322848 12 66 344 _Condition
3 380809 11 30464720 5 72 80 instancemethod
4 397892 12 28673968 5 78 2080 tuple
5 380694 11 27409968 5 83 72 builtin_function_or_method
6 446606 13 26100905 5 88 14799 str
7 189886 5 21267232 4 92 112 _socketobject
8 197255 6 14568080 2 95 14688 list

At this point, you can see that there are 190k instances of Thread, which is consuming 40% of all memory. There is also a very large 12.5MB dict. (It turns out that this dict holds all of those Thread objects.)

But how do we determine that. One thing we can do is just get a handle to all of those Thread instances
>>> threads = om.get_all('Thread')
>>> threads[0]
Thread(32874448 1112B 23refs 3par)

So this thread is at address 32874448 (not particularly relevant), consumes 1112 bytes of memory (including its dict, since we collapsed threads), references 23 python objects, and is referenced by 3 python objects.

Lets see those references
>>> threads[0].c # shortcut for 'children'
[str(11409312 54B 189887par '_Thread__block'), _Condition(32903248 344B 11refs
1par), str(11408976 53B 189887par '_Thread__name'), str(32862080 77B 1par
'PoolThread-twisted.internet.reactor-1'), str(1...

It looks like there might be something interesting there, but it is a bit hard to sort out. Step one is to try using python's pprint utility.
>>> from pprint import pprint as pp
>>> pp(threads[0].c)
[str(11409312 54B 189887par '_Thread__block'),
_Condition(32903248 344B 11refs 1par),
str(11408976 53B 189887par '_Thread__name'),
str(32862080 77B 1par 'PoolThread-twisted.internet.reactor-1'),
str(11429168 57B 189887par '_Thread__daemonic'),
bool(7478912 24B 572370par 'False'),
str(11409200 56B 189887par '_Thread__started'),
bool(7478944 24B 571496par 'True'),

That's a bit better, but I also now that instances have a dict, so lets try:
>>> pp(threads[0].refs_as_dict)
{'_Thread__args': tuple(140013759823952 56B 2008par),
'_Thread__block': _Condition(32903248 344B 11refs 1par),
'_Thread__daemonic': False,
'_Thread__initialized': True,
'_Thread__kwargs': dict(32516192 280B 1par),
'_Thread__name': 'PoolThread-twisted.internet.reactor-1',
'_Thread__started': True,

(Note to self, find a good way to shorten 'refs_as_dict', too much typing) Now that is starting to look like you can actually understand what is going on.

Another question to ask, who is referencing this object (why is it still active)?
>>> pp(threads[0].p)
[list(33599432 104B 1refs 1par),
list(33649944 104B 1refs 1par),
dict(11279168 1048B 10refs 1par)]

So this thread is in 2 lists and a dict with 10 items. So what about the parents of the parents
>>> pp(threads[0].p[0].p)
[ThreadPool(32888520 1120B 21refs 2par)]

So the first list is held by a ThreadPool. We can quick check info about that object:
>> pp(threads[0].p[0].p[0].refs_as_dict())
{'joined': False,
'max': 10,
'min': 0,
'name': 'twisted.internet.reactor',
'q': Queue(32888592 1120B 15refs 1par),
'started': True,
'threads': list(33599432 104B 1refs 1par),
'waiters': list(33649944 104B 1refs 1par),
'workers': 1,
'working': list(33649656 72B 1par)}

So that seems to be a Twisted thread pool.
What about the other parents?
>>> pp(threads[0].p[1].p)
[ThreadPool(32888520 1120B 21refs 2par)]

Also a list held by a ThreadPool
>>> pp(threads[0].p[2].p)
[dict(11253824 3352B 98refs 70par)]

Hmmm, now we have a dict pointing to 98 objects which is, itself, referenced by 70 objects. This at least seems worth investigating further.
>>> d = threads[0].p[2].p[0]
>>> d
dict(11253824 3352B 98refs 70par)

Yep, that's the one. We can try to dump it as a dict
>>> pp(d.refs_as_dict())
>>> pp(d.refs_as_dict())
{'BoundedSemaphore': 'BoundedSemaphore',
'Condition': 'Condition',
'Event': 'Event',
'Lock': builtin_function_or_method(10872592 72B 1refs 7par),
'RLock': 'RLock',
'Semaphore': 'Semaphore',
'Thread': 'Thread',

Now, the formatting here is actually hiding something. Namely that the referred to object is actually a type:
>>> d.c[1]
type(11280288 880B 4refs 2par '_BoundedSemaphore')

From experience, I know that this is probably a module's dict. It has a lot of objects, and a lot of objects referencing it. (all functions references their modules global dict.) I'm still working on how to collapse a modules __dict__ into the module itself for clarity. Anyway, lets look at the parents to see what module this is.
>>> pp([p for p in d.p if p.type_str == 'module'])
[module(11411416 56B 1refs 18par 'threading')]

And there you go, the threading module.

And that's how you walk around the object graph. To finish analyzing this memory, I would probably poke at all the thread objects, and see what they are trying to accomplish. But mostly, the summary tells you that something is wrong. You shouldn't really be able to have 200k active threads doing real work. So probably you have something that is accidentally preserving threads that are no longer active.