Home Company Services Portfolio Contact us nav spacer

Fetching objects in a folder optimally

by hedley posted on Sep 08, 2009 09:43 PM last modified Sep 08, 2009 09:43 PM

I find I'm frequently iterating over objects in a folder and this is usually painfully slow for large folders in large object databases. But did you know it is easy to significantly speed up this iteration?

Many of you with a computer science background will be familiar with locality of reference. We are really interested in Spatial locality which refers to the use of data elements within relatively close storage locations. This sounds a lot like the situation with items in a folder.

What this all means is that if you need to fetch items in a folder then do not jump around in the folder. Make your iteration follow the same route that items live in the ZODB.

You may want to do

for brain in context.getFolderContents(contentFilter={'review_state':'somestate'}):
    ob = context._getOb(brain.id) # or brain.getObject() yields the same result

but in practice this

for id in context.objectIds():
    ob = context._getOb(id)
    if workflow_tool.getInfoFor(ob, 'review_state') != 'somestate':
        continue

may be faster. It all depends on your data of course for this example.

For the trivial case of iterating over all objects in a folder I have found that sticking to the order as provided by objectIds is about 5 times faster than scanning the folder randomly. This was the case for two different folders both containing 250 000 objects.

If your folders are not that large and you do not have algorithms that scan over them then you may merrily continue on your current path :)

Document Actions