Finding fat free alternatives for upfront.diet
A list of Plone catalog dependencies and alternative implementations that don't rely on the catalog.
This is a follow-up on my previous post about upfront.diet. Although we have done some work around folder contents and references we decided to take a step back, document Plone's catalog usage and propose alternative fat free implementations for comment by other developers. Below is a list of features that currently use the catalog. We consider each of them and whether they can work without the catalog and propose an alternative implementation where possible.
Portal tabs
Portal tabs query the catalog for top level folders only. This can easily be replaced by using upfront.diet's foldercontents adapter that fetch objects directly. This adapter indexes allowed roles and users directly on the adapted container and uses this index to only return children a user has access to. This adapter is already used in upfront.diet's folder_contents implementation.
Folder contents and friends
We already have a successful implementation for folder_contents but some of the other folder views (folder_listing, folder_summary_view and folder_tabular_view) still need to be fixed. The catalog query in use by these templates need to be replaced by iterating over the contents of the foldercontents adapter.
Navigation portlet
Replacing the navigation portlet with an implementation that doesn't use the catalog could be significantly slower if not thought through carefully. The existing navigation portlet can render multiple levels deep and it renders all the children of all the parent folders in the path of the current folder. Before one burdens the catalog with indexes and metadata to support the navigation portlet, one should reconsider fetching objects to build the tree and determine exactly when this becomes too costly and if caching can be used to achieve good performance. If it turns out that cataloging is indeed required, one should consider using a separate catalog to support navigation. Having a separate catalog ensures that we don't end up with a bloated catalog like we currently have and makes it possible to clear the portal_catalog without breaking navigation. It makes more sense to separate catalogs that are meant for content searching from catalogs that are required to support specific use cases like navigation.
The next step is to benchmark a navigation portlet that fetches objects where the navigation tree has no more than a 100 objects at each level. A navigation portlet that renders more than a 100 objects don't really make any sense and anything more than a 100 is most probably unusable.
Sitemap
There is a significant overlap between the current implementation for the navigation portlet and the site map in that they both use the NavtreeQueryBuilder to build a tree and one could optimise them using the same strategy.
Review list
The review portlet asks the workflow tool for the worklists of all workflows. In a standard Plone site all the workflows have a worklist named reviewer_queue and they all query the catalog for content in the pending state. It could be argued that review_state is an important index for site wide searches in any case and at least this attribute must be indexed in the portal_catalog. This is definitely a convenient conclusion but I'm not going to accept it just yet. An alternative implementation might be to use a real queue of objects that need to be reviewed. This has a couple of advantages over indexing review_state in the catalog in that you don't need to index review_state on all objects and the queue will always be small since objects will be popped from the queue as they are processed. The reviewer queue can be a simple persistent object possibly using one of the queue classes in zc.queue. Besides the queue, event subscribers are necessary to add pending objects to and remove processed objects from the queue.
Calendar portlet
The calendar portlet queries the catalog using the CatalogTool.catalog_getevents method. This method still has a comment that states "XXX: this method violates the rules for tools/utilities: it depends on a non-utility tool" so we have an opportunity to fix this as well. The comment isn't entirely clear but I assume it is referring to the dependency on the catalog tool. This method queries the catalog on the portal_type, review_state, start and end indexes. The calendar clearly needs to index start and end date of events. If one only indexes portal types and review states configured in the calendar tool, you don't need indexes for portal_type and review_state. If you change the calendar configuration you can always re-index content in your portal to include other portal_types or additional review states.
A possible fix would be to add an utility, named IEventIndexer, that indexes start and end date for content types listed in the calendar tool configuration. An additional utility, IQueryEvents, can be used to query for events. The ZODB versions of these utilities can index and query a ZCatalog instance.
News and Events Smart Folders
The search performed by smart folders are clearly site wide content searches so it makes perfect sense to use the portal_catalog in this case. The Collection configlet already allows one to customise which fields must be enabled for smart folders so we don't really need to develop anything to allow searching on fewer indexes. Note that the News and Events smart folders installed by Plone only requires the portal_type, review_state and start indexes, should you want to reduce the number of indexes in your catalog.
Search
The site search clearly needs the portal_catalog, but hopefully we require a lot fewer indexes in the catalog. Live search and the regular site search query the SearchableText, portal_type and path indexes. The query on portal_type is used to restrict results to "user friendly types" which are types configured in the search configlet. If one has a portal where one restricts the number of types that can be searched, it probably doesn't make sense to index types that are not searched anymore. Given that we use the portal_catalog only for content searches and not to support functions listed above we can safely stop indexing these types without breaking the rest of the Plone user interface. One should probably make this configurable by adding an option reading "Don't index types that are excluded from searches" on the search configlet.
References
By using Simple References we removed the dependency on the reference catalog and this has proven to perform significantly better in some of our reference-heavy applications. However, we still depend on the UID catalog and are considering an approach that doesn't require any catalog for references. ZODB already support references between objects, by simply assigning a persistent object to an attribute of an existing persistent object eg. folder.item = object. The only reason ZODB references are not used for Archetypes references is that most applications require the reference to refer to the object in its original context. If you access a ZODB reference (like folder.item) it will wrap 'item' in the context of 'folder', regardless of its original context. Luckily Zope3 introduced the idea of a location (see ILocation in zope.location.interfaces) for objects with a structural location. This location of an object can be determined by reading the __parent__ attribute of an object and its parents. This means that we can develop a Archetypes reference field that use ZODB references to reference objects but still return the objects in their original location by wrapping it in their original location. Currently Plone containers don't implement ILocation so we will need to investigate what needs to be done to fix this - the interface suggests it is straightforward to implement.
Next steps
Knowing what pieces of Plone depend on the portal catalog is already helpful if you are developing for Plone or trying to optimise it. The above list is definitely not an exhaustive list of catalog dependencies but covers the most common ones. Please tell us if we've missed something obvious. In no particular order, we will continue to implement the alternatives suggested above and will gladly incorporate suggestions or improvements.

