<?xml version="1.0" encoding="utf-8" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"
         xmlns="http://purl.org/rss/1.0/">




    



<channel rdf:about="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/RSS">
  <title>Where I'm calling from</title>
  <link>http://www.upfrontsystems.co.za</link>
  
  <description>
    
       A blog by Roché Compaan about Zope, Plone, Open Source, freedom, life.
       
  </description>
  
  
  
            <syn:updatePeriod>daily</syn:updatePeriod>
            <syn:updateFrequency>1</syn:updateFrequency>
            <syn:updateBase>2007-10-19T20:56:24Z</syn:updateBase>
        
  
  <image rdf:resource="http://www.upfrontsystems.co.za/logo.jpg"/>

  <items>
    <rdf:Seq>
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/fat-free-alternatives-for-upfront-diet"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/upfront-diet"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/reconsidering-pair-programming"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/progress-bar-for-plone"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/simple-references"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/pair-programming"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/fat-doesnt-matter"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/catalog-indexes"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/project-management-checklist"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks-revisited"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/portlet-love"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/meeting-plone-3"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/plone-conference-2007-naples"/>
        
    </rdf:Seq>
  </items>

</channel>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/fat-free-alternatives-for-upfront-diet">
        <title>Finding fat free alternatives for upfront.diet</title>
        <link>http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/fat-free-alternatives-for-upfront-diet</link>
        <description>A list of Plone catalog dependencies and alternative implementations that don't rely on the catalog.</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>A list of Plone catalog dependencies and alternative implementations that don't rely on the catalog.</p>
<p>This is a follow-up on my previous post about <a title="The Upfront Diet for Plone" class="internal-link" href="upfront-diet">upfront.diet</a>. Although we have done some work around folder contents and references we decided to take a step back, document Plone's catalog usage and propose alternative fat free implementations for comment by other developers. Below is a list of features that currently use the catalog. We consider each of them and whether they can work without the catalog and propose an alternative implementation where possible.</p>
<h3>Portal tabs</h3>
<p>Portal tabs query the catalog for top level folders only. This can easily be replaced by using upfront.diet's foldercontents adapter that fetch objects directly. This adapter indexes allowed roles and users directly on the adapted container and uses this index to only return children a user has access to. This adapter is already used in upfront.diet's folder_contents implementation.</p>
<h3>Folder contents and friends</h3>
<p>We already have a successful implementation for folder_contents but some of the other folder views (folder_listing, folder_summary_view and folder_tabular_view) still need to be fixed. The catalog query in use by these templates need to be replaced by iterating over the contents of the foldercontents adapter.</p>
<h3>Navigation portlet</h3>
<p>Replacing the navigation portlet with an implementation that doesn't use the catalog could be significantly slower if not thought through carefully. The existing navigation portlet can render multiple levels deep and it renders all the children of all the parent folders in the path of the current folder. Before one burdens the catalog with indexes and metadata to support the navigation portlet, one should reconsider fetching objects to build the tree and determine exactly when this becomes too costly and if caching can be used to achieve good performance. If it turns out that cataloging is indeed required, one should consider using a separate catalog to support navigation. Having a separate catalog ensures that we don't end up with a bloated catalog like we currently have and makes it possible to clear the portal_catalog without breaking navigation. It makes more sense to separate catalogs that are meant for content searching from catalogs that are required to support specific use cases like navigation.</p>
<p>The next step is to benchmark a navigation portlet that fetches objects where the navigation tree has no more than a 100 objects at each level. A navigation portlet that renders more than a 100 objects don't really make any sense and anything more than a 100 is most probably unusable.</p>
<h3>Sitemap</h3>
<p>There is a significant overlap between the current implementation for the navigation portlet and the site map in that they both use the NavtreeQueryBuilder to build a tree and one could optimise them using the same strategy.</p>
<h3>Review list</h3>
<p>The review portlet asks the workflow tool for the worklists of all workflows. In a standard Plone site all the workflows have a worklist named reviewer_queue and they all query the catalog for content in the pending state. It could be argued that review_state is an important index for site wide searches in any case and at least this attribute must be indexed in the portal_catalog. This is definitely a convenient conclusion but I'm not going to accept it just yet. An alternative implementation might be to use a real queue of objects that need to be reviewed. This has a couple of advantages over indexing review_state in the catalog in that you don't need to index review_state on all objects and the queue will always be small since objects will be popped from the queue as they are processed. The reviewer queue can be a simple persistent object possibly using one of the queue classes in zc.queue. Besides the queue, event subscribers are necessary to add pending objects to and remove processed objects from the queue.</p>
<h3>Calendar portlet<br /></h3>
<p>The calendar portlet queries the catalog using the CatalogTool.catalog_getevents method. This method still has a comment that states "XXX: this method violates the rules for tools/utilities: it depends on a non-utility tool" so we have an opportunity to fix this as well. The comment isn't entirely clear but I assume it is referring to the dependency on the catalog tool. This method queries the catalog on the <em>portal_type</em>, <em>review_state</em>, <em>start</em> and <em>end</em> indexes. The calendar clearly needs to index start and end date of events. If one only indexes portal types and review states configured in the calendar tool, you don't need indexes for portal_type and review_state. If you change the calendar configuration you can always re-index content in your portal to include other portal_types or additional review states.</p>
<p>A possible fix would be to add an utility, named IEventIndexer, that indexes start and end date for content types listed in the calendar tool configuration. An additional utility, IQueryEvents, can be used to query for events. The ZODB versions of these utilities can index and query a ZCatalog instance.</p>
<h3>News and Events Smart Folders</h3>
<p>The search performed by smart folders are clearly site wide content searches so it makes perfect sense to use the portal_catalog in this case. The Collection configlet already allows one to customise which fields must be enabled for smart folders so we don't really need to develop anything to allow searching on fewer indexes. Note that the News and Events smart folders installed by Plone only requires the portal_type, review_state and start indexes, should you want to reduce the number of indexes in your catalog.</p>
<h3>Search<br /></h3>
<p>The site search clearly needs the portal_catalog, but hopefully we require a lot fewer indexes in the catalog. Live search and the regular site search query the SearchableText, portal_type and path indexes. The query on portal_type is used to restrict results to "user friendly types" which are types configured in the search configlet. If one has a portal where one restricts the number of types that can be searched, it probably doesn't make sense to index types that are not searched anymore. Given that we use the portal_catalog only for content searches and not to support functions listed above we can safely stop indexing these types without breaking the rest of the Plone user interface. One should probably make this configurable by adding an option reading "Don't index types that are excluded from searches" on the search configlet.</p>
<h3>References</h3>
<p>By using <a class="external-link" href="http://svn.plone.org/svn/collective/upfront.simplereferencefield/">Simple References</a> we removed the dependency on the reference catalog and this has proven to perform significantly better in some of our reference-heavy applications. However, we still depend on the UID catalog and are considering an approach that doesn't require any catalog for references. ZODB already support references between objects, by simply assigning a persistent object to an attribute of an existing persistent object eg. folder.item = object. The only reason ZODB references are not used for Archetypes references is that most applications require the reference to refer to the object in its original context. If you access a ZODB reference (like folder.item) it will wrap 'item' in the context of 'folder', regardless of its original context. Luckily Zope3 introduced the idea of a location (see ILocation in zope.location.interfaces) for objects with a structural location. This location of an object can be determined by reading the __parent__ attribute of an object and its parents. This means that we can develop a Archetypes reference field that use ZODB references to reference objects but still return the objects in their original location by wrapping it in their original location. Currently Plone containers don't implement ILocation so we will need to investigate what needs to be done to fix this - the interface suggests it is straightforward to implement.</p>
<h3>Next steps<br /></h3>
<p>Knowing what pieces of Plone depend on the portal catalog is already helpful if you are developing for Plone or trying to optimise it. The above list is definitely not an exhaustive list of catalog dependencies but covers the most common ones. Please tell us if we've missed something obvious. In no particular order, we will continue to implement the alternatives suggested above and will gladly incorporate suggestions or improvements.</p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>roche</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2010-01-12T14:29:09Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/upfront-diet">
        <title>The Upfront Diet for Plone</title>
        <link>http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/upfront-diet</link>
        <description>It's time to put Plone on a diet that works. Loose weight now, ask me how!</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>It's time to put Plone on a diet that works. Loose weight now, ask me how!</p>
<p>Over the years I have tried very hard to unpack the truth about Plone's performance problems and many other developers have done so too. A few products have seen the light to address the speed of Plone of which <a class="external-link" href="http://plone.org/products/cachefu">CacheFu</a>, <a href="http://pypi.python.org/pypi/experimental.catalogqueryplan">experimental.catalogqueryplan</a>, <a class="external-link" href="http://pypi.python.org/pypi/archetypes.schematuning">archetypes.schematuning</a> and <a class="external-link" href="http://svn.plone.org/svn/collective/SimpleReferenceField/">SimpleReferenceField</a> are the ones that I use often. Besides these products, the plone core has seen many optimisations over the years but naturally, and unfortunately, some new performance bugs are introduced a long the way. Generally though we have enough tools to save the day but I don't think that we have done enough to solve Plone's performance problems and with that the perception that Plone is slow.</p>
<p> There is something that complicates how this perception is formed, specifically the fact that Plone is not only a CMS, it is a development platform as well. If it wasn't, I think it wouldn't be any good in it's primary role as a CMS. With tools like ArchgenXML and paster, developers can churn out new content types and Plone products faster than Mcdonalds slides hamburgers across the counter. Yes, they are equally unhealthy! These content types inherit too much fat from the existing Plone environment and add even more fat by design. A lot of the fat can be attributed to Plone's obsession with the catalog and its irrational fear of waking up objects. The truth is, you can't use Plone if you empty your portal catalog: folder contents won't work, navigation will break, portlets will be empty and there won't be any breadcrumbs to help you run from the Grimm brothers' evil old witch. We really don't need all content types to be indexed, to have metadata, workflow, history and complicated references. In short, you don't need <strong>all</strong> content types to have <strong>all</strong> the features. This is the biggest problem with developing for Plone. Unless you know your way around 10 million lines of code and 10 years of history really well, you will find it surprisingly hard to make simple content types with less features. In other development frameworks you typically start with nothing and add the features as you need them. For example, nothing is searchable by default, you add the index when required. I believe that the same is possible with Plone, and that such an approach will serve it much better in the long run. This brings me to upfront.diet.</p>
<p><strong>upfront.diet</strong> is an experimental product that will explore the ways that Plone can be turned in a development platform that lets you add features as needed, rather than taking them away. By default no content type should require being indexed or being part of a workflow. To make this work, we will rewrite the existing content listings to not use the catalog but rather list the contents of the folder using the traditional ObjectManager API using objectIds or objectValues. To filter the content for a specific role or user, we will store allowedRolesAndUsers as a local keyword index on the folder itself. From this point on we will carefully consider any additional index requirements imposed by searching and sorting and rather than installing all these indexes by default, we will allow adding them through configuration.</p>
<p>upfront.diet will patch existing content types to use SimpleReferenceField instead of the default Archetypes ReferenceField to remove the dependency on the reference catalog. In our experience the reference catalog is a major performance bottleneck on content types with as little as two or three references.</p>
<p>Both the portal_catalog and reference_catalog are often responsible for conflict errors in a Plone site so avoiding them for simpler operations that don't require them must be a step in the right direction. I hope that one can end up with an extremely light weight portal_catalog that only contains indexes that are required for site-wide content searches. This should greatly reduce conflict errors and catalog bloat. Additionally moving to an implementation where modifying content don't involve a single tool but modify attributes in a local context instead is a much better fit when using the ZODB.</p>
<p>Let's hope this diet works and we can make a toast on Plone's health. Prost!</p>
<p>&nbsp;</p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>roche</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2009-12-14T19:10:07Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/reconsidering-pair-programming">
        <title>Reconsidering Pair Programming</title>
        <link>http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/reconsidering-pair-programming</link>
        <description>We gave pair programming a fair chance but I'm afraid it didn't work out.</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>We gave pair programming a fair chance but I'm afraid it didn't work out.</p>
<p>After a nine month experiment with pair programming, we decided that it is not a good fit for us. Here's why:</p>
<ol><li>It doesn't guarantee that the best design is chosen. It might be slightly better than what a single programmer would come up with but ideally the developers with the most experience in software design should take the lead when it comes to the design of the system.</li><li>Since a pair discuss their implementation amongst themselves they neglect to document their decisions and communicate them to other members in the team. It is still better than a single developer keeping it all to himself but it doesn't guarantee discussion about design and architecture in cases where it is really required.<br /></li><li>Mistakes are very costly. If you bill your customer by the hour and a pair took the wrong avenue, your customer have to pay double for that day's mistake.</li><li>It generally requires to much coordination and synchronisation and really doesn't cater for individual differences. Some developers like to code in the early mornings and some like to burn midnight oil and others like to keep on going when they are in the zone. So if one developer in a pair has a family and needs to knock off at five, he often breaks his partner's rhythm.</li></ol>
<p>To achieve the same level of quality and knowledge transfer that pair programming promises, while being more flexible at the same time, we practice the following:</p>
<ol><li>Standup meetings every morning.</li><li>Daily code review by a senior developer.</li><li>Developers have to document their design and implementation before they start coding.</li><li>Designs need to be reviewed by a senior developer before coding can start.<br /></li></ol>
<p>Developers generally don't like documenting what lies ahead of them, but they have to fight against the urge to just jump in and start coding. I think it is essential for developers to learn to express the design of a system in a narrative, both verbally and in writing. This helps them to design better systems and increases the maintainability of their software. In short, I think good developers are good story tellers.</p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>roche</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2009-12-04T18:16:16Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/progress-bar-for-plone">
        <title>Progress bar for Plone</title>
        <link>http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/progress-bar-for-plone</link>
        <description>Don't let long running jobs time out, just wire in collective.progressbar.</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>Don't let long running jobs time out, just wire in collective.progressbar.</p>
<p>If you have a big import job on the server and don't want to tell your customer to ignore the browser timeout and just reload later to see if it completed, just install collective.progressbar. It's basic but it doesn't time out :-)</p>
<p>This product provides a basic html and javascript progress bar for Plone
that is useful for long running server side processes like imports or
exports.</p>
<p>To use it, you simply have to fire two events. The first event
initialises the progress bar view:</p>
<pre>from collective.progressbar.events import InitialiseProgressBar
from collective.progressbar.events import ProgressBar
title = 'Importing file'
bar = ProgressBar(self.context, self.request, title)
notify(InitialiseProgressBar(bar))</pre>
<p>The ProgressBar class above can take an optional view parameter if you
want to customise the view that renders the progress bar further. You
only have to include the progressbar macro in your custom view.</p>
<p>To update progress, you simply have to fire the appropriate event:</p>
<pre>from collective.progressbar.events import UpdateProgressEvent
from collective.progressbar.events import ProgressState
for index in range(101):<dd>progress = ProgressState(self.request, index)
notify(UpdateProgressEvent(progress))
</dd></pre>
<p>To see how the progress bar works, install the package and browse to the
demo view eg.: <a class="external-link" href="http://localhost:8080/plone/@@collective.progressbar-demo">http://localhost:8080/plone/@@collective.progressbar-demo</a></p>
<p>It's available as an egg on PYPI (<a class="external-link" href="http://pypi.python.org/pypi/collective.progressbar/0.5">http://pypi.python.org/pypi/collective.progressbar/0.5</a>) or you can check it out from the collective (<a class="external-link" href="http://svn.plone.org/svn/collective/collective.progressbar/">http://svn.plone.org/svn/collective/collective.progressbar/</a>).</p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>roche</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2009-09-28T13:16:24Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/simple-references">
        <title>Simple References</title>
        <link>http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/simple-references</link>
        <description>Simple references can speed up your Plone application significantly.</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>Simple references can speed up your Plone application significantly.</p>
<p>While optimising the import of objects from a CSV file, I re-discovered how slow the Archetypes reference implementation is. In my experience, one rarely queries the reference catalog directly or interrogate the metadata of a relationship. I'm not saying these are not legitimate use cases, only that I rarely need to do this in the applications I write. Most of the time, I simply need to call the accessor of the reference field to get the referenced objects. Even Plone's related content reference field doesn't need to do more than get the referenced content.</p>
<p> Plone's overzealous use of catalogs is often the cause of performance problems, and the reference implementation is no exception. For most use cases, you don't need to do much more than store the UIDs of referenced objects on the source object itself. Years ago I developed SimpleReferenceField to address this problem, and pulled it closer now for this import job. It stores referenced UIDs on the object itself using annotation storage.</p>
<p> Using the original ReferenceField, it took almost <strong>30 minutes</strong> to import <strong>4000</strong> instances of a fairly basic Archetype with a single multivalued reference field. It took only <strong>5 minutes</strong> to import them using SimpleReferenceField. You can find it here: <a class="external-link" href="http://svn.plone.org/svn/collective/SimpleReferenceField/">http://svn.plone.org/svn/collective/SimpleReferenceField/</a></p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>roche</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2009-09-12T12:46:57Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/pair-programming">
        <title>Pair Programming</title>
        <link>http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/pair-programming</link>
        <description>How do you take the fun you have at Zope and Plone sprints and make it a way of life.</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>How do you take the fun you have at Zope and Plone sprints and make it a way of life.</p>
<p>I've only had the opportunity to sprint at Plone conferences and this was already a profound experience. I have been a believer in <a href="http://www.extremeprogramming.org/">extreme programming</a> long before I attended Plone conferences but never found a way to implement it in our own company. Mostly because it seemed really difficult to map the disciplines of extreme programming, especially <a href="http://www.extremeprogramming.org/rules/pair.html">pair programming</a>, to an environment where you have to manage multiple projects at once. We eventually found a way to do it, and given what we have learned in the last couple of months, I don't think we will ever go back to our previous way of working.</p>
<p>Before I explain what we did, let me explain how we used to work and what the challenges were. We are a team of eight developers and in the past we simply took all the projects that we were involved in and divided them up amongst the developers. So if we were working on 4 projects at a time, we would assign about two developers to each project, depending on the size of the project of course. For a given project, we further subdivided the tasks and assigned them to developers working on that project. In this scenario developers didn't work together on tasks, they simply tried to complete the task on their own in the shortest possible time. An important thing to take note of is the fact that all our developers are contracted to do development and were not employees earning a salary. This arrangement worked well for our team of developers who I believe likes to think that they are independent knowledge workers and don't have to work for a boss. Simply put, I call this the model of the individual contractor.</p>
<p>Although we have always been very effective in getting the job done, I have never been extremely happy with the quality of code that we produced. Even in cases where experienced developers produced code to solve particular complex problems in a way that worked and satisfied the customer, this code was not of the same quality as what would have been produced if we were pair programming. In my definition of quality code, I rate simplicity and the code's ability to communicate to the next developer working on it, the highest. It is these qualities that make it easy and fun for developers to work together.</p>
<p>Beside problems with quality, I found that I didn't get enough feedback from individual contractors on how long it took them to complete certain tasks. As a result we didn't have the required information to quote more for complex tasks on future quotes.</p>
<p>Other problems were:</p>
<ul><li> You can't effectively rate the experience of developers while they are working on their own.</li><li>The risk of maintaining a product is very high when too few developers work on it.</li><li>Knowledge transfer is low.</li><li>Developers did not talk enough about design before coding.<br /></li></ul>
<p>I could go into a lot more depth discussing problems with the individual contractor model but you probably know them already. I should emphasize that in my opinion, if you are not doing pair programming, you are following the individual contractor model even if your developers are full-time employees.</p>
<p>So five months ago we decided to change how we worked. Besides deciding to do pair programming we decided to put the whole team on a single project at a time. At this time we were still contracted to do development on multiple projects so we decided to simply schedule them in sequence instead of doing it in parallel. When we compared our schedule for projects where only two or three developers were working on them in parallel, to a schedule where we worked on these projects in sequence, we realised that we will still complete all the projects in time. For example, you might schedule 3 months for 2 developers on a thousand hour project, but you could wrap up that project in a month if you put a whole team of 8 developers on it. We took a big risk to change our way of working, but it absolutely paid off.</p>
<p>We have a team with both experienced and inexperienced developers and the pace at which novices learn while driving is remarkable. The most profound change that has come about is that developers now have to talk about the design and architecture of the code a lot more and they have to motivate it to each other before coding. In the past even experienced developers would often code to get the job done. The feedback loop on the complexity and duration of tasks is a lot smaller. I know almost immediately now when we have underquoted on a tasks. In the past I probably never found out that a developer took 2 days on a task we quoted 1 day for.</p>
<p>Most good developers will agree that coding is design and good design communicates. It is difficult to write code and design at the same time, or write code while thinking strategically about the code that you are writing. It is completely impossible to write good code without communicating. This is why pair programming works! And it works not only in the more relaxed environments offered by sprints, but also on projects with tight deadlines and demanding customers.<br /></p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>roche</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2009-04-02T19:36:09Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/fat-doesnt-matter">
        <title>Fat doesn't matter</title>
        <link>http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/fat-doesnt-matter</link>
        <description>Big transactions sizes caused by indexing doesn't really matter.</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>Big transactions sizes caused by indexing doesn't really matter.</p>
<p>In a previous post, <a href="catalog-indexes">Big Fat Catalog Indexes</a>, I was really concerned about the amount of data that is indexed and how this results in big transaction sizes. Recently I went back to that test and decided to pack the Data.fs. After packing, the 2 Gbyte Data.fs shrunk to 95 MB. Duh! So although this test seemed like a real bummer for cataloging in the ZODB, it turned out to be a non-issue. The size of the transaction doesn't really impact on the index speed as more benchmarks proved. The schema lookup in Archetypes is the major bottleneck when indexing documents in Plone. See the <a href="http://thread.gmane.org/gmane.comp.web.zope.plone.devel/20102">thread</a> on plone-dev for more detail if you missed it.</p>
<p>I'm always suprised when I do these benchmarks, and they often make me feel really stupid. And since I'm stupid I keep doing benchmarks, even though the conclusions in this month's stats totally contradict last month's. Maybe I'm not so stupid and we're all learning what the ZODB can do as we grow with it year after year. If this is not the case, just remember ignorance is bliss ;-) You gotta admit though, it is one awesome piece of technology! Thanks Jim.<br /></p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>roche</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2009-07-02T19:10:08Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/catalog-indexes">
        <title>Big Fat Catalog Indexes</title>
        <link>http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/catalog-indexes</link>
        <description>Some benchmarks that show how abnormally big Zope's catalog indexes are in relation to the data that is indexed.</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>Some benchmarks that show how abnormally big Zope's catalog indexes are in relation to the data that is indexed.</p>
<p>I've been doing some indexing benchmarks on Plone and got some surprising stats on the pickle size of btrees and their buckets that are persisted with each transaction. Surprising in the sense that they are very big in relation to the actual data indexed.</p>
<p>In the benchmark I add and index 10000 ATDocuments. I commit after each document to simulate a transaction per request environment. Each document has a 100 byte long description and 100 bytes in it's body. The total transaction size however is 40K in the beginning. The transaction sizes grow linearly to about 350K when reaching 10000 documents.</p>
What concerns me is that the footprint of indexed data in terms of BTrees, Buckets and Sets are huge! The total amount of data committed that related directly to ATDocument is around 30 Mbyte. The total for BTrees, Buckets and IISets is more than 2 Gbyte. Even taking into account that Plone has a lot of catalog indexes and metadata columns (I think 71 in total), this seems very high. I hope that this benchmark will alert developers to the negative side effects of stuffing more indexes in the catalog.<br />
This is a summary of total data committed per class:<br /><br />
<table>
	<colgroup><col width="373" /><col width="89" /><col width="127" /></colgroup>
<tbody>
<tr>
<td align="left"><b>Classname</b></td>
<td align="right"><b>Object Count</b></td>
<td align="right"><b>Total Size (Kbytes)</b></td>
</tr>
<tr>
<td align="left">&#65279;&#65279;BTrees._IIBTree.IISet</td>
<td align="right">640686</td>
<td align="right">1024506</td>
</tr>
<tr>
<td align="left">BTrees._IOBTree.IOBucket</td>
<td align="right">655025</td>
<td align="right">1007623</td>
</tr>
<tr>
<td align="left">&#65279;&#65279;BTrees._IIBTree.IIBucket</td>
<td align="right">252121</td>
<td align="right">163524</td>
</tr>
<tr>
<td align="left">BTrees._OIBTree.OIBucket</td>
<td align="right">132417</td>
<td align="right">101472</td>
</tr>
<tr>
<td align="left">&#65279;&#65279;BTrees._IOBTree.IOBTree</td>
<td align="right">25645</td>
<td align="right">71072</td>
</tr>
<tr>
<td align="left">BTrees._OOBTree.OOBucket</td>
<td align="right">115332</td>
<td align="right">70789</td>
</tr>
<tr>
<td align="left">BTrees._IIBTree.IIBTree</td>
<td align="right">143942</td>
<td align="right">53566</td>
</tr>
<tr>
<td align="left">&#65279;&#65279;BTrees._OOBTree.OOBTree</td>
<td align="right">15875</td>
<td align="right">52354</td>
</tr>
<tr>
<td align="left">&#65279;BTrees._IIBTree.IITreeSet</td>
<td align="right">49383</td>
<td align="right">25975</td>
</tr>
<tr>
<td align="left">BTrees._OIBTree.OIBTree</td>
<td align="right">4613</td>
<td align="right">23008</td>
</tr>
<tr>
<td align="left">Products.ATContentTypes.content.document.ATDocument</td>
<td align="right">10000</td>
<td align="right">15077</td>
</tr>
<tr>
<td align="left">Persistence.mapping.PersistentMapping</td>
<td align="right">20000</td>
<td align="right">8261</td>
</tr>
<tr>
<td align="left">&#65279;&#65279;&#65279;Products.Archetypes.BaseUnit.BaseUnit</td>
<td align="right">30000</td>
<td align="right">7504</td>
</tr>
<tr>
<td align="left">BTrees.Length.Length</td>
<td align="right">220107</td>
<td align="right">6382</td>
</tr>
<tr>
<td align="left">&#65279;OFS.Folder.Folder</td>
<td align="right">10000</td>
<td align="right">537</td>
</tr>
<tr>
<td align="left">&#65279;Products.PlonePAS.tools.memberdata.MemberData</td>
<td align="right">1</td>
<td align="right">0</td>
</tr>
</tbody>
</table>
<br />Here is a summary of transaction sizes for the first few transactions:<br /> <meta http-equiv="CONTENT-TYPE" content="text/html; charset=utf-8" />
	<title></title>
	<meta name="GENERATOR" content="OpenOffice.org 2.4  (Linux)" />
	<style>
		&amp;amp;amp;amp;amp;amp;amp;amp;lt;!-- 
		BODY,DIV,TABLE,THEAD,TBODY,TFOOT,TR,TH,TD,P { font-family:"Nimbus Sans L"; font-size:x-small }
		 --&amp;amp;amp;amp;amp;amp;amp;amp;gt;
	</style>
<table>
	<colgroup><col width="81" /><col width="89" /><col width="127" /></colgroup>
<tbody>
<tr>
<td align="left"><b>Txn id</b></td>
<td align="right"><b>Object count</b></td>
<td align="right"><b>Txn size (bytes)</b></td>
</tr>
<tr>
<td align="left">#00099</td>
<td align="right">179</td>
<td align="right">42119</td>
</tr>
<tr>
<td align="left">#00100</td>
<td align="right">175</td>
<td align="right">40021</td>
</tr>
<tr>
<td align="left">#00101</td>
<td align="right">167</td>
<td align="right">41746</td>
</tr>
<tr>
<td align="left">#00102</td>
<td align="right">171</td>
<td align="right">45480</td>
</tr>
<tr>
<td align="left">#00103</td>
<td align="right">171</td>
<td align="right">48411</td>
</tr>
<tr>
<td align="left">#00104</td>
<td align="right">173</td>
<td align="right">51524</td>
</tr>
<tr>
<td align="left">#00105</td>
<td align="right">171</td>
<td align="right">54265</td>
</tr>
<tr>
<td align="left">#00106</td>
<td align="right">175</td>
<td align="right">57744</td>
</tr>
<tr>
<td align="left">#00107</td>
<td align="right">175</td>
<td align="right">60380</td>
</tr>
<tr>
<td align="left">#00108</td>
<td align="right">180</td>
<td align="right">64854</td>
</tr>
<tr>
<td align="left">#00109</td>
<td align="right">172</td>
<td align="right">61819</td>
</tr>
<tr>
<td align="left">#00110</td>
<td align="right">176</td>
<td align="right">66281</td>
</tr>
<tr>
<td align="left">#00111</td>
<td align="right">173</td>
<td align="right">66906</td>
</tr>
<tr>
<td align="left">#00112</td>
<td align="right">176</td>
<td align="right">70307</td>
</tr>
<tr>
<td align="left">#00113</td>
<td align="right">174</td>
<td align="right">71629</td>
</tr>
<tr>
<td align="left">#00114</td>
<td align="right">184</td>
<td align="right">78853</td>
</tr>
<tr>
<td align="left">#00115</td>
<td align="right">181</td>
<td align="right">79756</td>
</tr>
<tr>
<td align="left">#00116</td>
<td align="right">188</td>
<td align="right">84928</td>
</tr>
</tbody>
</table>
<br />
<p>An the last few transactions:</p>
<meta http-equiv="CONTENT-TYPE" content="text/html; charset=utf-8" />
	<title></title>
	<meta name="GENERATOR" content="OpenOffice.org 2.4  (Linux)" />
	<style>
		&amp;amp;amp;amp;amp;amp;amp;lt;!-- 
		BODY,DIV,TABLE,THEAD,TBODY,TFOOT,TR,TH,TD,P { font-family:"Nimbus Sans L"; font-size:x-small }
		 --&amp;amp;amp;amp;amp;amp;amp;gt;
	</style>
<table>
	<colgroup><col width="81" /><col width="89" /><col width="127" /></colgroup>
<tbody>
<tr>
<td align="left"><b>Txn id</b></td>
<td align="right"><b>Object count</b></td>
<td align="right"><b>Txn size (bytes)</b></td>
</tr>
<tr>
<td align="left">#10081</td>
<td align="right">234</td>
<td align="right">343926</td>
</tr>
<tr>
<td align="left">#10082</td>
<td align="right">226</td>
<td align="right">341061</td>
</tr>
<tr>
<td align="left">#10083</td>
<td align="right">245</td>
<td align="right">394237</td>
</tr>
<tr>
<td align="left">#10084</td>
<td align="right">237</td>
<td align="right">367932</td>
</tr>
<tr>
<td align="left">#10085</td>
<td align="right">228</td>
<td align="right">338461</td>
</tr>
<tr>
<td align="left">#10086</td>
<td align="right">184</td>
<td align="right">310049</td>
</tr>
<tr>
<td align="left">#10087</td>
<td align="right">189</td>
<td align="right">314684</td>
</tr>
<tr>
<td align="left">#10088</td>
<td align="right">246</td>
<td align="right">405305</td>
</tr>
<tr>
<td align="left">#10089</td>
<td align="right">215</td>
<td align="right">334854</td>
</tr>
<tr>
<td align="left">#10090</td>
<td align="right">221</td>
<td align="right">346977</td>
</tr>
<tr>
<td align="left">#10091</td>
<td align="right">195</td>
<td align="right">318492</td>
</tr>
<tr>
<td align="left">#10092</td>
<td align="right">224</td>
<td align="right">351770</td>
</tr>
<tr>
<td align="left">#10093</td>
<td align="right">221</td>
<td align="right">345032</td>
</tr>
<tr>
<td align="left">#10094</td>
<td align="right">206</td>
<td align="right">332271</td>
</tr>
<tr>
<td align="left">#10095</td>
<td align="right">241</td>
<td align="right">541394</td>
</tr>
<tr>
<td align="left">#10096</td>
<td align="right">191</td>
<td align="right">283578</td>
</tr>
<tr>
<td align="left">#10097</td>
<td align="right">236</td>
<td align="right">323354</td>
</tr>
<tr>
<td align="left">#10098</td>
<td align="right">242</td>
<td align="right">329099</td>
</tr>
<tr>
<td align="left">#10099</td>
<td align="right">226</td>
<td align="right">339302</td>
</tr>
</tbody>
</table>
<br />
<p>Transaction detail for txn #00099 (first document):</p>
<meta http-equiv="CONTENT-TYPE" content="text/html; charset=utf-8" />
	<title></title>
	<meta name="GENERATOR" content="OpenOffice.org 2.4  (Linux)" />
	<style>
		&amp;amp;amp;amp;amp;amp;lt;!-- 
		BODY,DIV,TABLE,THEAD,TBODY,TFOOT,TR,TH,TD,P { font-family:"Nimbus Sans L"; font-size:x-small }
		 --&amp;amp;amp;amp;amp;amp;gt;
	</style>
<table>
	<colgroup><col width="81" /><col width="382" /><col width="116" /><col width="86" /></colgroup>
<tbody>
<tr>
<td align="left"><b>Txn id</b></td>
<td align="left"><b>Classname</b></td>
<td align="right"><b>Object count</b></td>
<td align="right"><b>Size (bytes)</b></td>
</tr>
<tr>
<td align="left">#00099</td>
<td align="left">BTrees._IIBTree.IIBTree</td>
<td align="right">3</td>
<td align="right">286</td>
</tr>
<tr>
<td align="left">#00099</td>
<td align="left">OFS.Folder.Folder</td>
<td align="right">1</td>
<td align="right">55</td>
</tr>
<tr>
<td align="left">#00099</td>
<td align="left">BTrees._IOBTree.IOBucket</td>
<td align="right">9</td>
<td align="right">4572</td>
</tr>
<tr>
<td align="left">#00099</td>
<td align="left">BTrees._OIBTree.OIBucket</td>
<td align="right">5</td>
<td align="right">2964</td>
</tr>
<tr>
<td align="left">#00099</td>
<td align="left">BTrees._IOBTree.IOBTree</td>
<td align="right">39</td>
<td align="right">17552</td>
</tr>
<tr>
<td align="left">#00099</td>
<td align="left">BTrees.Length.Length</td>
<td align="right">27</td>
<td align="right">768</td>
</tr>
<tr>
<td align="left">#00099</td>
<td align="left">Persistence.mapping.PersistentMapping</td>
<td align="right">2</td>
<td align="right">846</td>
</tr>
<tr>
<td align="left">#00099</td>
<td align="left">Products.ATContentTypes.content.document.ATDocument</td>
<td align="right">1</td>
<td align="right">1544</td>
</tr>
<tr>
<td align="left">#00099</td>
<td align="left">BTrees._OOBTree.OOBTree</td>
<td align="right">20</td>
<td align="right">3986</td>
</tr>
<tr>
<td align="left">#00099</td>
<td align="left">BTrees._IIBTree.IISet</td>
<td align="right">3</td>
<td align="right">184</td>
</tr>
<tr>
<td align="left">#00099</td>
<td align="left">BTrees._OIBTree.OIBTree</td>
<td align="right">9</td>
<td align="right">1404</td>
</tr>
<tr>
<td align="left">#00099</td>
<td align="left">Products.Archetypes.BaseUnit.BaseUnit</td>
<td align="right">3</td>
<td align="right">767</td>
</tr>
<tr>
<td align="left">#00099</td>
<td align="left">BTrees._OOBTree.OOBucket</td>
<td align="right">2</td>
<td align="right">3286</td>
</tr>
<tr>
<td align="left">#00099</td>
<td align="left">BTrees._IIBTree.IITreeSet</td>
<td align="right">55</td>
<td align="right">3905</td>
</tr>
</tbody>
</table>
<br />
<p>&#65279;Transaction detail for txn #10099 (last document):</p>
<meta http-equiv="CONTENT-TYPE" content="text/html; charset=utf-8" />
	<title></title>
	<meta name="GENERATOR" content="OpenOffice.org 2.4  (Linux)" />
	<style>
		&amp;amp;amp;amp;amp;lt;!-- 
		BODY,DIV,TABLE,THEAD,TBODY,TFOOT,TR,TH,TD,P { font-family:"Nimbus Sans L"; font-size:x-small }
		 --&amp;amp;amp;amp;amp;gt;
	</style>
<table>
	<colgroup><col width="81" /><col width="382" /><col width="116" /><col width="86" /></colgroup>
<tbody>
<tr>
<td align="left"><b>Txn id</b></td>
<td align="left"><b>Classname</b></td>
<td align="right"><b>Object count</b></td>
<td align="right"><b>Size (bytes)</b></td>
</tr>
<tr>
<td align="left">#10099</td>
<td align="left">BTrees._IIBTree.IIBTree</td>
<td align="right">8</td>
<td align="right">2517</td>
</tr>
<tr>
<td align="left">#10099</td>
<td align="left">OFS.Folder.Folder</td>
<td align="right">1</td>
<td align="right">55</td>
</tr>
<tr>
<td align="left">#10099</td>
<td align="left">BTrees._IOBTree.IOBucket</td>
<td align="right">57</td>
<td align="right">81564</td>
</tr>
<tr>
<td align="left">#10099</td>
<td align="left">BTrees._OIBTree.OIBucket</td>
<td align="right">13</td>
<td align="right">9872</td>
</tr>
<tr>
<td align="left">#10099</td>
<td align="left">BTrees._IIBTree.IIBucket</td>
<td align="right">29</td>
<td align="right">20024</td>
</tr>
<tr>
<td align="left">#10099</td>
<td align="left">BTrees._IOBTree.IOBTree</td>
<td align="right">1</td>
<td align="right">85</td>
</tr>
<tr>
<td align="left">#10099</td>
<td align="left">Persistence.mapping.PersistentMapping</td>
<td align="right">2</td>
<td align="right">846</td>
</tr>
<tr>
<td align="left">#10099</td>
<td align="left">BTrees.Length.Length</td>
<td align="right">22</td>
<td align="right">655</td>
</tr>
<tr>
<td align="left">#10099</td>
<td align="left">Products.ATContentTypes.content.document.ATDocument</td>
<td align="right">1</td>
<td align="right">1544</td>
</tr>
<tr>
<td align="left">#10099</td>
<td align="left">BTrees._OOBTree.OOBTree</td>
<td align="right">6</td>
<td align="right">30455</td>
</tr>
<tr>
<td align="left">#10099</td>
<td align="left">BTrees._IIBTree.IISet</td>
<td align="right">65</td>
<td align="right">182708</td>
</tr>
<tr>
<td align="left">#10099</td>
<td align="left">Products.Archetypes.BaseUnit.BaseUnit</td>
<td align="right">3</td>
<td align="right">767</td>
</tr>
<tr>
<td align="left">#10099</td>
<td align="left">BTrees._OOBTree.OOBucket</td>
<td align="right">16</td>
<td align="right">8088</td>
</tr>
<tr>
<td align="left">#10099</td>
<td align="left">BTrees._IIBTree.IITreeSet</td>
<td align="right">2</td>
<td align="right">122</td>
</tr>
</tbody>
</table>
<br /> For a discussion on the above benchmarks, read the thread on ZODB-DEV at<br /><a href="http://mail.zope.org/pipermail/zodb-dev/2008-August/012055.html">http://mail.zope.org/pipermail/zodb-dev/2008-August/012055.html</a><br /><br />Since the discussion on this thread, I've tried out collective.solr but since I don't really know it that well I haven't spend to much time with it. I started developing collective.alchemyindex (not checked in yet) that indexes data in a RDBMS using sqlalchemy. Doing the above benchmark using Postgres for indexing, resulted in Data.fs of around 368MB and a total Postgres database of only 135MB. This seems a lot more acceptable size wise. I'll document this benchmark in a future post.<br />]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>roche</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2008-09-24T20:23:32Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/project-management-checklist">
        <title>Project Management Checklist</title>
        <link>http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/project-management-checklist</link>
        <description>Checklist to keep your projects on track.</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>Checklist to keep your projects on track.</p>
<p>
We manage between 10 and 20 projects concurrently. On a weekly basis we try and ask the questions below on each project to make sure our projects stay on track. It is hard to manage software development. It is even harder to manage client expectations.<br /></p>
<p></p>
<ol><li>Is there anything in the project that can impact on the deadline that the client is not aware of?</li><li>Is there any additional information we need from the client to ensure we make the deadline?</li><li>Does the client know what progress is being made?</li><li>When last did we speak to the client?</li><li>Is the client expecting what we will deliver? Is there a mismatch in expectations?</li><li>Are we spending more time on this project than what we quoted?</li><li>Is time invoiced in line with deliverables completed?</li><li>Is payment for this project up to date?</li><li>Are developers aware of exactly what they must do next?</li><li>Do contracted developers know what they can invoice at the end of the month?</li><li>Are we happy and are we having fun?</li></ol>
<p></p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>roche</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2008-03-05T17:02:15Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks-revisited">
        <title>ZODB Benchmarks revisited</title>
        <link>http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks-revisited</link>
        <description>The truth is that the ZODB is faster than your RDBMS.</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>The truth is that the ZODB is faster than your RDBMS.</p>
<p>My <a title="ZODB Benchmarks" href="zodb-benchmarks">previous post</a> about ZODB benchmarks <b>incorrectly</b> confirmed the general opinion people had of the ZODB which is that it is not very fast when you want to insert millions of objects into it. Many developers will tell you, the ZODB is a low-write high-read database. Many more developers will reduce this to a slating "the ZODB is slow".</p>
<p> Fortunately this is not true! I overlooked a basic thing when I compared ZODB performance with Postgres. I only realised that the Postgres table did not have an index on the key field when I started testing lookup speed. After adding the index I neglected to re-run the insertion test on Postgres. I realised this soon after I wrote up my findings and ran the test again. Adding the index causes the insertion on Postgres to drop logarithmically at a higher rate than the ZODB.</p>
<p><img class="image-inline" src="zodbvspostgres.png" /><br /></p>
<p>For the most part of the test, insertion is faster in the ZODB than in Postgres. Wow, I didn't expect that! I don't know about you but this is a very comforting result for me.</p>
<p> There was a bug in my lookup test as well. After fixing this bug the times for lookups were looking a lot better:</p>
<table class="listing">
<thead>
<tr>
<th>Number of Objects   <br /><img src="../../../../arrowUp.gif" height="6" width="9" /></th>
<th><img src="../../../../arrowBlank.gif" height="6" width="9" />Average Lookup Time in Seconds<br /><img src="../../../../arrowBlank.gif" height="6" width="9" /></th>
</tr>
</thead>
<tbody>
<tr>
<td>100000</td>
<td>0.00000311<br /></td>
</tr>
<tr>
<td>1000000</td>
<td>0.00000648<br /></td>
</tr>
<tr>
<td>10000000</td>
<td>0.00230820<br /></td>
</tr>
</tbody>
</table>
<p>From the above table it is clear that lookups on a BTree with 10 million objects is very fast at around 2 milliseconds. On a Postgres table with 10 million records the average lookup time was 14 milliseconds.<br /></p>
<p>It wasn't my original goal to compare the ZODB with Postgres - I simply used Postgres as a reference point. I do think that the comparison was necessary to "fix" the perception people have of the ZODB. I love the ZODB and were looking really hard for reasons to justify my extensive use of it but benchmarks weren't available. Nevertheless, this is just a starting point. The tests are still very superficial, almost deliberately so since it allows one to compare apples with apples. I think applications like Zope and Plone generally do a lot more with the ZODB than what an RDMBS allows. I mean, of how many systems have you heard that has per-object security and indexed meta data for the majority of the content. I think it would be worthwhile to start quantifying how many objects are modified with common actions in Zope and Plone in a single transaction. This should give us an appreciation of how hard the ZODB is working. Our work is not done.<br /></p>
<p><br /></p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>roche</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2008-03-04T19:37:50Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks">
        <title>ZODB Benchmarks</title>
        <link>http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/zodb-benchmarks</link>
        <description>There doesn't seem to be any ZODB benchmarks readily available. Compelled by a project that needs to scale to very high numbers I started compiling some.</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>There doesn't seem to be any ZODB benchmarks readily available. Compelled by a project that needs to scale to very high numbers I started compiling some.</p>
<p><b>PLEASE NOTE THAT THE RESULTS IN THIS ARTICLE IS INCORRECT. ACCURATE RESULTS ARE DOCUMENTED IN A FOLLOW-UP POST NAMED <a title="ZODB Benchmarks revisited" href="zodb-benchmarks-revisited">ZODB BENCHMARKS REVISITED</a>.</b></p>
<br />
<p>During the Plone conference in Naples this year I started working on <a href="http://svn.plone.org/svn/collective/collective.zodbbench/">collective.zodbbench</a> with the goal to collect performance data for basic operations like inserts and lookups so anybody using the ZODB can make more informed decisions when developing ZODB applications or when choosing the database for your application. Another goal I had was to take this data to the ZODB-DEV mailing list and get explanations for areas where performance is poor and even inspire some improvements where we really need it.</p>
<p>The first benchmark shows how fast can you insert 1Kb objects into a OOBTree. This is not a lot of data and is probably not representative of what we typically store in the ZODB, but it does exercise the BTree implementation. You will notice that performance deteriorates linearly from around 2000 inserts per second when we start to around 250 inserts per second when the BTree contains 10 million objects. One very important observation I made when doing this benchmark is what the impact of the cache_size parameter is. When I ran the test with the default ZODB cache size the insertion rate deteriorated a lot more rapidly and it was barely 50 inserts per second on BTree with 10 million records.</p>
<p><img src="zodb_postgres.png" alt="ZODB vs Postgres" /></p>
<p>In an attempt to determine if the ZODB's drop in performance is normal I
created a test with Postgres purely to observe transaction rate and not
to compare it with the ZODB. Notice how the insert rate remains fairly consistent at around 7000 inserts per second. Roughly every million inserts there is sharp drop in the insert rate. Laurence Rowe on the ZODB-DEV explained this as follow:</p>
<blockquote>
<p>
It looks like ZODB performance in your test has the same O(log n) 
performance as PostgreSQL checkpoints (the periodic drops in your 
graph). This should come as no surprise. B-Trees have a theoretical 
Search/Insert/Delete time complexity equal to the height of the tree, 
which is (up to) log(n).

So why is PosgreSQL so much faster? It's using a Write-Ahead-Log for 
inserts. Instead of inserting into the (B-Tree based) data files at 
every transaction commit it writes a record to the WAL. This does not 
require traversal of the B-Tree and has O(1) time complexity. The 
penalty for this is that read operations become more complex, they must 
look first in the WAL and overlay those results with the main index. The 
WAL is never allowed to get too large, or its in memory index would 
become too big.</p>
</blockquote>
<p>
I was digging further trying to find any bottlenecks that might help in speeding up the ZODB so I did some profiling and noticed that there was a huge amount of calls to the persistent_id method of ObjectWriter in serialize.py. There was 1.3 million calls to this method while only 20000 objects were being persisted while profiling.</p>
<pre>6108977 function calls (6108973 primitive calls) in 57.280 CPU seconds<br /><br />   Ordered by: cumulative time<br />   List reduced from 232 to 20 due to restriction &lt;20&gt;<br /><br />   ncalls  tottime  percall  cumtime  percall filename:lineno(function)<br />        1    0.000    0.000   57.280   57.280 profile_zodb.py:70(run)<br />        1    0.000    0.000   57.280   57.280 :1(?)<br />        1    0.260    0.260   57.280   57.280 profile_zodb.py:24(_btrees_insert)<br />        1    0.000    0.000   57.280   57.280 profile:0(run())<br />     1001    0.030    0.000   51.060    0.051 _manager.py:88(commit)<br />     1001    0.040    0.000   50.990    0.051 _transaction.py:365(commit)<br />     1001    0.110    0.000   50.730    0.051 <br />_transaction.py:486(_commitResources)<br />     1001    0.020    0.000   48.060    0.048 Connection.py:496(commit)<br />     1001    0.220    0.000   48.040    0.048 Connection.py:512(_commit)<br />     9889    0.940    0.000   47.340    0.005 Connection.py:561(_store_objects)<br />    20372    0.480    0.000   39.790    0.002 serialize.py:381(serialize)<br />    20372    0.500    0.000   38.950    0.002 serialize.py:409(_dump)<br />    40750    7.790    0.000   38.020    0.001 :0(dump)<br />  1338046   17.560    0.000   30.230    0.000 serialize.py:184(persistent_id)<br />  2177223    9.150    0.000    9.150    0.000 :0(isinstance)<br />    20373    1.550    0.000    5.240    0.000 FileStorage.py:631(store)<br />     2964    0.050    0.000    4.980    0.002 Connection.py:749(setstate)<br />     2964    0.100    0.000    4.930    0.002 Connection.py:769(_setstate)<br />     2964    0.080    0.000    4.180    0.001 serialize.py:603(setGhostState)<br />     2964    0.030    0.000    4.100    0.001 serialize.py:593(getState)<br /></pre>
<p>Jim Fulton explained that this is because it's called for *all* objects, not just persistent objects. This includes, ints, strings (including attribute names), etc. And then he <a href="http://mail.zope.org/pipermail/zodb-dev/2007-November/011281.html">revealed</a> an undocumented feature that made an impressive difference to ZODB performance:<br /></p>
<pre>Note that there is a undocumented feature in cPickle that I added  <br />years ago to deal with this issue but never got around to pursuing.   <br />Maybe someone else would be able to spend the time to try it out and  <br />report back.<br /><br />If you set inst_persistent_id, rather than persistent_id, on a  <br />pickler, then the hook will only be called for instances.  This  <br />should eliminate that vast majority of the calls.</pre>
<p>At least for the first million inserts the improvement is significant:</p>
<p><img src="zodb_persistent_id.png" alt="ZODB persistent_id" /><br /></p>
<p>The final benchmark tested average lookup speed on BTrees:</p>
<p>
<table class="listing">
<thead>
<tr>
<th>Number of Objects   <br /></th>
<th>Average Lookup Time in Seconds<br /></th>
</tr>
</thead>
<tbody>
<tr>
<td>100000</td>
<td>0.000311</td>
</tr>
<tr>
<td>1000000</td>
<td>0.000648</td>
</tr>
<tr>
<td>10000000</td>
<td>0.23082</td>
</tr>
</tbody>
</table>
<br /></p>
<p>
Notice that lookup speed drops to 230 milliseconds when there is about 10 million records in the database. <b><i>There was a bug in the code that caused this result to be incorrect. The true average lookup speed is 0.0023082s (23 ms) which is very fast and completely acceptable.</i></b><br /></p>
<p>I have tried many things during my benchmarks and it has took up to 3 months to compile them in between working on projects. I takes a very long time to do these test and I often had them running for a whole weekend. Some of the things I tried that did not yield any significant performance gain was:</p>
<ul><li>Reducing number of calls to fsync</li><li>Remove calls to fsync completely</li><li>Recompiling BTrees with increased bucket sizes to reduces the overhead of bucket splits.<br /></li></ul>
<p>I find the statistics gathered from these benchmarks worrying and I hope that they inspire further investigation into the ZODB and encourage improvement. It is clear to me that ZODB performance (the BTree implementation specifically) for large datasets must be investigated. For an application like Plone an insertion rate of 250 objects per second results in terrible performance if you consider that storing a single document instance leads to close to 120 objects being inserted in a single transaction. And this explains why you can hardly insert more than 2 or 3 documents per second if you are uploading them into a Plone site. Separating data into separate BTrees or files will not solve the problem either since catalog indexes can easily contain 10 million or more objects in large deployments. If the lookup speed on a large index is 200 milliseconds and 1 second is the maximum response time you can afford, realise that you can only do 5 lookups in that second. You see the problem? <i><b>As pointed out above, there is nothing wrong with the lookup speed. So the only real concern I have is that the insertion rate is dropping to rapidly.</b></i><br /></p>
<p>More conclusions in a later post.</p>
<p><b><b>PLEASE NOTE THAT THE RESULTS IN THIS ARTICLE IS INCORRECT. ACCURATE RESULTS ARE DOCUMENTED IN A FOLLOW-UP POST NAMED <a title="ZODB Benchmarks revisited" href="../zodb-benchmarks-revisited">ZODB BENCHMARKS REVISITED</a>.</b></b></p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>roche</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2008-03-04T14:04:31Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/portlet-love">
        <title>Portlet love</title>
        <link>http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/portlet-love</link>
        <description>"Therefore, since brevity is the soul of wit,
And tediousness the limbs and outward flourishes, I will be brief" 
- Shakespeare, Hamlet, 1603</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>"Therefore, since brevity is the soul of wit,
And tediousness the limbs and outward flourishes, I will be brief" 
- Shakespeare, Hamlet, 1603</p>
<p>In a community where people donate their time and genius to develop software like Plone, the worst thing one can do is complain without offering help or advise, because that just doesn't help the evolution of the software at all! So apologies for <a href="meeting-plone-3">complaining</a> without suggesting what I think would be better. I would like to help the evolution of Plone in a constructive way so let met try again ;-)<br /></p>
<p>First, let me clarify my position somewhat. I don't think that going back to Plone 2.5 style portlets is the way to go. It is unwieldy for all the reasons mentioned by Martin and Geir in their posts. At the same time I don't think it should be that complicated to create a simple portlet in Plone 3.0. So both Martin and Geir suggested I use the "not recommended" classic portlet. I admit this is very easy but it doesn't solve my use cases from a development point of view.  I want to register a portlet by name and use a Zope 3 view for any python logic. So the classic portlet does not give me the same flexiblity offered to other developers that use the new portlet engine and the new dance is to complex.</p>
<p>The new portlets engine is not so complex that you can't get your head around it or get used to it. I still managed to rewrite my portlet in less than an hour. It is just not as simple as it should be. Ideally (for simple cases) one shouldn't need more than one zcml, a view class, an interface and a template. Even this still sounds like to much and I believe this is something that Grok is trying to solve. It shouldn't be necessary to define assignment, renderer, addview and editview if I don't need it. And it shouldn't be necessary to define something like a NullAddForm. In this sense the portlet engine introduces to much new terminology when you don't need it.<br /></p>
<p>Maybe it was unfair of me to shout at the portlets engine specifically - I'm sure there are other warts. What I would like to emphasize though is that we don't use the power of Zope 3 to build a framework with unnecessary layers of indirection that alienates new developers even further. We should use the power of Zope 3 to make things simple and more powerful at the same time. And it should be fun and easy to learn.<br /></p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>roche</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2007-11-12T19:18:37Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/meeting-plone-3">
        <title>Portlet insanity in Plone 3</title>
        <link>http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/meeting-plone-3</link>
        <description>Something that used to be a single template is now an interface, at least 3 view classes, an unnecessary complex zcml directive, a generic setup profile and more zcml to register the profile. Serenity now!</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>Something that used to be a single template is now an interface, at least 3 view classes, an unnecessary complex zcml directive, a generic setup profile and more zcml to register the profile. Serenity now!</p>
<p>So I just started migrating my first product to Plone 3. I get an AttributeError on right_slots so the first thing it seems I need to migrate are portlets. I've heard the portlet engine mentioned before so I'm not confused yet. I head over to the documentation section on plone.org and find a tutorial on overriding portlets in Plone 3.0. I couldn't find any docs on creating a new portlet. At least this tutorial points me to the plone.app.portlets package. For a brief moment I stare at the products in parts/plone in my buildout directory until I realize this is egg land. Looking at the portlets package I immediately realise that portlets are now Zope 3 views. I'm fairly familiar with those since I've been using them in Plone 2.5 projects for a while now.</p>
<p>I pick the news portlet as example and open up news.py. I close it immediately after all the unfamiliar classes in there jumps at me like a pack of wild dogs. Mmm, the login portlet seems like a more basic starting point. Not! These bloody dogs will follow you no matter what portlet view you open. My immediate reaction is that this is insanity. Something that used to be a single template is now an interface, at least 3 view classes namely Assignment, Renderer and AddForm and an unnecessary complex zcml directive. Wait I'm not done yet. You still need to define a GenericSetup profile and register the profile. This is insane. I wonder how long this feeling will last until I bear this burden like a chastising monk and even start to praise the glorious separation that is now required to make portlets. I must resist.</p>
<p>The most basic portlet should be no more difficult to write than a couple of lines of HMTL and TAL. I really beg whoever developed this to consider an approach that favours convention over configuration. It should not be this difficult to write a portlet. This is not better. It might be more flexible, probably infinitely so, but it is not better software.</p>
<p>I do realize that you can do portlets pre-3-era style but this is not really recommended. The point of my blog post is that one should really balance pristine framework and pragmatism. We shouldn't make it more difficult for new developers to use our framework. Maybe this level of separation is not pristine ...<br /></p>
<p>The rest of the migration was pretty straightforward. I didn't have to modify any of my Archetypes content types. Not that I would have minded, providing that it was simpler than what I just encountered in the portlets code. Retaining Archetypes was a wise and pragmatic decision. Turning portlets into a something that now requires an in depth knowledge of Zope 3 was not!</p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>roche</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2007-11-11T11:55:46Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/plone-conference-2007-naples">
        <title>Inspired by another great Plone Conference in Naples</title>
        <link>http://www.upfrontsystems.co.za/Members/roche/where-im-calling-from/plone-conference-2007-naples</link>
        <description>This year, I was looking forward to only travel 10 hours to the conference in Italy. Last year it took me a whole 20 hours to get Seattle. And you know what, I'll travel 40 hours if it means I can experience the magic that a Plone conference delivers. What impressed me most about this year's conference was the level of self-reflection the community had about how we got to where we are and what the current state of Plone is.</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>This year, I was looking forward to only travel 10 hours to the conference in Italy. Last year it took me a whole 20 hours to get Seattle. And you know what, I'll travel 40 hours if it means I can experience the magic that a Plone conference delivers. What impressed me most about this year's conference was the level of self-reflection the community had about how we got to where we are and what the current state of Plone is.</p>
<p>Where are we? Some say we have the best content management system on the planet. If this is true or not is not really relevant, since Plone cares more about what its users want and experience than the amount of market speak that rings true. The self-reflection I'm talking about were most obvious in Lennart Regebro's talk, <i><a href="http://regebro.wordpress.com/2007/10/14/plone-conference-2007-what-zope-did-wrong-the-slides/">What Zope did wrong</a></i>, Andy Mckay's talk, <a href="http://www.agmweb.ca/blog/andy/2018/"><i>What Plone can learn from Rails</i></a> and the case study panel, <i>So you want to be a Plone consultant</i>.</p>
<p>Lennart's talk was enlightening on many levels. He presented a very clear picture of how Zope 2 was too tightly coupled and how Zope 3 tried to solve this. Zope 2 has often been called unpythonic and the fact that its components are so tightly coupled makes it very difficult to reuse Zope code outside of a Zope application. So although the rest of the Python community can learn from Zope, there is no easy way for them to actually use the code that is written in Zope land. I finally understood why I have been getting blank stares and no interest when I presented the stack of Zope technologies at our local Python user group - they simply didn't feel safe swallowing the whole pill when they are only interested in a few cool things like ZPublisher or Page Templates. After listening to Lennart's talk one certainly feels that we are moving in the right direction with the adoption of Zope 3. The separation that Zope 3 offers might make more of Zope accessible to the rest of the Python community, but that doesn't mean that they will actually use it though. This is not really the point I'm trying to make. The fact that we, the Zope and Plone community, do consider our flaws in a open and public forum and go to great trouble to correct them is unique about open source communities and how open source software evolves. Limi profoundly ended this session in comment in which he emphasized that Plone is not the software but the community and that Plone might well run on Java in future (god forbid ;-) ) if the community decides that this is the best way forward.<br /></p>
<p>If you like revolution, you would have loved Andy's talk. I certainly did! He was shooting controversial ideas into the air backed by so much real life experience and reason that it was impossible to not consider it seriously. The notions that Plone should use a relational database, that any new term or technology (like KSS) is barrier to entry and that TTW development should not be abandoned in favour of file system development, all sound radical out of context, but you can't ignore it in context. What is significant about Andy's talk is that there are people in the community that can go beyond the borders of our own technology and community and check out what other projects are doing. Alexander Limi often encourages people to attend a Drupal conference to see what they're up to, to start a dialogue and exhange ideas. Doing so shows great maturity in that we are secure enough with what we have achieved to the extent that we are open to learn from others and share with them what we have learned.</p>
<br />
<p>The case study panel for Plone consultants, led by Nate Aune of Jazkarta, had the principles of the most prominent Plone companies present and they all shared the stories of how they built their business on Plone and made a success of it. It was clear from this panel that nobody really had the perfect answer or strategy to running a Plone company but that we all adapt through our experience until we find a recipe that works. You might think that a group of companies that all do Plone are really competition and would be reluctant to share their secrets. The reality however is that these companies all realise that the more Plone companies there are, the bigger the market becomes. Every year at the conference, the keynote echos "Everybody is busy", emphasising just how much Plone work there is. For this reason this panel is a significant milestone in Plone's growth and we should keep having it in years to come.<br /></p>
<br />
<p>I really made an effort this year to talk to as many core Plone developers as I could. They are often referred to as rock stars but they remain down to earth, and I want to thank all of them for their humility and eagerness to discuss Plone and share their knowledge.<br /></p>
<p>Thank you Plone!<br /></p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>roche</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2007-10-31T04:52:25Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>





</rdf:RDF>

