<?xml version="1.0" encoding="utf-8" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:dc="http://purl.org/dc/elements/1.1/"
         xmlns:syn="http://purl.org/rss/1.0/modules/syndication/"
         xmlns="http://purl.org/rss/1.0/">




    



<channel rdf:about="http://www.upfrontsystems.co.za/Members/izak/sysadman/RSS">
  <title>sysadman</title>
  <link>http://www.upfrontsystems.co.za</link>
  
  <description>
    
       The Sysadmin-man blog
       
  </description>
  
  
  
            <syn:updatePeriod>daily</syn:updatePeriod>
            <syn:updateFrequency>1</syn:updateFrequency>
            <syn:updateBase>2009-02-03T12:08:15Z</syn:updateBase>
        
  
  <image rdf:resource="http://www.upfrontsystems.co.za/logo.jpg"/>

  <items>
    <rdf:Seq>
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/izak/sysadman/cleaning-documents-polluted-by-copy-paste-from-msword"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/izak/sysadman/postgresqls-confusing-authentication-configuration"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/izak/sysadman/how-to-commit-a-transaction-even-when-sqlalchemy-thinks-the-session-is-clean"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/izak/sysadman/how-to-compile-python2.4-packages-for-newer-versions-of-ubuntu"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/izak/sysadman/using-xslt-to-shorten-some-links"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/izak/sysadman/why-i-hate-mysql"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/izak/sysadman/internet-security-2010"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/izak/sysadman/varnish-zope-and-backend-checking"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/izak/sysadman/choose-your-index-carefully"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/izak/sysadman/a-decorator-for-doing-things-in-a-subprocess"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/izak/sysadman/migrating-an-entire-linux-vserver-virtual-server-to-another-machine"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/izak/sysadman/import-considered-harmful"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/izak/sysadman/using-zope-schemas-with-a-complex-vocabulary-and-multi-select-fields"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/izak/sysadman/logsplit"/>
        
        
            <rdf:li rdf:resource="http://www.upfrontsystems.co.za/Members/izak/sysadman/spreadmirror"/>
        
    </rdf:Seq>
  </items>

</channel>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/izak/sysadman/cleaning-documents-polluted-by-copy-paste-from-msword">
        <title>Cleaning documents polluted by copy-paste from MSWord</title>
        <link>http://www.upfrontsystems.co.za/Members/izak/sysadman/cleaning-documents-polluted-by-copy-paste-from-msword</link>
        <description></description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p></p><p>This problem is much less severe now that Plone uses tinyMCE in the newer versions, but we still run into problems with older documents created in Kupu on older versions of Plone.</p>
<p>Case in point, yesterday I dumped the content of such a document to a file and cleaned it up. This resulted in a reduction in file size of more than 90%.</p>
<pre class="literal-block">
-rw-r--r-- 1 izak izak 3.2M 2011-09-15 15:52 /tmp/before.html
-rw-r--r-- 1 izak izak 205K 2011-09-15 16:09 /tmp/after.html
</pre>
<p>One thing that TinyMCE definitely doesn't handle as well as Kupu, is 3.2M documents, so we can no longer ignore the MSWord bloat. I wrote the following bit of code to make the cleanup easier. It uses Elementtree.</p>
<pre class="literal-block">
import sys
from lxml import etree
from lxml.etree import HTMLParser

parser = HTMLParser()
fp = open(sys.argv[1], 'r')
tree = etree.parse(fp, parser)
fp.close()

xslt = etree.XML(&quot;&quot;&quot;\
&lt;xsl:stylesheet xmlns:xsl=&quot;http://www.w3.org/1999/XSL/Transform&quot; version=&quot;1.0&quot;&gt;
    &lt;xsl:template match=&quot;comment()&quot; /&gt;
    &lt;xsl:template match=&quot;style&quot; /&gt;
    &lt;xsl:template match=&quot;link&quot; /&gt;
    &lt;xsl:template match=&quot;&#64;*|node()&quot;&gt;
        &lt;xsl:copy&gt;&lt;xsl:apply-templates select=&quot;&#64;*|node()&quot;/&gt;&lt;/xsl:copy&gt;
    &lt;/xsl:template&gt;
&lt;/xsl:stylesheet&gt;&quot;&quot;&quot;)
transform = etree.XSLT(xslt)

newtree = transform(tree)
print str(newtree)
</pre>
<p>I hope this is useful to someone.</p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>izak</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2011-09-16T09:13:36Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/izak/sysadman/postgresqls-confusing-authentication-configuration">
        <title>Postgresql's confusing authentication configuration</title>
        <link>http://www.upfrontsystems.co.za/Members/izak/sysadman/postgresqls-confusing-authentication-configuration</link>
        <description>Most distributions ship postgresql configured in "ident" mode. This is the first thing a MySQL user changes.</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>Most distributions ship postgresql configured in "ident" mode. This is the first thing a MySQL user changes.</p><p>This is a somewhat opinionated post on security practices used with projects that use relational databases.</p>
<p>My number one gripe of course is the use of &quot;trust&quot; in pg_hba.conf. If you use &quot;trust&quot;, your thinking needs readjustment. I think the traditional first step in these situations is to admit that you have a problem. Hi, my name is (fill in your name here) and I use &quot;trust&quot; in my postgresql authentication setup. I don't care about security, I just want it to work.</p>
<p>You don't want to use &quot;trust&quot;. It means that anyone is who they say they are. If my mail server has a remotely exploitable vulnerability, as did Exim last year, and someone manage to get shell access to that unprivileged user, they can connect to postgresql and demand to be called god. And it shall be granted.</p>
<p>If you don't want to be bothered with understanding how this works, then use a password mechanism such as md5. For example, on most Linux systems you want this in your pg_hba.conf:</p>
<pre class="literal-block">
local   all         postgres                          ident
local   all         all                               md5
host    all         all         127.0.0.1/32          md5
host    all         all         ::1/128               md5
</pre>
<p>You need to keep that one ident line so the cron job that does the vacuuming will continue to work. Everything else will require a password, so this will work just like MySQL.</p>
<p>I do not like this setup though. Why you may ask? Because I do not like embedding login details in configuration files. Login details are supposed to be there for security purposes. Storing those details in a file, a file that is most likely readable by the application server you expose to the big bad world, provides little security. It also means that you cannot check that configuration file into your source version control system as is without exposing potentially sensitive information to anyone who has access to it, and if you are anything like me, you've done this accidentally at least once. Storing a non-functional configuration file means that each time you check it out, you have to modify a configuration file before it will work. Not big problems I'll admit, but completely avoidable.</p>
<p>This is why I like &quot;ident&quot; authentication. In its default setup, it says simply that if the username of the logged in user, as seen from the operating system's point of view, is the same as the database user he claims to be, then access is granted. This works fairly well for development. Simply set the owner of your database to be the same as your unix user name, and no login details is necessary.</p>
<pre class="literal-block">
local   all         postgres                          ident
local   all         all                               ident
host    all         all         127.0.0.1/32          md5
host    all         all         ::1/128               md5
</pre>
<p>At this point I want to explain how the lines in pg_hba.conf work. The first word in the line specifies whether this line applies to local connections, through the local unix socket, or to network connections from remote machines. The second column contains a database name, or &quot;all&quot; for all databases. The third column contains a user name. This is followed by an address and a netmask for &quot;host&quot; authentication, and finally by an authentication method, of which there are several. These lines are evaluated from top to bottom, and whichever one matches first is used for authentication. Order is therefore important, generic rules need to come after specific ones. In the above setup, all users connecting through the local unix socket need to connect with a username that match their unix user name (ident). All users connecting through the network need a username and password (md5).</p>
<p>Using &quot;ident&quot; on its own will require the unix user and the postgresql user to match. But there is a way you can allow users to connect as a different user while still using &quot;ident&quot;. I'm told that this is what is most confusing to new users of postgresql.</p>
<p>You set this up in a file called pg_ident.conf. This file contains maps, it maps unix users to postgresql users. It is a simple three-column file:</p>
<pre class="literal-block">
 # MAPNAME     SYSTEM-USERNAME    PG-USERNAME
devel          john               zope
devel          jane               zope
devel          peter              zope
</pre>
<p>Combined with a pg_hba.conf like this</p>
<pre class="literal-block">
local   all         postgres                          ident
local   myprojectdb all                               ident map=devel
local   all         all                               ident
host    all         all         127.0.0.1/32          md5
host    all         all         ::1/128               md5
</pre>
<p>This will allow john, jane and peter to log into myprojectdb with the username &quot;zope&quot;.</p>
<p>However, complex setups that require setup in pg_ident.conf can also be avoided. If your application server runs as the &quot;zope&quot; user, then simply create a zope user in postgresql, and grant that user the required access in the relevant databases. End of story.</p>
<p>It also means the connection string in your configuration file is now</p>
<pre class="literal-block">
postgresql://&#64;:5432/myprojectdb
</pre>
<p>Instead of</p>
<pre class="literal-block">
postgresql://john:b00j4h&#64;:5432/myprojectdb
</pre>
<p>And that means you can check it straight into source control without exposing sensitive login details.</p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>izak</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2011-08-19T21:17:28Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/izak/sysadman/how-to-commit-a-transaction-even-when-sqlalchemy-thinks-the-session-is-clean">
        <title>How to commit a transaction even when sqlalchemy thinks the session is clean</title>
        <link>http://www.upfrontsystems.co.za/Members/izak/sysadman/how-to-commit-a-transaction-even-when-sqlalchemy-thinks-the-session-is-clean</link>
        <description>This happens when you call session.execute() or session.connection().execute()</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>This happens when you call session.execute() or session.connection().execute()</p><p>For the most part I try not to bother with low-level SQL, even though I'm quite proficient in doing the occasional complex join operation. But occasionally you just want to delete a whole bunch of records quickly. In this specific instance, I just wanted to delete large numbers of users stored in a postgresql database and used in zope by means of pas.plugins.sqlalchemy.</p>
<p>I tried to do it the obvious way.</p>
<pre class="literal-block">
from pas.plugins.sqlalchemy import model
from z3c.saconfig import named_scoped_session
Session = named_scoped_session(&quot;pas.plugins.sqlalchemy&quot;)
session = Session()
session.execute(delete(model.User.__table__))
</pre>
<p>It didn't work. The records were all still there. So I turned statement logging on in the database to see what it was doing, and I noticed that at the end of the transaction, it always calls ROLLBACK. Google suggested that this sneaky low-level modification of the database does not mark the session as dirty, and therefore it is not committed. Eventually I dumbed down my search string and found this solution, which I will now (re)share with the world. If you are using zope.sqlalchemy, all you need to do is this:</p>
<pre class="literal-block">
from zope.sqlalchemy import mark_changed
mark_changed(session)
</pre>
<p>Its even documented on the project page on pypi.</p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>izak</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2011-08-19T13:02:10Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/izak/sysadman/how-to-compile-python2.4-packages-for-newer-versions-of-ubuntu">
        <title>How to compile python2.4 packages for newer versions of ubuntu</title>
        <link>http://www.upfrontsystems.co.za/Members/izak/sysadman/how-to-compile-python2.4-packages-for-newer-versions-of-ubuntu</link>
        <description>Since Jaunty Jackalope, Ubuntu linux no longer ships with python2.4. You can compile it from source, but if you are like me, you want a package.</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>Since Jaunty Jackalope, Ubuntu linux no longer ships with python2.4. You can compile it from source, but if you are like me, you want a package.</p><p>There are a few people out there who provide such packages, myself included, but often they do not provide packages for all architectures, of which there are now several combinations. Just calculate the cross-product of lucid, maverick, natty and debian squeeze and i386 and amd64 as a start. Sure, debootstrap is your friend and someone like me could build them all if I wanted to, but I tend to stop working once I've scratched my own itches.</p>
<p>Here then, by popular request from my co-workers, is how you build your own. This package is based off the python 2.4.6 package that was in Debian Lenny, and was only adapted to remove most of the documentation, because it didn't build properly and you can just install the python2.6 versions if you must, and to deal with the multi-arch feature that is now in Natty.</p>
<p>Start by pointing your browser at <a class="reference" href="http://public.upfronthosting.co.za/debian/sources/">http://public.upfronthosting.co.za/debian/sources/</a>. Download the .orig.tar.gz file, and the .dsc and .diff.gz file that correspond with your distribution. Place them in a convenient spot, personally I prefer $HOME/debian/python2.4.</p>
<p>You will need to install a few things to get started</p>
<pre class="literal-block">
apt-get install build-essential fakeroot
</pre>
<p>Then unpack your package and check the build dependencies</p>
<pre class="literal-block">
dpkg-source -x python2.4_2.4.6-1+natty1.dsc
cd python2.4-2.4.6
dpkg-checkbuilddeps
</pre>
<p>Install any listed dependencies, then compile your package</p>
<pre class="literal-block">
dpkg-buildpackage -b -uc -rfakeroot
</pre>
<p>The -uc option will skip any signing of the resulting binary packages. This is okay, since this is for local consumption only. If you insist on signing them, you'll have to set up gpg keys and update debian/changelog. Exact details of this is beyond the scope of this post, and frankly, beyond the time limits of my own memory as well.</p>
<p>When the building completes, you will have a number of debian packages in the parent directory, that is $HOME/debian/python2.4 if you followed my example. These can be installed with dpkg -i, or with gdebi. For the most part:</p>
<pre class="literal-block">
sudo dpkg -i python2.4-minimal_2.4.6-1+natty1_i386.deb python2.4_2.4.6-1+natty1_i386.deb python2.4-dev_2.4.6-1+natty1_i386.deb
</pre>
<p>The natty version build-depends on dpatch. This is simply because I was too lazy to convert the patch to work without it, and because an additional patch was required to deal with its multi-arch nature, or simply, the fact that zlib is not where it used to be previously.</p>
<p>Also note that even though a -doc package is produced, its largely useless.</p>
<p>Also, for the impatient who just need a binary package and happen to run an i386 architecture, you can add one of these to /etc/apt/sources.list</p>
<pre class="literal-block">
deb http://public.upfronthosting.co.za/debian/lucid-i386 /
deb http://public.upfronthosting.co.za/debian/maveric-i386 /
deb http://public.upfronthosting.co.za/debian/natty-i386 /
</pre>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>izak</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2011-08-16T09:16:34Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/izak/sysadman/using-xslt-to-shorten-some-links">
        <title>Using XSLT to shorten some links</title>
        <link>http://www.upfrontsystems.co.za/Members/izak/sysadman/using-xslt-to-shorten-some-links</link>
        <description>When plone uses resolveuid and relative linking, the link can often be much shorter</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>When plone uses resolveuid and relative linking, the link can often be much shorter</p><p>This is not a long post. Its just something we had to do this morning, something that was necessary because some software on the windows platform has trouble with very long urls.</p>
<p>We noticed that the majority of urls on the site ends in resolveuid/someuid, but that they are very long because the folder structure is both deeply nested and have long and descriptive names. What we wanted to do is to shorten them simply to /resolveuid/someuid. But we didn't want to modify the content, because it would just break again the next time its opened in TinyMCE.</p>
<p>The solution is simple, because we were already using xdv.</p>
<pre class="literal-block">
&lt;xsl:template match=&quot;&#64;href[starts-with(., 'resolveuid/') or contains(., '/resolveuid/')]&quot;&gt;
  &lt;xsl:attribute name=&quot;href&quot;&gt;/resolveuid/&lt;xsl:value-of select=&quot;substring-after(., 'resolveuid/')&quot; /&gt;&lt;/xsl:attribute&gt;
&lt;/xsl:template&gt;
</pre>
<p>This has to be placed in your rules file directly inside the &lt;rules&gt; tag. You cannot place it in any file that is included using xinclude, as I wanted to do, that does not work. It does now work because xincluded rules end up in nested &lt;rules&gt; tags, and the xsl stylesheet that generates the final stylesheet only looks for inline xsl in xdv:rules/xsl:*, that is, the topmost level.</p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>izak</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2011-08-12T09:36:55Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/izak/sysadman/why-i-hate-mysql">
        <title>Why I hate MySQL</title>
        <link>http://www.upfrontsystems.co.za/Members/izak/sysadman/why-i-hate-mysql</link>
        <description>Certain things in MySQL really really gets to me...</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>Certain things in MySQL really really gets to me...</p><div class="section">
<h3><a id="mysql-has-a-weird-non-standard-way-of-quoting-literals" name="mysql-has-a-weird-non-standard-way-of-quoting-literals">MySQL has a weird non-standard way of quoting literals</a></h3>
<p>For example, if you named a table &quot;order&quot;, you need to quote it because order is also a reserved word in SQL:</p>
<pre class="literal-block">
Postgresql: select * from &quot;order&quot;
MySQL: select * from `order`
</pre>
</div>
<div class="section">
<h3><a id="triggers-are-useless" name="triggers-are-useless">Triggers are useless</a></h3>
<p>Triggers are a controversial feature that many people prefer not to use, but they can be useful for validation.
But not if you run an older version of MySQL, and they may turn out to be useless even in the newer versions. I am told this issue has been fixed in the latest versions, but we often have to deploy on older versions so I include this little gripe. Although you can &quot;make things happen&quot; upon an insert, update or delete, you cannot reject the data by raising some kind of error. The only way to do this is to do something illegal in your trigger to effectively crash the transaction, and then the error message does not match the crime at all.</p>
</div>
<div class="section">
<h3><a id="procedural-support-is-limited" name="procedural-support-is-limited">Procedural support is limited</a></h3>
<p>Its hard to write stored procedures or triggers (and therefore useful aggregate functions) for MySQL, because the languages at your disposal are so limited. Postgresql does spoil you with a plethora of choices, but even just one slightly-more-powerful-than-SQL yet slightly-higher-level than C language would help a lot.</p>
</div>
<div class="section">
<h3><a id="it-has-no-sequences" name="it-has-no-sequences">It has no sequences</a></h3>
<p>Yes, it has auto_increment. Yes, you can implement your own makeshift sequences, but then you have to use locking to ensure mutual exclusion to make sure two clients do not get the same sequence value.</p>
</div>
<div class="section">
<h3><a id="default-values-for-a-column-must-be-a-constant" name="default-values-for-a-column-must-be-a-constant">Default values for a column must be a constant</a></h3>
<p>When I define a column in a table, I can specify a default for the column. In Postgresql this can be any function, and in the normal auto-increment use case you use a function to get the next available value from a sequence. In MySQL, there are two special cases: NOW and auto_increment. All other default values must be constants.</p>
</div>
<div class="section">
<h3><a id="group-by-allows-selection-of-columns-not-in-the-group-by-clause" name="group-by-allows-selection-of-columns-not-in-the-group-by-clause">GROUP BY allows selection of columns not in the GROUP BY clause</a></h3>
<p>The documentation calls this a feature. MySQL will pick a representative value from the group if you don't tell it how to group a selected column. This allows people to write queries that appear to work, and they might even rely on the selected representative value without realising that there is no explicit way to know how it was picked. If you explicitly use an aggregate function in your where clause, you know exactly what to expect. Explicit is better than implicit. And besides, these queries are incompatible with other databases and hinders portability.</p>
</div>
<div class="section">
<h3><a id="strings-sometimes-evaluate-to-zero" name="strings-sometimes-evaluate-to-zero">Strings sometimes evaluate to zero</a></h3>
<p>This is where things become zopeish. On a recent project using <a class="reference" href="http://zope-alchemist.googlecode.com/">ore.alchemist</a> I spent an hour trying to figure out why acquisition does not work and I end up with a ghost object from the database instead of a template somewhere higher up. It turns out that when I asked for context/index_html, and due to an omission in my code that did not check for numeric keys, MySQL was asked to find a user with USERID='index_html'. Because USERID is an integer column, MySQL correctly raised a warning (which disappeared among many other lines in the log file), assumed that what I really meant is USERID=0, and returned that row instead. Obvious errors like this should not pass with just a warning but with a very loud and clear error.</p>
</div>
<div class="section">
<h3><a id="it-is-not-as-fast-as-you-might-expect" name="it-is-not-as-fast-as-you-might-expect">It is not as fast as you might expect</a></h3>
<p>I don't know if this might be limited to older versions or to the open source versions, but during development we noticed that MySQL only uses one core on the CPU. Its faster, but only when it runs on a single-core single-cpu system using the MyISAM engine. On a recent project, I found that Postgresql can easily outperform MySQL (with InnoDB) on a more powerful machine.</p>
</div>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>izak</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2010-11-09T10:54:29Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/izak/sysadman/internet-security-2010">
        <title>A Rogue Antivirus called Internet Security 2010</title>
        <link>http://www.upfrontsystems.co.za/Members/izak/sysadman/internet-security-2010</link>
        <description>I don't use Windows. This is a religious conviction. As a rule I don't fix other people's Windows PC's either, because once you've done that anything that goes wrong is your fault. But every once in a while a really good friend asks you to help and you take pity on him.</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>I don't use Windows. This is a religious conviction. As a rule I don't fix other people's Windows PC's either, because once you've done that anything that goes wrong is your fault. But every once in a while a really good friend asks you to help and you take pity on him.</p><p>The PC boots up and immediately goes bezerk, telling you there is spyware on the machine and you had better run a full scan pronto. Then this rather official looking program pops up and tells you you have about two dozen files infected by as many root kits, trojans and virii. It will remove those at a cost of $50. Internet posts suggests you might be charged arbitrary amounts ranging in the hundreds for this little favour.</p>
<p>This is the story of a Rogue aka fake Antivirus program. It is incredibly sneaky. First things first, it does this to your registry:
<pre>
    HKCU\Software\Microsoft\Windows\CurrentVersion\Policies\System\DisableTaskManager = 1
</pre>
</p>
<p>This makes it impossible to get into the task manager and kill anything. In addition, trying to start anything, be it taskmgr, regedit or a web browser, results in a popup telling you that the application is infected and cannot run. It completely killed off the two other antivirus programs on the machine, McAfee and AVG, and stopped Adobe Flash from updating itself. This is of course an attempt to stop you from using any kind of tool that might get you out of this situation, under the guise that doing anything at this point other than running a full scan (which will cost you a random amount) is too dangerous.</p>
<p>But the programmer made a small mistake that is helpful here. By the time you read this, it might no longer work. When the dialog box pops up, don't click OK. Leave it there. In this position, the rogue app is blocked and does not kill apps that you start. Now run regedit and fix your registry (just delete the above key), then hit ctrl+alt+del and start the task manager. Not that it is going to help much, but it feels sort of good.</p>
<p>It doesn't help because this irritating piece of buffalo dung has taken over what is supposedly a valid windows binary called wscntfy.exe, and it cannot be killed. This is a surprise to me, since I have often lamented the crazy KillProcess stuff that does not use signals that can be caught. Something must be supervising this and messing us around.</p>
<p>If you delete wscntfy.exe, it will be replaced. You cannot overwrite it, for windows does not allow files to be modified while they are open. You could boot into your favourite linux-on-cd rescue system and work away at it, but in my case this machine had sufficiently weird hardware that Linux could not find it's root filesystem after switching to protected mode. Nice.</p>
<p>So I took the easy way out. Download <a href="http://download.bleepingcomputer.com/malwarebytes/mbam-setup.exe">Malwarebyte's</a> anti-malware program. Run it and let it clean up. Do not interact with any of the dialog boxes of the rogue program.</p>
<p>Why do I write this? Because I know this blog is syndicated in at least one place where a couple of readers might still run windows. Because it is interesting in the way it would be interesting to Doctor Gregory House. Because there needs to be more good search engine findable links to a solution. And because I wasted the better part of an afternoon on this.</p>
<p>Normal programming, the kind that never speaks a kind word about a certain Os from Redmond, will resume shortly.</p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>izak</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2010-01-30T21:44:37Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/izak/sysadman/varnish-zope-and-backend-checking">
        <title>Varnish, Zope and Backend Checking</title>
        <link>http://www.upfrontsystems.co.za/Members/izak/sysadman/varnish-zope-and-backend-checking</link>
        <description>I sent this explanation about some trouble we had with varnish's backend probing to a client a while ago. The information is useful enough that it should be in a blog post.</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>I sent this explanation about some trouble we had with varnish's backend probing to a client a while ago. The information is useful enough that it should be in a blog post.</p><p>Varnish has a feature called backend probing, which I still think is a very unfortunate name. You basically tell it what url to hit on the backend, and it will periodically hit that url and check the responsiveness of the backend.</p>
<p>The idea is to use backend probing on the zope backends, so that when you restart, you do it in such a way that you always have enough healthy backends left to carry the load while the other ones restart and warm their caches.</p>
<p>This is different to what squid does. Squid does an ICP probe on the backend, but this is where the problem starts. When zope starts, ICP becomes responsive long before that zope instance is ready to serve content, causing squid to pass requests to cold backends. It also seems that the newly started zope instance responds faster, and squid passes the request to whichever backend responds first to the ICP query.</p>
<p>So what you get is a website that is partially non-responsive while you have n-1 perfectly healthy backends. Varnish was going to solve all that.</p>
<p>But varnish has an odd way of checking the backend. It sends the request, then it half-closes the connection (it closes the writing side, read the man page for shutdown system call if you're interested) and waits for the response on the open read side. Zope, or more specifically python's asyncore module, cannot distinguish between a half-close and a full close, and it shuts down the connection without sending a response.</p>
<p>Depending on how you interpret HTTP1.1, zope actually does the right thing when it closes the connection. Under "8.1.4 Practical Considerations" in RFC2616 it says:
<pre>
   When a client or server wishes to time-out it SHOULD issue a graceful
   close on the transport connection. Clients and servers SHOULD both
   constantly watch for the other side of the transport close, and
   respond to it as appropriate. If a client or server does not detect
   the other side's close promptly it could cause unnecessary resource
   drain on the network.
</pre>
</p>
<p>As I understand it, a "graceful close" means to close your end of the connection, ie, half-close. In other words, half-closing means you're timing out.</p>
<p>I then took the shutdown() call out of the code and backend probing started to work. I also posted the entire reasoning and the thread on zope-dev to varnish-dev and one of their developers indicated that the shutdown() call will be removed.</p>
<p>The thread on the varnish list can be found <a href="http://projects.linpro.no/pipermail/varnish-dev/2009-October/002287.html">here</a></p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>izak</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2009-12-03T09:49:58Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/izak/sysadman/choose-your-index-carefully">
        <title>Choose your index carefully</title>
        <link>http://www.upfrontsystems.co.za/Members/izak/sysadman/choose-your-index-carefully</link>
        <description>When designing a relational database schema, just adding an index on every column that might be in involved in a where-clause might not be enough. It might be downright wrong.</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>When designing a relational database schema, just adding an index on every column that might be in involved in a where-clause might not be enough. It might be downright wrong.</p><p>Suppose we have a simple table in a SQL database used to store measurements (for example temperature) taken at certain locations. We'd start out with a database table similar to this:
<pre>
  CREATE TABLE measurement (
    id bigserial NOT NULL,
    LogDate timestamp NOT NULL,
    Location varchar(255) NOT NULL,
    Value double precision NOT NULL,
    PRIMARY KEY(id)
  );
</pre>
</p>
<p>We expect that this table might eventually contain millions of rows, and that we might want to locate rows based on the date of the measurement or where it was logged. So we create two indexes on the relevant rows:
<pre>
  CREATE INDEX measurement_logdate on measurement(LogDate);
  CREATE INDEX measurement_location on measurement(Location);
</pre>
</p>
<p>The application turns out to be a little simpler than we expected and it turns out that the only queries we use are a simple INSERT statement append data to the measurement table, and SELECT queries similar to the following:
<pre>
  SELECT LogDate,Value FROM measurement WHERE
    Location='Gobabis'
    AND LogDate &gt;= '2009-10-01 00:00:00' AND LOGDATE &lt; '2009-10-02 00:00:00'
</pre>
</p>
<p>Initially it seems to work well and the system goes into production. But as the number of measurements approaches a hundred million it slows down noticeably.</p>
<p>Analysis reveals that when you run the above query, the database first consults measurement_logdate to find rows matching the time range we're interested in, then scans measurement_location to find rows matching the location, and finally it calculates the intersection between the aforementioned sets of rows, retrieves the rows and returns the result.</p>
<p>This is bad for at least two reasons. The set of rows matching the date criteria might be a very big set, and a large portion of the measurement_logdate index is scanned only to be ignored later because it involves locations other than the one we want.</p>
<p>My first attempt at improving this situation was to use the partial index feature in Postgresql. Creating a simple index per location quadrupled the speed:
<pre>
  CREATE INDEX partial_gobabis on measurement(Location) WHERE Location='Gobabis';
</pre>
</p>
<p>I wasn't particularly fond of this solution, because it involved a potentially large number of indexes, and I'm sure that deciding which index to use will introduce unnecessary overhead in the planner. And then it dawned on me that a multi-column index might just do the same job:
<pre>
  DROP INDEX partial_gobabis;
  CREATE INDEX measurement_location_logdate on measurement(Location, LogDate);
</pre>
</p>
<p>According to the Postgresql documentation, you should order the columns in the index placing the columns on which equality constraints will be applied first. This index also quadruples the original speed.</p>
<p>When deciding what indexes your tables need, you should also consider that when new data is appended to the table, the index needs to be updated, and this is generally a log(n) operation. You therefore don't want to maintain any indexes that do not add value. In our application, I found that measurements are always queried in the context of a specific location, never alone on the date or the location, so that my original two indexes only waste time when data is appended. I therefore dropped the original indexes and kept only the new compound index.</p>
<p>And if you found yourself wondering where Gobabis is, it is a small town in Namibia close to where I grew up.</p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>izak</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2009-12-01T10:47:48Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/izak/sysadman/a-decorator-for-doing-things-in-a-subprocess">
        <title>A decorator for doing things in a subprocess</title>
        <link>http://www.upfrontsystems.co.za/Members/izak/sysadman/a-decorator-for-doing-things-in-a-subprocess</link>
        <description>If you need to fork or drop privileges often, this will help.</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>If you need to fork or drop privileges often, this will help.</p><p>
This is a simple idea I came up with to make it easier to perform the often repeated task of forking a process and dropping privileges, as is often required of processes on a unix host. I came up with two simple decorators. This is not complete, it allows the sub process to return anything that can be pickled, but communication is not bi-directional. Perhaps it should be :-)
</p>

<p>
Note: Source code below display's badly because of a bug in deliverance, but it should cut-and-paste fine.
</p>

<pre>
  import os
  import pickle
  import pwd

  def fork_and_exec(fn):
      def _fork_and_run(*args, **kwargs):
          readend, writeend = os.pipe()
          readend = os.fdopen(readend, "r")
          writeend = os.fdopen(writeend, "w")
          pid = os.fork()
          if pid==0:
              readend.close()
              result = fn(*args, **kwargs)
              pickle.dump(result, writeend)
              writeend.flush()
              os._exit(0)

          writeend.close()
          result = pickle.load(readend)
          pid, status = os.waitpid(pid, 0)
          return result
      return _fork_and_run

  def drop_privileges(user, ignore=False):
      def _wrap(fn):
          def _new(*args, **kwargs):
              try:
                  pw = pwd.getpwnam(user)
                  os.setregid(pw[3], pw[3])
                  os.setreuid(pw[2], pw[2])
              except KeyError:
                  if not ignore:
                      raise
              except OSError:
                  if not ignore:
                      raise
              return fn(*args, **kwargs)
          return _new
      return _wrap
</pre>

<p>
I'm sure it can be made simpler by using the subprocess module, but this old C coder is sticking with what he knows.
</p>

<p>Here is an example of how you'd use it:</p>

<pre>
    @fork_and_exec
    @drop_privileges("nobody")
    def getpid(name):
        return [os.getpid(), os.geteuid(), 'Bob', 'Alice', name]

    print "my pid is %d" % os.getpid()
    print "my uid is %d" % os.geteuid()
    print '===='
    print getpid('Roger')
</pre>]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>izak</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2009-07-23T12:04:32Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/izak/sysadman/migrating-an-entire-linux-vserver-virtual-server-to-another-machine">
        <title>Migrating an entire linux-vserver virtual server to another machine</title>
        <link>http://www.upfrontsystems.co.za/Members/izak/sysadman/migrating-an-entire-linux-vserver-virtual-server-to-another-machine</link>
        <description>I always forget the exact rsync command used for this purpose. Here it is for me and the rest of the world too.</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>I always forget the exact rsync command used for this purpose. Here it is for me and the rest of the world too.</p><p>The method I use is to rsync the whole machine to the new location while it is still running, then take it down, rsync it again, and bring it up in the new location.</p>
<p>Because we do not allow root logins, we also need to add sudo to the mix, and a few other options. The whole command is:
<pre>
  rsync -aH -v -e ssh --rsync-path 'sudo rsync' \
  --numeric-ids --partial --delete \
  /var/lib/vservers/vsname/ \
  myuser@remotehost:/var/lib/vservers/vsname/
</pre>
</p>
<p>The --rsync-path option is required to use sudo on the other end. --numeric-ids is important, otherwise rsync resolves uid's using /etc/passwd and /etc/group, which may differ on the host. -a puts it in archive mode, which preserves links, permissions and ownership, and -H also preserves hard links. --delete tells rsync to delete files on the remote end that are no longer available locally.</p>
<p>You also need to set up the required files in /etc/vservers of course.</p>
]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>izak</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2009-06-23T10:42:01Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/izak/sysadman/import-considered-harmful">
        <title>Import * considered harmful</title>
        <link>http://www.upfrontsystems.co.za/Members/izak/sysadman/import-considered-harmful</link>
        <description>This is old news, but it needs to be said again.</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>This is old news, but it needs to be said again.</p><p>
To use an example from a day in my coding life, yesterday to be exact, here is an example of how things go pear-shaped when you use <i>import *</i> without fully considering the future ramifications of your laziness. For this example we will use sqlalchemy and reportlab which both implements a String class.
</p>

<pre>
"""
util.py
"""
from sqlalchemy import *
</pre>

<pre>
"""
strangeness.py
"""
from reportlab.graphics.shapes import String
from util import *
print String.__module__
</pre>

<p>
Now strangeness.py spits out <i>sqlalchemy.types</i> rather than the expected <i>reportlab.graphics.shapes</i>. Naturally it all makes perfect sense once you've inspected modules util, sqlalchemy and sqlalchemy.types and finally find the String class, but I suspect it is glaringly obvious that all this could have been avoided if you kept your name space clean to start with. Importing an entire module might give you more than you bargained for, especially if the module you're importing does some gratuitous imports of its own.
</p>

<p>
This is perhaps a good time to rant a little about the code examples used in documentation. An example is usually not long nor complex, and does not use that many classes to illustrate the point. Using a gratuitous <i>import *</i> in an example should, in my opinion, be avoided, for this is often repeated in the final code and leads to weird errors.
</p>

<p>
On the other hand I must point out that there are good uses for the <i>import *</i> pattern. This is typically used in plone products to import configuration variables:
</p>

<pre>
from config import *
</pre>

<p>
The point I want to make is that you should use it sparingly.
</p>]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>izak</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2009-04-02T10:46:08Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/izak/sysadman/using-zope-schemas-with-a-complex-vocabulary-and-multi-select-fields">
        <title>Using Zope schemas with a complex vocabulary and multi-select fields</title>
        <link>http://www.upfrontsystems.co.za/Members/izak/sysadman/using-zope-schemas-with-a-complex-vocabulary-and-multi-select-fields</link>
        <description>Using multi-select fields beyond the obvious simple examples is not well documented. This is my attempt to explain a way to do this.</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>Using multi-select fields beyond the obvious simple examples is not well documented. This is my attempt to explain a way to do this.</p><p>
For background I would like to explain a little about the setup of the project we needed this for. This product uses a sql database to store information. This information is then made available to zope by SQLAlchemy and Alchemist. This example depends heavily on the above products, but the basic idea as well as the example ObjectVocabulary should be useful to others.
</p>

<p>
We use SQLAlchemy's relational abilities to handle relations with other tables, for example, if table A has a field b_id that points to table B, SQLAlchemy creates an object A (corresponding with table A) with an attribute b that is a list of the referenced "objects" in table B.
</p>

<p>
We had one case where this relation was a many-to-many relation. We wanted to implement this as a selection of zero or more objects from a vocabulary of possible choices. This is where the multi-select comes in.
</p>

<p>
For simplicity I will not show the sqlalchemy definitions here and focus only on what we had to do to make zope work with the alchemist schemas. The SQLAlchemy documentation is very complete and explains this in detail.
</p>

<p>
Because SQLAlchemy is an object relational mapper and it deals in objects only (if you use it properly), we needed a vocabulary that holds objects, rather than the usual key-value pairs as implemented by SimpleVocabulary. This is to avoid unnecessary glue to map a simple key back to an object. The first thing we did was implement our own Vocabulary class:
</p>

<pre>
from zope.interface import implements
from zope.schema.interfaces import IVocabulary, IVocabularyTokenized
from zope.schema.vocabulary import SimpleTerm

class ObjectVocabulary(object):
    """
    Vocabulary implementation for alchemy content types.
    Class is constructed as follows:

    vocab = Vocabulary(ItemClass, "id", "description")
    """
    implements(IVocabulary, IVocabularyTokenized)

    def __init__(self, objects, primaryField, displayField):
        self.primaryField = primaryField
        self.displayField = displayField
        self.objects = objects

    def __iter__(self):
        return iter([SimpleTerm(value=ob,
                token=getattr(ob, self.primaryField),
                title=getattr(ob, self.displayField)) for ob in self.objects])

    def __len__(self):
        return len(self.objects)

    def __contains__(self, value):
        return value in self.objects

    def getQuery(self):
        return None

    def getTerm(self, ob):
        if ob not in self.objects:
            raise LookupError, value
        return SimpleTerm(value=ob,
                               token=getattr(ob, self.primaryField),
                               title=getattr(ob, self.displayField))

    def getTermByToken(self, token):
        for ob in self.objects:
            if str(getattr(ob, self.primaryField)) == str(token):
                return SimpleTerm(value=ob,
                    token=getattr(ob, self.primaryField),
                    title=getattr(ob, self.displayField))
        raise LookupError, token
</pre>

<p>
This vocabulary encapsulates a list of objects and creates SimpleTerms for the original object, token and title values. The tokens and titles are of course used to render a form. When the user selects something, the token is used once again to get the original Term, and the value attribute of the Term gives you the object.
</p>

<p>
Next we needed a multi-select widget. Zope already includes MultiCheckBoxWidget, but this cannot be used as is, because the form machinery expects to instantiate widgets with two parameters (field and request) while MultiCheckBoxWidget also expects a vocabulary.
</p>

<pre>
from zope.app.form.browser import MultiCheckBoxWidget as MultiCheckBoxWidget_

class MultiCheckBoxWidget(MultiCheckBoxWidget_):
    def __init__(self, field, request):
        MultiCheckBoxWidget_.__init__(self, field, field.value_type.vocabulary, request)
</pre>

<p>
The exact reasons for this implementation should become clear in a moment.
</p>

<p>
In our interfaces.py, where we define the annotation used by alchemist, we define the properties of our multi-select field as follows:
</p>

<pre>
from zope.schema.vocabulary import SimpleVocabulary
from zope.schema import Choice, Set
from ore.alchemist.annotation import TableAnnotation
from schema import TableATable

Annotation = TableAnnotation(
                                "TableA",
                                properties = {
                                    'single': zschema.Choice(
                                        title=u'Single',
                                        vocabulary=SimpleVocabulary.fromItems([])),
                                    'multi': Set(title=u'Multi',
                                        value_type=Choice(
                                          vocabulary=SimpleVocabulary.fromItems([])))
                                })

ITableATable = transmute(TableATable,
                           Annotation,
                           __module__="Products.example.content.tablea.interfaces" )
</pre>

<p>
While I am not certain how you would do this in pure zope, the trick lies in the Set and Choice field-types. We're saying that 'single' is one of the items (objects in our example) provided by a vocabulary, and 'multi' is a set of choices from another vocabulary. For the moment we use blank vocabularies. You are of course welcome to insert your vocabulary at this point, but we chose not to do this as this sometimes gets us into trouble with circular imports.
</p>

<p>
Finally, right before we render our form, we modify the vocabulary and tell it what widget to use for multi:
</p>

<pre>
    fi = self.form_fields.get('single')
    fi.field.vocabulary = ObjectVocabulary(objectListA, "id", "title")

    fi = self.form_fields.get('multi')
    fi.field.value_type.vocabulary = ObjectVocabulary(objectListB, "id", "title")
    self.form_fields['multi'].custom_widget=MultiCheckBoxWidget
</pre>

<p>
From examples on the internet it seems this is also possible in zcml:
</p>

<pre>
  &lt;addform
    name="add_foo"
    label="Add Foo"
    schema=".interfaces.IFoo"
    fields="filetypes"
    content_factory=".foo.Foo"
    template="foo.pt"
    permission="zope.Public"
  &gt;
  &lt;widget
    field="filetypes"
    class=".mywidgets.MultiCheckBoxWidget"
  /&gt;
  &lt;/addform&gt;
</pre>

<p>
I certainly hope this helps someone to write an even better howto on the subject.
</p>]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>izak</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2009-04-02T10:46:34Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/izak/sysadman/logsplit">
        <title>Logsplit</title>
        <link>http://www.upfrontsystems.co.za/Members/izak/sysadman/logsplit</link>
        <description>A Log File Splitter</description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>A Log File Splitter</p><p>
This is a small regex-based logfile splitter born out of client requirements that each hosted domain has it's own <a href="http://awstats.sourceforge.net/">awstats</a> report page.
</p>

<p>
Initially we used awstats' ability to read logs from a pipe, thereby allowing awstats to extract the relevant lines from a common log file using grep. This works fine if you have a small number of domains, but one of our clients are now hosting over 200 domains on a zope/plone with a squid accelerator setup. This means that now we grep through a fairly large log file 200 times whenever we update stats.
</p>

<p>
Logsplit is a python program. It reads logs from stdin and splits them to separate log files using regexes from a configuration file. It can be used directly with later versions of squid, earlier versions do not include support for logging to a pipe. Apache has had support for logging to a pipe for a long time though.
</p>

<p>
It is available from svn <a href="https://svn.upfronthosting.co.za/svn/izak/logsplit/trunk/">here</a>.
</p>]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>izak</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2009-04-02T10:46:59Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>


    <item rdf:about="http://www.upfrontsystems.co.za/Members/izak/sysadman/spreadmirror">
        <title>SpreadMirror</title>
        <link>http://www.upfrontsystems.co.za/Members/izak/sysadman/spreadmirror</link>
        <description>This is a simple tool I implemented to synchronise files between a cluster of machines. </description>
        <content:encoded xmlns:content="http://purl.org/rss/1.0/modules/content/"><![CDATA[<p>This is a simple tool I implemented to synchronise files between a cluster of machines. </p><p>
The specific problem we had to solve involved a Zope product called <a href="http://ingeniweb.sourceforge.net/Products/FileSystemStorage/">FilesystemStorage</a>. We have a small cluster of two machines running against a single ZEO server that lived on one of the two machines with heartbeat and drbd to guarantee availability.
</p>

<p>
When we put together this architecture we (the sysadmins) didn't know that FSS is going to be used for some of the content, so we found ourselves in the rather unfortunate position where not all the content was available from both nodes in our cluster.
</p>

<p>
There was also a further possible future complication in the form of a third node that might be added to the cluster.
</p>

<p>
So we needed a simple tool for synchronising files between nodes, with no specific master node: whoever operated on the file at the time would be the master and everyone would have to follow. Normally you'd require a mutual exclusion mechanism to do this safely, but the way FSS handles this made that unnecessary.
</p>

<p>
I called the result spreadmirror. It uses <a href="http://www.spread.org/">spread</a> as a messaging bus between the nodes, and a simple protocol where file rename, creation and deletion events are simply replicated to all nodes. A zope product called fssspread and a simple patch for FileSystemStorage is also provided. Unfortunately there has been at least three changes to the utils.py file in FSS recently, making it a little hard to provide a definitive version. You may have to patch it by hand.
</p>

<p>
When a file is modified, the patched FSS will generate an event. The fssspread product subscribes to these events and relays the event to the spreadmirror daemon running on the host. Spreadmirror then relays the change to all other nodes.
</p>

<p>
The synchronisation protocol is extremely simple. File deletion and renaming is handled by performing the same operation on all the nodes. File creation is handled by sending the entire file over spread to all the nodes. The idea is that spread's multicast features can be used to optimise this. Because there is a maximum message size imposed by spread, an additional internal "append" event is used between nodes, so that a file bigger than 16k ends up as one create event followed by several append events.
</p>

<p>
We've been using it for a few months now without problems.
</p>

<p>
Spreadmirror has been debianised, which makes it easy to install on any debian or ubuntu machine.
</p>

<p>
It is available from svn <a href="https://svn.upfronthosting.co.za/svn/izak/spreadmirror/trunk/">here</a>.
</p>]]></content:encoded>
        <dc:publisher>No publisher</dc:publisher>
        <dc:creator>izak</dc:creator>
        <dc:rights></dc:rights>
        <dc:date>2009-04-02T10:47:21Z</dc:date>
        <dc:type>Blog Entry</dc:type>
    </item>





</rdf:RDF>

