tech

Variable pricing for open source

Matt Asay makes some good points about a company called Trium, which provides its customers with the option of variable pricing. Customers can pay anywhere from 50% of the quoted price to over 100% of the quoted price. He suggests that open source could follow a similar model: get customers to pay for the value of the software to them, rather than paying for licenses. I think this is the right kind of approach, though I'm not sure how you'd administer it, or how you'd present it to a customer.

This reminds me of my last job, where we were using MySQL for some big applications. We hadn't paid a penny for it, and were doing our own support. A colleague and I suggested we buy a license anyway (this was in the days before MySQL Network, around version 3.23), as we were getting so much value from the product. It would be a gesture of good will. Our manager agreed and we bought a license, which we just stuck up on a wall. If you provide something great, people will pay for it. Yes, people are generally quite tight with cash; but they will pay for something if they feel they are getting value for money (which is why the Pound Shop in Northfield, where I live, is always packed on Saturdays).

Ruby Tuesday: libxml, closures

I've been doing more work on my S3 code this week, fixing a few bugs, adding features. Not ready for a new release just yet, as the functionality is a bit brittle and incomplete, but getting there.

While on this hackathon, I decided to use libxml properly for the first time (following Coda Hales' recommendation). I'm using it to parse responses from S3. It works a damn sight faster than REXML., even to the point of making a visible difference when running the test suite. Anyway, I thought it would be useful to give a pointer on how to parse a string using it, as this is not too obvious from the documentation, and I had to work it out from the libxml-ruby test suite. Here it is:

require 'xml/libxml'

def get_xml_doc(xml_str)
  parser = XML::Parser.new
  parser.string = xml_str
  # next line returns an XML::Document instance
  parser.parse
end

I also discovered that on Linux, the libxml-ruby gem doesn't appear to work: to get anything happening, I had to install using extconf.

For the first time, I also discovered a real use for closures, and understood why symbols are important. I had a piece of code like this:

# metadata
@name = doc.find('//ListBucketResult/Name').to_a.first.content
@delimiter = doc.find('//ListBucketResult/Delimiter').to_a.first.content
@prefix = doc.find('//ListBucketResult/Prefix').to_a.first.content
@marker = doc.find('//ListBucketResult/Marker').to_a.first.content
@max_keys = doc.find('//ListBucketResult/MaxKeys').to_a.first.content
@is_truncated = doc.find('//ListBucketResult/IsTruncated').to_a.first.content

Now, this is very repetitive and wasteful. The other issues are that if an element does not exist (and doc.find(...) returns nil), you get a horrible error; and the paths could be refactored so '//ListBucketResult' is added to each automatically. So potentially the code needs to be refactored into a function. But, by thinking of a closure as a way of refactoring repeated similar function calls, I realised I could collapse this to:

# prop: a property to set
# path: path to find the XML element in the response
prop_setter = lambda do |prop, path|
  node = doc.find("//ListBucketResult/#{path}").to_a.first
  self.send("#{prop}=", node.content) if node
end

# metadata
prop_setter.call(:name, 'Name')
prop_setter.call(:delimiter, 'Delimiter')
prop_setter.call(:prefix, 'Prefix')
prop_setter.call(:marker, 'Marker')
prop_setter.call(:max_keys, 'MaxKeys')
prop_setter.call(:is_truncated, 'IsTruncated')

Much neater. I've got my function (generated by the call to lambda), but it's just local to this block of code, and doesn't need to be declared as a method in the class definition. Also note that the name of the variable (:name, :prefix etc.) is passed to the prop_setter function, not its value: this means within the function, I can set the instance variable by referencing the name of the variable. If I'd done:

prop_setter.call(@name, 'Name')

This would have passed the value of the name instance variable to the function, not the pointer to the variable itself. It's quite an obscure distinction, but I was really pleased I happened across a situation where I found a real use case for symbols which helped me understand them. (Although, thinking about it, because I'm using send, I could have just used a string for the variable name, I suppose...)

S3 "golden buckets"

A neat solution to the problem of how to back a service with S3 while getting someone else to pay for the storage. The basic idea is for a supplier to sponsor a premium ("golden") bucket on S3, for which the consumer pays Amazon; the consumer pays slightly more than for a standard bucket, Amazon takes a cut, and the supplier takes a cut. I think it could work.

Thoughts on rsync and S3

rsync is a great utility. I bloody love it, and use it to do all my backups: over the network via SSH onto remote filesystems, and locally onto USB drives. I also use it to restore my home directory when I switch machine. Great.

However, the network part of my backup solution is expensive (in money). I use Strongspace at the moment, which works really well. But it is too expensive: $8 a month for 4Gb of storage, which means I have to keep a close eye on my files to make sure nothing massive goes over and maxes out my account. There are alternatives, of course, and the GDrive might solve all of my problems; but for now I'm stuck with not enough network storage.

S3 gives me the right price, but there is no mature client solution which does the job for me (JungleDisk looked promising, but it seems a bit buggy on my Linux machine - it sometimes just hangs - and Natilus doesn't properly mount WebDAV shares onto the filesystem, so I can't rsync to it). So I've been working on an S3 library for Ruby to allow me to sync a local filesystem to S3 automatically, only transferring the files which have changed.

I've been reading up on rsync a bit to work out what it does, and it's pretty clever: here's a technical report which explains the inner workings. It does a block by block copy from the source file to the target file, using a so-called "rolling checksum" on each block to decide whether: a) the source contains the block, but the target does not; or b) the block in the source file exists somewhere in the target file. In this way, only changed blocks are copied from the source file to the target file (plus some checksums and block indexes). This is what makes rsync so fast.

I'm not sure whether you could work things out the same way with S3. Each resource (accessible via a key into a bucket) on S3 has an MD5 checksum associated with it, so you could get a checksum for individual blocks or file fragments; but I'm not sure I'd want to split a file over multiple S3 keys to be able to do block-level copying. Although a bucket can contain as many keys as you like (even though you are limited to 100 buckets), so this might be possible. But to retrieve a file you would need to recompose it from the fragments on S3, which is a bit crap.

The obvious approach would be to associate one key with each object you put onto S3. This has the advantage of making each file addressable by URL, rather than requiring a reconstruction of the file from S3 fragments. Then when you sync your filesystem to S3, you do a comparison of the MD5 checksum of the local file to the MD5 checksum of the remote file. This could be pretty painful, though, and take forever: one call to S3 to get the target checksum, a checksum on the local file, then transfer of the whole file up to S3 if it has changed. This could perhaps be streamlined: instead of requesting the object using GET, you could use a HEAD request, which just gets the metadata; and perhaps the client could send a If-None-Match header with the request, passing the MD5 checksum as the value for the header - in this case, if the MD5 checksum in the request matches the checksum on S3, S3 will return a 304 response code (not modified), so we know the object hasn't changed and we don't need to parse the MD5 checksum out of the response. This could save a few cycles.

An alternative might be to put a piece of "file modified" metadata onto S3 (not the same as the date on the S3 resource, but a copy of the file modification time of the local file when it was transferred). Then just compare the local file modification time to the S3 metadata when the algorithm is deciding whether a file needs to be transferred. The file stat to get the modification time is likely to be far faster than an MD5 hash.

Yet another approach would be to just keep track locally of file paths and modification times (e.g. in a database) from the last time they were sent to S3. I think this is what JungleDisk does. Anything that has been added/removed/changed will be transferred without needing to reference S3 at all, so no expensive network operations. However, this will only work if S3 is only being sync'd from one location, and won't work if you are trying to sync from multiple locations. Is this enough, or should local files always be compared to S3 to determine whether they should be transferred? Maybe a local database of file modification times is enough?

One more idea: perhaps you could combine the database of local file modification times with a cache of MD5 checksums for those local files. You could then:

  1. Find any new files and transfer them to S3, while their checksum is generated and stored in the local database in the background
  2. Find any files which have changed, generate new checksums, cache them, and then compare those generated checksums to S3 checksums: the file gets transferred if its checksum differs from the one on S3
  3. Determine which local files have disappeared and optionally remove them from S3; then delete their cached MD5 checksum from the database
  4. Any files which haven't changed can have their checksum compared to the S3 resource

This has the advantage of some local caching, but isn't dependent on it; so you could use this approach to sync multiple machines to a single S3 bucket (unlike the database-only method). And the local cache could be optional, so you could use this approach from any machine, even if it was unable to do the caching (though with the ubiquity of file-based databases like SQLite it's unlikely you'll be working on a system with no database).

You could use this to do two-way synchronisation too, potentially. So if a file hasn't changed locally, you could get the client to compare to the S3 version, and fetch that to replace the local file if they are different. Though I think I will be concentrating on doing it in one direction first.

Ruby Tuesday: Rails West Midlands

I run a couple of Rails training courses at OpenAdvantage in the West Midlands, and thought it would be good if "alumni" had a way to stay in touch. While there are several good Rails mailing lists, I thought it might be nice to have one which is:

  1. West Midlands, UK-focused (much of the UK Rails stuff in London-focused)
  2. Intended for new Rails programmers (e.g. people who've been on OpenAdvantage courses)

Jono set up a similar group for PHP in the West Midlands which has taken off very successfully, and I wanted to follow his approach: low key, informal, a place to discuss Rails and Rails-related topics. I didn't want to put up another Rails howto website or blog, rather provide somewhere where Rails enthusiasts in the West Midlands could meet each other.

So, I bought http://railswm.org/ and set up a mailing list there. We've had a few posts so far, but it would be good to get some more people involved. It's not restricted to the West Midlands, of course, but the focus will be on that locality. Feel free to join if you're interested.

FlickrStorm

I don't know, I work my fingers to the bone writing FlickrLilli, and FlickrStorm gets the TechCrunching. Maybe if I'd got round to doing the "favourites" area (where you can bookmark images you like) I'd be toast of the month. Ho hum.

And once again...

Updated AxleGrease again, to Rails 1.1.6.

AxleGrease 0.6.1

Had to release a new version of AxleGrease to keep up with the everso severe security warning released by the Rails team yesterday. Upgraded to Rails 1.1.5.

Ruby Tuesday: announcing AxleGrease 0.6 (was ROROX)

I got fed up with the name "ROROX" for my Linux Rails package (Ruby on Rails on XAMPP). So I have now changed its name to AxleGrease. The project is still hosted on RubyForge under the old name, though.

To accompany this "exciting" name change, I am pleased to announce the new version, which brings the stack bang up to date. You can get the latest version from RubyForge. Full details are in the CHANGELOG, but here are the highlights:

  1. Added a whole load of new goodies: gruff, rspec, rmagick, libxml, mongrel, fakeweb, hpricot, gem_plugin, daemons, mongrel_cluster
  2. Upgraded everything in the old version (Rails, Builder, Rubygems) to latest versions
  3. Updated the included scripts to automate configuration of existing or new Rails applications, so that they work with XAMPP's Apache server; this means using mod_scgi for SCGI, and mod_proxy for Mongrel; you can mongrelise your applications with a simple:
./rails_add_app.sh mongrel /path/to/app app_name <port>

or de-mongrelise with:

./rails_add_app.sh /path/to/app app_name

As ever, it is designed for my environment: XAMPP 1.5.3a running on Ubuntu Dapper. It may work on other Linux distributions, but I've made no effort to test it. I might do eventually. Using XAMPP means I can also integrate the start/stop scripts into the XAMPP start/stop sequence, and can do:

/opt/lampp/lampp stopmyapp
/opt/lampp/lampp startmyapp
/opt/lampp/lampp restartmyapp

to control my Rails applications.

My approach to configuring Rails applications also goes against the grain of other packaged systems: rather than creating new virtual host configurations, I mount individual applications on directories within Apache's document root. This means no need to edit /etc/hosts and simpler setup. However, this is somewhere where I'd like some feedback, as I'm not a mod_rewrite or mod_proxy expert. Currently, Mongrel/Apache configuration looks like this for a typical application available at http://localhost/emaildir/:

ProxyPreserveHost Off
ProxyRequests On
# so redirects work properly
ProxyPassReverse /emaildir/ http://127.0.0.1:9999/

RewriteEngine On
# redirect requests for dispatcher to the application root
RewriteRule ^/emaildir/dispatch(.*)$ /emaildir/ [R]
# requests for the root get redirected to index.html if it exists
RewriteCond /opt/lampp/htdocs/emaildir/index.html -f
RewriteRule ^/emaildir/$ /emaildir/index.html [R]
# any requests which can't be served from static files go to the proxy
RewriteCond /opt/lampp/htdocs%{REQUEST_URI} !-f
RewriteRule ^/emaildir/(.*)$ http://127.0.0.1:9999$0 [P,QSA,L]

For this to work, you also need a line at the end of config/enviroment.rb like this:

ActionController::AbstractRequest.relative_url_root = '/emaildir'

(My scripts automate all this, by the way; and the remove script undoes any changes made. This is likely to work better with later versions of Rails which are being patched to work nice with Mongrel.)

It has taken me some time to get this straight. About 4 hours, to be precise. So hopefully it should work OK. Fingers crossed.

A couple of nice things: shfs and Meld

I like the way in Gnome/Nautilus you can create a "pseudo filesystem" over an SSH session: you can treat a remote filesystem sort of like part of your regular filesystem. However, it's not a real mount, as you can't reference it using a standard path (there's no /media/remote path you can point to from the command line, for example).

I was facing this with a client this morning, where we wanted to make a filesystem mountable via an SSH session. After a quick google, we found shfs. This enables mounting a filesystem which is accessible via an SSH session on a mount point on the local machine. We then found a simple installation guide and had it up and running in about 10 minutes. On Ubuntu Dapper, here's an overview of the process:

  1. Get root
  2. apt-get build-essential shfs-source shfs-utils linux-headers-`uname -r`
    (this installs everything you need to compile it up for your architecture; you may need to enable the Universe and Multiverse repositories)
  3. module-assistant build shfs
  4. module-assistant install shfs
  5. To mount a directory over SSH:
    1. Create a directory on the local machine (e.g. remote)
    2. Run this command: shfsmount --persistent user@server.com /path/to/mount/onto
      (the --persistent flag maintains the connection in the event of temporary outages at the server end causing breaks in the SSH session)
  6. To ensure the shfs module is loaded during boot:
    echo 'shfs' >> /etc/modules
  7. To mount the filesystem automatically at boot time, add a line like this to /etc/fstab:
    user@server.com /path/to/mount/onto shfs
  8. If you mount onto a directory inside /media, it will pop up as an icon on your desktop too.

We needed to set a few permissions to finish up; here's where I'm at so far in /etc/fstab:

user@server.com /media/ssh shfs rmode=755,auto,rw,group,cmd-user=ell,type=shell,stable,user,uid=1000,gid=100 0 0

This is useful as it logs into my remote machine using my personal public key, and sets the user and group to my user account and the users group. I'm still having a few issues with writing into the root of the mounted directory from Nautilus (I can from the command line, though).

We then explored whether tools like "TortoiseSVN": are available for Linux. EasyEclipse is great, but a bit heavyweight for some users; perhaps a simple Explorer-like interface which allows visual diffs will do. Enter Meld. It's packaged for Debian/Ubuntu, so you can do:

apt-get install meld

I hadn't looked at it for a while, and when I last looked it was quite simple and only did file-file comparisons. But the latest version is really cool. You can use it like a Subversion client and do visual comparisons of a local working copy and a repository without much fuss. I plan to look at this a bit more as it could be a great tool for companies who don't want to go as far as Subversion (you can do directory-directory comparisons too). Couple with shfs, it could be a great lightweight "revision control" system for Linux.

Syndicate content