elliot's blog

Decoding FLV files with ffmpeg

I'm using Ubuntu Intrepid Ibex, but the ffmpeg distribution it comes with doesn't support recent FLV file encodings (like some videos from YouTube). You get an error like this when you try to do anything with them:

[flv @ 0xb800e4c8]Unsupported video codec (7)

My solution was to checkout ffmpeg from its Subversion repository and compile it myself:

$ svn checkout svn://svn.ffmpeg.org/ffmpeg/trunk ffmpeg-svn
$ cd ffmpeg-svn
$ ./configure --prefix=~/apps/ffmpeg-svn --enable-libmp3lame
$ make install

To do the compile, you'll probably need build-essential, as well as libmp3lame-dev, and the *-dev versions of any other codecs you want to use.

Use the resulting binary to do the conversion:

$ ~/apps/ffmpeg-svn/bin/ffmpeg -i infile.flv outfile.mpg

It turns out this version does support those recent FLV files, happily.

Describing (finding) subjects which don't have a particular predicate in SPARQL

If you want to do something like a SQL NOT in SPARQL, here's what the query looks like:

PREFIX rs: <http://schemas.talis.com/2006/recordstore/schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

DESCRIBE ?tenancy {
  ?tenancy rdf:type rs:Tenancy .
  OPTIONAL { ?tenancy rs:platformStoreUri ?o } .
  FILTER ( !bound(?o) )

Here I'm looking for subjects with rdf type of http://schemas.talis.com/2006/recordstore/schema#Tenancy, which don't have a http://schemas.talis.com/2006/recordstore/schema#platformStoreUri predicate. The important bit is that you make the predicate which could potentially not be "set" OPTIONAL; and add a FILTER which only includes subjects where the predicate is bound to a value. This effectively screens out any subjects where the predicate has not been added to the subject. This pattern is basically Negation as Failure (according to the SPARQL recommendation), which derives from logic programming. Feels a bit like being back at university.

Installing Windows on the second hard disk of a Linux machine

I recently upgraded the hardware of my old desktop PC, with the aim of providing the house with a new-ish Linux machine for watching movies and using the internet, and a Windows machine for writing music and playing (old) games. My plan was to use two hard disks: one for Linux, another for Windows, and choose which to use at boot time.

Normal procedure is to install Windows first, then install Linux into a spare partition on the same hard drive (Windows tends to overwrite any disk you put it on). But it's easier to get a Linux machine up and running, see what hardware you've got, and get a decent system without needing to go and find loads of old drivers. So I decided to install Linux first. I plugged in a drive for it as the Primary IDE drive, and installed Ubuntu Linux onto it.

Then, I unplugged the Linux drive, plugged the other drive in, and installed Windows 2000 onto the second drive (just to make sure Windows couldn't overwrite Linux). Got that working too.

Then I plugged the Linux drive in, as the first drive on the IDE cable; and the Windows disk as the second.

The trick then is to get grub (the Linux bootloader I'm using) to present you with both disks as options as boot time. There's a sample configuration in /boot/grub/menu.lst, but that didn't work for me: it looked like it was working, then just hung. I tried a couple of things, but nothing which worked.

Finally, I found this blog entry and used the configuration there. The trick is to make Windows think it's installed on the first disk on the IDE cable. I added this to the bottom of menu.lst:

title Windows 2000
rootnoverify (hd1,0)
map (hd0) (hd1)
map (hd1) (hd0)
chainloader +1

which does the trick! Now I get a working Windows 2000 option in my grub boot menu.

Creating a self-signed SSL certificate for Apache on Linux

(This is extracted from my Apache course materials, but it's a useful howto in its own right.)

To generate a self-signed SSL certificate, you will need openssl installed first.

Then follow these steps:

  1. Generate the server's private key; we'll use a 1024-bit key using the RSA algorithm:
    openssl genrsa -out server.key 1024
  2. Generate a certificate-signing request:
    openssl req -new -key server.key -out server.csr
  3. Fill in the required information at the prompts:
       Country Name (2 letter code) [GB]:GB
       State or Province Name (full name) []:.
       Locality Name (eg, city) [Newbury]:Birmingham
       Organization Name (eg, company) [My Company Ltd]:Talis
       Organizational Unit Name (eg, section) []:Library Products
       Common Name (eg, your name or your server's hostname) []:prism.talis.com
       Email Address []:.
       Please enter the following 'extra' attributes to be sent with your certificate request
       A challenge password []:.
       An optional company name []:.
    The really important one is the Common Name: this must match the domain name which will serve the SSL site; otherwise connecting clients will get a prompt about a mismatch between the certificate's host name and the actual host name of the server.

    Note that we left the password blank. If we don't do this, Apache will prompt you for the certificate password each time you start the server, which is a pain in the arse.
  4. Create a self-signed certificate from the certificate-signing request (.csr file):
    openssl x509 -req -days 3650 -in server.csr -signkey server.key -out server.crt
  5. rm server.csr (you don't need it any more)
  6. Put the .crt and .key files into Apache's SSL directory and configure Apache to use them

If I get round to it I'll do another entry explaining how to make Apache use them.

FRBR explained pretty well

I've been struggling for a while to understand FRBR. It's basically (I quote) a conceptual model for the bibliographic universe. At its core are concepts describing bibliographic "things": books, works, scores, audio books, novels, all that litter. But there are two odd vague things sitting in between Items (physical things you can hold) and Works (the broad idea of "a work of art", separate from how it occurs in the world): Manifestations and Expressions. I kind of understood the difference, but they seem to have smudged boundaries.

This comment on the futurelib wiki by jrochkind cleared up some of the confusion for me:

An item, is an actual individual concrete book in your hand.

A manifestation is the set of all items that are identical (or close enough) in physical form as well as content.

An expression is the set of all manifestations that are identical in textual or information content. (or close enough for our purpopes; an archeologist would consider the coffee stain on the back to be distinguishing information content; we do not).

And a work is the set of all expressions that well, consist of the same intellectual work. This is definitely a cultural concept, but it's one we have and find useful. We consider the audio book version of a book to be the same book, just a different version. That's work.

Thanks Jonathan.

Also ran across Ian Davis' translation of FRBR concepts to RDF. He's my boss.

And the Resource Description and Access cataloguing standard, which I hadn't encountered before. And by coincidence, a recent UKOLN guest lecture on RDA just appeared in one of my RSS feeds.

Most of this was triggered by a colleague tipping me off to eXtensibleCatalog, a new open source discovery layer for bibliographic data, built on Drupal (amongst other things). It has its own metadata format, plus tools for translating out of common library metadata formats (like MARC) into their own format.

It's quite fascinating, this whole library metadata lark, once you get your teeth into it.

Valuing your website

A bit of fun.

Check yours?

Seems roughly right. I haven't made that much out of it yet, though.

One of my favourite things about Wednesdays

On Wednesdays, my employer lets me work from home. I actually find this one of my most productive and enjoyable days of the week: without the distraction of the office, I find it easier to focus; plus I get to take my daughter to school and fetch her, which gives my wife the chance to do student visits, and gives me a chance to spend some time with my daughter.

On top of those great things, Wednesday is also the day when I receive a rather excellent email from 14tracks. This is a fine idea put together by the equally marvellous Boomkat music store: each week they send you a list of 14 tracks exemplifying a particular musical style, label, producer, artist etc., with short reviews, plus links to play previews and buy on Boomkat.

This week's selection is 14 tracks relating to Surgeon, the techno/dubstep producer. It's a great way to find out about new music, particularly if you're into electronica of any stripe.

West Midlands SME completes its migration to open source

Mercian Labels is a 40-year pedigree West Midlands SME, specialising in label printing. In early 2007 (when I was still working at OpenAdvantage), some Mercian Labels staff attended our courses on PHP and Asterisk held at OpenAdvantage (I may well have taught on the PHP one). Afterwards, assisted by Paul and Jono, my old colleagues, they began migrating as much of the business to open source as possible. They did this with considerable consultancy help from Senokian, a company I got to know well while at OpenAdvantage.

I've been following the Mercian Labels blog with great interest since the start of the process. Adrian Steele, the Managing Director, has painstakingly, honestly and openly described the whole transition, explaining the business costs and benefits, snags, setbacks, victories etc.: an invaluable resource for any other business doing a similar migration.

So I was pleased to read today that the migration is finally complete. Impressively, they've replaced Windows throughout their organisation, except for a few machines to run legacy software.

Excellent news. Well done to Adrian, Mercian Labels, and Senokian. I feel proud to have been a tiny part of it.

Dealing with self-signed SSL certificates when running Selenium server with Firefox

Selenium is a decent tool for testing web UIs, with good integration with a variety of languages. We use it on Talis Prism for testing the UI, running a Selenium server instance then firing Ruby rspec tests and an older HTML suite at it. Here's the part of the Ant build script which runs the HTML suite using Selenium :

<target name="prism-selenium-tests" description="Run the old Prism Selenium tests">
  <echo message="Running old Selenium tests against Prism" />
  <java jar="test/dependencies/Selenium/selenium-server.jar" fork="true" maxmemory="1024m">
    <arg line="-debug -timeout 500 -htmlSuite '*chrome ${firefox.bin}' http://${prism.host} \
       test/selenium/testSuite.html doc/seleniumResults.html" />

where the variables we interpose are:

${firefox.bin} = path to the Firefox binary to use
${prism.host} = HTTP host to run the tests against

This works without a hitch if you're not using HTTPS; but as soon as your tests redirect to an HTTPS URL on the same host (we serve parts of Prism over SSL), where your SSL certificate is self-signed, things go wrong. As Selenium effectively runs Firefox with a new profile every time, you potentially lose any certificate exceptions you might accept.

One technique we were using was to create a custom profile; run Firefox using that profile; browse to the HTTPS URL and accept the exception into that profile; then close the profile.

This kind of worked, but we still got odd popups from Firefox about new extensions being installed. Just annoying.

I think I've now worked out the solution, which was largely based on http://kapanka.com/2008/12/selenium-rc-firefox-and-the-self-signed-ssl-c.... It's a bit of a pain in the arse, but it does seem to work. Here goes.

  1. Close down any running Firefox instances.
  2. Start Firefox (the one you're going to run your tests with) with the profile manager: firefox -ProfileManager
  3. Create a new profile. You'll be prompted to choose a directory for the profile. Put it somewhere inside the project where you're writing the tests.
  4. Select the profile and run Firefox using it.
  5. Browse to the HTTPS URL (with self-signed certificate) you're going to be testing against.
  6. Accept the self-signed certificate when prompted. This creates an exception for it in the profile.
  7. Close the browser.
  8. Go to the Firefox profile directory.
  9. Delete everything in the directory except for the cert_override.txt and cert8.db files.
  10. When you run your Selenium server (like in my Ant example above), pass a -firefoxProfileTemplate /path/to/profile/dir argument to it. This tells Selenium to use your partial profile (with certificate exceptions) as a basis for minting its new profile. So you get the certificate exceptions, but without any of the other clutter you would get if you used a whole profile.

The Ant task above, with this option, looks like this:

<target name="prism-selenium-tests" description="Run the old Prism Selenium tests">
  <echo message="Running old Selenium tests against Prism" />
  <java jar="test/dependencies/Selenium/selenium-server.jar" fork="true" maxmemory="1024m">
    <arg line="-debug -timeout 500 -firefoxProfileTemplate test/firefoxProfile \
       -htmlSuite '*chrome ${firefox.bin}' http://${prism.host} test/selenium/testSuite.html doc/seleniumResults.html" />

Outside of Ant, the command might look something like:

java -jar test/dependencies/Selenium/selenium-server.jar -firefoxProfileTemplate /path/to/profile \
-htmlSuite '*chrome firefox-bin' http://host.com testSuite.html seleniumResults.html

Works for me.


Syndicate content