MD5s

Using an MD5 Digest (or md5 sum) i a neat way of building a predictable key for data. Obviously there is the issue of MD5 collisions (there two completely different source data both produce the same MD5 Digets), but unless you’re building medical or safety equipment, for general text manipulation its pretty negligible.

However, MD5s can be represented in several ways. Lets discount the binary envoding of the 128-bits (16 bytes) of data as thats rather cumbersome, and if you’re storing this in a database such as MySQL, there isn’t a 16 bytes numeric data type; BIGINT is 8 bytes, so you’d have to use two BIGINTs and do lots of horible stuff.

That brings us to the base encodings. Base 16, or hexadecimal, would require us to use a text data type to store the results – as the base16 encoding will contains the numbers 0-9, and the letters A-F (or a-f – the case is irrelevent/insensative in base 16). It would be 32 “characters” long. We can stuff that in a column with no trouble (char(32)).

We can also use a Base 64  encoding, using upper and lower case letters and a few symbols as well as numerals 0-9. This comes to 22 characters (you’ll sometimes see == appended to a Base64 to make it 24 characters). Using 22 chars as a key instead of 32 is 31.25% less data. That makes your indexes that much more compact as well as the column data.

It may not be a perfect primary key, but its possibly reasaonable. But then comes the question of converting between Base16 and base 64. Here’s one way:

#!/usr/bin/perl
use strict;
use warnings;
use Digest::MD5;
use MIME::Base64;

my $data = "foobarbasbifffoobarbasbiff";
my $md5_base64 = Digest::MD5::md5_base64($data);
printf "%s in base64 as hex: %s\n", $md5_base64, unpack('H*', MIME::Base64::decode_base64($md5_base64));

Rusty is coming to Perth to talk!

If you’ve not seen any of the PLUG news, then you may not know that PLUG is organising for Rusty Russell to come to Perth for our October presentation. This is the first time that PLUG has flown a speaker in to Perth, and if successful, probably wont be the last time! The cost of doing this is being split amongst attendees by way of ticket sales at $20/member, $50/non-member (and PLUG membership is $20/full, $10/concession, so you can work out what’s cheaper!). In order to help us with out budget, door sale tickets on the night will be $50 for everyone, so please get your ticket now (details).

On top of this, free tickets are being given to UWA 3rd year Comp-Sci students… except… PLUG committee has just voted to extend this to all full time tertiary computer science students at any university. You still need to get a ticket – email tickets@plug.org.au to request one. This event will be on Tuesday October 11th, starting from 6:30pm (doors open 6:00pm).

For those unable to attend, PLUG will again be endevouring to live-video stream the session, and make the recording available afterwards.

Liboping install

I use Florian octo Forster’s fantastic parrallel ping library liboping, and over the last 9 months or so have contributed a few minor bug fixes (mostly build related stuff, so not exactly core to the library’s code). I use this for doing parallel pings of hosts and gathering stats, as part of a wider heuristic for site web speed. Of course not all hosts respond to ping, but thats fine; some do.

I went to build vresion 1.6.1, and found that it installed into /opt. This was from source as I was on a platform that is not Debian (otherwise I’d just use the package and be done with it). I found that by default, everything had been put into /opt. I prefer using /usr/local, so I built it with:

./configure --prefix=/usr/local && make && make install && make install-data

The reason for the last one is that I want the headers installed. I then want to make the Perl library; and for that I need the linker to be able to find the library. I found that I need a file in /etc/ld.so.conf.d/ that lists this location, so along came “echo /usr/local/lib > /etc/ld.so.conf.d/usrlocal.conf“, followed by a quick ldconfig run to update the paths. I could then cpan -i Net::Oping.

Why blog this? Because every few months I have to try and remember it. Now I just need to remember that I blogged it.

With thanks for Florian octo Forster for the library, of course.

MySQL UTF8 and Perl

It’s been quite annoying; DBI and DBD::MySQL seems to default to Latin 1, and it appears that the client side way of “updating” to UTF8 is to issue “USE NAMES utf8” as ytour first query when you connect to MySQl (in my case, 5.5.x). The alternate is to tell the server that it should automatically do this query each time a client connects, or alternatively, disable encoding negotiation and use everything as UTF8. Here’s a few links I found useful:

And a quote from the second:

[mysqld]
default-character-set=utf8
default-collation=utf8_general_ci
character-set-server=utf8
collation-server=utf8_general_ci
init-connect=’SET NAMES utf8′

[client]
default-character-set=utf8

As you’ll see in the first link above (Stackoverflow), adding params with spaces to Amazon RDS is a little tricky from Win platform – and you have no choice but to use the CLI tools for RDS to do this.

MySQL varchar not case sensative

I managed to overlook an issue with creating a varchar column in an app I have been working on. I have basically got a normalised table, with a forien key to a table of values. In this case, its a set of HTML Document Titles, keyed off an autoincrement column called Title_ID. What I want to do is look up a title, and get a Title_ID back.

Great; I can do this with a stored function, which I did, and it worked. But it was slow. So I decided that I’d normalise these en-mass with one big INSERT statement to the normalised table (protected by a unique index constraint), and then store the resulting Title_ID.

There be dragons. As one title came through as “[Q] help me” it was duly inserted and given a Title_ID.
However, when a lower case “[q] help me” came through, it matched as a duplicate of the original and therefore was not inserted again. I then pulled the strings into a Perl hash, and of course, couldn’t find a key with “[q] help me”, only “[Q] help me”.
Turns out that the issue was my column definition. varchar(x) is not case sensative. varchar(x) binary is. The Unique index I had on here was doing its job and comparing values based upon the case in-sensative column – not its fault.

ALTER TABLE Titles CHANGE COLUMN `Title` `Title` VARCHAR(600) BINARY NULL DEFAULT NULL ;

And now I see my column as “`Title` varchar(600) CHARACTER SET utf8 COLLATE utf8_bin DEFAULT NULL“.