MySQL views versus tables

An annoyance: at work we’ve been using MySQL 5.0.x for a reasonably high volume service. We recently swapped a table we were using from MyISAM, to a view containing several normalised MyISAM tables. Despite having indexes to make the joins in the table efficient, it seems that a SQL query with a “NOT IN (x, y, z)” style WHERE clause performs so much worse than the non-normalised original MyISAM table. So while a view may look equivalent to the original table, it won’t perform the same for all queries.

EasyCAP 4 channel input USB Video Capture

Video capture card; a gotcha

Once again, life proves when it comes to commercial offerings, you get what you pay for.

I found a £13 (~US$20) USB device with 4 composite video inputs, and one audio input, called an EasyCAP 4 channel device, on Amazon. I was planning on streaming 4 video captures from it at some quality. The unit turned up very efficiently, and emblazoned across the stick is “4 CHUSB DVR”, and a neat little Win32 utility called Multiviewer 2.0, by Zhong Kai Ran.

Fire up the software and an impressive 4-screen display is shown, however, its not until you plug in a video camera to one input that you can see that the device rotates through each of the video inputs; it doesnt appear to be able to stream all four simultaneously. A pity. So, if you’re looking for a “poor man’s multi-camera CCTV”, this is a very good choice, but be aware that you’ll only get (configurable) number of seconds from each channel in sequence (or selectable) but not a live continuous stream of all four inputs.

At least, that’s my experience under MS Windows thus far. I haven’t yet thrown it at a Linux system yet to see if it detects the device and/or it works any differently, but I’d suspect not.

Either way, it’s a start at what I’m looking to do anyway, so I am not too disappointed.

Next up, a small PC to house it in that I can run my GStreamer video processing on… hmmm… I want an Asus Eee Box B204 with its cute built in HDMI… its flirting with me promising Full-HD 1080p, but I guess if/when I have a spare £300 to play with I may find its also a little short!

Debian 5.0 Lenny is out

As of around 23:00 UTC today, Debian GNU Linux 5.0 is kind-of out and about. The official web site isn’t updated yet, and final CD images are being generated now, but the symlink of stable has come to rest on Lenny; testing is now Squeeze, and dear old 4.0 Etch is oldstable. For those not aware, the code names are all characters from Disney/Pixar’s Toy Story.

Lenny runs on 11 different CPU architectures, including the standard i386 32-bit and its equivalent 64-bit (AMD64). It ships with kernel 2.6.26. It has MySQL 5.0.51a (which has interesting STATISTICS table in the INFORMATION_SCHEMA).

Some of the gotchas that may come up:

  • Kernel 2.4 is dropped
  • Firmware for various devices may have been split out into separate packages; example, on an HP BL460 blade system, you’ll need to install firmare-bnx2
  • Apache 1 has been dropped; use Apache 2

See this for a rough summary and the release news story for more.

Logging to MySQL in 3rd Normal Form

I’m at it again with my Log3NF! When last I did this, Debian‘s Perl packages were in no shape for using MySQL stored procedures, but time has passed and everything is ready….

Any web server software, like Apache, can log requests that come in when people browse sites. Typically people record the accesses and do statistical analysis on it – to see visitor numbers, people stealing graphics, preferred browser versions of the visitors, where people are being linked-to from, etc. All of this data can be quite voluminous, and much of it is repetitive.

For a long time there has existed the ability to log this data to a simple flat MySQL (or other) database. However, most of those implementations have used just one table to store all the records in a log line. This means the data still has to be split apart for analysis.

So, what have I done? Well, I have written a bunch of table structures to handle each component of a standard “combined” log file, and a table that joins each of these components of a log line together. Plus I have written some table structures to hold summary data of this, so over time I can delete the original log entries and just keep the summaries. Then I have written some stored procedures to parse the incoming log entry and split it into these tables, and update the summary statistics. Here’s the main table that ties everything together – you’ll see it’s indexed in every way possible, so you cna see the possibilities for reporting from it…

CREATE TABLE Access (
ID bigint unsigned auto_increment primary key,
IPv4 int unsigned not null,
index index_IP(IPv4),
Ident_ID int unsigned,
User_ID int unsigned,
At datetime not null,
index index_At(At),
Protocol_ID tinyint unsigned,
index index_Protocol_ID(Protocol_ID),
Method_ID tinyint unsigned not null,
index index_Method_ID(Method_ID),
Status_ID tinyint unsigned not null,
index index_Status_ID(Status_ID),
Path_ID bigint unsigned,
index index_Path_ID(Path_ID),
Referer_ID bigint unsigned,
index index_Referer_ID(Referer_ID),
UserAgent_ID bigint unsigned,
index index_UserAgent_ID(UserAgent_ID),
Bytes int unsigned,
index index_Bytes(Bytes),
Server_ID smallint unsigned,
index index_Server_ID(Server_ID),
Site_ID smallint unsigned,
index index_Site_ID(Site_ID),
Timezone_ID tinyint unsigned not null
);

This supports having multiple web sites logging to it (think virtual hosting several sites) and server farms (multiple servers for big web sites, distributed global delivery).

Next up, I wrote a small script to load a pre-existing access log using this stored procedure. But thats rather slow, so I have written a “Log Handler” for Apache 2 with Mod_Perl 2. This means that as each access is performed, it is logged live to 3rd normal form in MySQL. The handler is very brief:

package JEB::Log3NFHandler;
use strict;
use warnings;
use Apache2::RequestRec ();
use Apache2::Const -compile => qw(OK DECLINED);
use Apache::DBI;
use Time::Zone;
my $dbh;

sub handler {
my $r = shift;
$dbh = DBI->connect('dbi:mysql:database=' . $r->dir_config("Log3NFDatabase"),  $r->dir_config("Log3NFDatabaseUser"),  $r->dir_config("Log3NFDatabasePassword")||"") unless $dbh;
return Apache2::Const::DECLINED unless $dbh;
my $sql = "call Log3NF(?, ?, ?, from_unixtime(?), ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)";
my $sth = $dbh->prepare($sql);
$sth->bind_param(1, $r->connection->remote_ip);
$sth->bind_param(2, "-"); # Ident
$sth->bind_param(3, $r->user());
$sth->bind_param(4, $r->request_time());
$sth->bind_param(5, $r->protocol());
$sth->bind_param(6, $r->method());
$sth->bind_param(7, $r->status());
$sth->bind_param(8, $r->uri());
$sth->bind_param(9, $r->headers_in->get('Referer')||'-'); # Referer
$sth->bind_param(10, $r->headers_in->get('User-Agent')); # Useragent
$sth->bind_param(11, $r->bytes_sent()); # Bytes
$sth->bind_param(12, $ENV{'SERVER_NAME'}); # Server name
$sth->bind_param(13, $r->hostname()); # Site name
#tz_local_offset()/60
$sth->bind_param(14, "+0000"); # Timezone
$sth->execute();
$sth->finish;
return Apache2::Const::OK;
}
1; # modules must return true

You’ll notice the Timezone set to “+0000”; while the TZ variable in Mod_Perl says a location (“Europe/London”), it doesn’t give an offset from GMT. I’m also always logging ident as “-“, since I cant see how Mod_Perl makes that available. The configuration of the Database, DB User and Password are all taken from the Apache configuration file from the PerlSetVar directive.

With this data in 3rd normal form, viewing it means several joins, or making use of another of the newer facilities that saw daylight in MySQL 5.1: views. So a couple of views sit around to make this data easily accessible.

With this data being stored as it happens, I wrote a CGI script to render this data – to give me some graphs of the last 5 minutes of activity, in real time. In fact, its dynamic, so I can zoom in to the last 5 mins, or out to the last 800 minutes. This real-time analysis shows HTTP status codes, popular paths being requested (by hits and by bytes), plus a per-minute hits and bytes.

But there’s more… lets to some analysis on where these hits are coming from. MaxMind distribute a free Country CSV database that shows roughly where all these IPs are coming from. We load this CSV into a normalised form, and start to integrate this into the live and summary tables…

… at least, that’s where I am up to now.

I’ve been looking at this approach since around 2002, when I had to perform all the normalisation in client-side Perl. But abstracting away the normalisation into the MySQL stored procedure makes this much neater, and less prone to inconsistencies (the client doesn’t have to update the main table and ensure it puts in the correct foreign keys).

I will put this code up for public consumption soon, so if you’re interested in 3rd normal form logging, drop me an email!

The intelligent mailbox

Close up of the IR \"beam break\" circuitFor some time I have been looking to create an intelligent mailbox and security entry interface; something that has several functions, and interfaces back to a PC, which will send messages and log data. Now amongst all of this is sitting my Arduino Decimellia microprocessor, which leads me to the electronics that play with these ideas; while this may not be the perfect unit for this, it is an easy start to playing with microprocessors.

Let me start with a dream list of functions to happen at/in/near my mailbox:

  • Detect the presence of snail-mail in the mailbox: give a visual indication on the outside that I should stop and empty the mailbox, and send an alert to the network of the arrival of mail (and an alert when the mailbox is emptied).
  • Optionally count the number of deliveries of mail I have waiting, keeping count until I empty the mailbox
  • Read an RFID card to authenticate someone at the gate, and present this serial number to the network
  • Optionally read a fingerprint scan from someone at the gate, and present this as a unique serial number to the network, just like for the RFID card above
  • Read an RFID card from the vehicle in the driveway, and present this to the network
  • Detect if the gate is fully open or fully closed
  • Close the circuit to activate the gate to move (just like pressing the remote control button)
  • Detect via PIR any movement near the mailbox, and report that to the network

The next bit gets a little tricky to do with a microprocessor, so probably involves putting components in/on the mailbox, but taking the feed back to a PC for processing:

  • Have a microphone, speaker, and video camera capture the audio and video of the person, and essentially hook them up to a SIP phone that will call a SIP destination if the person is not authenticated (ie, they are a visitor pressing a traditional “doorbell”).

In theory, someone who fails authentication or presses a traditional doorbell would be connected (as in, a SIP call with the camera/mic/speaker to a call group that starts with video phones and displays in the house, and then (depending on rules such as time-of-day, etc) cascade to a 3G video cell-phone, potentially. Thus if I am home, I can see who is at the gate, and then give a response to the control system to tell the Intelligent Mailbox to throw the circuit on the gate to open it (if it is not open already), and furthermore, close the gate again.

Indeed, let’s go one better:

  • Detect the presence of an object in the path of the gate, so we know if it can be safely opened/closed
  • Detect any PIR movement inside the gate so it can send an alert to the network

I’m ignoring the “one better” bits for a “phase 2” while I concentrate on the “easier” bits for “phase 1”. So what have I got thus far:

  • An IR transmitter and receiver pair from Maplin; CH10L and CH10M
  • A pair of LEDs to visually debug if the send signal is being sent and the receive path is activating
  • A transistor, a few resistors, and voilà…

… it flashes the IR LED, reads the high values from the IR receiver and averages them a bunch of times, then goes low, reads the low values a few times and averages those, and then looks for significant deviations during the “on” cycle; a simple “beam break detection” circuit; aside from the obvious GND and +5V connections, we have one PIN of the Decimillia being used to turn the IR LED on and off, and one analogue pin used to read the IR receive circuit.

In ignoring the “count” of the mail deliveries, and just opting for “there is mail” or “there is no mail”, I can place the IR emitter no the base of the mailbox (inside, of course) and the IR receiver on the roof (inside, obviously), and any letters that are sitting in the mailbox should obscure the light path. When that happens, the Arduino does a serial write, with an API I am yet to think through, basically saying “You’ve got snail mail”.

The Arduino could live in the mailbox, and have a long, long USB or serial cable going from the mailbox, under the ground, and up the driveway (in conduit), to the house, up to the roof, and across the crawlspace to a silent low power PC (Eee box, Mini ITX, etc). Hence when the driveway comes up for replacement (its sinking all over the place, I am thinking of some heavy duty conduit runs under it (as Anand, one of my mates, has previously done). Perhaps a concrete pipe with multiple plastic conduit runs? Only needs to be around 25 metres long…

What’s next; well, I have just purchased a 555 timer and a 339 voltage comparator IC. Now, this seems geeky, even for me, but it may be possible to make the 555 and the 339 do all the logic of the timing of the flashes and the detection of the beam break, and reduce this down to just one DIGITAL pin on the Microprocessor; if the voltage is high, then the beam is broken. That reduces down the amount of code I will have loaded into the Arduino, which is a bit limited anyway, especially if I want to cram a whole bunch of other functions (see above) on the other pins.

And through all this, while I (and my family) have been having a terrible cold, all over Christmas. Oh well, only human.