Technical Overview: History, Design and Use
© May 2001 James Bromberger, <james@rcpt.to>
Chapter 0
What is Debian
DEBIAN is an operating system, plus all the tools you could ever want to use, wrapped and bundled in easy to digest parcels, configurable, reasonably secure, tested, peer-reviewed, corrected, distilled, and then… given away in the true (old) spirit of the Internet and Open Source (or, as one colleague puts it, Open Sores).
The Debian organisation (www.debian.org) pulls together volunteers from around the globe to orchestrate a coherent operating system, drawing on freely available sources. Debian’s aim is to produce a complete operating system that is devoid of any restrictions on use or modification; Debian’s “Social Policy” and “Debian Free Software Guidelines” (DSFG) indicate that this is a requirement for the project, forever, and for any software to be considered a part of Debian, it must also be ‘free’ for anyone to use or modify as they please, a bold goal in a world where Intellectual Property (IP) and commerce is a key part of software development.
Debian was started around 1990 by Ian Murdock and Bruce Perens, and quickly became a favourite of the Linux community, whom previously had only used the *BSDs and Slackware. Its name is derived from Ian’s and his wife, Deborah. The version names that have been used for each release have been drawn from the Pixar movie “Toy Story”. So far we have had releases of “Bo” (Little Bo Peep), “Hamm” (the pig), “Slink” (the slinky toy), and have under development, “Woody” (the main character from the movie). Obviously the latest one has succumbed to the odd joke regarding the freezing and releasing of… .
There are two Debian flavours to choose from: the more popular GNU/Linux distribution, and the little known, and still very experimental GNU/Hurd distribution. The two centre around a different core part of an operating system: the kernel. The kernel is effectively the ‘main program’ of a computer, that looks after scheduling CPU time for other programs on the system, interacting with network devices, hard disks, memory, peripherals, and so forth. The Linux Kernel is written by Linus Tolvaalds, now a developer at Transmetta corporation, and a team of hundreds of wacky coders: Alan Cox from Red Hat, Rusty Russell (networking stack), Rick van Reil (sp?) (virtual machine), Donald Becker (most of the older NIC drivers), H. Peter Alvin, etc. The list goes on and on. It is reasonably stable, reliable, and yet continuing to grow. The alternative is the GNU/Hurd kernel (www.gnu.org/software/hurd). A project of the Gnu Free Software Foundation, it attempts to create a micro kernel, based on the old Mach kernel, and improve scalability and performance over the GNU/Linux kernel.
The Debian organisation has a parent organisation which controls financial transactions, etc, called SPI, or Software in the Public Interest.
The Debian GNU/Linux system is available to run on traditional PC (Intel archectecture x86 32-bit, or ia32) hardware, as well as Intel 64 bit (ia64), Sun Sparc and Sparc 64 bit, Mac m68k, Mac PowerPC, HP PA, MIS and MIS EL, and ARM CPUs. This makes it one of the most ported Linux distributions. Where possible, every package on one CPU chip-set is available on every other one.
There is much more to the Debian system that what I can give in a general background here; for more information, see www.debian.org.
Chapter 1
History of the Debian Package System.
SOFTWARE rarely comes as one file you find on your file-system. One typically finds some associated documentation, manual pages, copyright notices, system wide configuration files, user configuration files, cron schedules, inetd entries, log files, … the list goes on and on.
Traditionally the distribution of these programs had been in “Tarballs”, compressed Tape Archives (.tar.gz). The tarballs usually has a path relative to where they were unpacked, inheriting the ownership and umask (file permissions) of the person who actually unpacked the tarball. What you got from the tarball is just a set of files.
What was seen was a need for a system that could create installable images of software, that could have configuration scripts that would act on behalf of the package and correct settings upon installation (or removal), be upgradable in place, have file permissions and ownership set properly, and be easily maintainable.
This was part of the initial specification that Ian and Bruce used. Hence, a binary file format containing the package, and various control information was required. The named the package format a “.deb”, and created a low level tool (dpkg) and user tool (dselect) to manipulate them.
Unfortunately the user tool, dselect, was harder to use than a light bulb in a power blackout; luckily dpkg was good enough that many people just used it directly.
Using dpkg is really quite simple; while there are a hoard of options and ways in can be use, I found that there were five that I normal used. Just before I go through those, let me point out that a package has a name, a version, and a filename, where the filename is usually similar to the package name and the version with “.deb” on the end (unless you have these packages on certain file systems where the filenames are 8.3 *cough*). Eg: the package I currently maintain is “libapache-mod-backhand”, version is “1.1.1-0.pre3-7”, and the file name is “./libapache-mod-backhand_1.1.1-0.pre3-7.deb”, or whatever the path happens to be to the file. I can rename that file to whatever I like, it still contains libapache-mod-backhand version 1.1.1-0.pre3-7.
Install a package:
dpkg -i filename.deb
Which would either install, or after unpacking and getting ready to configure would complain about missing dependencies (see later), which meant I had to install something else it needed first.
Remove a package:
dpkg --remove package_name
Which will remove the executables and documentation, but would leave behind any of the configuration files that were installed (and possibly changed by the administrator). Even the configuration files can be removed if you purge the package:
dpkg --purge package_name
To list the contents of a currently installed package:
dpkg -L package_name
Or to find which package owns a file:
dpkg -S /path/to/file
Chapter 2
Features of current packages
ALL of the interesting parts of the package are really in the “control” section. This is where the version numbers, name, installation and removal scripts are located.
One concept that has really put Debian at the front of the packaging pack has been the idea of interdependencies between packages, and the “virtual package”. The idea is quite simple, and saves a lot of time and headaches for weary sysadmins.
A package may depend on another package if that other packe is required for the first package to run properly. Derr! Lets give an example, because that sounds like rubbish to me. Lets take an example of “grep”, the Gnu Regular Expression Parser utility. It is in a package; you want grep, install it (it a part of the base system, so don’t worry about it). Grep is a pretty simple utility, but it does depend on something that a large number of things need: the C library (libc).
Indeed, the dependency can be stated in one of two ways: either the depended-upon package needs to be installed for this to run, or more severely, needs to be already installed and operational before the new package is even unpacked. This is the difference between “depends” and “pre-depends”. In the case of “grep(1)” above, it says:
Pre-Depends: libc6 (>= 2.1.2)
You can draw a diagram of the dependencies between packages and see it is a connected tree or web (this is probably an interesting project if anyone is bothered; I think it was Rusty Russell who did this for Linux.conf.au in Sydney, Jan 2001, of the function calls in the Linux Kernel).
On top of this web of dependencies, we can specify that a package “provides” some functionality. Perhaps there are a number of packages that provide this functionality, and only one can be installed. For example, mail transport applications (MTA). You can have (normally) one MTA on one system at any given time, since an MTA is listening on the ‘standard’ remote port (port 25), and only one program can listen to any one port at a given time[1]. Thus, the control fields say:
Replaces: mail-transport-agent
Provides: mail-transport-agent
Now, there is no package called “mail-transport-agent”, but this functionality is provided by several packages (send mail, email, and probably version 12a of GNU hello(1)[2]).
[1] I once saw a job agency put an add out asking for people to apply, but set a challenge for prospective job seekers; you had to find out how to email them. After finding the correct server, you had to find the MTA, which was not running on port 25 but some ‘high port’. The trick to that is that you cannot use a regular mail client to send to it; the easiest thing is to speak RFC822 SMTP to it via telnet(1).
[2] If you are an author of GNU hello, one contraction for you: DON’T!
Chapter 3
Structure of the package
SO, what went into a Debian package. Well, basically it is a concatenation of a version header, a control information tar.gz, and a payload tar.gz. This was formalised a bit more in the revision when the Debian Package 2.0 format was created. You can see all the Gary details in the manual page, deb(5). Basically, the format now starts with a file specification and version information, and tarball with control information, and the payload tarball. You can see these sections with the following commands:
parrot ~> nm libapache-mod-backhand_1.1.1-0.pre4.6_i386.deb
nm: debian-binary: File format not recognised
nm: control.tar.gz: File format not recognised
nm: data.tar.gz: File format not recognised
You can see the contents of the first section quite easily using ar(1). Its contents is ASCII:
parrot:~> ar -p libapache-mod-backhand_1.1.1-0.pre4.6_i386.deb debian-binary
2.0
Which corresponds to the 2.0 of the package file format. The other two sections are much bigger, and probably of more interest, but this first part is important in that it is what enables the format to be changeable over time in a controlled and reliable manner. Every new tool is expected to read and use this “debian-binary” archive to understand the rest of the package.
Now, ar(1) and nm(1) are GNU system tools; Debian recommends that people use dpkg-deb(1) for playing with these things, because dpkg-deb knows how to read these fields and handle things for you, so the format can change in the future and you will be protected from the pain of getting lost!
The control information is, however, very simple to understand. It is ASCII text, with a one word label, a colon, and then a (set of) value(s), with continuations if the subsequence line starts with a space (which I guess was deemed better than using backslashes (\) at the end of the current line).
Here is the control information from my package:
parrot:~> dpkg-deb -I libapache-mod-backhand_1.1.1-0.pre4.6_i386.deb
new debian package, version 2.0.
size 61134 bytes: control archive= 1221 bytes.
564 bytes, 13 lines control
1133 bytes, 14 lines md5sums
313 bytes, 8 lines * postinst #!/bin/sh
222 bytes, 6 lines * prerm #!/bin/sh
Package: libapache-mod-backhand
Version: 1.1.1-0.pre4.6
Section: web
Priority: optional
Architecture: i386
Depends: libc6 (>= 2.2.2-2), libdb2 (>= 2:2.7.7-4), apache (>= 1.3.19-1)
Installed-Size: 208
Maintainer: James Bromberger <james@rcpt.to>
Description: Load balancing module for Apache web server
mod_backhand is project that allows seamless redirection of HTTP requests
from one web server to another. This redirection can be used to target
machines with under-utilized resources, thus providing fine-grained,
per-request load balancing of web requests.
A few more pieces of self-explanatory information are present. I wont go on about them, but I will talk about the first few lines. Version, sure, we can see that above using ar(1). Size of the archive and the control data, OK, thats somewhere useful and we know what that is. Next comes the information on what’s in the control tarball. There is a file called “control”, the contents of which is the rest of the screen above, a file containing md5sums of scripts and package payload, a post installation script, and a pre removal script.
There are four possible scripts that can be used in a Debian package:
- Pre installation
- Post installation
- Pre removal
- Post removal
Which would do things like:
- Start up a service after installation (postinst)
- Shutdown a service before removal (prerm)
- Download external files before installation (preinst, eg realplayer)
One of the newer features to be created in newer package is debconf. The problem it addresses is sys admin laziness. 😉 Traditionally when a set of packages are being installed, dpkg will run the post install script after each one, where those scripts may prompt the admin for configuration settings. Debconf defines a framework in which these questions can be in a control tarball, extracted off before the installation of the package, and prompted for all together before anything is installed. Thus, the admin answers all the questions, then goes back to more important things in life: harassing users[1]. Post install scripts still run after each package is run, but use the answers supplied to debconf at the beginning of the install run.
[1] See BOFH.
Chapter 4
Security
MOST of the people I know who use Debian have updated their systems off of the Internet. Some people have become very concerned that they don’t know if they can trust their downloaded packages, so signing is being implemented. What makes this easy is Debian’s stance on having strong developer authentication in place by the use of PGP or (preferably) GPG. People are worried that packages they get may be altered and not clean (ie, you run www.fbi.gov, you get packages from ftp.us.debian.org, but someone has poisend your DNS, and you don’t realise!). Signing packages, along with the Debian key ring, will give you a reassurance that the package has not been tampered with.
This is an advantage. I personally have machines with cron entries of:
apt-get update
apt-get upgrade -d
Which runs weekly to update the list of packages, and download (but not install) those that have been updated. Why: laziness. I can’t be bothered to wait while they download, especially when this link was a 56 K modem. So if this fires up at 3 am, it can download the new X Windows packages (50 MB) without me watching it for a few hours. I can then come along at 10 when I wake up, and see the list of new packages, and then install them (or check them first, md5sum them, etc). Signing these packages will give me an extra assurance that they are kosher.
Chapter 5
Apt.
THE Advanced Package Tool. Really quite easy to use, and something that Red Hat and Eazle are now getting into. Debian is available on around 50 archives worldwide, and many more private archives. A sys admin plugs in an ordered list of archives into a configuration file, and apt can get list of the packages available at each one, download newer version, resolve dependencies and get required packages to satisfy those dependencies and install them.
The commands are quite simple, and in reality, comes down to the following things:
- Your sources are in /etc/apt/sources.list
- You get a list of packages using apt-get update
- You download and install using apt-get upgrade
Repeat steps 2 and 3 as needed (weekly/daily). Apt keeps a cached copy of (partially, or completely) downloaded packages, so you can uninstall and reinstall and not download a second time. Some people have even NFS exported apt cache directories to share it between machines. Probably possible, but why not use a proxy cache? *shrug*
Chapter 6
Alien
PROBLEM: everyone with a commercial product supporting “Linux”, releases an RPM. SOLUTION: Alien.
Alien is a tool that groks the RPM package format, rips it open, reformats the control information, and re packs it into a Debian archive. Its pretty rudimentary; no dependencies or conflicts are allowed for, so you can get into trouble here, but it is convenient. You can remove the package after you have finished with it if you so desire, and dpkg will remove it faithfully.
Chapter 7
New Features in 1.9
Woody has a new version of dpkg, 1.9. Aside from speedups and clean ups and avoiding SEGFAULTs, it is portable to OpenBSD (haha!), support for GPG signed packages (above), and a dpkg configuration file has been added. Dpkg is the crux of the system, and although personally I have had no problems using it since 1994, this rework is aparntly quite big and important. The change log entry for it is quite large, and I recommend those interested to glance over it (/usr/share/doc/dpkg/changelog.Debian.gz).
Chapter 8
Creating Packages
There are two types of packages: binary and source. Source are the newer, and it is recommended that everyone make source packages where possible. Binary packages are compiled by hand on a system, sorted into order, and then archived into a .deb for a specific architecture (i386, etc).
Source packages take this back a step. You arrange your source in such a way that Debian tools can unpack the source, run configure scripts, make files, and everything needed to non-interactively compile the binary files required, assemble a binary package, and create the archive.
The advantage of source packages becomes very clear when you look at all of the architectures supported. In place of you manually creating your binary package on each machine, you can submit it to an autobuild system (Debian’s one is called Katie; I don’t know why…). Source packages are given to Katie, and it/she gives back an array of platform specific binary packages.
In creating packages, you need to adhear to the Debian Policy for file system layout. Much debate rages on the appropriate debian lists on who is putting what into where in the Debian file system tree, and while this may seem petty, it is this clear direction and consistency that makes Debian so easy to use. The policy covers items such as shared libraries, configuration files, Perl modules, etc.
For more information, see Debian.org, the manual page for deb(5), the manual page for dpkg-deb(1), the manual page for dpkg(1), the manual page for apt(1).