Official Debian Images on Amazon Web Services EC2

Official Debian AMIs are now on Amazon web Services

Please Note: this article is written from my personal perspective as a Debian Developer, and is not the opinion or expression of my employer.

Amazon Web Service‘s EC2 offers customers a number of Operating Systems to run. There are many Linux Distributions available, however for all this time, there has never been an ‘Official’ Debian Image – or Amazon Machine Image (AMI), created by Debian.

For some Debian users this has not been an issue as there are several solutions of creating your own personal AMI. However for the AWS Users who wanted to run a recognised image, it has been a little confusing at times; several Debian AIMs have been made available by other customers, but the source of those images has not been ‘Debian’.

In October 2012 the AWS Marketplace engaged in discussions with the Debian Project Leader, Stefano Zacchiroli. A group of Debian Developers and the wider community formed to generated a set of AMIs using Anders Ingemann’s ec2debian-build-ami script. These AMIs are published in the AWS Marketplace, and you can find the listing here:

No fees are collected for Debian for the use of these images via the AWS Marketplace; they are listed here for your convenience. This is the same AMI that you may generate yourself, but this one has been put together by Debian Developers.

If you plan to use this AMI, I suggest you read http://wiki.debian.org/Cloud/AmazonEC2Image, and more explicity, SSH as the user ‘admin and then ‘sudo -i‘ to root.

Additional details

Anders Ingemann and others maintain a GitHub project called ec2debian-build-ami which generates a Debian AMI. This script supports several desired features, an was also updated to add in some new requirements. This means the generated image supports:

  • non-root SSH (use the user ‘admin)
  • secure deletion of files in the generation of the image
  • using the Eucalyptus toolchain for generation of th eimage
  • ensuring that this script and all its dependencies are DFSG compliant
  • using the http.debian.net redirector service in APT’s sources.list to select a reasonably ‘close’ mirror site
  • and the generated image contains only packages from ‘main’
  • plus minimal additional scripts (nuder the Apache 2.0 license as in ec2debian-build-ami) to support:
    • fetching the SSH Public Key for the ‘admin’ user (sudo -i to gain root)
    • executing UserData shell scripts (example here)

Debian Stable (Squeeze; 6.0.6 at this point in time) does not contain the cloud-init package, and neither does Debian Testing (Wheezy).

A fresh AWS account (ID 379101102735) was used for the initial generation of this image. Any Debian Developer who would like access is welcome to contact me. Minimal charges for the resource utilisation of this account (storage, some EC2 instances for testing) are being absorbed by Amazon for this. Co-ordination of this effort is held on the debian-cloud mailing list.

The current Debian stable is 6.0.6 ‘Squeeze‘, and we’re in deep freeze for the ‘Wheezy‘ release. Squeeze has a Xen kernel that works on the Parallel Virtual Machine (PVM) EC2 instance, and hence this is what we support on EC2. (HVM images are a next phase, being headed up by Yasuhiro Akarki <ar@d.o>).

Marketplace Listing

The process of listing in the AWS Marketplace was conducted as follows:

  • A 32 bit and 64 bit image was generated in US-East 1, which was AMI IDs:
    • ami-1977f070: 379101102735/debian-squeeze-i386-20121119
    • ami-8568efec: 379101102735/debian-squeeze-amd64-20121119
  • The image was shared ‘public’ with all other AWS users (as was the underlying EBS snapshot, for completeness)
  • The AWS Marketplace team duplicated these two AMIs into their AWS account
  • The AWS Marketplace team further duplicated these into other AWS Marketplace-supported Regions

This image went out on the 19th of November 2012. Additional documentation was put into the Wiki at: http://wiki.debian.org/Cloud/AmazonEC2Image/Squeeze

A CloudFormation template may help you launch a Debian instance by containing a mapping to the relevent AMI in the region you’re using: see the wiki link above.

What’s Next

The goal is to continue stable releases as they come out. Further work is happening to support generation of Wheezy images, and HVM (which may all collapse into one effort with a Linux 3.x kernel in Wheezy). If you’re a Debian Developer and would like a login to the AWS account we’ve been using, then please drop me a line.

Further work to improve this process has come from Marcin Kulisz, who is starting to package ec2debian-build-ami into a Debian: this will complete the circle of the entire stack being in main (one day)!

Thanks goes to Stefano, Anders, Charles, and everyone who  contributed to this effort.

Resources

Courier IMAP and FAM

Last Friday, while tracking Debian Testing, the courier package was updated, and while authentication could be seen to be successful, actually using IMAP seemed to fail.

Turns out the FAM package was somehow to blame; installing fam and libfam0 was the solution. This uninstalled gamin for me. So if you’re pulling your hair out with a similar courier/imap issue, then perhaps have a look at the courier-imap mailing list.

Goodbye Linux.2.6.x

It’s taken some time, but now none of my personal Linux hosts (4 in total) are running the 2.6 kernel any more.

From the start (January) my company web host on Amazon EC2 has been running a 3.x kernel. My little Acer Aspire Revo low power home server, with attached disk pack that sits in my shed in a network cabinet has run 3.x for the last 6 months or so. My Linux laptop (Dell Studio 1558) which only recently got installed (and, since removing Windows, hasn’t overheated once!) went to 3.x immediately. And the last piece of the puzzel is a virtual machine I’ve had for many years with Bytemark.co.uk – they’re now offering a 3.2 kernel in their menu of selectable kernels.

Not that 3.x is that much different than 2.6.3x; but its a line in the sand of feature and security thats easy to identify. But with nearly 15 years of looking at a 2.x kernel, its about time we moved to 3.x!

Hurricane Electric IPv6 tunnel MTU

I’ve been running an IPv6 tunnel for a long time, but occasionally I’ve been seeing traffic hang on it. It looks like it was the MTU, defaulting at 1500 bytes, causing issues when large amounts of data were being shuffled OUT from my Linux box, back to the ‘net’.

The fix is easy: /etc/network/interfaces should have an “up” line for the interface definition saying: up ip link set mtu 1280 dev henet, where henet is the name of your tunnel interface.

Easy enough to skip this line if your tunnel appears to be working OK, but interesting to track down.

Debian Wheezy: US$19 Billion. Your price… FREE!

As many would know, Debian GNU/Linux is one of the oldest, and the largest Linux distributions that is available for free. Since it was first released in 1993, several people have analysed the size and produced cost estimates for the project.

In 2001, Jesús M. González-Barahona et al produced an article entitled “Counting Potatoes“, an analysis of Debian 2.2 (code named Potato). When Potato was released in June 2003, it contained 2,800 source packages of software, totalling around 55 million lines of source code. When using David A. Wheeler’s sloccount tool to apply the COCOMO model of development, and an average developer salary of US$56,000, the projected development cost that González-Barahona calculated to start-from-scratch and build Debian 2.2 in 2003 was US$1.9 billion.

In 2007 an analysis entitled ‘Macro-level software evolution: a case study of a large software compilation‘ by Jesús M. González-Barahona, Gregorio Robles, Martin Michlmayr, Juan José Amor and Daniel M. German was released. It found that Debian 4.0 (codename Etch released April 2007) had just over 10,000 source packages of software and 288 million lines of source code. This analysis also delved into the dependencies of software packages, and the update flow between Debian release (not all packages are updated with each release).

Today (February 2012) the current development version of Debian, codenamed Wheezy, contains some 17,141 source packages of software, but as it’s still in development this number may change over the coming months.

I analysied the source code in Wheezy, looking at the content from the “original” software that Debian distributes from its upstream authors without including the additional patches that Debian Developers apply to this software, or the package management scripts (used to install, configure and de-install packages). One might argue that these patches and configuration scripts are the added value of Debian, however the in my analysis I only examined the ‘pristine’ upstream source code.

By using David A Wheeler’s sloccount tool and average wage of a developer of US$72,533 (using median estimates from Salary.com and PayScale.com for 2011) I summed the individual results to find a total of 419,776,604 source lines of code for the ‘pristine’ upstream sources, in 31 programming languages — including 429 lines of Cobol and 1933 lines of Modula3!

In my analysis the projected cost of producing Debian Wheezy in February 2012 is US$19,070,177,727 (AU$17.7B, EUR€14.4B, GBP£12.11B), making each package’s upstream source code wrth an average of US$1,112,547.56 (AU$837K) to produce. Impressively, this is all free (of cost).

Zooming in on the Linux “Kernel”

In 2004 David A. Wheeler did a cost analysis of the Linux Kernel project by itself. He found 4,000,000 source lines of code (SLOC), and a projected cost between US$175M and US$611M depending on the complexity rating of the software. Within my analysis above, I used the ‘standard’ (default) complexity with the adjusted salary for 2011 (US$72K), and deducted that Kernel version 3.1.8 with almost 10,000,000 lines of source code would be worth US$540M at standard complexity, or US$1,877M when rated as ‘complex’.

Another Kernel Costing in 2011 put this figure at US$3 billion, so perhaps there’s some more variance in here to play with.

Individual Projects

Other highlights by project included:

Project Version Thousands
of SLOC
Projected cost
at US$72,533/developer/year
Samba 3.6.1 2,000 US$101 (AU$93M)
Apache 2.2.9 693 US$33.5M (AU$31M)
MySQL 5.5.17 1,200 US$64.2M (AU$59.7M)
Perl 5.14.2 669 US$32.3M (AU$30M)
PHP 5.3.9 693 US$33.5M (AU$31.1M)
Bind 9.7.3 319 US$14.8M (AU$13.8M)
Moodle 1.9.9 396 US$18.6M (AU$17.3M)
Dasher 4.11 109 US$4.8M (AU$4.4M)
DVSwitch 0.8.3.6 6 US$250K (AU$232K)

Debian Wheezy by Programming Language

The upstream code that Debian distributes is written in many different languages. ANSI C with 168,536,758 is the dominant language (40% of all lines), followed by C++ at 83,187,329 (20%) and Java with 34,698,990 (8%).

Line chart
Break down of Wheezy by Language

If you are intersted in finding the line count and cost projections for any of the 17,000+ projects, you will find them in the raw data CSV.

Other Tools and Comparisons

Ohcount is another source code cost analysis tool. In March 2011 Ohcount was run across Debian Sid: its results are here. In comparison, its results  appear much lower than the sloccount tool. There’s also the Ohloh.net Debian Estimate which only finds 55 Million source lines of code and a projected cost of US$1B. However Ohloh uses Ohcount for its estimates, and seems to be to be around 370 million SLOC missing compared to my recent analysis.

Summary

Over the last 10 years the cost to develop Debian has increased ten-fold. It’s intersting to know that US$19 billion of software is available to use, review, extend, and share, for the bargain price of $0. If we were to add in Debian patches and install scripts then this projected figure would increase. If only more organisations would realise the potential they have before them.

Need help with Linux (including Debian), Perl, or AWS? See www.jamesbromberger.com.