AWS CodeBuild: Lambda Support

A few days ago, AWS announced Lambda support for their Code Build service.

Code build sits amongst a slew of Code* services, which developers can pick and chose from to get their jobs done. I’m going to concentrate on just three of them:

  • Code Commit: a managed Git repository
  • Code Build: a service to launch compute* to execute software compile/built/test services
  • Code Pipeline: a CI/CD pipeline that helps orchestrate the pattern of build and release actions, across different environments

My common use case is for publishing (mostly) static web sites. Being a web developer since the early 1990s, I’m pretty comfortable with importing some web frameworks, writing some HTML and additional CSS, grabbing some images, and then publishing the content.

That content is typically deployed to an S3 Bucket, with CloudFront sitting in front of it, and Route53 doing its duty for DNS resolution… times two or three environments (dev, test, prod).

CodePipeline can be automatically kicked off when a developer pushes a commit to the repo. For several years I have used the native Code Pipeline service to deploy this artifact, but there’s always been a few niggles.

As a developer, I also like having some pre-commit hooks. I like to ensure my HTML is reasonable, that I haven’t put any credentials in my code, etc. For this I use the pre-commit hooks.

Here’s my “.pre-commit-config.yaml” file, that sits in the base of my content repo:

repos:
 -   repo: https://github.com/pre-commit/pre-commit-hooks
     rev: v4.4.0
     hooks:
     -   id: mixed-line-ending
     -   id: trailing-whitespace
     -   id: detect-aws-credentials
     -   id: detect-private-key
 -   repo: https://github.com/Lucas-C/pre-commit-hooks-nodejs
     rev: v1.1.2
     hooks:
     -   id: htmllint

There’s a few more “dot” files such as “.htmllintrc” that also get created and persisted to the repo, but here’s the catch: I want them there for the developers, but I want them purged when being published.

Using the original CodePipeline with the native S3 Deployer was simple, but didn’t give the opportunity to tidy up. That would require Code Build.

However, until this new announcement, using code build meant defining a whole EC2 instance (and VPC for it to live in) and waiting the 20 – 60 seconds for it to start before running your code. The time, and cost, wasn’t worth it in my opinion.

Until now, with Lambda.

I defined (and committed to the repo) a buildspec.yml file, and the commands in the build section show what I am tidying up:

version: 0.2
phases:
  build:
    commands:
      - rm -f buildspec.yml .htmllintrc .pre-commit-config.yaml package-lock.json package.json
      - rm -rf .git

Yes, the buildspec.yml file is one of the files I don’t want to publish.

Time to change the Pipeline order, and include a Build stage that created a new output artifact based upon that. The above buildspec.yml file then has an additional section at the end:

artifacts:
  files:
    - '**/*'

This the code Build job config, we define a new name for the output artifacts, int his case I called it “TidySource”.However there was an issue with the output artifact from this.

When CodeCommit triggers a build, it makes a single artifact available to the pipeline: the ZIP contents from the repo, in an S3 Bucket for the pipeline. The format of this object’s key (name) is:

s3://codepipeline-${REGION}-{$ID}/${PIPELINENAME}/SourceArti/${BUILDID}

The original S3 Deployer in CodePipeline understood that, and gave you the option to decompress the object (zip file) when it put it in the configured destination bucket.

CodeBuild supports multiple artifacts, and its format for the output object defined from the buildspec is:

s3://codepipeline-${REGION}-{$ID}/${PIPELINENAME}/SourceArti/${BUILDID}/${CODEBUILDID}

As such, S3 Deployer would then look for the object that should match the first syntax, and fails.

Hmmm.

OK, I had one more niggle about the s3 deployer: it doesn’t tidy up. If you delete something from your repo, the S3 deploy does not delete from the deployment location – it just unpacks over the top, leaving previously deployed files in place.

So my last change was to ditch both the output artifact from code build, and the original S3 deployer itself, and use the trusty aws s3 sync command, and a few variables in the code pipeline:

- aws s3 --region $AWS_REGION sync . s3://${S3DeployBucket}/${S3DeployPrefix}/ --delete

So the pipeline now looks like:

You can view the resulting web site at https://cv.jamesbromberger.com/. If you read the footnotes here it will tell you about some of the pipeline. Now I have a new place to play in – automating some of the framework management via NPM during the build phase, and running a few sed commands to update resulting paths in HTML content.

But my big wins are:

  1. You can’t hit https://cv.jamesbromberger.com/.htmllintrc any more (not that there was anything secure in there, but I like to be… tidy)
  2. Older versions of frameworks (bootstrap) are no longer lying around in /assets/bootstrap-${version}/.
  3. Its not costing me more timer or money when doing this tidy up, thanks to Lambda.

IPv6 for AWS Lambda connections (outbound)

Another step forward recently with the announcement that AWS Lambda now supports IPv6 for connections made from your lambda-executed code.

Its great to see another minor improvement like this. External resources that your service depends upon – APIs, etc – should now see connections over IPv6.

If you host an API, then you should be making it dual-stack in order to facilitate your clients making IPv6 connections, and avoiding things like small charged and complexity for using up scarce IPv4 addresses.

However, this is also useful if you’re trying to access private resources with in an AWS VPC.

VPC Subnets can be IPv4 only, dual-stack, or IPv6 only. Taking the IPv6 only approach permits you to provision vast numbers of EC2 instances, RDS, etc. Now your Lambda code can access those services directly without needing a proxy bottleneck) to do so.

At some stage, we’ll be looking at VPCs that are IPv6 only, with only API Gateways and/or Elastic Load Balancers being dual stack for external inbound requests.

Presumably Lambda will be dual stack for some time, but perhaps there is a future possibility that IPv6-only Lambda could be a thing – ditching the IPv4 requirement completely for use cases that support. Even then, having a VPC lambda connecting to an IPv6 only subnet, but with DNS64 and NAT64 enabled, would perhaps still permit backwards connectivity to IPv4 only services. It’s a few hops to jump through, but could be useful when there is very rare IPv4-only services being accessed from your code.

AWS SkillBuilder: moving to the new AWS Builder ID

For anyone learning AWS, then online learning platform that is Skillbuilder.aws is a well known resource. It took over from training.aws as the training platform several years ago, though the latter lives on in a very minor way – as a bridge between SkillBuilder and the AWS Certification portal.

Many individuals hold AWS certifications – personal I currently hold 9 of them. The majority of people have those certifications in an AWS Training & Certification account that is accessed via logging in via a personal login – something not linked to your employer. This is because certifications remain attached to the individual, not the employer.

Historically, this meant using Logon For Amazon, to use the same credentials you could potentially use on the Amazon.com retail platform. Yes, the same credentials that some people buy underpants with also links to your AWS Certifications.

In the AWS Partner space, individuals often end up needing to acquire AWS Accreditations; however they are not exposed on SkillBuilder.aws to individuals not linked to a recognised AWS Partner organisation. Instead, individuals much register in the AWS Partner portal, using a recognised email address domain that links to a given AWS partner entity.

From the partner portal, a user can then access SkillBuilder, using the AWS Partner Portal as a login provider, triggering the creation of a (second) SkillBuilder login – but this time linked as a Partner account.

This is clearly confusing, having multiple logins.

SkillBuilder has evolved, and now also offers paid subscriptions for enterprises. For that set of organisations, federated logins (SSO) are available.

Now, the situation is changing again for individuals. AWS has introduced their separate identity store, called the AWS Builder ID.

Login prompt from Skillbuilder, 2023

Luckily, this new AWS Builder ID does not create a separate identity within SkillBuilder, but adds an additional login for the account linked to your existing Login with Amazon identity.

Onboarding to this is easy. The process validates that you have the email address, and then you can simply use the AWS Builder ID login option where you used to use Login with Amazon.

I’d suggest if you are learning AWS, start using your AWS Builder ID to access SkillBuilder.

Software License Depreciation in a Cloud World

Much effort is spent on preserving and optimising software licenses when organisations shift their workloads to a cloud provider. It’s seen as a “sunk cost”, something that needs to be taken whole into the new world, without question.

However, some vendors don’t like their customers using certain cloud providers, and are making things progressively more difficult for those organisations that value (or are required) to keep their software stack well maintained.

Case in point, one software vendor who has their own cloud provider made significant changes to their licensing, removing rights progressively for customers to have the choice to run their acquired licences in a competitors cloud.

I say progressively, customers can continue to run (now) older versions of the software before that point in time the licensing was modified.

The Security Focus

Security in IT is a moving target. Three’s always better ways of doing something, and previous ways which, once were the best way, but are now deemed obsolete.

Let me give you a clear example: network encryption in flight. The dominant protocol used to negotiate this is called Transport Layer Security (TLS), and its something I’ve written about many times. There’s different versions (and if you dig back far enough, it even had a different name – SSL or Secure Sockets Layer).

Older TLS versions have been found to be weaker, and newer versions implemented.

But certain industry regulators have mandated only the latest versions be used.

Support for this TLS is embedded in both your computer operating system, and certain applications that you run. This permits the application to make outbound connections using TLS, as well as listen and receive connections protected with TLS.

Take a database server: its listening for connections. Unless you’ve been living under a rock, the standard approach these days is to insist on using encryption in flight in each segment of your application. Application servers may access your database, but only if the connection is encrypted – despite them sitting in the same data centre, possibly in the same rack or same physical host! It’s an added layer of security, and the optimisations done mean its rarely a significant overhead compared to the eavesdropping protection it grants you.

Your operating system from say 2019 or before may not support the latest TLS 1.3 – some vendors were pretty slow with implementing support for it, and only did so when you installed a new version of the entire operating system. And then some application providers didn’t integrate the increased capability (or a control to permit or limit the version of TLS) in their software in those older versions from 2019 or earlier.

But in newer versions they have fixed this.

Right now, most compliance programs require only TLS 1.2 or newer, but it is foreseeable that in future, organisations will be required to “raise the bar” (or drawbridge) to use only TLS 1.3 (or newer), at which time, all that older software becomes unusable.

Those licences become worthless.

Of course, the vendor would love you to take a new licence, but only if you don’t use other cloud providers.

Vendor Stickiness

At this time, you may be thinking that this is not a great customer relationship. You have an asset that, over time, will become useless, and you are being restricted from using your licence under newer terms.

The question then turns to “why do we use this vendor”. And often it is because of historical reasons. “We’ve always used XYZ database”, “we already have a site licence for their products, so we use it for everything”. Turns out, that’s a trap. Trying to smear cost savings by forcing technology decisions because of what you already have may preclude you from having flexibility in your favour.

For some in the industry, the short term goal is the only objective; they signa purchase order to reach an immediate objective, without taking the longer term view of where that is leading the organisation – even if that’s backing hem into a corner. They celebrate the short term win, get a few games of golf out of it, and then go hunting for their next role elsewhere, using the impressive short term saving as their report card.

A former colleague of mine once wrote that senior executive bonuses shouldn’t be paid out in the same calendar year, but delayed (perhaps 3 years) to ensure that the longer term success was the right outcome.

Those with more fortitude with change have, over the last decade, been embracing Open Source solutions for more of their software stack. The lack of licence restriction – and licence cost – makes it palatable.

The challenge is having the team who can not only implement potential software changes, but also support a new component in your technology stack. For incumbent operations and support teams, this can be an upskilling challenge; some wont want to learn something new, and will churn up large amounts of Fear, Uncertainty and Doubt (FUD). Ultimately, they argue it is better to just keep doing what we’ve always done, and pay the financial cost, instead of the effort to do something better.

Because better is change, and change is hard.

An Example

Several years ago, my colleagues helped rewrite a Java based application and change the database from Oracle, to PostgreSQL. It was a few months from start to finish, with significant testing. Both the Oracle and PostgreSQL were running happily on AWS Relational Database Service (RDS). The database was simple table storage, but the original application developers already had a site license for Oracle, and since that’s what they had, that’s what they’ll use.

At the end of the project, the cost savings were significant. The return on investment for the project services to implement the change was around 3 months, and now, years later, the client is so much better off financially. It changed the trajectory of the TCO spend.

The coming software apocalypse

So all these licences that are starting to hold back innovation are becoming progressively problematic. The time that security requirements tighten again, you’re going to hear a lot of very large, legacy software license agreements disintegrate.

Meanwhile, some clod providers can bundle the software licence into the hourly compute usage fee. If you use it, you pay for it; when you don’t use it, you don’t pay for it. if you want a newer version, then you have flexibility to do so. Or perhaps event to stop using it.

Time to minimise public IPv4 usage in the AWS Cloud

It was always going to happen. We’ve been watching the exhaustion of the 32 bit address space of IPv4 for more than 20 years, and we’ve had the solution available for even longer: IPv6.

I’ve written many times about IPv6 adoption and migration on this blog. I’ve spoken many times with colleagues about it. I’ve presented at AWS User Groups about using IPv6 in AWS. And when I worked at AWS 10 years ago, I championed that s a competitive advantage to IPv6 all the things where IPv4 was in use.

The adoption has been slow. Outside of the Cloud, ISP support has been mixed, depending if they have the engineering capability to uplift legacy networks, or not. Let’s be clear – those ISPs who removed their engineers, and minimise the innovation, are about to have a lot of work to do, or face tough conversations with customers.

For those that have already done the work, then this weeks AWS annoucement about the start of charging for public IPv4 address space from 2024 is a non-issue. For others, its going to start to mean some action.


Lets start with the basics; go have a read of the AWS Announcement: New – AWS Public IPv4 Address Charge + Public IP, posted 28 July 2023.

You’re back, ok, so at time of blogging, charges start in 2024. Currently, your first IPv4 assigned to an instance is not charged for, but soon it will be half a US cent per hour, or on a 744 hour month, US$3.72 a month. Not much, unless you have hundreds of them.

Selling an IPv4 netblock

In the last few years I helped a government agency “sell” an unused /16 IPv4 netblock for several million dollars. They had two of them, and had only ever used a few /24 ranges from their first block; the second block was not even announced anywhere. There was no sound plan for keeping them.

The market price to sell a large contiguous block of addresses keeps going up – 4 years ago it was around $22 per IPv4 address (and a /16 is 65,536 of them, so just over US$1.4M). Over time, large contiguous address blocks were becoming more valuable. Only one event would stop this from happening: when no one needed them any more. And that event was when the tipping point into the large spread (default) usage of IPv6, at which time, they drop towards worthless.

The tipping point just got closer.

Bringing it back to now

So with this announcement, what do we see. Well, this kind of sums it up:

Congratulations, your IPv6 migration plan just got a business case, AWS is now charging for v4 addresses. v6 is free, and the sky has finally fallen:

Nick Matthews @nickpowpow

There have been many IPv6 improvements over the years, but few deployments are ready to ditch IPv4 all together. Anything with an external deployment that only supports IPv4 is going to be a bit of a pain.

Luckily, AWS has made NAT64 and DNS64 available, which lets IPv6 only hosts contact IPv4 hosts.

The time has come to look at your business partners you work with – those you have API interfaces to, and have the IPv6 conversation. It’s going to be a journey, but at this stage, its one that some in the industry have been on since the last millennium (I used to use Hurricane Electric’s TunnelBroker IPv6 tunnelling service in the late 1990s from UWA for IPv6).

Looking at your personal ISP and Mobile/Cell provider

It’s also time to start to reconsider your home ISP and cell phone provider if they aren’t already providing you with real IPv6 addresses. I personally swapped home Internet provider in Australia several years ago, tired of the hollow promises of native IPv6 implementation from one of Australia’s largest and oldest ISPs, started by an industry friend of mine in Perth many years ago (who has not been associated with it for several years). When the ISP was bought out, many of the talented engineers left (one way or another), and it was clear they weren’t going to implement new and modern transport protocols any time soon.

Looking at your corporate IT Dept

Your office network is going to need to step up, eventually. This is likely to be difficult, as often corporate IT departments are understaffed when it comes to these kinds of changes. They often outsource to managed service providers, many of whom don’t look to the future to see what they need to anticipate for their customers, but minimise the current present cost to “keep the lights on”. This is because customers often buy on cost, not on quality or value, in which case, the smart engineers are elsewhere.

Your best hope is to find the few technically minded people in your organisation who have already done this, or are talking about this, and getting them involved.

Looking at your internet-facing services

There’s only one thing to do, ASAP: dual-stack everything that is [public] Internet facing. Monitor your integration partners for traffic that uses IPv4, and talk to them about your IPv6 migration plans.

Its worth watching for when organisations make this switch. There are many ways to do this.

For web sites and HTTP/HTTPS APIs, consider using a CDN that can sit in front of your origin server, and as the front-door to your service, can be dual stack for you. Amazon CloudFront has been a very flexible way to do this for years, but you must remember both steps in doing this:

  1. Tick the Enable IPv6 on the CloudFront distribution
  2. Add a record to your DNS for the desired hostname as an AAAA record, alongside the existing A record.

The Long Term Future

IPv4 will go away, one day.

It may be another 20 years, or it may now be sooner given economic pressures starting to appear. Eventually the world will move on past Vint Cerf’s experimental range that, from the 1970s, has outlasted all expectations. IPv4 was never supposed to scale to all of humanity. But its replacement, IPv6, is likely to outlast all of us alive today.


EDIT: Cross link to Greg Cockburn’s recent AWS IPv6 post, and Corey Quinn’s post on the topic.