Log4Shell and your apps in AWS

There’s a great XKCD cartoon entitled Depencency that cuts to the heart of today’s software engineering world: developers (and in turn organisations) everywhere love the use of libraries to accelerate their development efforts, particularly if that library of code is free to use, and typically that’s Open Source Free.

The image speaks about large complex systems, critical to organisations, needing the unpaid, thankless contributors of these libraries but upon whom everything relies.

In the last week, we’ve seen Log4J, a Java logging utility, come under such focus due to a critical remote code execution bug that can see the server side triggered to make outbound requests. A vast amount of Java based solutions for the last 15+ years has dependencies on logging messages being implemented using this library.

It’s lead to articles like “The Internet is on Fire” by the Australian Broadcasting Corporation, and thousands of posts on Infosec specific news sites (eg: (Kaspersky, Postswigger, SecurityScorecard, BleepingComputer).

Java is widely used, as Oracle corporation points out clearly:

3 Billion Devices Run Java – Oracle

There’s two sides to this: invalid requests coming in that should be handled with sensible data validation, and the resulting external requests that servers can be tricked into making.

Now I am not saying everyone should use their own logging library; that would be even more on fire. But we should stand ready to update these things rapidly, and we should help with either code contributions or financial donations (or both) to help improve this for the common good.

Untrusted Data Validation

Validating untrusted data sources is critical. The content of a local configuration file is vastly different from the query from the Internet. I’ve often joked about setting my Browser user-agent string to the EICAR test file content, used as a dummy value to trigger Antivirus software to match on this text.

In this case, we have remote attackers stuffing custom generated data strings in HTTP requests (and email and other sources that accept external traffic/data) to try and trick the Log4j library into processing and interpreting this data instead of just writing it to a log file.

Web servers always accept data from the Internet, and Web Application Firewalls can offer some protection, but in this case, the actual “string to check” can be escaped, making it harder to write simple rules that match.

Restricting outbound traffic

An attacker is often trying to get a better access into the systems they target; their initial foothold may be tentative. In this example, the ability to trick a target server to fetch additional data (payload) from an external service is key. There’s two main types of external data egress: direct, and indirect.

In the direct model, your server, which you installed and thus trust, may be running behind a firewall, but have you checked if you have restrictions on what it can fetch directly from the Internet?

In AWS, the default AWS Security Group for egress is to permit all traffic; this is a terrible idea, but is the element of least surprise for those new to the AWS VPC environment. It is strongly recommended that you pair this down for all applications, to end up with only the minimum network access you need, even when behind a (managed) NAT Gateway or routing rules, and even if you think your server only has internal network access.

I wrote a whitepaper on this topic for Modis in 2019 about Lateral Movement within the AWS VPC, and some of the concepts there are relevant now.

Your VPC-deployed virtual machine instance probably only needs to initiate connections to S3 on 443, and its database server on the local CIDR (address) range. For example, if you have three Subnets for databases:

  • (Databases in AZ-A)
  • (Databases in AZ-B)
  • (Databases in AZ-C)
  • (reserved for future expansion of Databases in a yet-to-be announced AZ-D)

… and are running MySQL (eg, RDS MySQL) in those AZs, then you probably want an egress rule on your Application Server/instance of (Note, be ready for making this all IPv6 in future). However, your inbound rule on the same group is probably referential to your managed Load Balancer, on port 443.

What about DNS and Time Sync?

If you have cut down your egress to just the two rules (HTTPS for S3 to bootstrap, CFN-init to signal ASG creation, and database traffic), what about things like DNS and Time. These are typically UDP based (ports 53, 123).

Indeed, the typical DNS firewall used for NTP, when syncing from external time services, is *:123 inbound and *:123 outbound. Ouch.

AWS Time Sync Service

The good news is you do not need to permit this in your security group rules IF you are using the AWS VPC provided Time Sync service and DNS Resolvers. These are available over the link-local network, and security groups do not restrict this traffic; hence can be left closed for UDP port 123.

This time service is also scalable; you don’t need to have thousands of hosts pointing at one or two of your own NTP servers; the AWS Time Sync service runs from the hyper-visor, so as you randomly add instances, you have more physical nodes (droplets) involved in provisioning this, so your time services scale.

Managed & Scalable DNS Resolution

DNS can be used for data exfiltration. If you run your own DNS resolver (eg, on a Windows Domain Controller(s) or Linux host(s) and set your DHCP to hand this resolver address to clients, then you may be at risk of not even seeing this happen. This is an indirect way of being exploited; your end server may not have access to egress to the Internet, but it can egress to your DNS resolver to… well, look up addresses. If you do run your own DNS server, you should be looking at the log of what is being looked up, and managing the process to match this against a threat list, and issuing warnings of potential compromise.

Managed DNS Security Checks: Guard Duty

If that’s too much effort, then there is a managed solution for this: AWS Guard Duty and the VPC-provided DNS resolver. In order for Guard Duty to inspect and warn on this traffic, you must be sending DNS queries via the VPC resolver. Turning on Guard Duty while not sending DNS traffic through the AWS provided service – for example, running your own root-resolving DNS server, means the warnings from Guard Duty will probably never trigger.

By contrast, having your self-managed resolver (eg Active Directory server) use the VPC resolver means that it is the one that will be reported upon when any other instance uses it as a resolver with a risky lookup! I’m sue that will be a mild panic.

Managed DNS Proactive Blocking: DNS Firewall

Going beyond simply retrospectively telling you that traffic happened is pro-actively blocking DNS traffic. Route53 DNS Firewall was introduced in 2021, using managed block lists for malicious domains. This gives some level of protection that clients (instances) will get a failed DNS lookup when trying to resolve these bad domains.

My Recommendations

So here’s the approach I tell my teams when using VPCs:

  1. Always use the link-local time Sync service; it scales, and reduces SPOFs and bad firewall rules.
  2. Always use the link-local DNS resolver; it scales. use a Resolver Rule if you need to then hook the DNS traffic up to your own DNS server (AD Domain Controller).
  3. Turn on Guard Duty, set up notifications of the Findings it generates.
  4. Turn on DNS Firewall to actively BLOCK DNS lookups for bad domains.
  5. Turn on your own Route53 query logs for yourself, with some retention period (90 days?)
  6. For inbound Web traffic, use a managed Web Application Firewall with managed rules, and/or scope your application to the country you’re intending to serve traffic to. In particular, block access to administrative URL paths that don’t come from trusted source ranges.
  7. Leverage any additional managed services that you can, so you minimise the hand-crafted solutions in your application.
  8. Template your workload, and implement updates from template automation; no local changes. Deploy changes rapidly using DevOps principles. Socialiase with your team/management the importance of full stack maintenance and least privilege access — including at the network layer ingress and egress — and schedule and prioritise time to include technical debt in each iteration, including the updating of every third party library in your app.
  9. If you have a DevOps pipeline with something like SonarQube or Whitesource, have it report on dependencies (libraries), and get reports on how out-of-date those libraries are, and/or if those out of date versions have known CVEs against them. Google Lighthouse (in the browser) does a great job of his for JavaScript web frameworks.

For this exploit you need to go widerthat what you run in cloud: your company printer (MFP), network security cameras, VoIP phones, UPS units, air-conditioners, Smart Hubs, TVs, Home Internet Gateways, and other devices will probably have an update. Your games console, and the games on it (this started from an update in Minecraft to address this and has… escalated quickly!). Even the physical on–prem firewalls and virtual appliances themselves – but ensure you don’t just do firewalls and ignore the larger landscape of equipment you have.

PS: I highly recommend my colleague Elliot Segler’s recent post entitled Learnings with AWS WAF and Log4Shell.

PPS: You may want to apply something like this to Apache:

 <IfModule mod_rewrite.c>
 RewriteEngine On
 RewriteCond %{THE_REQUEST} ^.*%7Bjndi:.* [NC]
 RewriteRule ^(.*)$ - [F,L]

CodeCommit: Mono Repo, Multiple Pipelines – part I: repackaging the repo

As an experiment, I have a CodeCommit repository that has a combination of CloudFormation Templates, and some static web content, checked in to two separate prefixes or folders: /Templates/ and /Website/.

What I am trying to do is, upon any commit to the repo, determine if the Website prefix needs an update, or the Templates have to trigger CFN Stack Update.

Starting with the most basic piece, I want the web content to go via CodePipeline, and unpack into an S3 Bucket, against which there is a CloudFront distribution pointing (with an Origin Access Identity already in place).

By default, an S3 unpack expects the entire repo to unpack into S3, but I want to only have a particular sub folder, so I’ve implemented a “repackage” step as a Lambda function in the pipeline, which grabs the Original Artifact the pipeline has, unpacks it, and then create as new Artifact containing just the folder /Website/ and below. Turned out to be around 50 lines of code in Python:

import json
import boto3
import os
import zipfile

def lambda_handler(event, context):
    if (event["CodePipeline.job"]["data"]["inputArtifacts"][0]["location"]["type"] != "S3"):
        return { 'statusCode': 500, 'body': json.dumps('Not on S3') }
    dl_filename = event["CodePipeline.job"]["data"]["inputArtifacts"][0]["location"]["s3Location"]["objectKey"].split('/')[-1]

    s3client = boto3.client('s3',
    with open("/tmp/" + dl_filename, 'wb') as data:
    with zipfile.ZipFile("/tmp/" + dl_filename, 'r') as zip:
    ul_filename = event["CodePipeline.job"]["data"]["outputArtifacts"][0]["location"]["s3Location"]["objectKey"].split('/')[-1]
    zipf = zipfile.ZipFile("/tmp/" + ul_filename, 'w', zipfile.ZIP_DEFLATED)
    for root, dirs, files in os.walk('.'):
        for file in files:
            zipf.write(os.path.join(root, file))
    #WARNING: CodePipeline artifacts may ave a default BucketPolicy requiring an explict KMS key. Remove that SSE requirement, turn on dfault encryption for the bucket.
        "/tmp/" + ul_filename,
        ExtraArgs={"ServerSideEncryption": "AES256"}
    client_cp = boto3.client('codepipeline')
    response_cp = client_cp.put_job_success_result(jobId=event["CodePipeline.job"][ "id"])
    return {
        'statusCode': 200,
        'body': json.dumps('Done repacking.')

This runs reasonably quickly, and means I am not unpacking the entire CodeCommit repo into my CloudFront distribution.

AWS Ambassador Program 2021: #2 in ANZ!

I’ve been tracking my blog posts and other “contributions” to the AWS developer community since 2017 when the program was originally called the AWS Cloud Warrior Program. This morphed into the Partner Ambassador program, for the top engineering talent in the partner community, and then became a global program.

You can find the ambassadors here. At the time of writing (Nov 1 2021), three are 227 people listed: 114 in APAC, 43 in Europe, 9 in LATAM, and 46 in North America.

I submitted some 28 items to the program in 2021 (until mid-October 2021), from Blog Posts to Case Studies, Open Source work, Event Hosting, and Certification Subject Matter Expert contributions.

This was enough to land me in the #2 position for 2021, as shared during the online Global Ambassador Summit recently – shown in this slide:

And while I sit here with 9 (of 11) AWS certifications (more set to launch during re:Invent), I don’t yet hold the coveted Gold Jacket for holding all available cert (which looks as loud and proud as you can imagine; I think I saw something similar in The Hangover movie).

Arjen and Ian are both amazing engineers; I am honoured to be considered amongst them in this program.

Sharing ideas and solutions has been core to my work in the technology field since I was at University when I first discovered open-source and then became a Debian Linux Developer. Indeed, as developers (and Sys Admins, these days deemed as DevOps Engineers) become more senior, sharing and mentoring becomes more of the job.

Occasionally I get feedback from people that my posts have helped them save time and find a solution quickly, or avoid problems. Often I find myself reading back my own posts years later, thanking younger-me for putting some notes online.

But as with most in this industry, we stand on the shoulders of others; the only right thing to do is to support those coming after us.

Thoughts on the IPv6 Transition

I’ve been discussing the IPv6 transition with our customers more recently; for over 3 years we’ve been dual-stack IPv4 and IPv6 for public-facing AWS-Cloud-based solutions and services for our customers.

So what?“, you’re thinking?

It’s worth noting that from Google’s numbers, global IPv6 is now approaching 36%, while at home in Australia 27%, helped by TelCo carriers like Telstra enabling IPv6 to their mobile phone subscribers, and advanced ISPs like Aussie Broadband and Internode making IPv6 trivial to enable.

Google IPv6 Adoption, as of 12/Oct/2021

I first had an IPv6 tunnel established to Hurrican Electric in 1999 when I worked for The University of Western Australia. I championed the adoption of IPv6 as a first-class citizen in the cloud when I worked at Amazon Web Services as a Solution Architect, and these days, a large majority of AWS public-facing services already support dual-stack approaches, and more are on the way.

As the next billion people come online, the unavailability of more existing IPv4 Internet is a limiting factor. The temporary value of the IPv4 address space, being reallocated (“sold“) between assignees will eventually presumably peak when a majority of clients (people) and the services they are accessing are all on IPv6.

I have been advising a government body, who had two IPv4 “Class B” sized IPv4 subnets allocated to them. Each of these subnets is a “/16” netblock (65,535 addresses); they had only ever used a handful of /24 ranges from within their first allocation.

Most services they use, both for staff and for public-facing services, now run on the cloud, from cloud-provider address space. They’re unlikely to need all of the address blocks they currently have from the first /16 block, let alone the second.

This netblock has a current value of a couple of million dollars (AUD).

It’s likely that many public sector agencies have IPv4 address netblocks that they’re unlikely to ever use, and could also benefit from reallocating to service providers desperate for their own address space to host solutions from.

Well, desperate until most clients are using IPv6.

I’d urge any public sector organisation to review their plans for using their address space, and if they have large unused, contiguous address space, consider reallocating that. The funds raised can then help with further modernisation of workloads – including those workloads to move to IPv6 addressing.

For any managed service providers, I would urge you to “dual-stack” all public-facing Internet services. You should continue to use strong encryption in flight, modern TLS protocols, and strong authentication, regardless of the network transport protocol version.

If you are using AWS CloudFront as a CDN in front of your origin service, then enable IPv6 in the CloudFront configuration, and then publish the corresponding AAAA DNS record just as you have to the A DNS record. Similar works if using CloudFlare, Akamai, Fastly or others.

For those who use managed service providers for their corporate business networking, ask why your work Internet connection is not dual-stacked already. It’s typically a configuration question, and rarely has any actual cost associate with it. If you have a corporate proxy service, then if it is dual-stacked, the clients (on your internal corporate network) already get some benefit of being able to talk to IPv6 services.

If you have DNS services, check they not only can serve IPv6 records (AAAA), but they are reachable using IPv6. Services like AWS Route53 have done this for years (see my earlier point about getting IPv6 as a first-class citizen within AWS).

While you’re looking at DNS, have a look at creating a simple CAA record, to list the Certificate Authorities you obtain certs from.

IoT and AWS IoT Core for Lorawan: Getting Started

Oskar loves sailing. He’s been doing it for a little over a year, and it’s the first time that he’s really taken to a sport. We’ve found a very inclusive mob of people around East Fremantle who are encouraging children to get into sailing, coupled with some awesome massively overqualified coaches (eg, State, National. and Olympic sailers) who are keen to see their little fleets of junior sailors take up the sport.

I’ve done my bit; I learnt to sail in a Mirror Dinghy, many many years ago, and in my late teen’s early twenties, learnt to sail a much larger, locally famous three-masted barquentine, Sail Training Ship Leeuwin. My formative late school years were spent around the B Sheds on the Fremantle wharf; I managed to sneak out of home to the ship as I had family with me: a second cousin who at the time was the permanent 1st mate (later captain). I used to crew, navigate, rig, and refit; the summer period in port we’d spend the time sleeping onboard taking shifts on the boat to keep it safe.

When you sail on the ship on a voyage, the young sailors are split into four Watches (teams): Red, Blue, Green, Yellow. When we were in port, and working the day down in the bilges, and guarding the ship at night, the small team would call ourselves the Black Watch. New Year’ s Eve we would even get the (small) canon out, for the strike of midnight. Yo ho ho!

Aaany ways… Oskar’s taken to sailing a small boat originally designed by the Bic pen company. This small skiff is basically a piece of hollowed plastic, a small sail, centreboard and rudder. It flips about as readily as a politician faced with the truth and facts, but luckily, as a flat piece of plastic, there’s no bailing and it rights easily. The design is now open and in a great riff on the fact that Bic started it, it’s now called the O’pen skiff. Heh – pen, get it?

Some of these kids can see when they are about to capsize, and calmy step out over the side just as the boat keels over, and have been known to seamlessly step onto the exposed centreboard, and counter the capsize!

I’ve done my bit, helping the coaches in the support boats (rigid inflatable boat or RIBs, or a classic tinny), which has mostly been about helping do running repairs, help tow stricken vessels, or swap kids in and out of boats (I’m not a soon-to-be Paris-2024 Olympian; I’ll let the pros do the instruction). But to become a little more useful, I sat the Skipper’s Ticket license (Dept Transport WA) to I can now drive the powerboats and not just be a passenger.

The Fleet of 6 – 9 boats also race in the Swan River by the East Fremantle Yacht Club. And thus, as parents stand around the shoreline at Hillary’s Marina a few weeks back, our children taking their 2-metre plastic, one parent says to me ” could we get real-time positioning and a map of their boats”.

And herein starts the rabbit hole of my first foray into IoT.

I first saw Lorawan at the AWS Summit in Sydney around 2016 or so. Back in the early 2000’s, I was playing with long-distance 802.11b, with cantennas (antennas made from large cans, and old commercial-sized coffee tin if I recall), and at one stage had a 17-metre antenna on my father’s Osborne Park factory roof, with an Apple Access Point, powered via PoE, rigged at the top. I’ve done a bit of networking over the years (I hold the AWS Certified Network – Specialty certification, and have contributed Items (questions) to it).

So now was the time to look at how do this log distance, low power, low throughput data now.


We want to get frequent (second or two) GPS location of between 5 and 20 boats. They’ll be travelling along the Swan River, mostly (occasionally a few coastal regattas). We want to have a map showing the location of all boats, and a tail of their last few moments, and last known speed and direction. We’ll then display this map in the public spaces around the venue.


Pretty quickly we zoomed in on the Dragino LGT-92, LoRaWAN GPS Tracker. It’s around AUD$100, and has a good battery life. It recharges by a micro USB port. It can be adjusted via a TTL serial interface (for which I don’t yet have a device to chat with it).

Noticing that I was not covered by any The Things Network (TTN) public gateways in my area, I also purchased a RAK7246 LoRaWAN Developer Gateway at $225 delivered (IoT Store Perth). And having seen the data rates I’d like to do, I’m glad I have my own gateway.


So how does the cloud come into this? Well, the gateway device is just one part of it; it’s effectively a data forwarder. There may be multiple gateways in my network to extend coverage; yes, they could be a mesh of device, or they could be separately homed to the Internet. Each Gateway registers against a Lorawan Network Server (LNS). It is the LNS that has the central configuration of gateways and end devices, and processes the data coming from them all.

I could deploy my own LNS, or I can use the AWS Managed version of it, and then trundle the data out to the application that I want to have consume it. At this point, that application is probably just DyanmoDB, with items containing the device unique identifier, timestamp, latitude, longitude, battery level, and firmware revision. And thus, the IoT Core for Lorawan.

Getting started

As an initial overview, thanks to Greg Breen from AWS, is this YouTube video in which Ali Benfattoum describes putting these together. This video from December 2020 is now slightly out of date with the AWS Console (things move pretty quickly), but you can follow along easily enough.

The first thing I did was update the installed Raspian. A new major release has come out, so an apt-get update && apt-get dist-upgrade is in order. Some CA certificates have expired (in the chain of) one of the repositories listed in /etc/apt/apt.sources.d/, so a little bit of work to get this amenable. A quick reboot (having updated the Raspian OS) and I dutifully pulled in git as described in the above video, cloned the Lorawan BasicStation, and built it (make).

I found that the Gateway device registered exactly as shown in the video, and showed up with no problems. However, my radio devices weren’t attaching. Well, turns out there was a process running on the gateway for The Things Network, which had exclusive access to the local Lora radio. So I stopped that process, repeated, and data flowed through. Knowing I didn’t want that TTN process to restart, I found its SystemD config file in /etc/systemd/, and removed it (well, copied it away to my home directory).

The first hurdle

I rebooted the device overnight, and the next day went to restart the basic station service from the command line. But no matter what, it couldn’t turn on the local Lorawan radio.

I lucked upon a post that suggested the radio have a GPO pin reset, and that it was either pin 25 or 17 that would do the trick. Hence, I had this small script that I called reset_gw.sh:

gpioset --mode=time --usec=500 pinctrl-bcm2835 17=1
gpioset --mode=time --usec=500 pinctrl-bcm2835 17=0

I ran this, and then the radio reset! Browsing through posts it appears that the basicstation doesn’t initalise or reset the radio; I can only presume that the TTN daemon did, and when I initially killed it and fired up basicstation, the radio was good to go. So rule now is reset the radio as part of the initalisation of basicstation; I found basicstation has support for a command line argument to call the above script.

Given I want basicstation to start and connect on boot, it needed its own startup script in /etc/system.d/system/:


ExecStart=/home/pi/basicstation/build-rpi-std/bin/station --radio-init=/home/pi/reset_gw.sh


Note I also put a symlink to this in /etc/systemd/system/multi-user.target.wants/.

The other optimisation I did was to go into the WiFi settings for this little device, in /etc/wpa_supplicant/. I want to list a few networks (and preshared keys/passwords) that I want the device to just connect to. Hence my /etc/wpa_supplicant/wpa_supplicant.conf file now looks like:

ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev

        ssid="22A Home"
        ssid="JEB Phone"

No, that’s not the real password or ssid. But the JEB Phone one means that if I take the gateway on the road, I can power it up (USB) and then have it tethered to my mobile phone to backhaul the data.

The Data Flows

Following the above demo, I now have data showing up. I long pressed the one and only button on it for 5 seconds, and this is what ends up on the IoT Topic:

  "WirelessDeviceId": "46d524e5-88f6-8852-8886-81c3b8f38888",
  "PayloadData": "AAAAAAAAAABPd2Q=",
  "WirelessMetadata": {
    "LoRaWAN": {
      "ADR": true,
      "Bandwidth": 125,
      "ClassB": false,
      "CodeRate": "4/5",
      "DataRate": "0",
      "DevAddr": "01838d9f",
      "DevEui": "a8408881a182fb39",
      "FCnt": 5,
      "FOptLen": 1,
      "FPort": 2,
      "Frequency": "917800000",
      "Gateways": [
          "GatewayEui": "b888ebfffe88f958",
          "Rssi": -48,
          "Snr": 11
      "MIC": "600d5102",
      "MType": "UnconfirmedDataUp",
      "Major": "LoRaWANR1",
      "Modulation": "LORA",
      "PolarizationInversion": false,
      "SpreadingFactor": 12,
      "Timestamp": "2021-10-05T12:41:44Z"

That’s a lot of metadata for the payload of “AAAAAAAAAABPd2Q=“. That’s 11 bytes, and behold, it has data embedded in it. I used the following python to decode it:

import base64
import sys

v = base64.b64decode(sys.argv[1])
lat_raw = v[0]<<24 | v[1]<<16 | v[2]<<8 | v[3]
long_raw = v[4]<<24 | v[5]<<16 | v[6] << 8 | v[7]
if (lat_raw >> 31):
  lat_parsed = (lat_raw - 2 ** 32) / 1000000
  lat_parsed = lat_raw/1000000

if (long_raw >> 31):
  long_parsed = (long_raw - 2 ** 32) / 1000000
  long_parsed = long_raw/1000000

alarm = (v[8] & 0x40) > 0
batV = ((v[8] & 0x3f)<<8 | v[9]) / 1000
motion = v[10]>>6
fw = 150 + (v[10] & 0x1f)
print("Lat: {}, Long: {}".format(lat_parsed, long_parsed))
print("Alarm: {}, Battery: {}, Motion mode:{}, Fw: {}".format(alarm, batV, motion, fw))

The end result of running this with the payload a parameter on the command line shows the result:

Lat: 0.0, Long: 0.0
Alarm: True, Battery: 3.959, Motion mode:1, Fw: 154

And that’s what we expected: the alarm button was depressed. and the documentation says that when this is the case, lat and long are set to zero on the initial packet sent in alarm state.

And so

Now that I have a gateway I can move around, reboot, and have it uplink on home wifi or mobile phone tether, I can wander around and then put the token out there. What I am up to next is to pull out that payload and push it to Dynamo. Stay tuned for the next update….