Frustrating IoT Devices!

I’ve been continuing my IoT journey, finding that IoT devices are a little fickle.

My first LGT92 GPS Tracker device failed back in 2021; and I tried contacting both the retailer (IoT Store Perth), and the manufacturer. I was instructed by the manufacturer to open the clamshell case and take numerous photos to send to them. They suggested a fault, and that they would organise a replacement, but after 6 months, nothings happened.

During that period, i ordered a second LGT92, and it failed on first use. I contacted IoT Store again – by webform, email, and phone, and after many weeks, spoke to “Sam”, who from the sound of it was on the phone in his car. While he said he would look into this (and the original, nothing came of it, and I tried following up several times.

I then tried to get an IP67 rated, solar power device; however, what IoT Store sent me had no solar panel or GPS tracker device, just a box with some wires and screws. Again I spoke with “Sam” (in his car again) having tried webforms, email and his mobile number multiple times, and again he said he’d follow up on it, and that’s been three months and no success.

So I’m never buying anything from IoT Store again, and I strongly advise against anyone else doing so. The customer service is terrible. Not one of the emails I’ve sent have been replied to. Not one of the contact me forms have been responded to. And when I have managed to speak with Sam, he is evasive, and does not follow up on the actions he says he’ll take.

Next up is the RAK Wireless 10700, a new GPS tracker device, again IP67 rated, with solar power. Released in 2022, these devices shipped from China after about 3 months, but without a battery that the solar panel would charge. I ordered a LiPO battery from Amazon.com.au, but naturally these had a different connector, so I find myself soldering again after 15 years.

But they do power up, with device firmware 1.0.4 installed. I connected a serial power and enter the AT command to dump the config: Dev EUI, App EUI and App Key.

I enter this into the AWS IoT Core device registration, and ensure thing slike the frequency are correct, but the device refuses to join the LoRaWAN network with the local gateway running basicstation (current build at this time), with the best log output from the basicstation gateway showing:

Mar 28 14:15:19 rak-gateway basicstation[12538]: 2022-03-28 13:15:19.629 [S2E:VERB] RX 917.0MHz DR2 SF10/BW125 snr=-14.8 rssi=-89 xtime=0x6900001BA46F94 - jreq MHdr=00 JoinEUI=ac1f:9ff:f915:4631 DevEUI=ac1f:9ff:fe06:7117 DevNonce=35258 MIC=1390227384

Mar 28 14:15:20 rak-gateway basicstation[12538]: 2022-03-28 13:15:20.093 [S2E:WARN] Unknown field in dnmsg - ignored: regionid

And the output on the tracker device showing:

+EVT:JOIN FAILED

Out of interest, the AT+STATUS shows (with some of the keys and addresses hidden with underscores):

Device status:
   Auto join enabled
   Mode LPWAN
   Network not joined
LPWAN status:
   Dev EUI AC1F09FFFE______
   App EUI AC1F09__________
   App Key AC1F09__________________________
   Dev Addr 26021F__
   NWS Key 323D155A000DF335307A16DA0C______
   Apps Key 3F6A66459D5EDCA63CBC4619CD______
   OTAA enabled
   ADR enabled
   Public Network
   Dutycycle disabled
   Send Frequency 2
   Join trials 2
   TX Power 0
   DR 3
   Class 0
   Subband 1
   Fport 2
   Unconfirmed Message
   Region AU915
LoRa P2P status:
   P2P frequency 916000000
   P2P TX Power 22
   P2P BW 125
   P2P SF 7
   P2P CR 1
   P2P Preamble length 8
   P2P Symbol Timeout 0

I did notice the documentation from RAKWireless says that firmware 1.0.1 supports LoRaWAN MAC version 1.0.2 (not the 1.0.3 that the LGT92 supported); and this version difference is defined in a device profile in AWS IoT Core for LoRaWAN.

What I also noticed was the documentation for the RAK 10700 at https://docs.rakwireless.com/Product-Categories/WisBlock/RAK10700/Datasheet/#software mentioned that the firmware version available is 1.0.1, so older than what shipped to me on the device:

+VER:1.0.4 Jan 14 2022 14:17:02

But, on that same documentation page, is a link to download for a firmware, but is unfortunately a 404!

So, my journey continues, but I’ve learnt a few lessons. The IoT device landscape seems… littered with failures. The quality, of commodity devices is low, the compatibility is bewildering, and the standards are evolving.

Transitioning to IPv6 in AWS

There are a large number of workloads that operate in the AWS Cloud using traditional virtual machines (Instances) on traditional IPv4 networking. And for the last few years, we’ve seen the steady growth in IPv6 adoption globally. For those who haven’t started this journey yet, here’s some notes on what you may want to look at as you start to embrace the future of the Internet.

It should be noted that this transition is a two way street:

  1. you need to get ready to offer your digital services to your clients over both IPv4 and IPv4 (Dual Stack)
  2. you need to have your dependant services you use to offer (listen) on an IPv6 address, and probably via a gradual transition via offering both IPv4 and IPv6 for a (long) period of time

Within your internal (to your VPC) network architecture you can use either network protocol: the initial focus needs to be on enabling your incoming traffic to use either IPv4 or IPv6.

Your transport layer security (TLS) should be identical on either network protocol. The IP protocol is just a transport protocol.

Here are the steps:

  1. VPC Changes
  2. Subnet Changes
  3. Load Balancers Changes
  4. Routing Changes
  5. Security Group Changes
  6. DNS Changes

VPC Configuration

Adding an IPv6 address block is reasonably simple in VPC. While you can allocate from your own assigned pool, its far easier to use the AWS pool; its ready to go and doesn’t need any other preparation.

There are three ways to add an IPv6 address allocation:

  • In the console, via ClickOps
  • Via the API (including the CLI)
  • Via the CloudFomation template that defines your VPC – highly recommended

Assigning the address block to the VPC does not actually use it, and should make zero impact to already running workloads. You should be safe to apply this at any time.

Subnet Configuration

Once the VPC has an allocation, we can then update existing subnets to also include an allocation from within the VPC’s range. The key difference we see here is that in IPv4 we can chose the size of the subnet, in IPv6 you cannot: every IPv6 allocation to a subnet is a /64, which is about 18 billion billion IP addresses.

You can undo an allocation if no Network interfaces (ENIs) are present in the subnet using those addresses.

The configuration is relativity simple: you get to those which slice of the VPC IPv6 address block will be used for which subnet. I follow a pretty simple rule: I anticipate that my VPCs will perhaps one day spread across 4 Availability Zones, so I allocate subnets sequentially across Availability zones in order to be able to reference the range via a supernet.

The reason for this is:

  • subnetting is done in powers of two: so for continuous addressing (supernetting) we’re looking at using two AZs, four AZs, or eight AZs, etc.
  • two availability zones is insufficient. If one fails, then I you are running on a single Availability Zone during the incident (which may last several hours). This AZ may be constrained in capacity, while other AZs may be underutilised. Hence we want to use three AZs to have fault tolerance able to be restored DURING a single AZ outage

Most Regions have between three and 5 AZs. Preparing for 8 in most Regions will be reserving address space we’ll likely never be allocating.

Hence, starting with public subnets, we want to sequentially allocate them with space to accommodate four AZs. These allocations are a hexadecimal number between 00 and FF – and hence a 256 limit on the total number of subnets. If we recall the four AZ allocation, then that’s 64 sets of Subnets across all AZs.

Again, you can allocate these by:

  • Click Ops in the console on each existing subnet (or when creating new subnets)
  • API call (including the CLI)
  • CloudFormation template – recommended – in which case, look at the Fn::Cidr to calculate the allocation. Check out my post form March 2018 on this.

If your focus is to start with your services being dual-stack available, then the only subnets you need to allocate initially are the Public Subnets: the subnets where your client facing (internet facing) load balancers are.

Once again, there’s no interruption to existing traffic during this change; indeed you’re less than half way through the required changes.

You may also allocate the rest of your private subnets at this time if you wish.

Routing Changes

For public subnets to function, they need a route for the default IPv6 address via the existing Internet Gateway (IGW). This looks like “::/0”, and when pointing to the IGW, it permits two way traffic just like IPv4. Your set of public subnets will need this route, and this can be done at any time: permitting IPv6 routing wont start clients using it.

If you have private subnets with IPv6 allocations, and you want them to be able to make outbound requests on IPv6 to the Internet, then you may want to consider an Egress Only IGW as the destination for “::/0” for private subnets. Note your public subnets still will use the standard IGW.

The Egress only IGW resource does what it says, and supplants the need for NAT Gateway as used in IPv4 (more on NAT GW later).

Again, you can add the Egress Only IGW and the Routing changes in several ways:

  • Click Ops on the console
  • Via the API (including the CLI)
  • In your CloudFormation template for your VPC – recommended

Load Balancer Changes

Now you have public load balancers in public subnets that have IPv6 available, you can modify your load balancer to have it get an IPv6 address. This is yet another action that will have no impact on current traffic.

You can modify the existing load balancers by:

  • Click ops on the console
  • An API call (including the CLI)
  • In your CloudFormation template for your Workload – recommended

Security Group Changes

Now we’re down the the last two items. By default, your security group is closed unless you have made changes. Your typical load balancer will be listening on TCP 80 and/or 443 for web traffic, and be open to the entire [IPv4] Internet with a source of 0.0.0.0/0.

To enable this security group for IPv6, we add a set of rules for source of ::/0 for the same ports you have for IPv4 (typically 80 and 443 for web traffic, different for other protocols).

Its at this time you can now test connectivity to your load balancer using IPv6 end-to-end – assuming you have another end on the IPv6 Internet somewhere.

If your workstation/cellphone is using IPv6, then you could browse to IPv6 address – but you’ll probably get a certificate warning as the name in the certificate doesn’t match the raw IP address.

If you’re not familiar yet, this should also be a CloudFormation template update.

DNS Changes

This is when we announce to the world that your service can be accessed with IPv6. You want to make sure you have done the above test to ensure you can connect, as this is the final piece in the puzzle.

Typically a custom DNS name for a load balancer is a Route53 ALIAS record of type A (Address). The customer DNS name is what also appears in any TLS Certificates.

To finally flick the switch on IPv6, you add an additional Route53 ALIAS record of type AAAA (four As), with the destination being the same as you have used for the existing Alias A record (one A).

You should now be able to check that you can resolve your service using the dnslookup utility. From a command prompt or Powershell, type:

  • nslookup -type AAAA my.custom.load.balancer.name
  • nslookup -type A my.custom.load.balancer.name

Your Dependencies

Now you’re up and running, you need to think about the services you depend upon. Services within your VPC, such as RDS, require AWS to enable these to be dual stack. Some services already are, such as the Link-Local MetaData service, Time Sync Service and VPC DNS resolver (note: always use the DNS resolver).

Some services will be outside of your VPC but still AWS-run, like SQS, and S3: in which case, look to use VPC Endpoints to communicate with them.

But other third party resources across the Internet may be stack back on IPv4. if you have an EC2 Linux Instance then its sometimes worth running a TCPDUMP to inspect the traffic you see using IPv4. A command like tcpdump ip and port not 22 may be useful. You can extend that to also exclude HTTP/HTTPS traffic with tcpdump ip and port not 22 and port not 80 and port not 443. Remember, your service port on your instance may be a different number on the inside of your network.

You’ll need to ask your dependencies to include dual-stack support on their services. In the mean time, you’ll be having to fall back to using IPv4 from your network to communicate with these dependencies. There’s two ways this can happen:

  1. If the subnet with your EC2 instance in it is dual-stack, hen the host can use an IPv4 connection itself, possibly via a NAT Gateway to communicate with the external IPv4 dependency
  2. If the subnet with your EC2 instance is IPv6 only (which is rather new), then the subnet can be configured to use DNS64 addressing (a subnet level configuration), and can route its traffic via the NAT GW, which will translate from IPv6 on the VPC-internal network, to IPv4 across the Internet (and back). See this.

Moving to IPv6 only internal networks is a long term goal, probably in the order of half a decade or so. A number of additional AWS updates will be needed before this becomes a default.

Additional IPv6 Notes in AWS

In this transition period (which has been going for nearly 25 years), you’re going to find stuff that silently falls back to IPv4. With host able to simultaneously have two addresses (IPv4 A, and IPv6 AAAA), then things that look them up can have a choice. For more things this is the newer AAAA, with a fall-back to A if needed (see the Happy Eyeballs RFC).

However, at this time (Mar 2022), CloudFront still preferences IPv4 origins when the origin is dual-stack. CloudFront also still uses TLS 1.2 instead of the newer and faster TLS 1.3, and HTTP/1.1 instead of the slightly more efficient HTTP/2 request protocol.

AWS IoT core exposes IPv4 endpoints, which is unusual as a key element of IoT is having millions of devices connected, a situation best served by IPv6.

Similar considerations exist for Route53 Health Checks, and others.

Summary

If you’re thinking this is all very new in cloud, you’d be mistaken. I was transitioning customer environments (including production) in AWS to dual stack in 2018 – four years ago. I’ve been dual-stack for my home Internet connection since I swapped to Aussie Broadband (I churned away from iiNet, who once had an IPv6 blog and strong implementation plans).

For several years, Australia’s dominant telco, Telstra, has had IPv6 dual stack for its consumer mobile broadband, something that the other players like Optus are yet to enable.

But these changes are inevitable.

The future is here, its just not evenly distributed.

IoT and AWS IoT Core for Lorawan: Getting Started

Oskar loves sailing. He’s been doing it for a little over a year, and it’s the first time that he’s really taken to a sport. We’ve found a very inclusive mob of people around East Fremantle who are encouraging children to get into sailing, coupled with some awesome massively overqualified coaches (eg, State, National. and Olympic sailers) who are keen to see their little fleets of junior sailors take up the sport.

I’ve done my bit; I learnt to sail in a Mirror Dinghy, many many years ago, and in my late teen’s early twenties, learnt to sail a much larger, locally famous three-masted barquentine, Sail Training Ship Leeuwin. My formative late school years were spent around the B Sheds on the Fremantle wharf; I managed to sneak out of home to the ship as I had family with me: a second cousin who at the time was the permanent 1st mate (later captain). I used to crew, navigate, rig, and refit; the summer period in port we’d spend the time sleeping onboard taking shifts on the boat to keep it safe.

When you sail on the ship on a voyage, the young sailors are split into four Watches (teams): Red, Blue, Green, Yellow. When we were in port, and working the day down in the bilges, and guarding the ship at night, the small team would call ourselves the Black Watch. New Year’ s Eve we would even get the (small) canon out, for the strike of midnight. Yo ho ho!

Aaany ways… Oskar’s taken to sailing a small boat originally designed by the Bic pen company. This small skiff is basically a piece of hollowed plastic, a small sail, centreboard and rudder. It flips about as readily as a politician faced with the truth and facts, but luckily, as a flat piece of plastic, there’s no bailing and it rights easily. The design is now open and in a great riff on the fact that Bic started it, it’s now called the O’pen skiff. Heh – pen, get it?

Some of these kids can see when they are about to capsize, and calmy step out over the side just as the boat keels over, and have been known to seamlessly step onto the exposed centreboard, and counter the capsize!

I’ve done my bit, helping the coaches in the support boats (rigid inflatable boat or RIBs, or a classic tinny), which has mostly been about helping do running repairs, help tow stricken vessels, or swap kids in and out of boats (I’m not a soon-to-be Paris-2024 Olympian; I’ll let the pros do the instruction). But to become a little more useful, I sat the Skipper’s Ticket license (Dept Transport WA) to I can now drive the powerboats and not just be a passenger.

The Fleet of 6 – 9 boats also race in the Swan River by the East Fremantle Yacht Club. And thus, as parents stand around the shoreline at Hillary’s Marina a few weeks back, our children taking their 2-metre plastic, one parent says to me ” could we get real-time positioning and a map of their boats”.

And herein starts the rabbit hole of my first foray into IoT.


I first saw Lorawan at the AWS Summit in Sydney around 2016 or so. Back in the early 2000’s, I was playing with long-distance 802.11b, with cantennas (antennas made from large cans, and old commercial-sized coffee tin if I recall), and at one stage had a 17-metre antenna on my father’s Osborne Park factory roof, with an Apple Access Point, powered via PoE, rigged at the top. I’ve done a bit of networking over the years (I hold the AWS Certified Network – Specialty certification, and have contributed Items (questions) to it).

So now was the time to look at how do this log distance, low power, low throughput data now.

Requirements

We want to get frequent (second or two) GPS location of between 5 and 20 boats. They’ll be travelling along the Swan River, mostly (occasionally a few coastal regattas). We want to have a map showing the location of all boats, and a tail of their last few moments, and last known speed and direction. We’ll then display this map in the public spaces around the venue.

Hardware

Pretty quickly we zoomed in on the Dragino LGT-92, LoRaWAN GPS Tracker. It’s around AUD$100, and has a good battery life. It recharges by a micro USB port. It can be adjusted via a TTL serial interface (for which I don’t yet have a device to chat with it).

Noticing that I was not covered by any The Things Network (TTN) public gateways in my area, I also purchased a RAK7246 LoRaWAN Developer Gateway at $225 delivered (IoT Store Perth). And having seen the data rates I’d like to do, I’m glad I have my own gateway.

Cloud

So how does the cloud come into this? Well, the gateway device is just one part of it; it’s effectively a data forwarder. There may be multiple gateways in my network to extend coverage; yes, they could be a mesh of device, or they could be separately homed to the Internet. Each Gateway registers against a Lorawan Network Server (LNS). It is the LNS that has the central configuration of gateways and end devices, and processes the data coming from them all.

I could deploy my own LNS, or I can use the AWS Managed version of it, and then trundle the data out to the application that I want to have consume it. At this point, that application is probably just DyanmoDB, with items containing the device unique identifier, timestamp, latitude, longitude, battery level, and firmware revision. And thus, the IoT Core for Lorawan.

Getting started

As an initial overview, thanks to Greg Breen from AWS, is this YouTube video in which Ali Benfattoum describes putting these together. This video from December 2020 is now slightly out of date with the AWS Console (things move pretty quickly), but you can follow along easily enough.

The first thing I did was update the installed Raspian. A new major release has come out, so an apt-get update && apt-get dist-upgrade is in order. Some CA certificates have expired (in the chain of) one of the repositories listed in /etc/apt/apt.sources.d/, so a little bit of work to get this amenable. A quick reboot (having updated the Raspian OS) and I dutifully pulled in git as described in the above video, cloned the Lorawan BasicStation, and built it (make).

I found that the Gateway device registered exactly as shown in the video, and showed up with no problems. However, my radio devices weren’t attaching. Well, turns out there was a process running on the gateway for The Things Network, which had exclusive access to the local Lora radio. So I stopped that process, repeated, and data flowed through. Knowing I didn’t want that TTN process to restart, I found its SystemD config file in /etc/systemd/, and removed it (well, copied it away to my home directory).

The first hurdle

I rebooted the device overnight, and the next day went to restart the basic station service from the command line. But no matter what, it couldn’t turn on the local Lorawan radio.

I lucked upon a post that suggested the radio have a GPO pin reset, and that it was either pin 25 or 17 that would do the trick. Hence, I had this small script that I called reset_gw.sh:

#!/bin/sh
gpioset --mode=time --usec=500 pinctrl-bcm2835 17=1
gpioset --mode=time --usec=500 pinctrl-bcm2835 17=0

I ran this, and then the radio reset! Browsing through posts it appears that the basicstation doesn’t initalise or reset the radio; I can only presume that the TTN daemon did, and when I initially killed it and fired up basicstation, the radio was good to go. So rule now is reset the radio as part of the initalisation of basicstation; I found basicstation has support for a command line argument to call the above script.

Given I want basicstation to start and connect on boot, it needed its own startup script in /etc/system.d/system/:

[Unit]
Description=basicStation

[Service]
WorkingDirectory=/home/pi
Environment=RADIODEV=/dev/spidev0.0
ExecStart=/home/pi/basicstation/build-rpi-std/bin/station --radio-init=/home/pi/reset_gw.sh
SyslogIdentifier=basicstation
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Note I also put a symlink to this in /etc/systemd/system/multi-user.target.wants/.

The other optimisation I did was to go into the WiFi settings for this little device, in /etc/wpa_supplicant/. I want to list a few networks (and preshared keys/passwords) that I want the device to just connect to. Hence my /etc/wpa_supplicant/wpa_supplicant.conf file now looks like:

ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
update_config=1
country=AU

network={
        ssid="22A Home"
        psk="Password"
        priority=1
}
network={
        ssid="JEB Phone"
        psk="AnotherPasswoed"
        priority=2
}

No, that’s not the real password or ssid. But the JEB Phone one means that if I take the gateway on the road, I can power it up (USB) and then have it tethered to my mobile phone to backhaul the data.

The Data Flows

Following the above demo, I now have data showing up. I long pressed the one and only button on it for 5 seconds, and this is what ends up on the IoT Topic:

{
  "WirelessDeviceId": "46d524e5-88f6-8852-8886-81c3b8f38888",
  "PayloadData": "AAAAAAAAAABPd2Q=",
  "WirelessMetadata": {
    "LoRaWAN": {
      "ADR": true,
      "Bandwidth": 125,
      "ClassB": false,
      "CodeRate": "4/5",
      "DataRate": "0",
      "DevAddr": "01838d9f",
      "DevEui": "a8408881a182fb39",
      "FCnt": 5,
      "FOptLen": 1,
      "FPort": 2,
      "Frequency": "917800000",
      "Gateways": [
        {
          "GatewayEui": "b888ebfffe88f958",
          "Rssi": -48,
          "Snr": 11
        }
      ],
      "MIC": "600d5102",
      "MType": "UnconfirmedDataUp",
      "Major": "LoRaWANR1",
      "Modulation": "LORA",
      "PolarizationInversion": false,
      "SpreadingFactor": 12,
      "Timestamp": "2021-10-05T12:41:44Z"
    }
  }
}

That’s a lot of metadata for the payload of “AAAAAAAAAABPd2Q=“. That’s 11 bytes, and behold, it has data embedded in it. I used the following python to decode it:

#!/usr/bin/python3
import base64
import sys

v = base64.b64decode(sys.argv[1])
lat_raw = v[0]<<24 | v[1]<<16 | v[2]<<8 | v[3]
long_raw = v[4]<<24 | v[5]<<16 | v[6] << 8 | v[7]
if (lat_raw >> 31):
  lat_parsed = (lat_raw - 2 ** 32) / 1000000
else:
  lat_parsed = lat_raw/1000000

if (long_raw >> 31):
  long_parsed = (long_raw - 2 ** 32) / 1000000
else:
  long_parsed = long_raw/1000000

alarm = (v[8] & 0x40) > 0
batV = ((v[8] & 0x3f)<<8 | v[9]) / 1000
motion = v[10]>>6
fw = 150 + (v[10] & 0x1f)
print("Lat: {}, Long: {}".format(lat_parsed, long_parsed))
print("Alarm: {}, Battery: {}, Motion mode:{}, Fw: {}".format(alarm, batV, motion, fw))

The end result of running this with the payload a parameter on the command line shows the result:

Lat: 0.0, Long: 0.0
Alarm: True, Battery: 3.959, Motion mode:1, Fw: 154

And that’s what we expected: the alarm button was depressed. and the documentation says that when this is the case, lat and long are set to zero on the initial packet sent in alarm state.

And so

Now that I have a gateway I can move around, reboot, and have it uplink on home wifi or mobile phone tether, I can wander around and then put the token out there. What I am up to next is to pull out that payload and push it to Dynamo. Stay tuned for the next update….

Scalable, secure, static web sites in AWS

Hosting web content has a mainstay of AWS for many years. From running your own Virtual Machine, with your favourite web software, to load balancing web traffic, DNS from Route53 and CDN from CloudFront, it’s been one of the world’s preferred ways to publish content for over a decade.

But these days, it’s the Serverless suite of services that help make this much cheaper, faster, more scalable, and repeatable. In this article, we’ll show how we can host a vast number of websites. We’ll also set a series of security features to try to get as secure and available as possible, even though we’ll be allowing anonymous access.

In a future post, we’ll dive through setting up a complete CI/CD pipeline for the content of your websites, with Production and non-production URLs for workflow review and acceptance.

High Level Features

  1. No application servers to manage/patch/scale
  2. Highly scalable
  3. Globally available (cached)
  4. IPv4 and IPv6 (dual-stack)
  5. HTTP/2 (request multiplexing, and compressed request headers)
  6. Brotli compression, alongside gzip/deflate
  7. TLS 1.2 minimum; strong rating on SSLLabs.com
  8. Modern security headers: strong rating on securityheaders.com

Basic Architecture

The basic architecture of the content is:

  • An S3 bucket to host our S3 Access Logs (from the below content bucket) and the CloudFront Access Logs we will be making
  • An S3 Bucket to host the file (object) content
  • A CloudFront distribution, with an Origin Access Identity to securely fetch content from S3.
  • A TLS certificate, issued from Amazon Certificate Manager with DNS validation
  • DNS in Route53 (not strictly necessary, but it makes things easier if we have control of our own domain immediately, and we can handle CloudFront at the APEX of a DNS domain (ie, foo.com) with ALIAS records)

While there is a lot to configure here there are no Servers to administer, per sé. This means the scaling, OS patching, and all other maintenance activities are managed – so we can get on with the content.

A Canonical URL

It is strongly recommended to have one hostname for your website. While you can have multiple names in a TLS certificate and serve the same content, you’ll get down-weighted in search engines for doing so, and it’s confusing to users.

In particular, you need to decide if the URL your users should get your content from is www.example.com, or just example.com. Choose one, and stick to it; the other should be a redirect if you need to (as a separate, almost empty, website). Indeed, there’s a CloudFront Function or Lambda@edge function you can write to do your redirects.

Don’t be tempted to use an S3 Bucket for your web redirections, as there’s a limit on the number of S3 Buckets you can have, and you can’t customise the TLS certificate or TLS profile (protocols, ciphers) on S3 website endpoints directly.

S3 Logging Bucket

This is the destination of all our logs. The key element is the automated retention (S3 lifecycle) policy – we want logs, but we don’t want them forever! Some keys points:

  • S3 Versioning enabled
  • S3 Lifecycle policy, delete current objects after 365 days, and previous revisions after 7 days (just in case we have to undelete).
  • Default encryption, Amazon S3 master-key (SSE-S3)
  • Ironically, probably no server access logging for this Bucket; otherwise if we log server access to the same bucket, we end up with an infinite event trigger loop
  • Permissions: Block Public Access
  • Object ownership: Bucket Owner preferred
  • Permit CloudFront to log, using the canonical ID shown here
  • Permit S3 logging for the Log Delivery permission

S3 Content Bucket

Again we want to Block Public Access. While that may sound counter-intuitive for a public-facing anonymously accessible website, we do not want external visitors poking around in our S3 Bucket directly – they have to go via the CloudFront Distribution.

S3 does have a (legacy, IMHO) website hosting option, but it hasn’t traditionally given you access to have a custom TLS certificate with your own hostname, nor permitted you to restrict various compression and TLS options – that’s what CloudFront lets us customise.

The basic configuration of the Content S3 Bucket is:

  • S3 Versioning enabled (hey, it’s pretty much a standard pattern)
  • S3 Lifecycle Policy, to only delete Previous revisions after a period we’d use for undelete (7 days)
  • Default encryption, Amazon S3 master-key (SSE-S3)
  • Access logs sent to the above Logging Bucket, with a prefix of /S3/content-bucket-name/. Note to include the trailing slash in the prefix name, otherwise, you’ll have a horrible mess of object names
  • Permissions: Block Public Access (CloudFront Origin ID will take care of this)
  • We’ll come back later for the Bucket Policy…

ACM Certificate

The next component we need to start with is a TLS Certificate; we’ll need to be already available when we set up a CloudFront distribution.

ACM is pretty simple: tell it the name (or names) you want on a certificate, and then ensure the validation steps happen.

My preference is DNS validation: once the secret is stored in DNS, then subsequent re-issues of the certificate get automatically approved, and then automatically deployed.

Ideally, your website will have one, and only one, authoritative (canonical) DNS hostname. And for that service, you may want to have just one name in the certificate. It’s up to you if you want the name to be “www.domain.com”, or just “domain.com”. I would avoid having certificates with too many alternate names, as any one of those names having its DNS secret removed will block the re-issuance of your certificate.

Lambda@Edge

There are two major functions we’ll use Lambda@Edge: one to transform some incoming requests, and one to inject some additional HTTP headers into the response.

All Lambda@Edge functions need to be created in us-east-1; and the CloudFront service needs access to invoke them.

Handling the default document in sub-prefixes

CloudFront as a CDN has the concept of a default object, a file name that can be fetched when no filename is supplied. Historically (as in, before IIS existed), this was index.html (if you’re wondering index.htm came about, then you probably don’t recall Microsoft DOS and Windows with its 8.3 filename limits). However, the configuration setting only applies to one request URL: the root object, or “/”. It does not cater for “subdirectories” or folders, which is often not what’s needed; in which case, when a path of “/foo/”. is requested, then we want to update the request that will hit the origin (S3) to “/foo/index.html”., and mask the fact we’ve done this.

As of May 2021, CloudFront also has a new code execution service, CloudFront Functions. This would be suitable for this purpose as well.

Here’s a simple Node.JS to achieve this:

const path = require('path')
exports.handler = (event, context, callback) => {
const { request } = event.Records[0].cf
const url = request.uri;
const extension = path.extname(url);
if (extension && extension.length > 0){
return callback(null, request);
}
const last_character = url.slice(-1);
if (last_character === "/"){
return callback(null, request);
}
const new_url = ${url}/;
console.log(Rewriting ${url} to ${new_url}...);
const redirect = {
status: '301',
statusDescription: 'Moved Permanently',
headers: {
location: [{ key: 'Location', value: new_url}],
},
};
return callback(null, redirect);

};

Injecting HTTP Security Headers

The second function we will want is to inject additional HTTP headers to help web clients (browsers) to enforce stricter security. There’s a set of headers that do this, some of which need customising to your site and code:

'use strict';
exports.handler = (event, context, callback) => {
function add(h, k, v) {
h[k.toLowerCase()] = [ { key: k, value: v } ];
}
const response = event.Records[0].cf.response;
const requestUri = event.Records[0].cf.request.uri;
const headers = response.headers;
add(headers, 'Strict-Transport-Security', "max-age=31536000; includeSubdomains; preload");
add(headers, 'Content-Security-Policy', "default-src 'self'; img-src 'self' data: ; script-src 'self' 'unsafe-inline' 'unsafe-eval' ; style-src 'self' 'unsafe-inline'; object-src 'none'; frame-src 'self' ; connect-src 'self' ; frame-ancestors 'none' ; font-src 'self'; base-uri 'self'; manifest-src 'self'; prefetch-src 'self' ; form-action 'self' ;");
add(headers, 'X-Content-Type-Options', "nosniff") ;
add(headers, 'X-Frame-Options', "DENY") ;
add(headers, 'Referrer-Policy', "same-origin") ;
add(headers, 'Expect-CT', "enforce, max-age=7257600") ;
add(headers, 'Permissions-Policy', "geolocation=(), midi=(), notifications=(), push=(), sync-xhr=(self), microphone=(), camera=(), magnetometer=(), gyroscope=(), speaker=(), vibrate=(), fullscreen=(), payment=(), autoplay=(self)");
delete headers['server'];
if (requestUri.startsWith('/assets/')) {
add(headers, 'Cache-Control', 'max-age=15552000');
} else if (requestUri.endsWith(".jpg")) {
add(headers, 'Cache-Control', 'max-age=1209600');
} else if (requestUri.endsWith('.html')) {
add(headers, 'Cache-Control', 'max-age=43200');
}
callback(null, response);
};

The exact headers that are recommended changeover time, as the state of capability in the commonly deployed (and updated) browsers change.

The most important header is the HSTS, or Hypertext Strict Transport Security, which informs clients that your service on this hostname should always (for the time period specified) be considered HTTPS only.

Next on my list of security headers is the Permissions Policy, formerly the Feature Policy. This administratively disables some capability that browsers can surface to web applications, such as the ability to fetch fine-grained location or use a device’s camera. Typically we don’t want any of this, and we probably wouldn’t want any introduced JavaScript (possibly coming from a 3rd party site) to try this.

The most specific header, which truly needs customising to your site’s content and structure, is the Content Security Policy, or CSP. This permits you to express in great detail the permitted sources for content to be loaded from, as well as where your content can be embedded into (as iframe content in another page), or what it can embed (as iframe content within your page).

As of May 2021, CloudFront also has a new code execution service, CloudFront Functions. However, this would have to be executed every time an object is served to a client, as at this time, CloudFront Functions can not hook into the request life cycle at the Origin Response phase. The difference is important: these static headers can be executed once and attached to a cached object, and then served an infinite number of times.

CloudFront Origin Identity & S3 Content Bucket Policy

An Origin Access Identity is a way to permit CloudFront edge locations to make authenticated calls against an S3 Bucket, using credentials that are fully managed, dynamic, and secure.

An Origin Access Identity has one attribute, a “comment”., which we’ll call “Website-Bucket-Access”. In response, we’ll get an ID, as shown here:

We can now go back to the S3 console, and update our Content Bucket with a Policy that permits this ID to be able to Get objects (it only needs Get, not List, Put or anything else).

{
  "Version": "2008-10-17",
  "Id": "PolicyForCloudFrontPrivateContent",
  "Statement": [
    {
      "Sid": "1",
      "Effect": "Allow",
      "Principal": {
"AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity E2VOSAJS533EMJ"
    },
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::my-bucket-for-websites/*"
    }
  ]
}

CloudFront Distributions

Each web site requires its own CloudFront distribution, with a unique origin path within our S3 Content Bucket that it will fetch its content from. It will also have a unique Hostname, and a TLS certificate.

In order to facilitate some testing, we’re going to define two Distributions per web site we want: one for Production, and one for Testing. That way we can push content to the first S3 Bucket, ensure that it is correct, and then duplicate this to the second (production) location.

To make this easier, we’re going to use the following prefixes in our Content S3 Bucket:

  • /testing/${sitename}
  • /production/${sitename}

For the two distributions, we’ll create one of test.sitename, and the production one with just the target sitename we’re after.

In this case, we’re using the same AWS account for both the non-production and production URLs; we could also split this into separate AWS accounts (and thus duplicate a separate S3 bucket to hold content, etc). We can also add additional phases: development, testing, UAT, production. One deciding factor is how big a team is working on this: if it is just one individual, two levels (testing, production) is probably enough; if a separate team will review and approve, then you probably need an additional development environment to keep working while a test team reviews the last push.

Here’s the high level configuration of the CloudFront distribution configuration:

  • Enable all locations – we want this to be fast everywhere.
  • Enable HTTP/2 – this compresses the headers in the request, and permits multiplexing of multiple requests over the one TCP connection
  • Enable IPv6 as well as IPv4 – significant traffic silently falls back to IPv4, and the deployment is easy, fast, and doesn’t cost anything. Note that you need to create both an A record in DNS, and an AAAA record (ALIAS in Route53) for this; just ticking the IPv6 option here (or in the template) does not make this work by itself.
  • For the default behaviour, set up an View Request handler for the default document rewrite lambda in US East, and the Security Header injection on Origin Response.
  • Set logging to the S3 log bucket, in a prefix of “CloudFront/${environment}/${sitename}
  • Enable compression
  • Redirect all requests to HTTPS; one day in a few years time this wont be necessary, but for now….
  • Only permit GET and HEAD operations
  • Set the Alternate Domain name to the one in your ACM certificate, and assign the ACM certificate to this distribution

Template the steps

In order to make this as efficient as possible, and support maintenance in a scalable way, we’re going to template these. Lets start with these template ideas:

Shared Templates (only one instantiation)

  • CloudFront Origin Identity – used by all CloudFront distributions to securely access the S3 Bucket holding the content
  • Lambda@Edge Default Document Function, to map prefixes to a default document under each prefix.
  • Lambda CloudFront Invalidate (flush) function (so we can test updates quickly) – very useful with CI/CD pipelines!
  • Logging S3 Bucket
  • Content S3 Bucket

Templates per distribution (per web site)

  • Lambda@Edge Security Headers; with unique values per site, to fit security like a glove around the content
  • ACM certificate
  • CloudFront distribution (depends on the above two)

Download templates

These may need some customisation, but are a reasonable start:

Summary

Now you have a way to deploy a number of web sites, it’s worth looking at the costs, and administration overhead.

Bandwidth is always a cost no matter what the rate is, so optimising your service to reduce the size of downloads is key; not only will cost decrease, but its also going to make your service ever so slightly faster.

Serving images in current-generation formats, such as webp (instead of jpeg) may give an improvement; but you need to be confident that only modern clients are using your service. Of course, if you’re restricting TLS protocols for security requirements, then you probably already have mostly modern clients!

Even if you can’t use contemporary image formats, you can ensure that images are used in the browser at the resolution they are; we’ve seen people take the image they took with their phone at 2 MB and thousands of pixels wide and high, only to implement width and height of 50 pixels! If nothing else, ensure you’re compression of JPEGs is reasonably (you probably have a default of 90%, when 60% may do).

You should now test your public facing services with SSLLabs.com/testssl/, SecurityHeaders.com, and Hardenise.com. You may also want to hook up from browser reporting with report-uri.com as well.

Next steps

In a subsequent post, we’ll look at having Production, UAT and Development copies of our sites, as well as using CodeCommit to store the content, and CodePipeline to check it out into the various environments.

UniFi: Should I wait for the next DreamMachine Pro?

I switched to a 1 Gb/s NBN connection a few months back, but it soon became apparent that the original Unifi Security Gateway (USG) is no match for 1 Gb/s link.

While I love the Wifi access points, the management interface, and the rapid firmware updates, throughput limitations of the USG only became noticeable when the link speed went up. Ubiquity Networks, the manufacturers of the Unifi range, has released faster products – the throughput being the major selling point. And of course, the pricing goes up accordingly.

But in thinking of the current top-of-the-line device, the DreamMachine Pro, it kind of gives me some pause for consideration.

The device has two “WAN links”; one is an RJ45 gigabit Ethernet port, the other is a 10Gb/s SFP slot for a fibre GBIC. I’d love to have a fail-over Internet connection, but the fibre connection isn’t an advantage to me.

Ubiquiti are not selling their LTE fail-over device in Australia. I’d have to drop the 10 Gb/s SFP port back to a vanilla 1 Gb/s RJ34 copper port to plug into an alternate LTE device. But then again, carrier plans for this pattern are expensive.

However, it could be that I have two RJ45 Internet connections; my NBN connection affords me up to 4 ISP connections in the fiber-to-the-premesis that I have available. Now, the upstream link from my CPE to the Point of Interconnect (PoI) may be limited to 1 GB, but having the ability to fail-over to another ISP may be useful. Or I may ant to route traffic by port or Service to a different link (eg, VPN traffic over Link #2, or Web Traffic over Link #2, or perhaps just streaming video from some specific providers over alternate link.).

The Dream Machine also has a built in 8 port switch (EJ45), but none of the ports are Power over Ethernet (POE). After all, the majority of links going in to this are going to be WiFi access points directly, and linking an 8 port PoE switch in here seems a waste. A long tail of customers would find this fills their needs without having additional switches to worry about.

I would also have expected more ports here, given the cost of the device: say 16 ports, even if only half of them were PoE.

The inclusion of Protect for video cameras is a neat idea, but having two local disks to RAID together would be nice. I have shied away form on=premise storage, but for large volumes of video, I still like having the highest bandwidth version not traversing the WAN. So Its great we have one disk option available, but it could be so much more awesome if we just had some local resilience.

Of course, if Japan has residential 2 Gb/sec Internet connection, then would this device still be usable? I’m guessing Australia will max out on 1 Gb/sec for a while…

So, trying to decide if I dive in for the current DreamMachine Pro, or wait until it’s tweaked…..