The case for NEL

I’ve been talking for the last number of years about some of the web security changes that have happened in the web/browser/PKI landscape, many of which have happened quietly and those not paying attention may have missed.

One very interesting element has been a modern browser capability called Network Error Logging. Its significant as it fixes a problem in the web space where errors may happen on the client side, and the server side hears nothing about it.

You can read the NEL spec here.

Adopting NEL is another tool in your DevOps armoury, to drive operational excellence and continuous improvement of your deployed web applications, helping to retain customers and increase business value.

Essentially, this is a simple HTTP header string that can be set, and browsers that recognise it will cache that information for a period (that you specify). If the browser has any of a whole set of operational issues while using your web site, then it has an opportunity to submit a report to your error logging endpoint (or cache it to submit later).

Prior to this, you would have to rely on the generosity of users reporting issues for you. The chance of a good Samaritan (white hat) getting through a customer support line to report issues is.. small! Try calling your local grocery store and tell them they have a JavaScript error. Indeed, for this issue, we have RFC 9116 for security.txt.

Having your users report your bad news is kind of like this scene from the 1995 James Bond film GoldenEye:

“Unlike the American Government we prefer not to get our bad news from CNN” – M, GoldenEye

So, what’s the use-case of Network Error Logging. I’ll split this into four scenarios:

  1. Developer environments
  2. Testing environments
  3. Non-production UAT environments
  4. Production Environments

Developer Environments

Developers love to code, and will produce vast amounts of code, and have it work in their own browser. They’ll also respond to error sin their own environments. But without the NEL error logs, they may miss the critical point a bug is at, when something fails to render in the browser.

NEL gives developers the visibility they are lacking when they hit issues, with otherwise, need screen captures from the browser session (which are still useful, particularly if your screen capture includes the complete time (including seconds) and the system is NTP synchronised).

With stricter requirements coming (such as Content Security Policies being mandated by the PCI DSS version 4 draft on payment processing pages), the sooner you can give developers the visibility of why operations have failed, the more likely of success when the software project makes it to a higher environment.

So, developers should have a non-production NEL endpoint, just to collect logs, so they can review and sort them, and affect change. Its not likely to be high volume reporting here – it’s just your own development team using it, and old reports are quickly worthless (apart from identifying regressions).

Testing Environments

Like developers, Test Engineers are trying to gather evidence of failures to feed into trouble ticketing systems. A NEL endpoint gives Testers this evidence. Again the volume reporting may be pretty low, but the value of the reporting will again help errors hitting higher environments.

Non-Production UAT Environments

This is your last chance to ensure that the next release into production is not hitting silly issues. The goal here to is make the volume of NEL reports approach zero, and any that come in are likely to be blockers. Depending on the size of your UAT validation, the volume will still be low.

Production Environments

This is where NEL becomes even more important. Your Security teams and your operational teams need to both have real-time visibility of reporting, as the evidence here could be indicative of active attempts to subvert the browser. Of course the volume of reports could also be much larger, so be prepared to trade the fraction of reporting to balance the volume of reports. It may also be worth using a commercial NEL endpoint provider for this environment.

Running your own NEL Endpoint

There is nothing stopping you from running your own NEL endpoint, and this is particularly useful in low volume, non-production scenarios. It’s relatively, simple, you just need to think of your roll out:

  • Let every project define their own NEL endpoints, perhaps one per environment, and let them collect and process their own reports
  • Provide a single central company-wide non-production NEL endpoint, but then give all developers access to the reports
  • Somewhere in the middle of these above two options?

Of course, none of these are One Way Doors. You can always adjust your NEL endpoints, by just updating the NEL headers you have set on your applications. If you don’t know how to adjust HTTP header strings on your applications and set arbitrary values, then you already have a bigger issue in that you don’t know what you’re doing in IT, so please get out of the way and allow those who do know to get stuff done!

Your NEL endpoint should be defined as a HOSTNAME, with a valid (trusted) TLS certification, listening for an HTTP post over HTTPS. Its a simple JSON payload, that you will want to store somewhere. You have a few choices as to what to do with this data when submitted:

  1. Reject it outright. Maybe someone is probing you, or submitting malicious reports (malformed, buffer overflow attempt, SQL injection attempt, etc)? But these events themselves may be of interest…
  2. Store it:
    • In a relational database
    • In a No-SQL database
    • In something like ElasticSearch
  3. Relay it:
    • Via email to a Distribution List or alias
    • Integration to another application or SIEM

My current preference is to use a No-SQL store, with a set retention period on each report.

NEL in AWS API Gateway, Lambda and DynamoDB

The architecture for this is very simple:

Simple NEL endpoint with API Gateway, Lambda and DynamoDB

We have one table which controls access for reports to be submitted; its got config items that are persistent. I like to have a config for banned IPs (so I can block malicious actors), and perhaps banned DNS domains in the NEL report content. Alternately, I may have an allow list of DNS domains (possibly with wildcards, such as *.example.com).

My Lambda function will get this content, and then evaluate the source IP address of the report, the target DNS domain in the report, and work out if its going to store it in the Reports table.

When inserting the JSON into the report table, I’m also going to record:

  • The current time (UTC)
  • The source address the report came from

Here’s an example report that’s been processed into the table:

This is what the NEL report (Body) looks like:

{
   "csp-report":{
      "document-uri":"https://www.example.com/misc/video.html",
      "referrer":"",
      "violated-directive":"default-src 'none'",
      "effective-directive":"child-src",
      "original-policy":"default-src 'none'; img-src 'self' data: https://docs.aws.amazon.com https://api.mapbox.com/  https://unpkg.com/leaflet@1.7.1/dist/images/ ; font-src data: ; script-src 'self' blob: https://cdnjs.cloudflare.com/ajax/libs/video.js/ https://ajax.googleapis.com https://cdnhs.cloudflare.com https://sdk.amazonaws.com/js/ https://player.live-video.net/1.4.1/amazon-ivs-player.min.js https://player.live-video.net/1.4.1/ https://unpkg.com/leaflet@1.7.1/dist/leaflet.js https://unpkg.com/leaflet.boatmarker/leaflet.boatmarker.min.js  https://unpkg.com/leaflet.marker.slideto@0.2.0/ 'unsafe-inline';  style-src 'self' https://cdnjs.cloudflare.com/ajax/libs/video.js/ https://unpkg.com/leaflet@1.7.1/dist/leaflet.css ; frame-ancestors 'none'; form-action 'self'; media-src blog:; connect-src 'self' *.live-video.net wss://e2kww8wsne.execute-api.ap-southeast-2.amazonaws.com/production wss://boat-data.example.com ; object-src: self ; base-uri 'self'; report-to default; report-uri https://nel.example.com/",
      "blocked-uri":"blob",
      "status-code":0,
      "source-file":"https://player.live-video.net",
      "line-number":6,
      "column-number":63072
   }
}

And here is the Lambda code that is validating the reports:

import json
import boto3
import ipaddress
import datetime
import uuid
def lambda_handler(event, context):
    address = report_src_addr(event)
    if address is not False:
        if report_ip_banned(address) or not report_ip_permitted(address):
            return {
                'statusCode': 403,
                'body': json.dumps({ "Status": "rejected", "Message": "Report was rejected from IP address {}".format(address)})
            }
    if not report_hostname_permitted(event):
        return {
            'statusCode': 403,
            'body': json.dumps({ "Status": "rejected", "Message": "Reports for subject not allowed"})
        }
    report_uuid = save_report(event)
    if not report_uuid:
        return {
            'statusCode': 403,
            'body': json.dumps({ "Status": "rejected"})
            }
    return {
        'statusCode': 200,
        'body': json.dumps({ "Status": "accepted", "ReportID": report_uuid})
    }


def save_report(event):
    report_uuid =  uuid.uuid4().hex
    client_ip_str = str(report_src_addr(event))
    print("Saving report {} for IP {}".format(report_uuid, client_ip_str))
    response = report_table.put_item(
        Item={
            "ReportID": report_uuid,
            "Body": event['body'],
            "ReportTime": str(datetime.datetime.utcnow()),
            "ClientIp": client_ip_str
            }
        )
    if response['ResponseMetadata']['HTTPStatusCode'] is 200:
        return report_uuid
    return False


def report_ip_banned(address):
    fe = Key('ConfigName').eq("BannedIPs")
    response = config_table.scan(FilterExpression=fe)
    if 'Items' not in response.keys():
        print("No items in Banned IPs")
        return False
    if len(response['Items']) is not 1:
        print("Found {} Items for BannedIPs in config".format(len(response['Items'])))
        return False
    if 'IPs' not in response['Items'][0].keys():
        print("No IPs in first item")
        return False
    ip_networks = []
    for banned in response['Items'][0]['IPs']:
        try:
            #print("Checking if we're in {}".format(banned))
            ip_networks.append(ipaddress.ip_network(banned))
        except Exception as e:
            print("*** EXCEPTION")
            print(e)
            return False
    for banned in ip_networks:
        if address.version == banned.version:
            if address in banned:
                print("{} is banned (in {})".format(address, banned))
                return True
    #print("Address {} is not banned!".format(address))
    return False


def report_ip_permitted(address):
    fe = Key('ConfigName').eq("PermittedIPs")
    response = config_table.scan(FilterExpression=fe)
    if len(response['Items']) is 0:
        return True
    if len(response['Items']) is not 1:
        print("Found {} Items for PermittedIPs in config".format(len(response['Items'])))
        return False
    if 'IPs' not in response['Items'][0].keys():
        print("IPs not found in permitted list DDB response")
        return False
    ip_networks = []
    for permitted in response['Items'][0]['IPs']:
        try:
            ip_networks.append(ipaddress.ip_network(permitted, strict=False))
        except Exception as e:
            print("permit: *** EXCEPTION")
            print(e)
            return False
    for permitted in ip_networks:
        if address.version == permitted.version:
            if address in permitted:
                print("permit: Address {} is permitted".format(address))
                return True
    print("Address {} not permitted?".format(address))
    return False

def report_hostname_permitted(event):
    if 'body' not in event.keys():
        print("No body")
        return False
    if 'httpMethod' not in event.keys():
        print("No method")
        return False
    elif event['httpMethod'].lower() != 'post':
        print("Method is {}".format(event['httpMethod']))
        return False
    if len(event['body']) > 1024 * 100:
        print("Body too large")
        return False
    try:
        reports = json.loads(event['body'])
    except Exception as e:
        print(e)
        return False

    for report in reports:
        print(report)
        return True
        if 'url' not in report.keys():
            return False
        url = urlparse(report['url'])
        fe = Key('ConfigName').eq("BannedServerHostnames")
        response = config_table.scan(FilterExpression=fe)
        if len(response['Items']) == 0:
            print("No BannedServerHostnames")
            return True
        for item in response['Items']:
            if 'Hostname'not in item.keys():
                continue
            for exxpression in item['Hostname']:
                match = re.search(expression + "$", url.netloc)
                if match:
                    print("Rejecting {} as it matched on {}".format(url.netloc, expression))
                    return False
    return True


def report_src_addr(event):
    try:
        addr = ipaddress.ip_address(event['requestContext']['identity']['sourceIp'])
    except Exception as e:
        print(e)
        return False
    #print("Address is {}".format(addr))
    return addr


def parse_X_Forwarded_For(event):
    if 'headers' not in event.keys():
        return False
    if 'X-Forwarded-For' not in event['headers'].keys():
        return False
    address_strings = [x.strip() for x in event['headers']['X-Forwarded-For'].split(',')]
    addresses = []
    for address in address_strings:
        try:
            new_addr = ipaddress.ip_address(address)
            if new_addr.is_loopback or new_addr.is_private:
                print("X-Forwarded-For {} is loopback/private".format(new_addr))
            else:
                addresses.append(new_addr)
        except Exception as e:
            print(e)
            return False
    return addresses

You’ll note that I have a limit on the size of a NEL report – 100KB of JSON is more than enough. I’m also handling CIDR notation for blocking (eg, 130.0.0.0/16).

Operational Focus

Clearly to use this, you’ll want to push the Lambda function into a repeatable template, along with the API Gateway and DynamoDB table.

You may also want to have a Time To Live (TTL) on the Item being submitted in save_report() function, with perhaps the current time (Unix time) plus a number of seconds to retain (perhaps a month), and configure TTL expiry on the DynamoDB table.

You may also want to generate some custom CloudWatch metrics, based upon the data being submitted; perhaps per hostname or environment, to get metrics on the rate of errors being reported.

Summary

Hopefully the above is enough to get you to understand NEL, and help capture these reports from your web clients; in a production environment you may want to look at report-uri.com, in non-production, you may want to roll your own as above.

Australian National Cyber Security Conference, Melbourne 2022

This year I put forward my first proposal to speak at the Australian National Cyber Security Conference, organised by the Australian Information Security Association (AISA) of which I have been a member for around 5 years.

I have previously spoken at the AISA local Perth branch conference, and figured that there was a lack of content around my area of interest, being web security (something I have spoken at other conferences in the past about, and been teaching students and colleagues since 2014.

I was thrilled to be selected, based on merit (and not sponsorship), to present.

Damien Manuel, AISA Chair and Adjunct Professor at Deakin University opening CyberConf 2022

Held at the Melbourne Convention and Exhibition Centre, spanning three floors, there weer to be just shy of 400 speakers, and over 4000 attendees.

MCEC Main Auditorium, with 5000 seat capacity, with delegates starting to file in…

Its a big venue, and there were at times some 15 simultaneous breakout streams running over the three days of the conference, along with a large exhibitor hall. The catering budget alone for the event was in excess of AU$1M.

James Bromberger, listening to the opening presentations and keynote at CyberConf 2022

We started with a word from Clare O’Neil, the federal minister presenting via pre-recorded video:

Clare O’Neil

This was followed by Dillan Alcott giving a no-holes bared authentic blast from his personality on how he sees himself, his challenges, and opportunities:

Dillan Alcott at CyberConference 2022

Later in the day came The Woz, here speaking with conference host Juanita Phillips:

Steve Wozniak (The Woz), Apple Co-Founder, and Juanita Phillips

Steve was a genuine engineer, taking joy in the machines he could build with the chipsets he played with. It was heartening to hear the desire to avoid conflict and disappointment, and focus on achievement and joy.

Next up was Juliette Wilcox CMG, Cyber Security Ambassador for UK Defence and Security Exports at Dept International Trade, UK Government.

Juliette Wilcox, UK Government

Juliette spoke well about the importance of strong cybersecurity, sharing advances, and having reliable systems to ensure that trade and economics could proceed smoothly.

The Hon Julie Bishop.

Next up was Julie Bishop, who also spoke about the important of strong cyber security in our digital systems and the reliance on these systems for international trade and relations.

Julie Bishop

Next up was environmental advocate (not activist), Erin Brockovich.

Erin Brockovich

Erin spoke of her stick-to-it-ivness, determination to write a wrong, and managing conflict. She rejects the title of being an Environmental Activist, as its deemed to negative, but more an advocate for the environment.

Dr Vyom Sharma

Next was Dr Vyom Sharma, talking about managing stress. From Workload, to Reward, Fairness, Autonomy, Community and Values as all being factors in stress that lead to burnout.

The Hon Matt Thistlethwaite, MP

A surprise was Matt Thistlethwaite adding to the line up, who spoke about the Dept Defence programs on Critical Infrastructure and reach out via ACSC and their programs.

Paula Januszkiewicz

Finally a pentester gets to the stage – Paula J – who proceeded to drive holes with Windows Server processes and WMI, demonstrating live to the audience the risks with misconfigured and under-configured systems.

And then, we came to Capt Sully Sullenberger:

Capt Sulley Sullenberger

Capt’n Sulley was the calmest person on stage. He spoke about being passionate about what you love, and becoming a master of it. He says he’s loved two aircraft, and old DC, and the Boeing he was in when he encountered the bird strike in 2009 on flight 1529 our of New York. His passion meant that he had internalised the entire manual, and know which pages he would be turning to, and what the first few actions would be before any manual was opened.

He spoke of his roles and activities since 2009, working with aviation safety, and the improving record on US domestic flights (no deaths since 2009).

The Awards Dinner

As a speaker, I had a ticket to the awards gala dinner.

AISA Awards Gala Dinner, Crown Towers

It was great to see my local North Metro TAFE pick up one award, and Chris Bolan and friends at Seamless Intelligence pick up another. Congatulations to all the nominees and the winners.

A few sessions of note

I kind of liked the presentation on Cyber Asset Attack Surface Management, new in the Gartner graphs of wonder from July 2021 . At its core, its about having more visibility of all the assets, including those SaaS apps that staff sign up for, and at its most basic, can be just a spreadsheet of what’s in use:

Next up, was the Ukranian power outages of 2014:

This was a remote access tool, where by engineers would see their mouse cursor moving and keystrokes being entered, but then custom firmwares flashed onto PLCs, turning the lights out for three regions of Ukraine. Power company staff had to drive to the remote sub stations to physically turn power back on, as all remote operations was lost.

The company had firewalls and VPN services in place, but clearly not strict and restricted enough to block this behavious – let alone network segregation (air-gap).

Of course, my session:

James Bromberger, about to go in and present
James Bromberger presenting at CyberConf 2022, thanks to Kelly Taylor

Another session (no pics) spoke about securing domains (something I look to tools like Ivan Ristic’s hardenize.com). A new (minor) record to add to DNS is the BIMI record, to indicate the marketing icon (square SVG) to be displayed to users for authenticated mail from your domain. Personally I see that as just another record that a typo-squatting domain could just copy and use as well, so wont actually elevate security, but it was a new one for me (

But my highlight was meeting Cricket Liu, the author of the original DNS & Bind O’Reilly book.

James Bromberger & Cricket Liu

Cricket spoke about the 30 years tha have passed since then, and the more recent use of Resource Policy Zones in DNS to provide blocking and logging of DNS queries for malicious domains – including generated domains that are registered and activated at particular times to be Command & Control services for botnets. With Bind (and alternatively products from his company) you can easily share the policies to block these services, IMHO akin to the capability now in AWS GuardDuty and AWS DNS Firewall. We also spoke about DNS over HTTPS, DNSSec, and more.

Of course, I wished I had a mug for the occasion.

But this discussion was by far and away the best of the conference for me. DNS is such a critical piece of our network engineering, and in so many environments its set up, works, and is then ignored; despite the fact that it is feasible to exfiltrate data (20 bytes at a time) over DNS – probably with millions of requests – but that will probably be invisible to most network operators.

Optus Breach Sept 2022: Drivers Licence Western Australia and DoT WA

Optus (part of Singtel) was breached due to poor development practices in September 2022.

UPDATE 28/Sept/2022: The Premier has announced that new drivers licences will be available, with new IDs. 

It appears the team implementing their APIs did not have the skills to apply authentication, firewalling, rate limiting, alerting, and/or simulated data in non-production environments. It appears the management for this team did not know or enforce these protections either. And it appears the upper management did not check that lower management was taking necessary precautions and standards when handling PII.

There’s going to be some implications for this. Perhaps better engineering will be one of them.

I’m in the breach data as an Optus customer, and after a few days of news items, I received a confirmation email from Optus.

I’ve seen that in NSW, the digital-savvy minister Victor Dominello is already discussing re-issuing drivers licences in NSW. I thought I’d call the Western Australian Department of Transport and see what they are doing.

It’s been a public holiday Monday this week, so on Tuesday after 55 minutes in a queue, I got through to someone at DoT. Of course, to authenticate me on the phone they asked for the same information as shown in the data breach.

I learned:

  • DoT WA are not re-issuing licences at this stage
  • the ID number o the licence cannot currently be changed – it is perpetual
  • if they were to re-issue them with the same ID but a new expiry date, it would be on the same day and month, but 5 years later, so for any attacker trying a combination, the correct expiry date is the one in the breach, plus one, two, three for our five years.

The WA Department of Transport needs to look at this issue and fix a few items: The ID number issued to the public should be temporary and rotating for every issuance. I suspect there’s a few databases with this public number as a primary key. Perhaps the expiry date will need to be investigated to have 5 years +/- 30 days or so, and every re-issue should include the same variance. Indeed, perhaps reduce the lifetime from 5 years to two years to force rotation of the ID number, or let customers pay for the number of days they would like pro-rata, from 180 to 3650.

I know a few people at the Department, and I know they’re going to get a lot of focus from this issue. They’re welcome to reach out and chat with me; they have my details, after all. I know its a busy week for my contacts, so for anyone else out there, let’s stand back and wait.

CloudFormation and CloudFront Origin Access Control

I recently wrote about the change of Amazon CloudFront’s support for accessing content from S3 privately.

It’s bad practice to leave an origin server open to the world; if an attacker can overwhelm your origin server then your CDN cant help to insulate you from that, and the CDN cannot serve any legitimate traffic. There are tricks to this such as having a secret header value injected into origin requests and then have the origin process that, but that’s kind of a static credential. Origin Access Identity was the first approach to move this authentication into the AWS domain, and Origin Access Control is the newer way, supporting the v4 Signature algorithm (at this time).

(If you like web security, read up on the v4 Signature, look at why we don’t use v1/2/3, and think about a time if/when this gets bumped – we’ve already seen v4a)

CloudFormation Support

When Origin Access Control launched last month, it was announced with CloudFormation support! Unfortunately, that CloudFormation support was “in documentation only” by the time I saw & tried it, and thus didn’t actually work for a while (the resource type was not recognised). CloudFormation OAC documentation was rolled back, and has now been published again, along with the actual implementation in the CloudFormation service.

It’s interesting to note that the original documentation for AWS::CloudFront::OriginAccessControl had some changes between the two releases: DisplayName became Name, for example.

Why use CloudFormation for these changes?

CloudFormation is an Infrastructure as Code (IaC) way of deploying resources on the cloud. It’s not the only IaC approach, another being Terraform, or the AWS CDK. All of these approaches gives the operator an artefact (document/code) that itself can be checked in to revision control, giving us the ability to easily track differences over time and compare the current deployment to what is in revision control.

Using IaC also gives us the ability to deploy to multiple environments (Dev, Test, … Prod) with repeatability, consistency, and as minimal manual effort as possible.

IaC itself can also be automated, further reducing the human effort. With CloudFormation as our IaC, we also have the concept of Drift Detection within the deployed Stack resources as part of the CloudFormation service, so we can validate if any local (e.g., console) changes have been introduced as a deviation from the prescribed template configuration.

Migrating from Origin ID to OAC with CloudFormation

In using CloudFormation changes to migrate between the old and the new ways of securely accessing content in S3, you need to do a few steps to implement and then tidy up.

1. Create the new Origin Access Control Identity:

  OriginAccessControlConfig:
    Name: !Ref OriginAccessControlName
    Description: "Access to S3"
    OriginAccessControlOriginType: s3
    SigningBehavior: always
    SigningProtocol: sigv4

If you had a template that created the old OriginAccessId, then you could put this new resource along side that (and later, come back and remove the OID resource).

2. Update your S3 Bucket to trust both the old Origin Access ID, and the new Origin Access Control.

 PolicyDocument:
    Statement:
      -
        Action:
          - s3:GetObject
        Effect: Allow
        Resource: 
          - !Sub arn:aws:s3:::${S3Bucket}/*
        Principal:
          "AWS": !Sub 'arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity ${OriginAccessIdentity}'
          "Service": "cloudfront.amazonaws.com"

If you wish, you can split that new Principal (cloudfront.amazonaws.com) into a separate statement, and be more specific as to which CloudFront distribution Id is permitted to this S3 bucket/prefix.

In my case, I am using one Origin Access Control for all my distributions to access different prefixes in the same S3 bucket, but if I wanted to raise the bar I’d split that with one OAC per distribution, and a unique mapping of Distribution Id to S3 bucket/prefix.

3. Update the Distribution to use OAC, per Origin:

    Origins:
      - Id: S3WebBucket
        OriginAccessControlId: !Ref OriginAccessControl
        ConnectionAttempts: 2
        ConnectionTimeout: 5
        DomainName: !Join
          - ""
          - - !Ref ContentBucket
            - ".s3.amazonaws.com"
        S3OriginConfig:
          OriginAccessIdentity: ""
        OriginPath: !Ref OriginPath

You’ll note above we still have the S3OriginConfig defined, with an OriginAccessIdentity that is empty. That took a few hours to figure out that empty string; without it, the S3OriginConfig element is invalid, and a CustomOriginConfig is not for accessing S3. At least at this time.

If you’re adopting this, be sure to also look at your CloudFront distributions’ HttpVersion setting; you may want to adopt http2and3 to turn on HTTP3.

4. Remove the existing S3 Bucket Policy line that permitted the old OID

“AWS”: !Sub ‘arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity ${OriginAccessIdentity}’ is no longer needed:

 PolicyDocument:
    Statement:
      -
        Action:
          - s3:GetObject
        Effect: Allow
        Resource: 
          - !Sub arn:aws:s3:::${S3Bucket}/*
        Principal:
          "Service": "cloudfront.amazonaws.com"

5. Delete the now unused OID from CloudFront

Back in part 1 where you created the new OriginAccessControl, remove the OriginAccessIdentity resource and update your stack to delete it.

Summary

Of course, run this in your development environment first, and roll steps out to higher environments in an orderly fashion.

Amazon CloudFront: Origin Access Control

Amazon CloudFront, the AWS Content Delivery Network (CDN) service, has come a long way since I first saw it launch; I recall a slight chortle when it had 53 points of presence (PoPs) account the world, as CloudFront often (normally?) shares edge location facilities with the Amazon Route53 (Hosted DNS) service.

Today it’s over 400 PoPs, and is used for large and small web acceleration workloads.

One common pattern is having CloudFront serve static objects (files) that are stored in AWS’s Simple Storage Service, S3. Those static objects are often HTML files, images, Cascading Style Sheet documents, and more. And while S3 has a native Website serving function, it has long been my strong recommendation to my friends and colleagues to not use it, but use CloudFront in front of S3. There’s many reasons for this, one of which is you can configure the TLS certificate handed out, set the minimally permitted TLS version, and inject the various HTTP Security Headers we’ve come to see as minimal requirements for asking web browsers to help secure workloads.

Indeed, having any CDN sit in front of an origin server is an architecture that’s as old as web 2.0 (or more). One consideration her is that you don’t want end users circumventing the CDN and going direct to your origin server; if that origin gets overloaded, then the CDN (which caches) may not be able to fetch content for it’s viewers.

It’s not uncommon for CDNs to exceed 99.99% caching of objects (files), greatly reducing the origin server(s) that host the content. CDNs can also do conditional GET requests against an origin, to check that a cached version of an object (file) has not changed, which helps ensure the cached object can still be served our to visitors.

Ensuring that origin doesn’t get overloaded then becomes a question of blocking all other requests to the origin except those from the CDN. Amzon CloudFront has evolved its pattern over the years, staring with each edge operating independently. As the number of PoPs grew, this became an issue, so a mid tier cache, called the CloudFront Regional Edge, was introduced to help absorb some of that traffic. It’s a pattern that Akamai was using in the 2000’s when it had hundreds/thousands of PoPs.

For S3, the initial approach was to use a CloudFront Origin Identity (OID), which would cause a CloudFront origin request (from the edge, to the origin) to be authenticated against the S3 endpoint. An S3 Bucket Policy could then be applied that would permit access for this identity, and thus protect the origin from denial of service.

The S3 documentation to restrict access to S3 for this is useful.

Here’s an example S3 Bucket policy from where I serve my web content (from various prefixes therein):

{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity E6BL78W5XXXXX"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::xxxxxxxx-my-dev-web-bucket/*"
        }
    ]
}

This has now been revised, an in one release post, labelled as legacy and deprecated. The new approach is called an Origin Access Control (OAC), and will give finer-grained control.

One question I look at is the migration from one to another, trying to reach this with minimal (or no) downtime.

In my case, I am not concerned with restricted access to the S3 object to a specific CloudFront distribution ID; I am happy to have one identity that all my CloudFront distributions share against the same S3 Bucket (with different prefixes). As such, my update is straight forward, in that I am going to start by updating the above Bucket policy:

{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity E6BL78W5XXXXX",
                "Service": "cloudfront.amazonaws.com"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::xxxxxxxx-my-dev-web-bucket/*"
        }
    ]
}

With this additional Service line, any CloudFront distribution can now grab objects from my account (possibly across account as well). I can add conditions to this policy as well, such as checking the Distribution IDs, but as part of the migration from OID to OAC we’ll come back to that.

Next up, in the CloudFront Console (or in a Cloud Formation update) we create a new OAC entry, with the v4sig being enabled for origin requests. Here’s the CloudFormation snippet:

  OriginControl:
    Type: AWS::CloudFront::OriginAccessControl
    Properties:
      OriginAccessControlConfig:
        DisplayName: S3Access
        Description: "Access to S3"
        OriginType: s3
        SigningBehavior: always
        SigningProtocol: sigv4

Now we have an Origin Access Control, which in the console looks like this:

With this in place, then we need to update the CloudFront distributions to use this for each behaviour’s origin.

Give it a few minutes, check the content is still being delivered, and then its time to now back out the old CloudFront origin Access identity from the S3 Bucket Policy:

{
    "Version": "2008-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "cloudfront.amazonaws.com"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::xxxxxxxx-my-dev-web-bucket/*"
        }
    ]
}

Then pop back to the CloudFront world and remove the old Origin Access Id (again, but either Cloud Formation update if that’s how you created it, or via the console or API).

This is also a good time to look at the Condition options in that policy, and see if you want to place further restrictions on access to your S3 Bucket, possibly like:

        "Condition": {
            "StringEquals": {
                "AWS:SourceArn": "arn:aws:cloudfront::111122223333:distribution/*"
            }
        }

(where the 1111… number in red is your AWS account number).

AWS has been key to say that:

Any distributions using Origin Access Identity will continue to work and you can continue to use Origin Access Identity for new distributions.

AWS

However, that position may change in future, and given this has already marked the existing OID approach as “legacy“, it’s time to start evaluating your configuration changes.