AWS Sydney Summit 2023

Last week I was fortunate enough to attend the AWS Sydney Summit 2023, along with my colleague and friend, Elliot Segler, on behalf of Akkodis.

A Partner Day was arranged that ran on Monday 3rd April – I’ll avoid details on that here at this time to focus on the main Customer Summit – the Big Event.

This was really the first major in-country Summit since the “end” of the Covid-19 pandemic – the 2022 version was much smaller, and that was only announced as a small in-person event just a few weeks before it was held. Thus in 2022 this event only had a small crowd; this year, attendance tickets were capped at 5,000.

In 2019, some 19,000 people attended over two days.

I’ve been fortunate to never miss an AWS Summit in Australia, and it gives me insight into the state of the Cloud when comparing year-on-year.

The floorplan of the AWS Sydney Summit 2023
The exhibitor floor at the ASW Sydney Summit 2023
AWS Managing Director A/NZ, Rianne van Veldhuizen, opens the Summit
Cameron Adams, founder of Canva, shared some of their AWS Cloud architecture.

Cameron also spoke about the move to sustainable computing, Canva signing The Climate Pledge, and Canva racing to integrate AI models into their service offering, such as magic erase.

We also saw Nicole Sheffield from Wesfarmers “One Digital” service who spoke about the implementation of their digital platform that offers customer-facing digital services across the Wesfarmers portfolio. This is significant, as Wesfarmers had, in 2014, been quite alarmed at Amazon entering Australia.

The exhibitors were split between Consulting Services partners, and ISV partners. Those ISVs had a distinct developer tooling flavour to them, but we also saw Canva talking to end customers about their product, and both Bespoke and Lumnify (formerly DDLS) training providers discussing their educational schedules and offerings.

Early in the morning I took a photo of the Builder lab, set up for individuals to undertake self-paced digital training – which was full for the rest of the day:

The Builder Lab (taken first thing in the morning before it opened)

In Financial Services, Australian bank (and one of the Big Four) Westpac provided their take on some of the cloud security approaches. This is notable to me as one of their staff in 2013 said they would “never use Public Cloud”; they seem to be doing well putting Public Cloud to good use.

Suncorp (finance, insurance and banking) also spoke about their exit strategy from their existing data centers using AWs as a target platform.

This year there were no major service announcements or releases at this Summit, but then these days the major announcements happen almost every few days anyway! In previous years, the Summit has always had a sort of “revolution” (when major new concepts or services were released, such as AI services) or “evolution” (incremental updates announced for existing services). This year the theme was more “steady-as-she goes” and stable.

What’s clear is that commentary around Lift & Shift migrations are now evolving to Migrate & Modernise (which is what I have always focused on – deeper expertise and a better short and long term outcome). This isn’t surprising, as often naïve Lift & Shift has left customers with workloads costing more and unable to take advantage of key cloud attributes, such as scalability or cloud-platform-managed-services to reduce TCO.

Of course, those who implemented cloud-native solutions, and paid close attention to Cloud Operating Models (and tech team org structures) with an eye to the Well-Architected Framework have enjoyed optimised and reliable operation.

AWS claimed that perhaps only 15% of all workloads that will eventually run in the cloud, is now doing so. Of course, that 15% requires the maintenance, care and attention to ensure it remains operational, and optimal.

So where are we in the Cloud evolution timeline? I suspect we’re in the middle of furious catch-up by software providers who now are focusing on adopting IaaS and PaaS to take their legacy solutions, and reimplement them as cloud-native SaaS. More vertical-specific SaaS products are coming to market.

The individual services within the cloud are maturing. ARM-based Graviton chips continue to uphold Moore’s Law (RIP Gordon Moore, 2023). IPv6 is progressing, but as noted by one of my fellow Partner Ambassadors and long term friend, Greg Cockburn, the rate of change in modern IPv6 networking appears to be slowing (with notable exception, VPC Network Firewall now supporting IPv6 only subnets). I suspect the major requests are now satisfied; workloads that want to be dual-stacked to the outside world are fully supported. Of course, many ISPs, telcos and carriers are continuing to slowly adopt IPv6 for their consumers; some advanced end user providers in Australia have been Telstra, Internode (ironically part of iiNet who dropped the IPv6 ball in 2013, and thus part of Vodafone), and Aussie Broadband.

From speaking to people, it did appear that a 5,000 person cap had limited many people from attending, particualy those who live in the same Australian state of New South Wales who may have left booking a ticket too late, and missed out. Perhaps in 2024 we’ll see another increase (and who knows).

Meanwhile, outside around Darling Harbour, much construction happens, watched over by warships and tall ships.

Warships and Tall Ships in Darling Harbour, 2023

And in case you’re wondering, its a long way from Perth (where I live) to Sydney, and the Great Circle line directly between the two cities takes us far over the Southern Ocean:

3,300 kms from Sydney to Perth

AWS Local Zone: Perth Launch

In the last few days, the AWS Local Zone service launched in Perth.

Articles in the media are claiming this is a significant uplift:

AWS has officially opened a new cloud infrastructure region in Perth.

Kate Weber, IT News

Only problem, this is not a Region (big R) in the AWS vernacular, but a single Availability Zone, attached as an adjunct to the AWS Sydney Region (ap-southeast-2).

Media Statements and press releases are designed to excite the media, but the reality is a little more circumspect. The Local Zone does add local compute (EC2) and block storage (in the form of EBS) within Perth. However, its what many would expect that leads a local zone to be perhaps a little limiting.


First up, the compute is only certain, limited types. The same with the EBS volume types. You may be forced to use more expensive compute and storage, because the cheaper instance sizes or EBS types are not available.

Second, the costs for the Compute and Storage is more expensive than in the full Region, where economies of scale and usage patterns drive down costs. You may see up to a 50% overhead on the same 4xLarge instance type compared to Sydney.

Third, some basics like IPv6 support is not (yet?) supported. That may upset your VPC conventions.

Fourth, you may have wanted close access to managed services, like RDS (databases), or even Workspaces (desktop) and AppStream (application virtualisation). Well, that’s back in the main Region, not in the Local Zone, and at 50ms across Australia, that may be a bit too far, which means you’re back to the stone age of running databases on Instances yourself.

Fifth, being a single AZ, you wont get any multi-AZ resilience that you’re comfortable with an a full Region.

Sixth: the Local zone’s operation is tied to the designated Region: issues in ap-southeast-2 (Sydney) may impact management (control plane) or availability (data plane) of that AZ. Your CloudFormation has to execute in Sydney to stand up a stack.


In essence, a Local Zone needs to be looked at from the Well-Architected Framework perspective, and utilised accordingly.

Once you have reconciled those issues, there is the bright side: compute in cloud, managed in a uniform way, without having to deploy an Outpost and have a datacentre for the Outpost to run in.

Many organisations do want Cloud services locally, and for some high volume, low latency, idempotent, loosely coupled, edge processing, this may be perfect.

An AWS Local Zone is also a small stepping stone to perhaps being a full Region one day, as was seen in Osaka, Japan. It just takes demand to validate the point.

Its 10 years since AWS opened first its point of presence (CloudFront, Route53) in Australia, and then the Sydney Region. We’re on the cusp of a second Local zone up in Brisbane, and a second Australian Region – yes, a full Region – in Melbourne.

It’s the new Region that suddenly opens up Multi-region application architecture, which, for public sector in particular, has not been permissible for data jurisdiction purposes (even if that is just a desire and not a mandated legal requirement).

AWS Consulting Services Partners and Certifications, December 2022

I had previously posted about AWS Consulting Services Partners and Certifications in July 2022. Its been six months, so I decided to run the numbers again.

This time, there were 945 partners in the data set, up from 882 in July by 62 new organisations or a 7.1% increase.

What was clear is that much of this growth is at the lower levels but we can also see some growth in the larger buckets of certs, with two exceptions.

The 100-200 category saw a drop in the number of partners in this space, but only by 5 organisations. Similarly, the 200-300 space remained stable at 43 organisations, but I know of several businesses that have migrated from the 200+ to 300+ category. so others have moved into this grouping from the lower levels.

1+20+50+100+200+300+400+500+1000+2000+
Dec 2022276262158107432614251519
Jul 202224425214811243258201218
ASW Certifications held by Consulting Services partners in the AWS Partner Finder

Is at the 400-500 grouping that an increase of 75% has been seen, from 8 organisations to 14.

Furthermore, raw data shows that there are 9 organisations who would fit in a 5000+ certification grouping – who combined would have 117,317 AWS certifications.

The largest? 36,098 certifications goes to Accenture, followed by 14,740 at IBM, and Deloitte on 14,605. Deloitte would be #2 if they grouped Deloitte Shanghai and Deloitte Tohmatsu together. Similarly, NTT is in the partner data as 5 separate organisations.

Cert CountConsulting Services Company Name
36098Accenture
14750IBM
14605Delitte
12790Cognizant
10994Capgemini
9673Tata Consutling
6517Kyndryl
6423Infosys
5467Wipro
4527EPAM
Top 10 by Certification count, December 2022

Increased AWS IPv6 Announcements

IPv6 has been something I have long championed, ever since establishing the first tunnels in the last 1990’s when I was working at UWA. Its also something I was pushing to AWS Service Delivery teams when I worked at AWS in the early 2010s, and in my time at Ajilon/Modis/Akkodis, I have set IPv6 as a stretch goal for all AWS projects to support as a standard.

What’s interesting to see is the increase in IPv6 related announcements from ASW in the What’s New page by year:

Number of IPv6 related posts on AWS What’s New page, as at 19 Dec 2022.

It’s clear that IPv6 is now a first class citizen. Coverage is pretty strong, and customers not only can, but in my opinion should, be targeting dual stack solutions, or in some use cases, IPv6-only deployments.

As System Operators (Sys Admins), DevOps and Developer folk, you should be fully comfortable with another transport protocol. Any considerations around addressing should be minimal. In the on-premises desktop/end-user-compute environment, your internal networks are possibly all IPv4 only, but your corporate proxy should now be dual-homed. (It should also be supporting TLS 1.3 and HTTP/3).

The case for NEL

I’ve been talking for the last number of years about some of the web security changes that have happened in the web/browser/PKI landscape, many of which have happened quietly and those not paying attention may have missed.

One very interesting element has been a modern browser capability called Network Error Logging. Its significant as it fixes a problem in the web space where errors may happen on the client side, and the server side hears nothing about it.

You can read the NEL spec here.

Adopting NEL is another tool in your DevOps armoury, to drive operational excellence and continuous improvement of your deployed web applications, helping to retain customers and increase business value.

Essentially, this is a simple HTTP header string that can be set, and browsers that recognise it will cache that information for a period (that you specify). If the browser has any of a whole set of operational issues while using your web site, then it has an opportunity to submit a report to your error logging endpoint (or cache it to submit later).

Prior to this, you would have to rely on the generosity of users reporting issues for you. The chance of a good Samaritan (white hat) getting through a customer support line to report issues is.. small! Try calling your local grocery store and tell them they have a JavaScript error. Indeed, for this issue, we have RFC 9116 for security.txt.

Having your users report your bad news is kind of like this scene from the 1995 James Bond film GoldenEye:

“Unlike the American Government we prefer not to get our bad news from CNN” – M, GoldenEye

So, what’s the use-case of Network Error Logging. I’ll split this into four scenarios:

  1. Developer environments
  2. Testing environments
  3. Non-production UAT environments
  4. Production Environments

Developer Environments

Developers love to code, and will produce vast amounts of code, and have it work in their own browser. They’ll also respond to error sin their own environments. But without the NEL error logs, they may miss the critical point a bug is at, when something fails to render in the browser.

NEL gives developers the visibility they are lacking when they hit issues, with otherwise, need screen captures from the browser session (which are still useful, particularly if your screen capture includes the complete time (including seconds) and the system is NTP synchronised).

With stricter requirements coming (such as Content Security Policies being mandated by the PCI DSS version 4 draft on payment processing pages), the sooner you can give developers the visibility of why operations have failed, the more likely of success when the software project makes it to a higher environment.

So, developers should have a non-production NEL endpoint, just to collect logs, so they can review and sort them, and affect change. Its not likely to be high volume reporting here – it’s just your own development team using it, and old reports are quickly worthless (apart from identifying regressions).

Testing Environments

Like developers, Test Engineers are trying to gather evidence of failures to feed into trouble ticketing systems. A NEL endpoint gives Testers this evidence. Again the volume reporting may be pretty low, but the value of the reporting will again help errors hitting higher environments.

Non-Production UAT Environments

This is your last chance to ensure that the next release into production is not hitting silly issues. The goal here to is make the volume of NEL reports approach zero, and any that come in are likely to be blockers. Depending on the size of your UAT validation, the volume will still be low.

Production Environments

This is where NEL becomes even more important. Your Security teams and your operational teams need to both have real-time visibility of reporting, as the evidence here could be indicative of active attempts to subvert the browser. Of course the volume of reports could also be much larger, so be prepared to trade the fraction of reporting to balance the volume of reports. It may also be worth using a commercial NEL endpoint provider for this environment.

Running your own NEL Endpoint

There is nothing stopping you from running your own NEL endpoint, and this is particularly useful in low volume, non-production scenarios. It’s relatively, simple, you just need to think of your roll out:

  • Let every project define their own NEL endpoints, perhaps one per environment, and let them collect and process their own reports
  • Provide a single central company-wide non-production NEL endpoint, but then give all developers access to the reports
  • Somewhere in the middle of these above two options?

Of course, none of these are One Way Doors. You can always adjust your NEL endpoints, by just updating the NEL headers you have set on your applications. If you don’t know how to adjust HTTP header strings on your applications and set arbitrary values, then you already have a bigger issue in that you don’t know what you’re doing in IT, so please get out of the way and allow those who do know to get stuff done!

Your NEL endpoint should be defined as a HOSTNAME, with a valid (trusted) TLS certification, listening for an HTTP post over HTTPS. Its a simple JSON payload, that you will want to store somewhere. You have a few choices as to what to do with this data when submitted:

  1. Reject it outright. Maybe someone is probing you, or submitting malicious reports (malformed, buffer overflow attempt, SQL injection attempt, etc)? But these events themselves may be of interest…
  2. Store it:
    • In a relational database
    • In a No-SQL database
    • In something like ElasticSearch
  3. Relay it:
    • Via email to a Distribution List or alias
    • Integration to another application or SIEM

My current preference is to use a No-SQL store, with a set retention period on each report.

NEL in AWS API Gateway, Lambda and DynamoDB

The architecture for this is very simple:

Simple NEL endpoint with API Gateway, Lambda and DynamoDB

We have one table which controls access for reports to be submitted; its got config items that are persistent. I like to have a config for banned IPs (so I can block malicious actors), and perhaps banned DNS domains in the NEL report content. Alternately, I may have an allow list of DNS domains (possibly with wildcards, such as *.example.com).

My Lambda function will get this content, and then evaluate the source IP address of the report, the target DNS domain in the report, and work out if its going to store it in the Reports table.

When inserting the JSON into the report table, I’m also going to record:

  • The current time (UTC)
  • The source address the report came from

Here’s an example report that’s been processed into the table:

This is what the NEL report (Body) looks like:

{
   "csp-report":{
      "document-uri":"https://www.example.com/misc/video.html",
      "referrer":"",
      "violated-directive":"default-src 'none'",
      "effective-directive":"child-src",
      "original-policy":"default-src 'none'; img-src 'self' data: https://docs.aws.amazon.com https://api.mapbox.com/  https://unpkg.com/leaflet@1.7.1/dist/images/ ; font-src data: ; script-src 'self' blob: https://cdnjs.cloudflare.com/ajax/libs/video.js/ https://ajax.googleapis.com https://cdnhs.cloudflare.com https://sdk.amazonaws.com/js/ https://player.live-video.net/1.4.1/amazon-ivs-player.min.js https://player.live-video.net/1.4.1/ https://unpkg.com/leaflet@1.7.1/dist/leaflet.js https://unpkg.com/leaflet.boatmarker/leaflet.boatmarker.min.js  https://unpkg.com/leaflet.marker.slideto@0.2.0/ 'unsafe-inline';  style-src 'self' https://cdnjs.cloudflare.com/ajax/libs/video.js/ https://unpkg.com/leaflet@1.7.1/dist/leaflet.css ; frame-ancestors 'none'; form-action 'self'; media-src blog:; connect-src 'self' *.live-video.net wss://e2kww8wsne.execute-api.ap-southeast-2.amazonaws.com/production wss://boat-data.example.com ; object-src: self ; base-uri 'self'; report-to default; report-uri https://nel.example.com/",
      "blocked-uri":"blob",
      "status-code":0,
      "source-file":"https://player.live-video.net",
      "line-number":6,
      "column-number":63072
   }
}

And here is the Lambda code that is validating the reports:

import json
import boto3
import ipaddress
import datetime
import uuid
def lambda_handler(event, context):
    address = report_src_addr(event)
    if address is not False:
        if report_ip_banned(address) or not report_ip_permitted(address):
            return {
                'statusCode': 403,
                'body': json.dumps({ "Status": "rejected", "Message": "Report was rejected from IP address {}".format(address)})
            }
    if not report_hostname_permitted(event):
        return {
            'statusCode': 403,
            'body': json.dumps({ "Status": "rejected", "Message": "Reports for subject not allowed"})
        }
    report_uuid = save_report(event)
    if not report_uuid:
        return {
            'statusCode': 403,
            'body': json.dumps({ "Status": "rejected"})
            }
    return {
        'statusCode': 200,
        'body': json.dumps({ "Status": "accepted", "ReportID": report_uuid})
    }


def save_report(event):
    report_uuid =  uuid.uuid4().hex
    client_ip_str = str(report_src_addr(event))
    print("Saving report {} for IP {}".format(report_uuid, client_ip_str))
    response = report_table.put_item(
        Item={
            "ReportID": report_uuid,
            "Body": event['body'],
            "ReportTime": str(datetime.datetime.utcnow()),
            "ClientIp": client_ip_str
            }
        )
    if response['ResponseMetadata']['HTTPStatusCode'] is 200:
        return report_uuid
    return False


def report_ip_banned(address):
    fe = Key('ConfigName').eq("BannedIPs")
    response = config_table.scan(FilterExpression=fe)
    if 'Items' not in response.keys():
        print("No items in Banned IPs")
        return False
    if len(response['Items']) is not 1:
        print("Found {} Items for BannedIPs in config".format(len(response['Items'])))
        return False
    if 'IPs' not in response['Items'][0].keys():
        print("No IPs in first item")
        return False
    ip_networks = []
    for banned in response['Items'][0]['IPs']:
        try:
            #print("Checking if we're in {}".format(banned))
            ip_networks.append(ipaddress.ip_network(banned))
        except Exception as e:
            print("*** EXCEPTION")
            print(e)
            return False
    for banned in ip_networks:
        if address.version == banned.version:
            if address in banned:
                print("{} is banned (in {})".format(address, banned))
                return True
    #print("Address {} is not banned!".format(address))
    return False


def report_ip_permitted(address):
    fe = Key('ConfigName').eq("PermittedIPs")
    response = config_table.scan(FilterExpression=fe)
    if len(response['Items']) is 0:
        return True
    if len(response['Items']) is not 1:
        print("Found {} Items for PermittedIPs in config".format(len(response['Items'])))
        return False
    if 'IPs' not in response['Items'][0].keys():
        print("IPs not found in permitted list DDB response")
        return False
    ip_networks = []
    for permitted in response['Items'][0]['IPs']:
        try:
            ip_networks.append(ipaddress.ip_network(permitted, strict=False))
        except Exception as e:
            print("permit: *** EXCEPTION")
            print(e)
            return False
    for permitted in ip_networks:
        if address.version == permitted.version:
            if address in permitted:
                print("permit: Address {} is permitted".format(address))
                return True
    print("Address {} not permitted?".format(address))
    return False

def report_hostname_permitted(event):
    if 'body' not in event.keys():
        print("No body")
        return False
    if 'httpMethod' not in event.keys():
        print("No method")
        return False
    elif event['httpMethod'].lower() != 'post':
        print("Method is {}".format(event['httpMethod']))
        return False
    if len(event['body']) > 1024 * 100:
        print("Body too large")
        return False
    try:
        reports = json.loads(event['body'])
    except Exception as e:
        print(e)
        return False

    for report in reports:
        print(report)
        return True
        if 'url' not in report.keys():
            return False
        url = urlparse(report['url'])
        fe = Key('ConfigName').eq("BannedServerHostnames")
        response = config_table.scan(FilterExpression=fe)
        if len(response['Items']) == 0:
            print("No BannedServerHostnames")
            return True
        for item in response['Items']:
            if 'Hostname'not in item.keys():
                continue
            for exxpression in item['Hostname']:
                match = re.search(expression + "$", url.netloc)
                if match:
                    print("Rejecting {} as it matched on {}".format(url.netloc, expression))
                    return False
    return True


def report_src_addr(event):
    try:
        addr = ipaddress.ip_address(event['requestContext']['identity']['sourceIp'])
    except Exception as e:
        print(e)
        return False
    #print("Address is {}".format(addr))
    return addr


def parse_X_Forwarded_For(event):
    if 'headers' not in event.keys():
        return False
    if 'X-Forwarded-For' not in event['headers'].keys():
        return False
    address_strings = [x.strip() for x in event['headers']['X-Forwarded-For'].split(',')]
    addresses = []
    for address in address_strings:
        try:
            new_addr = ipaddress.ip_address(address)
            if new_addr.is_loopback or new_addr.is_private:
                print("X-Forwarded-For {} is loopback/private".format(new_addr))
            else:
                addresses.append(new_addr)
        except Exception as e:
            print(e)
            return False
    return addresses

You’ll note that I have a limit on the size of a NEL report – 100KB of JSON is more than enough. I’m also handling CIDR notation for blocking (eg, 130.0.0.0/16).

Operational Focus

Clearly to use this, you’ll want to push the Lambda function into a repeatable template, along with the API Gateway and DynamoDB table.

You may also want to have a Time To Live (TTL) on the Item being submitted in save_report() function, with perhaps the current time (Unix time) plus a number of seconds to retain (perhaps a month), and configure TTL expiry on the DynamoDB table.

You may also want to generate some custom CloudWatch metrics, based upon the data being submitted; perhaps per hostname or environment, to get metrics on the rate of errors being reported.

Summary

Hopefully the above is enough to get you to understand NEL, and help capture these reports from your web clients; in a production environment you may want to look at report-uri.com, in non-production, you may want to roll your own as above.