What does a 2nd AWS Region for Australia mean to the Australian IT Industry?

I was there launching the 1st AWS Region in Sydney in 2013 as an AWS staff member, and as the AWS Solution Architect with a focus (depth) on security meant that it was a critical time for customers who were looking to meet strict (or perhaps, even stricter) security requirements.

Back in 2013, the question was the US Patriot Act. That concern and question has long gone.

Subsequently came cost effectiveness. Then domestic IRAP PROTECTED status.

And back in 2013, secrecy was everything about a Region. We launched Sydney the day it was announced, as ready-for-service. This made recruiting operational staff, securing data centre space (or even building data centres), and having large amounts of fibre run between buildings by contractors, difficult to keep under wraps. These days, pre-announcements like this help ease the struggle to execute the deployment without the need for code names and secrecy.

So, what does the launch of a second AWS Region in Australia, with three Availability Zones, and the general set of services within AWS being present, going to mean to the domestic and international markets?

Proximity to customers & revenue

Let’s look at some population and revenue statistics for these few cities in Australia (and use NZ as a comparison):

LocalityPopulation 2018% population of AustraliaGross State Product 2018/19 (USD$B)
All of Australia25M100%1,462 (100%)
NSW (Sydney)7.5M (5.2M)30% (20.8%)482 (32%)
Victoria (Melbourne)6.3M (4.9M)25.2% (19.6%)344 (23%)
New Zealand (for comparison)4.8M19.2%209 (14%)
Population comparisons in AN/Z

So with a Melbourne Region launch, we see 55.2% of the Australian population in the same state as a Region, and 40.4% in the same city as a Region. This also represents being close to where 55% of the GDP of Australia comes from.

Moreover this is coverage for where the headquarters of most Australian national organisations are based, and typically their IT departments are helmed from their national HQ.

Where does Latency matter?

The main industry that will see the latency impact from the new Melbourne Region is probably the video/media production vertical. There’s a sizable video media production industry in Melbourne that will now not discount AWS for the 11ms or so that was previously seen to Sydney.

Of course, latency doesn’t imply bandwidth.

Melbourne has been a Direct Connect location for some time, with customers able to take a partial port, or a whole 1GB or 10 GB port, and multiples three of with Link Aggregation Control Protocol (LACP) to deliver higher throughputs.

But the latency remained. And thus the Big Fat Pipe Problem would be a consideration: the amount of data sitting IN the pipe since transmission, and before being confirmed as received on the other end. For some devices and TCP/IP stacks, as the bandwidth increases, this becomes a problem.

You canna change the laws of physics

Mr Scott, Enterprise

Then there are applications that make multiple sequential connections from on-cloud to on-premises. An application that opens perhaps 100 SQL connections to a remote database over 11ms latency in series will see three round trips of TCP/IP handshake, and perhaps a TLS 1.2 handshake of 3 round trips, for 6.6 seconds of wall time before any actual query data and response is sent and processed.

The death of Multi-Cloud in Australia

Despite extremely strong multiple-Availability Zone architectures (see the Well-Architected principles), the noise of “multi-Region” has escalated in the industry. From the AWS-customer perspective, multi-cloud has become recognised as a “how about me” call from desperate competitors.

Of course the complexity of multi-cloud is huge, and not understood by most (any?) CIO who endorses this strategy. It’s far better to get the best out of one cloud provider, rather than try and dilute your talent base across implementing, maintaining and securing more than one.

However, some industry regulators have demanded more than one Region for some workloads, again mostly as a lack of understanding of what a strong Well-Architected design looks like.

With this announcement, multi-Region domestically within Australia will be a reality.

But we’re a Melbourne based infrastructure provider, we’re local!!

Sorry, time’s up.

You’re about to lose your customers, and your staff to an unstoppable force, with easy on-boarding, pay as you go, no commitment required terms.

It’s self-service, so there’s no cost of sale.

It’s commodity.

And it’s innovating at a clip far faster than any infrastructure organisation around. There’s almost nothing special about your offering that isn’t faster, cheaper and better in cloud. It’s time to work out what happens when 1/2 your staff leave, and 1/2 your customers are gone.

Getting customers to sign a 5 year contract at this point is only going to sound like entrapment.

Where next in Australia & NZ?

There’s a lot to consider when planning an AWS Region.

First, their are tax & legal overheads in establishing multiple entities in a country to implement, operate and own infrastructure. That means that even if New Zealand was next in line by population, or GDP, it may fall to another location in Australia to follow Melbourne.

And while the state of Queensland may look like it’s third, the latency it already has between Brisbane and Sydney of around 17ms may be outweighed by the fourth in the pack, Western Australia at 50ms.

Lots of variables and weightings to consider. And despite all of this, we have some time to see what the customer cost for AWS resources will be in the new Melbourne Region when it becomes available.

UniFi: Should I wait for the next DreamMachine Pro?

I switched to a 1 Gb/s NBN connection a few months back, but it soon became apparent that the original Unifi Security Gateway (USG) is no match for 1 Gb/s link.

While I love the Wifi access points, the management interface, and the rapid firmware updates, throughput limitations of the USG only became noticeable when the link speed went up. Ubiquity Networks, the manufacturers of the Unifi range, has released faster products – the throughput being the major selling point. And of course, the pricing goes up accordingly.

But in thinking of the current top-of-the-line device, the DreamMachine Pro, it kind of gives me some pause for consideration.

The device has two “WAN links”; one is an RJ45 gigabit Ethernet port, the other is a 10Gb/s SFP slot for a fibre GBIC. I’d love to have a fail-over Internet connection, but the fibre connection isn’t an advantage to me.

Ubiquiti are not selling their LTE fail-over device in Australia. I’d have to drop the 10 Gb/s SFP port back to a vanilla 1 Gb/s RJ34 copper port to plug into an alternate LTE device. But then again, carrier plans for this pattern are expensive.

However, it could be that I have two RJ45 Internet connections; my NBN connection affords me up to 4 ISP connections in the fiber-to-the-premesis that I have available. Now, the upstream link from my CPE to the Point of Interconnect (PoI) may be limited to 1 GB, but having the ability to fail-over to another ISP may be useful. Or I may ant to route traffic by port or Service to a different link (eg, VPN traffic over Link #2, or Web Traffic over Link #2, or perhaps just streaming video from some specific providers over alternate link.).

The Dream Machine also has a built in 8 port switch (EJ45), but none of the ports are Power over Ethernet (POE). After all, the majority of links going in to this are going to be WiFi access points directly, and linking an 8 port PoE switch in here seems a waste. A long tail of customers would find this fills their needs without having additional switches to worry about.

I would also have expected more ports here, given the cost of the device: say 16 ports, even if only half of them were PoE.

The inclusion of Protect for video cameras is a neat idea, but having two local disks to RAID together would be nice. I have shied away form on=premise storage, but for large volumes of video, I still like having the highest bandwidth version not traversing the WAN. So Its great we have one disk option available, but it could be so much more awesome if we just had some local resilience.

Of course, if Japan has residential 2 Gb/sec Internet connection, then would this device still be usable? I’m guessing Australia will max out on 1 Gb/sec for a while…

So, trying to decide if I dive in for the current DreamMachine Pro, or wait until it’s tweaked…..

“Well Maintained”

I touched on this in a recent article, but I wanted to dive deeper on this.

AWS makes much about Well-Architected principles, something I worked on the early stages of circa 2013, and applied to $work. I strongly recommend anyone deploying to any cloud provider think about these principles and their responsibility in implementing these, or in ensuring they are implemented.

Around the same time (2013/2014), the term DevOps and the rise of CI/CD pipelines was also coming to the fore. Looking back, the biggest advantage that Well-Architected lent on DevOps for was the ability to make rapid, incremental improvements to an architecture.

Poor architectural implementations traditionally went unchanged during he lifecycle of the deployed solution. Poor software would eventually be replaced in a follow up project; replaced with massive fanfare and budget.

So while Well-Architected starts a project, the concept of Well Maintained is the constant re-application of Well-Architected to a workload post go-live. It’s also the rapid adoption of software patches throughout the stack: the database version, the SDKs and libraries in the code base, the uplift of runtime versions (such as Java 8 -> 11, and beyond), the enabling of new TLS protocols and sun-setting of the old (TLS 1.3 turned on, TLS <= 1.1 turned off at this time).

A project that always adopts the current version of SDKs, and is always in good compliance with current best practice over time is Well Maintained. It’s almost evergreen. Its ages well – in the fact it doesn’t really age.

How can you tell if something is Well-Maintained?

Check the versions of its components. Dive deep. Find out what prevents you from updating these items. Find the known vulnerabilities in the versions between what your project has now, and the current released version.

Integrating McAfee ESM SIEM and CloudTrail

McAfee Enterprise Security Manager (ESM) is a Security Information and Event Management (SIEM) product that helps collect and then correlate events from various data sources (McAfee info).

Gartner rates it in their Magic quadrant, shown here from 2019:

Gartner Magic Quadrant for SIEM, 2019

I was working with a customer recently who wanted to ingest their AWS CloudTrail logs into this product.

McAffee have implemented support for this, using an IAM Access and Secret key, reading from an SQS Queue, and fetching the referenced files from S3.

AWS CloudTrail itself supports sending a notification to the Simple Notification Service for each log file delivered; it is left as an exercise to the customer to plumb the rest together. It’s not hard, and can be put into a CloudFormation template. Here it is: Download the CloudFormation Template.

Lets pull apart the simple architecture that is wrapped up in the above linked template.

Parameters

The parameters to a template are the only items you likely need to customise to your environment.

  1. The S3 Bucket name that you have CloudTrail being deployed in to (from the Trail config)
  2. The SNS Topic name that you have CloudTrail sending file delivery notifications sent to (from the Trail config)
  3. The Name of the new SQS Queue to be made that will catch the notifications, and buffer them for McAfee to then read. You can probably leave this as the default.
  4. The name of the IAM User to be created to permit an off-cloud SIEM to have API credentials. You can probably leave this as McAfeeSIEM.
  5. The Public IPv4 address range that requests will originate for your ESM making its outbound API calls to AWS.
  6. Your AWS Organizations “Organization ID”. Its a string that starts with “o-” that is in the prefix for the files appearing in S3 from CloudTrail.

Inside the template

The resources created in the template are the guts of what gets created on our behalf.

The SQS queue is configured with long polling enabled, to reduce the number of polls being made in case McAfee tries a tight continual loop, but also so that when the queue is already drained, the McAfee will remain connected for up to this duration to immediately get the message to fetch a file.

An SQS Policy is added to the Queue to permit the selected SNS Topic to publish messages to the queue, and then a Subscription to hook these together is defined.

Lastly, an IAM user is created, with a policy that permits access to process messages from the queue (read and delete messages, plus a few other APIs that McAfee documented), as well as access to List the contents of the target bucket, and Get the objects within the defined prefix.

Admin actions after deploying the template

With this CloudFormation stack deployed, go to the IAM console, find the new IAM User (McAfeeSIEM was the default), go to the Credentials tab, and issue an Access key pair. Take care to record the secret key – this is the only time you’ll see this; if you lose it, then start a key rotation to get a fresh Access/Secret key pair.

On McAfee SIEM, insert the access key and secret key into the AWS CloudTrail config. If your SIEM has outbound Internet access (possibly via a proxy) then this should start to fetch messages form the SQS Queue and process files.

You can look for the number of messages in the SQS queue as a help to debug: if the queue is non-zero (and growing) then your SIEM is probably not fetching and clearing its Queue messages.

Evolution of Compute: Physical to Serverless

Unless you’ve been under a rock, you’ve seen the impact that Hyperscale Public Cloud has made on the IT industry. Its invention wasn’t to be a thing, but to be a continually evolving, improving thing.

And while many organisations will use SaaS platforms, those platforms themselves often run atop the IaaS and PaaS platforms of a hyperscale cloud platform.

One person’s SaaS is another person’s IaaS.

Me, James Bromberger

But its worth just checking on the evolution of IT service delivery at a low level, for not everyone who is in the IT industry has seen what that looks like at this time.

Evolution of service delivery from Physical servers to Serverless.

Change is hard. Humans are bad at it. I’ve seen many who evolved from column 1 to column 2, and have felt they are “done”. They aren’t on-board for the next wave of the evolution.

I suffer from this too. But three is a short cut that I can offer: try to jump from where you are now, to as far to the right as you can in one step.

Every one of these phases is a monumental shift in the way that services are delivered, requiring training, and experience. There is an overhead knowledge baggage that engineers take with them, trying to work out what functions the same as before, and what is different. This is taxing, stressful, and unpleasant.

So rather than repeat this process in sequence, over years for each change, my recommendation is to see how far to the right you can jump. Some limitations will crop up that prevent you from leap-frogging all the way to Serverless, but that’s OK. Other services will not be thus constrained.

Well Architected, meet Well Maintained

In 2012, the Well Architected concept was born inside AWS. It is a set of principles that helps lead to success in the Cloud; at that time, that was the AWS EC2 environment. It’s well worth a read if you have not seen it. At this time, its also been adopted my Microsoft for the Azure environment as well.

However, I want to move your attention from Architecture time, to operations time.

If you look at the traditional total life-cycle activities, there’s a lot of time and effort spent learning, adjusting, and implementing supporting technologies that are starting to become invisible in the Serverless world.

Lets look at the operational activities done in a physical environment, and compare that to Serverless. I’ll skip the middle phases of evolution as shown above:

ActivityPhysicalServerless
Physical securityRequiredManaged
Physical installationRequiredManaged
Capacity PlanningRequiredManaged
Network switchingRequiredManaged
Hardware power planningRequiredManaged
Physical coolingRequiredManaged
Hardware procurementRequiredManaged
Hardware firmware updatesRequiredManaged
OS installationRequiredManaged
OS patchingRequiredManaged
OS upgradeRequiredManaged
OS licensingOften RequiredManaged
Runtime selectionRequiredRequired
Runtime minor patchingRequiredManaged
Runtime major version upgradeRequiredRequired
App server selectionRequiredManaged
App server minor patchingRequiredManaged
App server major version upgradeRequiredManaged
Code base maintenanceRequiredRequired
Code base 3rd party library updates (SDKs)RequiredRequired
Network encryption protocol and cipher upgrades (TLS, etc)RequiredRequired

As you can see, a large number of activities that should be done regularly to ensure operational excellence. However, I am yet to see a traditional physical environment, or virtualised on-prem environment that actively does all of the above well.

It’s an easy test: wander into any Java environment, and ask what version of the Java runtime is deployed in production. The typical response is “we updated to Java 8 two years ago“. What that means if “we haven’t touched the exact deployed version of Java for two years“.

Likewise, ask what version of Windows Server is deployed? Anything older than 2016 (even that, with 2019 has been out for nearly 2 years at this time is generous) shows a lack of agility and maintenance.

I challenge those in IT operations to think through the above table and check the last time their service updated each row – post project launch. If its a poor show, the change is your in “support mode”, and not “DevOps Operations”.

So what can be done to help do this maintenance?

Take it away. Stop it. While it can be argued to be important, and interesting, you’re possibly better off spending that effort on the smaller list that remains in a Serverless environment.

Evolution Continues

We can’t see where this evolution will go next. We do see that identity, authentication, authorisation, in-flight encryption, remain key elements to be aware of.

What comes next, I can’t predict. I know many ideas will be thrown about, new or recycled, and some will work, while others will wither and disappear again.

The only constant in life is change.

Heraclitus, b. 565 BC