Is it safe to move to The Cloud?

I try and stay as up-to-date with all things Cloud, and have done for the better part of a decade and a bit. But I recently came across a social media post entitled “Is it safe to move to the cloud?“, and with this much experience, I had so many immediate thoughts, that this post thus precipitated.

My immediate reaction was “Is it safe to NOT move to The Cloud?“, but then I thought about the underlying problems with all digital solutions. And the key issue is understanding TCO, and ensuring the right cost is being endured over the operating time of the solution, rather than the least cost as is so typical.

The truth is that with digital systems, things change all the time. And if those systems are facing untrusted networks (such as the Internet), or processing untrusted data (such as came from humans) then there are issues lurking.

Let me take a moment to point out, as an example, any Java implementation that used the very popular Log4J library to handle error messages. Last December (2021) a serious vulnerability arose that meant that if you logged a certain message, then it would trigger an issue. Quite often error messages being raised include the offending input that failed validation or caused an exception, and thus, you could have untrusted data triggering a vulnerability via this (wildly popular and heavily used) library.

It’s not that anyone had done anything bad on purpose. No one had spotted it (and reported it to the developer of the library) earlier.

Of course, the correct thing happened: an updated version of this library was released. And then other vendors of solutions updated their products that included this newer version of the Log4J library. And then your operations team updated your deployment of this application.

Or did they.

There’s a phrase that fills me with fear in IT operations: “Transition to Support“. It indicates we’re punting the operational responsibility of the solution to a team that a did not build it, and do not now how to make major changes to the application. We’re sending to to a team that already look after other digital solutions, and adding one more thing to their work for them to check is operational, and for them to maintain — which, as they are often overwhelmed with multiple solutions, they do the simplest thing: check it is operational, not that it is Well Maintained.

Transition to support: the death knell for Well-Maintained systems

James Bromberger

I’ve seen first hand that critical enterprise systems, line-of-business processing that is the core of the business, is best served when the smart people who built it, stay to operate it in a DevOps approach. This team can make the major surgical changes that happen after deployment, and as business conditions and cyber threats change.

The concern here is cost. Development teams cost more than dumping large numbers of systems on under staffed Support teams. Or support gets sent offshore to external providers who may spend 30 seconds checking the system works, but no time investigating the error messages and their resolution that may require a software update.

It’s a question of cost.

A short-term CIO makes their hero status by cutting costs. Immediately this has only a positive impact on the balance sheet. But as time goes on, the risks of poor maintenance goes up. But after the financial year has ended, and short term EBITDA shows massive growth, and a heroes party is given for the CIO, they then miraculously depart for another job based on the short term success.

Next up, the original company finds that their digital solution needs to be updated, but there is no one who understands it to make such a change.

The smart people were let go of. They were seen as a cost, not part of the business.

So lets rephrase the question: “Is it safe to move to the cloud with your current IT management and maintenance approach?” Possibly not: you probably have to modify the way you do a lot of things, including how you structure your teams and Org Unit. You may need to up-weight training for teams who will now take on full responsibility for workloads, instead of just being “the network guy”. But this is an opportunity; those teams can now feel that THEY are the service team for a workload that supports something more substantial than just rack-and-stack of storage. Moving to separate DevOps teams per critical workload, you can then have them independently innovate – but collaborate on standards and improvements. a friendly competition on addressing technical debt, or number of user feature improvements requested – and satisfied.

So is it safe to move to the Cloud? It depends on who is doing, how much knowledge and experience they have, and what happens next in your operating model.

The Cloud is not just another data centre. And TCO isn’t just cloud costs, and it isn’t just people cost. Sometimes the cost is the compliance failure and fine you get by inadvertently removing the operating model that would have prevented a data breach.

Its been 7 years since I (and my colleagues at Ajilon/Modis, soon to be Akkodis) moved the Land Registry of Western Australia, the critical government registry of property ownership of the state, into the AWS Cloud for Landgate. We’ve kept a DevOps approach for the solution – ensuring it was not just Well-Architected, but Well Maintained. It’s a small DevOps crew now that ensure that Java Updates, 3rd party library updates and more get imported, but also maintenance of the Cloud environment such as load balancing, virtual machine types & images (AMIs) get updated, managed relational database versions get updated, newer TLS versions get supported and — more importantly — older versions get deprecated and disabled. FinOps, DevOps, and collaboration.

AWS RDS Goes Dual-stack: IPv4 and IPv6

I’ve spoken of the IPv6 transition for many, many years. Last month I gave a presentation at the AWS User Group (Perth) on this, and included a role play on packets through the network.

Earlier in 2022 we saw AWS VPC support IPv6-only subnets, a great way to scale out vast numbers of instances with 18 billion billion addresses per subnet. Today sees one of the most commonly used services with virtual machines – managed databases via the Relational Database Service – finally get its first bit of IPv6 support!

When creating a database, you now have a new option as shown here:

AWS Console Wizard for starting an RDS instance

It’s worth noting that, the Database Subnet defined in RDS can (at this point in time) select subnets that are either IPv4 only, or dual stack IPv4 and IPv6. To put this clearer, RDS is not (yet?) supporting IPv6 only deployment.

But that’s a small limitation. The power of scale-out of application servers in vast subnets can now natively talk to a dual-stack deployed RDS Instances using IPv6 as the transport protocol. No other proxies or adaptors or work-around required.

Of course, there’s more managed AWS services to even get this far – ElasticCache, for example, or even IPv6 as first class (eg: CloudFront origin fetch).

This is incremental improvement.

AWS Local Zones expansion 2022

AWS recently made a bold announcement; at re:invent in specified a few countries it planned to open Local Zones in, but last week it revealed some 32 locations, including Perth, Brisbane, and Auckland

Perth is isolated by the vast distances between east and west coast of Australia – 2044 miles, similar distances to the continental United States between DC and LA (2200 miles), or London to Moscow (2500 miles). The Round Trip Time (RTT) of packets online is around 50ms, which for many applications is not immediately noticeable.

But for some time-critical workloads, its a deal breaker.

Local Zones offer a very cut down version of an AWS Region, targeting compute workloads that use a virtual machine Instance. First available in Japan, there are currently 16 in service; this recent announcement of 32 more will make 48 Local Zones.

While many have become familiar with AWS, the minimal viable product of a Local Zone may leave some confused: the options at your disposal are listed here.

Local Zone attachments

Local Zones are attached to a host Region. In the case of the announced Perth Local Zone, the API designation for this indicates this will be linked to the yet-to-launch Melbourne Region.

When it comes to load balancing within the Local Zone, typically only Application Load Balancing (ALB) is available. That’s perfect for HTTP based workloads with multiple local application servers, but if you’re looking to then add a managed RDS database behind that, you’ll be reaching back to the host Region. Same for SQS, SNS, and most everything else.

Instance types will also be limited, typically focusing on a subset of the latest general purpose families; this is likely to be true of the Elastic Block Store (EBS) volumes, where until now, GP2 (General Purpose SSD) has been the primary option.

When it comes to networking, it appears that Local Zones do not yet support IPv6 dual-stack addressing, as shown in the Console option for defining a subnet with the current Oregon/Los Angeles Local Zone:

IPv4 only subnet creation in Oregon/LA

So, what would benefit from Local Zones? Well architectures with local access direct to instances, that perhaps transform and validate requests on the edge, or perhaps cache responses at the edge before forwarding more efficient queries across the “VPC-internal” connectivity to the host Region. Another use case may be local EC2 Windows Instances, where the reduced latency may make RDP access a seamless desktop experience.

Perhaps some Local Zones will supplant the need for on-premesis Outposts deployments.

Perhaps over time more architectural patterns will come about, and more services will start to make their way into the common Local Zone implementation. Some Local Zones may grow to become full Regions, as happened with the original Osaka (Japan) Local Zone.

Regardless of the way it ends up being used, the expansion is a massive step up in the globally deployed infrastructure.

Stronger SSH Keys for EC2

For those not familiar, SSH is the Secure Shell, an encrypted login system that has been in use for over 25 years. It replaced unencrypted Telnet for remote (text) terminal connections used to access (and administer) systems over remote networks.

Authentication for SSH can be done in multiple ways: simple passwords (not recommended), SSH Keys, and even MFA.

SSH keys is perhaps one of the most common ways; its simple, free, and relatively easy to understand. It uses asymmetric key pairs, consisting of a Private key, and a Public Key.

Understandably, the Private key is kept private, only on your local system perhaps, and the Public key which is openly distributed to any system that wishes to give you access.

For a long time, the Key algorithm used here was the RSA algorithm, and keys had a particular size (length) measured in bits. In the 1990s, 128 bits was considered enough,but more recently, 2048 bits and beyond has been used. The length of the key was one factor to the complexity of guessing the correct combination: fewer bits means smaller numbers. However, the RSA algorithm becomes quiet slow when key sizes start to get quite large, and people (and systems) start to notice a few seconds of very busy CPU when trying to connect across the network.

Luckily, a replacement key algorithm has been around for some time, leveraging Elliptical Curves. This article gives some overview of the Edwards Curve Elliptical Curve for creating the public and private key.

What we see is keys that are smaller compared to RSA keys of similar cryptographic strength, but more importantly, the CPU load is not as high.

OpenSSH and Putty have supported Edwards curves for some time (as at 2022), and several years ago, I requested support from AWS for the EC2 environment. Today, that suggestion/wish-list item has come to fruition with this:

Amazon EC2 customers can now use ED25519 keys for authentication with EC2 Instance Connect

AWS has been one of the last places I was still using RSA based keys, so now I can start planning their total removal.

  • Clearly generating a new ED25519 key is the first step. PuttyGen can do this, as can ssh-keygen. Save the key, and make sure you grab a copy of the OpenSSH format of the key (a single line that starts with ssh-ed25519 and is followed by a string representing the key, and optionally a space and comment at the end). I would recommend having the Comment include the person name, year and possibly even the key type, so that you can identify which key for which individual.
  • You can publish the Public Key to systems that will accept this key – and this can be done in parallel to the existing key still being in place. The public key has no problem with being shared and advertised publicly – its in the name. The worse thing that someone can do with your public key is give you access to their system. In Linux systems, this is typically by adding a line to the ~/.ssh/authorized_keys file (note: US spelling); just add a new line starting with “ssh-ed25519”. From this point, these systems will trust the key.
  • Next you can test access using this key for the people (or systems) that will need access. Ensure you only give the key to those systems or people that should use it. Eg, yourself. When you sign in, look for evidence that shows the new key was used. For example, the Comment on the key (see point 1 above) may be displayed, such as:
  • Lastly you can remove the older key being trusted for remote access from those systems. For your first system, you may one to leave one SSH session connected, remove the older SSH key from the Authorized Keys file, and then initiate a second new connection to ensure you still have access.

Now that we have familiarity with this, we need to look at places where the older key may be used.

In the AWS environment, SSH Public Keys are stored in the Amazon EC2 environment for provision to new EC2 instances (hosts). This may be referenced and deployed during instance start time; but it can also be referenced as part of a Launch Configuration (LC) or Launch Template (LT). These LCs and LTs will need to be updated, so that any subsequent EC2 launches are provisioned with the new key. Ideally you have these defined in a CloudFormation Template; hence adjusting this template and updating the stack is necessary; this will likely trigger a replacement of the current instances, so schedule this operation accordingly (and test in lower environments first).

There’s no sudden emergency for this switch; it is part of the continual sunrise and sunset of technologies, and address the technical debt in a systematic and continual way, just as you would migrate in AWS from GP2 to GP3 SSD EBS volumes, from one EC2 instance family to the next, from the Instance MetaData v1 to v2, and or from IPv4 to dual-stack IPv6.

Gartner Magic Quadrant for Cloud Database: 2021 v 2020

In December, Gartner produced another one of their Magic Quadrants comparing the offering from various Cloud service providers focusing on their database offerings. While its like reading tea leaves, its interesting to see the jostling of the players, the new departures who are excited (funded) enough to run an analyst relations ream, and those who are dropping out.

You can get a copy of the current report from Gartner, AWS, or the 2020 version from Google.

Here’s a mash up comparing the two years; the darker navy blue is 2021, and the lighter blue dots are 2020.

New to this in 2021 are:

  • Intersystems
  • MariaDB
  • Single Store
  • Exasol
  • Cockroach Labs

Leaving the magic quadrant in 2021 are Tencent.

Much improved are AWS and Microsoft who continue to lead – these two are now ranked neck and neck, with Oracle sitting behind them (but also improved). IMHO, those increasing in position are SAP, Teradata, Snowflake, Databricks an Cloudera, and even Huawei.

At the same time, relative to the others in this list, are two that are dropping in comparison: Redis and Marklogic – but only slightly.