5 AWS Trends and Wishes for early 2019

AWS is the largest public Cloud provider in the world, and it is constantly evolving at a rapid clip, and using the scale of its service to reap the benefits from the economies that can be brought to bear at that scale.

The IT industry is itself evolving, with new patterns, protocols, and approaches being created in and out of the cloud. AWS is well placed to embrace many of these trends; things like WebSockets, IPv6, and more. But not everything is “done”in AWS; it’s all a continuous work-in-progress to stay current; but AWS’s approach (independent Service Teams, loose coupling, well-documented API interfaces)  and track record puts it far ahead of the competition in the race to stay current.

I’ve been using AWS for >10 years now, hold 8 AWS Certifications at this point in time, served nearly 3 years as the only Solution Architect with a “depth” in Security for Australia & New Zealand, have been a Cloud Warrior for 2 years, and now an AWS Ambassador. I’ve developed and delivered critical government solutions in Australia that the entire population depends upon every day, so have a reasonably deep understanding of the requirements that organisations have around their digital systems. With nearly 20 years as a Debian Linux developer, and >20 years delivering online services, my experience puts me in reasonable position to understand the ecosystem.

Here’s a list of things I foresee becoming commonplace in early 2019:

  • Organisation CloudTrail: enforcing company wide API logging standards, leading to better analysis of CloudTrail logs and the activity they expose
  • Enforced patterns around serving static content via S3: blocked public access by default, enabled only by CloudFront and Origin Access Identity to serve content stored in S3. side effect: appropriate TLS Certificates, and TLS Protocol and Cipher enforcement.
  • Virtual Private Cloud: enforced company-wide standards on routing: Transit Gateway from a corporate “production services”account”, once DirectConnect is supported by Transit Gateway
  • CloudFront and ALB set to HTTPS only (possibly with HTTP-> HTTPS redirect), with TLS 1.2 only!

5 Things I’d still like to see in AWS:

  • Improved health checks for Network and Application Load Balancers, similar to the existing ELB (Classic).
  • ECDSA certificates from Amazon Certificate Manager
  • TLS 1.3 on ALB, CloudFront, and the ability to restrict TLS Protocols to TLS 1.2+, or TLS 1.3+.
  • VPC: IPv6-only comms for intra-VPC services (RDS, ElastiCache, ALB/ELB, RedShift, etc.), IPv6-only subnets leading to IPv6-only VPCs, helped by service discounts for adopting IPv6-only
  • In Australia: AWS finally added to the ASD Protected Cloud list, without a Consumer Guide!

None of these are surprises to those who have extensively used AWS and hold those valuable AWS certifications.  These items don’t preclude your immediate extensive usage of the Cloud; they present visibility of the continuing evolution that is required in IT.

AWS Re:Invent: rest of the releases

Well, that was busy week. It was almost impossible to keep up with the announcements; an overwhelming feeling of something akin to playing Tetris as announcements poured down faster than I could read, understand and appreciate them.

So, having got past day 1, here’s the rest of what I think of what happened next:

DynamoDB Transactions and on-demand

ACID Compliance (atomicity, consistency, isolation, and durability) was always one of the constraints that those new to NoSQL were always trying to understand. For some workloads it was OK to move this validation to the user space (app server) for others, not so much.

On-demand DynamoDB removes the need to set sharding requirements, and let DynamoDB scale (and charge) as required from usage patterns.

CloudWatch Logs Insights

When I first saw this console, it just yelled “This is Sumo Logic” at me.

Outposts

For many years, infrastructure being delivered to a Region was a pre-configured rack with all equipment ready to run. This release effectively shifts the delivery address from “AWS Region” to a customers data centre. It means there is a new channel for delivery of the equipment, and thus produces more scale, and ultimately, drives down cost further.

But who still wants to run data centres? The compliance, maintenance, physical security are all very compelling. Plus, an on-premise deployment has maintenance, and capacity limits that are way lower than the Region.

S3 Glacier Deep Archive

From Glacier with infrequent retrieval, to a deeper retention – Deep Archive requires data retention for half a year, and is a 12 hour restore. But the benefit is a huge price savings: US$1/TB/month (yes, Terabyte). That’s US$0.0009765/GB/month – so about time we changed units of measurement to the TB. Compare to Azure blob storage at US$0.002/GB/month (US$2.048/TB/month) , that’s less than half the cost .

When combined with some sensible data work-flows for backups, you’ll save a ton of money. But the biggest win will be when 3rd party backup solutions can instrument this themselves automatically. For example, the last 7 days of backup may sit on S3 Standard durability, and then get migrated to Glacier for 3 months, and then Deep Glacier after that.

Using the above tiering, lets do a 2 TB full backup once per week, and a 100 GB daily incremental. We’ll take 2.6 TB in S3 Std, then 11 weeks of S3 Glacier at 26 TB, and then 9 months of S3 Glacier Deep Archive for 93.6 TB. Sum total monthly cost is = 2.6 * 1024 * 0.025 + 26 * 1024 * 0.005 + 93.6 * 1 = 66.56 + 133.12 + 93.6 = US$293.28 / month = US$3,519.36/year assuming a one year retention.

If we had kept this all onS3 standard durability, then we would have been looking at US$37,539.84/year.

So, who’s going to make the first move? CommVault? Synology? StoreSimple? Storage Gateway VTL?

Managed Blockchain

Previously AWS had said it didn’t want to run a managed Blockchain service, saying no company should sit at the centre of this, but customer demand wins over this: and now two services filling the space: Blockchain as a Service, and the Quantum ledger database service.

Both of these are interesting to me, and I’ll be speaking with customers to see if they want us to integrate this into their solutions. Neither will replace using a relational database for temporal processing, state, etc. But for point in time authoritative signed data, they look interesting.

Textract

This one requires some testing. I’ve previously looked at Mechanical Turk for doing human-intelligence level OCR, but as a service this may be better. Any process that does text extraction should have a multi pronged approach to ensure accuracy; so perhaps a pass of Tectract, followed by a pass of Mech Turk (or other Humans), and then if there is a conflict/mismatch, flag for management inspection….

Security Hub

This is huge for me, and one I am actively getting my head around before recommending into customer environments. Its also enthused me to get back to AWS Config, which I’d previously discounted on cost.

Security Hub united several AWS security services. Each of these have had their own interface, cross-account capabilities, etc. Of course, for me, and my Public Sector customers, the lack of Macie in Australia is still a consideration here.

AWS Organisational CloudTrails

I’ve been a fan of CloudTrail since I first heard of it. The fact that it could always deliver API logs across-account – to a dedicated security account. without any fear or possibility of it being filtered or edited by the source account was a key enabler in enterprise workloads.

Its developed well since its initial launch, with multi-region support, digests files to detect tampering and more. But with all these options came the possibility of inconsistent deployments across a large fleet of accounts.

And while my perception has always been consistency, its only after circling back that you realise that not everything is consistent, with new AWS accounts being added at different times.

It is only with me starting to play with Config and Security Hub (see above) that these inconsistencies have come to light; and the new solution to this is just in time: Organisation Trails, that apply from the Billing/Organisation account, down to all dependent accounts.

An Organisation trial in a dependent account cannot be deleted or modified. They can log cross-account almost the same previous implementation – with the exception of  a few new Permissions required on the destination S3 Bucket policy.

Lambda Ruby, BYO Runtime, and Firecracker

Firecracker is a strong story, but in the end, having a manage environment for it is worth it if I can do so (ie, if latency, sovereignty, etc can be met). What will be interesting is the opportunity for more eyes to review it’s source code.

FSx (Luster & Windows Fileshare)

Managed file shares sound great, but now there’s confusion between EFS and FSx (and to some degree, Storage Gateway as an NFS and CIFS file share).

And much more

I wont go into detail on the large list of other services; my interest is the vast majority of web, security and DevOps-enabling services that continue to incrementally improve. But what happens next is interesting.

Config revisted

When first launched, I got bill shock form turning Config on with just a few rules. But now its much richer, and easier to understand. As it is one of the security tools feeds across into Security Hub, its forced me to circle back to Config and start re-evaluating some of its rules. Its come a long way, and much of the tooling I have written myself in the past to do cross-account checks, which Config also does, can now feed via Security Hub back to a central (organisatoin-wide) interface for alerting and actioning.

Summary,

With some 50,000 people at re:Invent this year, the pace of innovation continues to put AWS far ahead of its competitors.

AWS Re:Invent Day 1 thoughts

This is going to be a long week of learning how the world has changed. I’m already tired, and I’m not even there. My brain hurts (you’d not believe how many typos I am correcting here).

While (once again) I am not at Re:Invent in Las Vegas, Nevada, I’m tuned in to as many news sources as possible to try and catch what parts of the undifferentiated heavy lifting has changed. I’ve been one of the AWS Cloud Warriors for the last two years (2017-2018), which has been lucky enough for me to be given a conference ticket, but unfortunately I’ve not been able to get there.

While I may not be physically there, I am in spirit, having been nominated as one of the AWS Ambassadors.

However the live stream video (which has improved dramatically since 2014), the Tweets from various people, the updates on LinkedIn, RSS feeds, Release Notes, What’s New page, AWS Blog (hi Jeff), and indeed, the Recent Changes/Release History sections of lots of the documentation pages (such as this Release History page for CloudFormation) have given me more information to trawl through.

It’s now Tuesday night in Perth, Western Australia, and day two of Re:Invent but its only 7am Tuesday morning in Las Vegas (yes, I’m 16 hours in the future). Here’s my thoughts on the releases thus far:

100 GB/s networking in VPC

The ENA network interface was previously limited to 25 Gb/sec per instance on the largest instance types. Indeed, its worth noting that most network resources are limited to some degree by the instance size within an instance family. But now a new family – the C5n instances – have interfaces capable of up to 100 Gb/sec (that’s 12.6 GB/s – little b is bits, big B is bytes).

Much has been said about network throughput, and the comparison between ENA and SR-IOV in the AWS Cloud, and comparisons to other Cloud environments. 100 Gb/s now sets a new high bar that other vendors are yet to reach.

While its wonderful to have that level of throughput, its also worth noting that scale-out is still sometimes a good idea. 100 instances at 1 GB/s each may provide a better solution sometimes, but then again sometimes a problem doesn’t split nicely between multiple server instances. YMMV.

Transit gateway

Managing an enterprise within AWS usually a case of managing multiple AWS accounts. The ultimate in separation from a console/account level sometimes reverts to integration questions around network, governance and other considerations.

In March 2014 (yes, 4 and a half years ago), VPC service team introduced VPC Peering, a non-transitive peering arrangement between VPCs – non-throttling, no single point of failure way of meshing two separate VPCs together (including in separate accounts).

This announcement now gives a transitive way (hence the name) of meshing a spread enterprise deployment. There’s multiple reasons for doing so:

  • Compliance: all outbound (to Internet) traffic is deemed by your corporate policy to funnel via a centralised specific gateways.
  • Management overhead: organising N-VPCs to mesh together means creating (N-1)*N/2 peering arrangements, and double that number for routing table entries. If we have 4 environments (dev, test, UAT and Production), and 10 applications in their own environments, then that’s 40 VPCs, and 780 peering relationships and 1560 routing table updates.

Its worth noting that in some organisations, an accounts administrative users may themselves not have access to create an IGW for access to the internet; a Transit gateway may be the only way permitted for connectivity so it can be centrally managed.

But in taking central management, you now have a few considerations:

Blast radius. If you stuff up the Transit gateway configuration, you take down the organisation. With separation and peering, each VPC is its own blast radius.

  1. Cost: Transit gateway isn’t free. You probably want to permit S3 Endpoints for large volume object storage
  2. Throughput: 50 Gb/s may seem a lot, but now there are 100 Gb/s instances

ARM based A1 Instances

In 2013, when I worked at AWS, I spoke with friends at ARM and AWS Service teams about the possibility of this happening. The attractiveness of the reduced power envelope, and cost comparison of the chip itself made it already look compelling then. This was before Windows was compiled to ARM – and that support is only strengthening. Its heartening now to see this coming out the door, giving customers choice.

Earlier this month we saw an announcement about AMD CPUs. Now we have three CPU manufacturers to choose from in the cloud when looking to run Virtual Machines. Customers can now vote with their workloads as to what they want to use. The CPU manufacturers now have more reason to innovate and make better, faster or cheaper CPUs available. When you can switch platforms easily (you do DevOps, right? All scripted installs?) then its perhaps down to the cost question now.

Now, recently it was announced there will be a t3a. Wonder if there will be a t3a1?

Compute in the cloud just got even more commodity. Simon Wardley, fire up your maps.

S3 improvements (lots here)

Gosh, so much here already.

Firstly, an admission that AWS Glacier is no longer its own service, but folded under S3 and renamed as S3 Glacier. There’s a new API for glacier to make it easier to work with, and the ability to put objects to S3 and have them stored immediately as Glacier objects without having to have zero day archive Lifecycle policies.

SFTP transfers – finally, a commodity protocol for file uploads that simple integrators can use, without having to deploy your own maintained, patched, fault-tolerant, scalable ingestion fleet of servers. This right here is the definition of undifferentiated heavy lifting being simplified, but with a price of 30c/hour, you’re looking at US$216 before you include any data transfer charges.

Object Lock: the ability to put files and not be able to delete them for a period. For when you have strict compliance requirements. Currently can only be defined on a Bucket during Bucket creation.

S3 events seem to have got a lot more detailed as well, with more trigger types than can be sent to SQS, SNS, or straight to a Lambda function.

KMS with dedicated HSM storage

KMS has simplified the way that key management is done, but some organisations require a dedicated HSM for compliance reasons. Now you can tell KSM to use your custom key store (a single-tenneted CloudHSM devices in our VPC) as the storage for these keys, but still use KMS APIs for your own key interaction, and use those keys for your services.

A dedicated Security Conference

Boston, End of June. Two days.

Not so new (but really recent)

CLI Version 2

Something so critical – the CLI – used by so many poor-man (poor-person) integrations and CI/CD pipelines, now with a version 2 in the works. Its breaking changes time – but in the mean time, the v1 CLI continues to get updates.

Predictive AutoScaling

Having EC2 AutoScaling reactively scale when thresholds are breached has been great, but combining that with machine learning based upon previous scaling events to make predictive scaling is next-level .

Lambda Support for Python 3.7

You may initially think this is trivial, stepping up from Python 3.6 to Lambda with Python 3.7, but it means that Python Lambda code can now make TLS 1.3 requests. Updating from Python 3.6 to 3.7 is mostly trivial; from 2.7 to 3.x normally means re-factoring liburi/requests client libraries and liberal use of parentheses where previously they weren’t required (eg, for print()).

S3: Public Access Blocking

Block Public Access finally removes the need for custom Bucket policies to prevent accidental uploads with acl:public (which, when you’re using a 3rd party s3 client for which you can’t see or control the ACL used may be scary). The downfall of the previous policies that rejected uploads if ACL:public (or not acl:private) was used is that it interfered with the ability to do multi-part puts (different API).

There’s been way too many cases of customers leaving objects publicly accessible. This will become a critical control in future. Most organisations don’t want public access to S3: those that do want public, anonymous access probably should be using CloudFront to do so (and a CloudFront origin Access identity for this as well, with Lambda@Edge to handle auto indexing and trailing slash redirects).

DynamoDB: Encrypted by default

A big step up. In reality, the ‘encryption at rest’ scenario within AWS is a formality: as one of the few people in Australia who has actually been inside a US-East-1 facility (hey QuinnyPig, I recall that from your slide two weeks ago at Latency Conf) the physical security is superb; the separation of responsibility between the logical allocation of data, and the knowledge of the physical location are separate teams.

So given that someone in the facility doesn’t know where your data is, and someone who knows where it is doesn’t have physical access (and those with physical access cant smuggle storage devices in or out), we’re at a high bar (physical devices only leave facilities when crushed into a very fine powder, particularly for SSD based storage).

So the Encrypted At Rest capability is more a nice to have – an extra protection should the standard storage wiping techniques (already very robust) have an issue. But given the bulk of the AES algorithm has been in CPU extensions for years, the overhead of processing encryption is essentially no impact.

Summary

I’ve tried my best to stay aware of so much, but the last 24 months has stretched the definition of what Cloud is so very wide. IoT, Robotics, Machine Learning, Vision Processing, Connect, Alexa, Analytics, DeepLens, this list seems so wide before you dive deep to the details. And the existing stalwarts: Ec2, S3, SQS, and even VPC keep getting richer, and richer.

The above is the services I’ve been interested in – there is definitely a hell of a lot more in the last 24 hours as well.

What’s today (US time) going to bring? I need to get some sleep, because this is exhausting just trying to keep the brain up to date.

(Previously-) Symantec Run Certificate Authority distrust is about to hit

Sometime in the next week, a large swathe of web sites around the world, from Fortune 500 companies, to governments and beyond will stop being available securely. All with the next release version of Google Chrome (version 70) and Firefox. The words “NET::ERR_CERT_SYMANTEC_LEGACY” are about to become well known.

For those wishing to look into the future, Google Chrome makes its future releases available to those interested under the labels of Beta, and prior to that, as a “Canary” (ie, in the coal mine). And if you’d cast your eyes over a few sites with these pre-release versions, you’ll see examples like that shown here (name removed).

A web site not available as the issuing certificate authority has been distrusted.

Sadly, the operators of these sites may well be looking at the embedded certificate expiry (Valid Until) date, and think this is not an issue for them. Some of these certificates may have many more years of appearing to be valid.

The case is much worse: the organisation that these web site operators obtained the certificate from — the Certificate Authority — is about to have its status revoked, having been caught acting in ways that undermine the trust instilled in it. These are all powered by Symantec’s legacy root certificates, which includes the Thawte, GeoTrust, and RapidSSL brands.
You can read plenty online about this, for example: form DigiCert, Mozilla, and Google. Here’s Scott Helm’s February 2018 post, and his follow up from a recent Alexa Top sites crawl. Several of these have since updated their certificates (Well done, Tigereair.com.au, stratco.com.au, naati.com.au; fixed it!).

So what’s actually going to happen?

Some disruption.

I’m sure a large number of these will be smeared across mainstream media for being “hacked”, or “offline”, when in reality, “oblivious” is closer to the point.

Poor service providers are going to tell vulnerable people to “ignore the security warnings”, and to “proceed” to the site regardless. This is BAD advise. If you are told this, you are better off ceasing to do business with the organisation as they do not under stand the security they are dealing with. If this is the advice of your employer, then you should consider what this means to the security of your personal HR (and other) data.

There’s far too many people operating, controlling, or otherwise “responsible” for large numbers of web sites who have no idea about what they are actually operating. It’s evident from scanning site and seeing those that still have legacy, vulnerable encryption on their HTTPS configuration, or worse, serve content over unencrypted HTTP. Just because you don’t value you’re content from modification, doesn’t mean your web visitors don’t value NOT being compromised when visiting you.

Web traffic interception happens every second of every day. In Wifi Cafes, Airports, air planes, corporate LANs. TLS (formerly SSL) is the best way we have to protect the integrity of the content across untrusted networks, but we’re in a constant capability race to ensure that services only offer ways to connect that minimise the risk of using untrusted networks.

Driven by a desire to not change things that appear to be working (or indeed, being either lazy, overworked, under resourced/funded, or unaware), organisations are not bringing up their drawbridge of security on their most vulnerable interfaces: those services that are facing the Internet, such as their web site or web services. This issue, when it breaks, will help highlight that some organisations and individuals should probably not be in charge of the services they currently operate.

Case in point: check out BankGradeSecurity.com, a ranking of financial institutions around the world and how well they have adopted modern encryption and security capabilities on their web site and Internet banking services.

It’s clear we’re constantly in the middle of technology transitions – IT Services are not simply done; they are either in-use and actively well-maintained, or they should be archived or removed. Anything else demonstrates cost cutting and under-valuation of the digital capability that allows an organisation to operate.

Organisations face a choice of two types of Managed Services providers today: those that understand service maintenance on behalf of their customers and those that do not (and are still running with the same HTTPS configuration they went live with years ago.

It’s easy to spot these services — they haven’t enabled GCM based AES block ciphers or Eliptical Curve Diffie-Hellman Ephemeral (ECDHE) key exchange mechanisms. Worse, those permitting the use of SSLv2, SSLv3, or TLS 1.0, or not yet permitting the use of TLS 1.2 (or the shiny new TLS 1.3). And unbelievably, those that don’t enable HTTPS at all.

There’s more signs of stagnation if you know what to look for; lack of HTTP/2, lack of IPv6, long TTLs on DNS records, etc, that all indicate organisations that are stifled, or don’t have capability to understand what they are doing. Sometimes its corporate direction to use 3rd party IT operations who again, use the cheapest unskilled and unqualified labour to delivery IT services, dressed up in marketing to make it look like they save the earth.


If you’re affected by this, consider attending Nephology’s Web Security training.

Gartner Magic Quadrant for Cloud 2018: Half the players dessimated

It’s as if the left hand side of the 2017 Gartner Cloud MQ just imploded! And just as interesting, most of those that were in the Visionaries quadrant are now relegated to the Niche Players sector.

Only six players remain compared to last years 14. Lets see who has gone this year:

  • Skytap
  • NTT
  • Joyent
  • Interoute
  • Fujitsu
  • Rackspace
  • CenturyLink
  • Virtustream

It’s no surprise to those close to the ground that the only real survivors here thus far are AWS and MS Azure, with Google barely making it into the top right corner, which actually looks like they just flopped over the line from last year &mdash but progress none the less. And while the gap between the top two is closing, AWS is still far above in “Ability to execute”, and slightly ahead in “Completeness of vision”.

For those that had chosen one of the departed 8 as their Cloud provider, then its time to question what their strategy is, and what yours is. Gartner may have upped its inclusion requirements resulting in some of these players being filtered out, and that may have no impact on those providers. But it may resonate poorly for their sales prospects going forward, all of which could have a long term downward trend, increased cost per customer, loss of economies of scale, etc.

Meanwhile, the rate of innovation by service improvement, refinement, or new offerings continues.