AWS GuardDuty: taking on the undifferentiated heavy lifting of network security analytics

Guard Duty is a machine learning security analytics service for AWS

Several years ago saw the introduction of AWS CloudTrail, the ‘almost’ audit log of API calls performed by a customer against an AWS Account. This was a huge security milestone; the ability for the customer to play back what they had asked for.

I say ‘almost’, as a critical design decision was for CloudTrail in no way to inhibit the already authenticated API call that had been made by the customer. If the internal logging mechanism of CloudTrail were to ever fail, it should not stop the API call that was issued. Other logging mechanisms in computing may place logging in the critical path of call execution, and if logging fails, then the API call fails.

With CloudTrail (and the ability to go directly cross-account to from AWS direct to a trusted independent account, came the second task – looking at the data. Its all JSON text, and it has a corresponding chain of check-summed and signed digest files meaning the set of log files cannot be tampered with, and cannot be removed without breaking the chain.

Numerous solutions were put in place, but they were mostly basic individual pattern matches against single lines of logs. If you see X, then alert with a message Y: If there is a Console Login event, and it doesn’t come from XX.YY.ZZ.AA/32, then alert.

Similarly, VPC introduced VPC flow logs, tracking the authorisation or rejection of connections through the VPC (no payload content, just payload size, start time, ports, addresses).

In December, AWS introduced a managed service that would use a private copy of the VPC Flow Logs, a private copy of the CloudTrail log, as well as a Route53 query log, and supplement this with some centrally managed, maintained and updated threat lists, mix in some customer defined threat lists and white lists, mix with a bit of machine learning, and produce much richer alerting.

Guard duty currently has not finished yet. At re:Invent, Tom Stickle indicated in a graph that there is a slew of additional capability coming shortly to GuardDuty, and now that it’s GA, more customers will have feedback and input into the future direction of the service.

However, this doesn’t replace the need to have your own, secured and trusted copy of your CloudTrail logs, and your own alerting for events that you think are particularly significant, such as a SAML Identity Provider being updated with a new Metadata document!

But between this, and Amazon Macie (for analysing and helping you review and secure your S3 documents), your visibility of security compliance and issues continues to get even higher.

Inspecting the AWS RDS CA Certificates

Trying to fetch all the RDS CA certificates as a bundle, and inspect them:

#!/usr/bin/python3
# vim: tabstop=8 expandtab shiftwidth=4 softtabstop=4
import urllib.request
import re
from OpenSSL import crypto
from datetime import datetime


def get_certs():
    url = ("https://s3.amazonaws.com/rds-downloads/"
           "rds-combined-ca-bundle.pem")
    with urllib.request.urlopen(url=url) as f:
        pem_certs = []
        current_cert = ''
        for line in f.read().decode('utf-8').splitlines():
            current_cert = current_cert + line + "\n"
            if re.match("^-----END CERTIFICATE-----", line):
                pem_certs.append(current_cert)
                current_cert = ""
        return pem_certs


def validate_certs(certs):
    ca = None
    for cert_pem in certs:
        cert = crypto.load_certificate(crypto.FILETYPE_PEM, cert_pem)
        if cert.get_issuer().CN == cert.get_subject().CN:
            ca = cert
    for cert_pem in certs:
        cert = crypto.load_certificate(crypto.FILETYPE_PEM, cert_pem)
        start_time = datetime.strptime(
            cert.get_notBefore().decode('utf-8')[0:14], "%Y%m%d%H%M%S")
        end_time = datetime.strptime(
            cert.get_notAfter().decode('utf-8')[0:14], "%Y%m%d%H%M%S")
        print("%s: %s (#%s) exp %s" %
              (cert.get_issuer().CN, cert.get_subject().CN,
               cert.get_serial_number(), end_time))
        if end_time < datetime.now():
            print("EXPIRED: %s on %s" % (cert.get_subject().CN,
                                         cert.get_notAfter()))
        if start_time > datetime.now():
            print("NOT YET ACTIVE: %s on %s" % (cert.get_subject().CN,
                                                cert.get_notBefore()))
    return

pem_certs = get_certs()
validate_certs(pem_certs)

Output

Today this gives me::

Amazon RDS Root CA: Amazon RDS Root CA (#66) exp 2020-03-05 09:11:31
Amazon RDS Root CA: Amazon RDS ap-northeast-1 CA (#68) exp 2020-03-05 22:03:06
Amazon RDS Root CA: Amazon RDS ap-southeast-1 CA (#69) exp 2020-03-05 22:03:19
Amazon RDS Root CA: Amazon RDS ap-southeast-2 CA (#70) exp 2020-03-05 22:03:24
Amazon RDS Root CA: Amazon RDS eu-central-1 CA (#71) exp 2020-03-05 22:03:31
Amazon RDS Root CA: Amazon RDS eu-west-1 CA (#72) exp 2020-03-05 22:03:35
Amazon RDS Root CA: Amazon RDS sa-east-1 CA (#73) exp 2020-03-05 22:03:40
Amazon RDS Root CA: Amazon RDS us-east-1 CA (#67) exp 2020-03-05 21:54:04
Amazon RDS Root CA: Amazon RDS us-west-1 CA (#74) exp 2020-03-05 22:03:45
Amazon RDS Root CA: Amazon RDS us-west-2 CA (#75) exp 2020-03-05 22:03:50
Amazon RDS Root CA: Amazon RDS ap-northeast-2 CA (#76) exp 2020-03-05 00:05:46
Amazon RDS Root CA: Amazon RDS ap-south-1 CA (#77) exp 2020-03-05 21:29:22
Amazon RDS Root CA: Amazon RDS us-east-2 CA (#78) exp 2020-03-05 19:58:45
Amazon RDS Root CA: Amazon RDS ca-central-1 CA (#79) exp 2020-03-05 00:10:11
Amazon RDS Root CA: Amazon RDS eu-west-2 CA (#80) exp 2020-03-05 17:44:42

S3 MFA Delete

The Simple Storage Service (S3, or S3) has made long term durable storage simple for the masses. The democratisation of object storage with well documented, stable APIs has been incorporated into many products. The API is part of the product.

But despite the word Simple, there are more and more advanced features: storage tiers, security policies, life-cycle policies, logging, versioning, requestor-pays, and more recently, Inventory generation and more.

S3 features prominently in long-term retention of important data due to its high durability. But today I’m diving into the another benefit: MFA Delete.

Simple CRUD

Create, Read, Update, Delete: the basics of a REST interface for sending and manipulated a data store. In AWS, IAM policy (or Bucket Policy) can permit or limit the actions that a user can perform.  If you delete an Object, then it’s gone. If you overwrite an object (using the same Prefix or name), then the original is lost, as you would expect.

We can limit a calls to s3:DeleteObject, either with a explicit DENY, or carefully only permitting the fine grained controls we intend (s3:PutObject, s3:GetObject) for the role., groups or users we confer privileges to. However, we still run the change of an unintended overwrite.

Furthermore, there may be privileged users or roles that have elevated access, so while your general work-flow is protected by policy from accidental deletion, you’re not protected from accidents from other source (eg, humans with admin privs).

S3 Versioning

To help with this, S3 Versioning permits you to retain multiple revisions of the same object. When listing the bucket naturally, you see the current revision in the list. But a few API calls and you can drill into the previous revisions of the same object, helping you recover from object overwrites.

When a file is deleted from a Versioned S3 Bucket, its really just updated with a new version as a designated Delete Marker. This Marker prevents the object being included in a natural bucket listing. Without further action, the previous versions are still present, and you’re still paying for their storage.

Lifecycle Policies

I always recommend agreeing a life cycle retention policy for S3 buckets – possibly by agreed prefix – upon creation of the Bucket. It makes the creator of the data set really consider how permanent their data must be.

Lifecycle policies can change data storage tiers, but my favourite is the expiry of “previous revisions” after a customer-defined number of days. This gives me a kind of “S3 Undelete” window, and its saved my bacon on several occasions; the accidental Admin delete can easily be undone within the number of days you have specified.

But I want to go further, I want to have some buckets that I know are my “keep forever” bucket, and I want to make any kind of delete of even previous revisions difficult: enter MFA Delete.

Enabling MFA Delete

MFA delete works on Versioned S3 Buckets, and protects all revisions (including delete markers) from being deleted with a corresponding special delete command that includes a valid MFA token from an authorised user.

In my experimentation, I had an existing bucket that I had Versioning enabled. To enable this feature I had to turn to the API – this isn’t available in the AWS Console at this time. I also had to us an IAM User with MFA or the master Root identity – federated users or Ec2 Instances in IAM Roles cannot do this, as they have no MFA associated with them directly.

In this example, I created a profile for the AWS CLI called MasterUser, and had root IAM keys created (which I immediately rescinded). I had a bucket called MyVersionBucket, that I had set up just as I liked it. I also grabbed the ARN of my Virtual MFA I had for the Root user in this account (the ARN is listed as a SERIAL number in the console).

To enable MFA Delete:

aws s3api put-bucket-versioning –profile MasterUser –bucket MyVersionBucket –versioning-configuration MFADelete=Enabled,Status=Enabled –mfa ‘arn:…. 012345

Note: the MFA is referenced with quotes around it, as the single argument contains a space between the serial (ARN in this case) and the current value on the MFA).

To then see the configuration:

aws s3api get-bucket-versioning –profile MasterUser –bucket MyVersionBucket

With this in place it was time to test it out.

(Not really) Deleting from an MFA-Delete protected Bucket

The first thing I did was upload a file (same as normal), and then delete it. Using the “current view” of the bucket, the file vanished. In the new AWS console I could see the deleted item listed, and drilling into it, I could see the revisions there as with a regular Versioning bucket.

The next thing I tried was to “undelete” an object, an option that has just appeared in the revised S3 console, however this silently failed.

I then looked at the revisions of my sample file, and could see the delete marker sitting there. I attempted to delete the Delete Marker, but without an MFA I was blocked. This seemed to make sense: previously “undeleting” an object from S3 meant removing the delete marker, and clearly that’s just a version that I cannot really delete.

I looked at the other revisions of my sample file, and I was likewise blocked from deleting them.

Next looked at adding a Lifecycle policy to the bucket, and discovered that no Lifecycle policies can be added to an MFA protected bucket. So three’s no opportunity to move to the Infrequent Access tier of storage after a period automatically.

To truly empty the bucket, I deleted the a version of the file:

aws s3api delete-object –bucket MyVersionBucket –key sample.png –version-id Foo1234 –mfa ‘arn:… 123456

The VersionID was displayed to me in the list versions’ output.

Of course, I could potentially have suspended MFA delete, tidied up, and then re-enabled it.

At the end of my experiment, with MFA Delete Enabled, I could dimply delete the empty bucket as normal – there were no further challenges.

When to use MFA Delete

As MFA-Delete is a bucket-wide policy, you need to ensure that all objects that will be in this bucket are right to be considered permanent. You’ll want to limit who has access to put-version policy (perhaps your PowerUsers should have an explicit deny on this API call). If you have temporary or staging data in the bucket, or data that you want a lifecycle policy to automaticlaly clean up, then MFA Delete is not for you.

CloudPets security fail is not a Cloud failure

I spent several years at Amazon Web Services as the Solution Architect with a depth in Security in A/NZ. I created and presented the Security keynotes at the AWS Summits in Australia and New Zealand. I teach Advanced Security and Operations on AWS. I have run online share-trading systems for many of the banks in Australia. I help create the official Debian EC2 AMIs. I am the National Cloud Lead for AWS Partner Ajilon, and via Ajilon, I also secure the State Government Land Registry in Ec2 with Advara.

So I am reasonably familiar with configuring AWS resources to secure workloads.

Last week saw a poor security failure; the compromise of a company that makes Internet-connected plush toys for children that lets users record and playback audio via the toys: CloudPets. Coverage from Troy Hunt,  The Register, ArsTechnica.

As details emerged, a few things became obvious. But here are the highlights (low-lights, really) to me that apparently occurred:

  • A production database (MongoDB) was exposed directly to the Internet with no authentication required to query it
  • Audio files in S3 were publicly, anonymously retrievable. However, they were not listable directly (no worries, the object URLs were in that open Mondo database)
  • Non-production and production systems were co-tenanted

There’s a number of steps that should have been taken technically to secure this:

  1. Each device should have had a unique certificate or credential on each of them
  2. This certificate/credential should have been used to authenticate to an API Endpoint
  3. Each certificate/credential could then be uniquely invalidated if someone stole the the keys from it
  4. Each certificate/credential should only have been permitted access to fetch/retrieve its own recordings, not any recording from any customer
  5. The Endpoint that authenticates the certificate should have generated Presigned URLs for the referenced recordings. PreSigned URLs contain a timestamp set in the future, after which the Presigned URL is no longer valid. Each time the device (pet) would want a file, it could ask the Endpoint to generate the Presigned URL, and then fetch it from S3
  6. The Endpoint could rate limit the number of requests per certificate pre minute/hour/day. Eg, 60 per minute (for burst fetches), 200 per hour, 400 per day?

If the Endpoint for the API was an Ec2 instance (or better yet, an AutoScale Group of them), then it could itself be running in the context of an IAM Role, with permission to create these Presigned URLs. Similarly an API Gateway running a Lambda in a Role.

Indeed, that Endpoint would have been what would have used the MongoDB (privately), removing the publicly facing database.

I’ve often quoted Voltaire (or Uncle Ben from Spider Man, take your pick): “with great power comes great responsibility“. There’s no excuse from the series of failures that were conducted here; the team apparently didn’t understand security in their architecture.

Yet security is in all the publicly facing AWS customer documents (joint responsibility). It’s impossible to miss this. AWS even offers a free security fundamentals course, which I recommend as a precursor to my own teachings.

Worse is the response and lack of action from the company when they were alerted last year.

PII and PHI is stored in the cloud. Information that the economy, indeed modern civilisation depends upon. The techniques used to secure workloads are not overly costly, they mostly require knowledge and implementation.

You don’t need to be using Hardware Security Modules (HSMs) to have a good security architecture, but you do need current protocols, ciphers, authentication and authorisation. The protocols and ciphers will change over time, so IoT devices like this need to also update over time to support Protocols and Ciphers that may not exist today. It’s this constant stepping-stone approach, to continually be moving to the next implementation of transport and at-rest ciphers that is becoming a pattern.

Security architecture is not an after-thought that can be left on the shelf of unfulfilled requirements, but a core enabler of business models.

Looking back at 2016, and forward to the future

It’s going to be interesting to see how the Gartner Magic Quadrant for Infrastructure as a Service looks when it comes out this later year (assuming around August time again): the gap between the players, and the names that disappear.

2016 saw 5 competitors drop out compared to Gartner’s 2015 edition, and now more recently Cisco’s $1B investment in Intercloud seems to have ended; however they’ve now purchased AppDynamics who have been pushing very heavily into the cloud, especially around the microservices world. It’s interesting to see the the players shuffle around:

Year Count Differences to previous year
2013 15
2014 15 Merged IBM + Softlayer, -Tier3, -Savis, +VMWare, +Google, +CenturyLink
2015 15 -GoGrid, -HP, +NTT, +Interoute
2016 10 -Joyent, -DimensionData, -Verizon, -CSC, -Interoute

Meanwhile at AWS, services continued to innovate, reliably and without any major interruptions. May 2015 saw VPC S3 Endpoints launched, permitting private interconnect between VPCs and S3 service, and there’s been promises of more of this to follow. Re:Invent 2016 saw enhanced distributed account controls with AWS Organisations being announced (only in preview, so subject to change), enhancing the corporate controls in a multi-AWS-account set-up.

AWS did open up four additional Regions in 2016 as promised — Ohio, Canada, London, and India. The footprint of its Edge Locations also expanded — although some of these were additional Edges in the same cities (at different interconnect/peering providers). That’s OK; as the Edges can be turned on and off transparently around maintenance windows, so having multiple Edges in a location may indicate how important this location is.

I’ve found it particularly interesting to see CloudFront move from a flat network of Points of Presence (POPs), to a two-tier caching model with “Regional Edges” servicing requests from “Global Edges”. As CloudFront has spread wider into more locations, there’s an increase in the number of origin requests (misses) made to your origin service, which even with modest TTLs on objects can still be an overwhelming volume of traffic.

From a networking perspective, the availability of IPv6 on Service Endpoints, and now within the VPC is also a sign of evolution. These EC2 evolutions have happened in the past — perhaps not so noticeable:

  • from 32 bit to 64 bit VMs
  • from Para-Virtualisation (PVM) to Hardware-assisted Vitalisation (HVM) for EC2
  • to newer generations of Instance types (helped by an improved pricing point)

And now we see the start of the move from IPv4 to IPv6. It will take a few years, but we’re standing at the edge of massive change. Yet another migration. Only yesterday have we seen the launch of IPv6 for ELB within VPC – something that used to exist for ELB in what is now called “Classic” (all customer shared networking EC2), and today IPv6 within the VPC in all existing Regions (from what was just US-East-2 at launch; which in itself was interesting to see Ohio uses as a canary for the new feature deployment instead of the traditional US-East-1).

For the Debian the EC2 images that I help maintain, we started to support the Elastic Network Adaptor (ENA) at the end of 2016 after I attended the first Debian Cloud Sprint in Seattle – with thanks to Marcin Kulisz for his assistance. For those not familiar, Debian is a 23 year old non-profit, open-source operating system, which underlies much of the modern Linux ecosystem. I’ve been participating since the late 1990s, and a member of the project since 2000 (18 years now). Today I help maintain the Debian AMIs on EC2 for (at least) tens of thousands of AWS customers (may be much higher).

Debian has been selected to be one of the options of operating system in AWS’s new LightSail product: point-and-click VPS that neatly wraps up the details of VPC, Security Groups and storage into a simple model. This brings the beauty of Debian to even more people, taking away the long-held myth that Linux is hard.

What’s in store for 2017

For Debian: In 2017 we’ll move to make the images even more transparent to consumers than they are now with the help of the very talented maintainer of FAI for the last 20 years or so, Mr Thomas Lange (whom I have had the pleasure of knowing for many of those years since we met at DebConf 1). Marcin Kulisz, Anders Ingemann and others have played a major part in this, and of course, the other 800+ Debian Developers world-wide, and of course the contributors who report bugs, review code and help ensure that Debian remains as transparent as possible and true to its goals.

For the AWS platform, storage pricing continues to drop; and while it took a while to get the cents-per-GB-per-month, I’m sure we’ll see cents-per-TB-per-month not too long from now. Others say Cloud storage will be “free” (little “f”), but I just think the order of magnitude for charging will change. Compute edges down in price too; new instance types will come, and those who architect (and automate) their deployments well (CloudFormation, Auto-scale and Launch Configurations) can and will easily adopt them.

Status Quo: All Change

What’s become clear is that for any cloud deployment, there is constant change and maintenance in order to be able to take advantage of improvements to the platform over time. Be that re-deploying your app servres with new operating system patches, modifying VPC architectures (Endpoints, NAT GW, IPv6), etc. I guess the main things these days is to be pretty comfortable with a quote from Heraclitus (535-475 BC): “Change is the only constant in life“.

Meanwhile, there’s another whole story around my work that’s been very satisfying and exciting, but that’s a story for another day…


If you’re interested in AWS and Security, then please check out my training at https://nephology.net.au/, where in a 2 day in-person class we cover above and beyond the AWS courses to ensure you have the knowledge and are prepared for the agile world of running and securing environments in the AWS Cloud.