CloudTrail, now with scalable logging of AWS APIs

AWS CloudTrail had some quiet updates in 2015 to make it a smoother ride when new Regions launch.

When AWS CloudTrail launched in 2013 as a free service (except for the consumed storage of its logs it dumped into S3) it was filling a hole — not advertised as an audit trail, but as close as AWS could get without fear of it becoming a blocking internal service on legitimate API calls. CloudTrail has to work quick enough to keep up with the constant stream of APIs.

Having a log of API calls that a customer makes is a key enabler for compliance reasons. CloudTrail did (at least) one thing that was pretty awesome — cross-account logging. Logs from CloudTrail in Account A could log to Account B, without anyone in Account A having been able to modify the log. For this to work, the recipient account had to configure their S3 bucket with appropriate permissions to receive the logs, with the correct originating identity, and to the specific paths — matching the account numbers of the source account(s) that will be permitted to log to it.

Clearly, one wouldn’t authorise the entire name-space too wide, or you would potentially let any account chose to log to you. They’d have to know the name of your bucket, but once discovered, they could generate enough API activity to start generating logs into your receiving account. Now these logs are quite small (and gziped), but its the principal!

If we think of this ‘receiving’ account as being our security and governance team, then they workload was to:

  1. Add additional paths (account numbers) as the organisation added AWS accounts
  2. White-list user IDs matching the AWS CloudTrail identity in each region as it came on line.

This second item is important. As AWS expands — it’s added a region already this year (2016), with plans for another 5 to come before Christmas — then the Security team in this account would have a race to find the CloudTrail ID for the new region, add it to the S3 Bucket policy for receiving logs, and then contact each of its sending accounts and get them to visit the region purely to turn on CloudTrail. Here’s what that looked like in the S3 bucket policy:

    {
      "Sid": "AWSCloudTrailAclCheck20131101",
      "Effect": "Allow",
      "Principal": {"AWS": [
        "arn:aws:iam::903692715234:root",
        "arn:aws:iam::859597730677:root",
        "arn:aws:iam::814480443879:root",
        "arn:aws:iam::216624486486:root",
        "arn:aws:iam::086441151436:root",
        "arn:aws:iam::388731089494:root",
        "arn:aws:iam::284668455005:root",
        "arn:aws:iam::113285607260:root"
      ]},
      "Action": "s3:GetBucketAcl",
      "Resource": "arn:aws:s3:::my-sec-team-logs"
    },

But the CloudTrail and IAM teams didn’t stand still. In mid 2015, the race to find the new region ID was removed with the ability to specify a global Service Principal ID that mapped to CloudTrail in all Regions – with AWS updating this to include new Regions as they come on line:

    {
	"Sid": "AWSCloudTrailAclCheck20150319",
	"Effect": "Allow",
	"Principal": { "Service": "cloudtrail.amazonaws.com" },
	"Action": "s3:GetBucketAcl",
	"Resource": "arn:aws:s3:::myBucketName"
    },

Turning to the ‘sending’ account, it had that same race – to turn on CloudTrail in a new region. Some questioned the need for doing this – if you’re not planning on using AWS in ap-northeast-2, then why turn on logs there? The simple reason is – to catch any activity that may happen, that you’re not aware of or expecting. Again during 2015, CloudTrail updated to change what used to be 1 ‘Trail’ per region, to having a ‘ShadowTail’ that was actually configured in one Region, but applied to all, with AWS turning on CloudTrail in new regions as they come online.

This replaced a CloudFormation template I’d developed to uniformly do the old Region-by-Region turn on of CloudTrail — and helps future proof the rapidly expanding service to reduce the ‘fog of war’ — the blind spots where activity may happen, but you don’t have any logging of it.

Lastly, a single trail per region was the default – and if you configured that to be handed immediately and directly to a separate account, then you may miss out on being able to inspect it yourself in the service account that generated the events! CloudTrail team fixed that too – permitting multiple trails per Region. This means I can pass one copy of the API log to the central security team, and then direct a duplicate stream to my own bucket for me to review should I need to.

When configuring the delivery of these logs, its also important to think about the long term retention – and automatic deletion of these logs. S3 LifeCycle policies are perfect for this – setting a deletion policy couldn’t be easier – just specify the number of days until deleted.

Should you be worried about one of your security team deleting the log – turn on Versioning for the receiving S3 bucket, and MFA delete. Whenever you access a log, you can always check to see if there are any “previous revisions” that are a result of an overwrite.

Lastly, its important to do something with these logs. CloudTrail Logs, Alerts, or a 3rd party suite like Splunk or managed service like SumoLogic works OK; but the key element is starting to wrap rules around your APIs calls that map to your activity. If you know you’re only ever going to access the API from a certain range, then set up an alert for when this happens from somewhere else. If you know you’re only going to access during office hours, set up an alert for when this happens outside of these hours. Easy stuff! Here’s a few others I like:

  • If using federated identity (SAML, OAuth), look for the definition of a new Identity Provider. Also look for updates (overwrites) of the MetadataDocument for existing Identity Providers – this will happen in (often yearly as SAML metadata contains X509 certificates that have expiry dates in them)
  • If using local users, check for additional users being created, especially if you have a pattern for usernames or know that you only create additional users from the office IP range
  • Check for IAM policies being modified
  • Check for new VPNs being established, new DirectConnect interfaces being offered (including sub-interfaces from another account), new Peering Requests ebing offered/accepted
  • Check for routing table changes; this is often stable after initial set-up

There’s many more situations to think about, and your profile of what wraps around your use of AWS may vary from account to account (eg, between Development and Production, or a payroll workload versus an account used purely for off-site backup.

If you’re already receiving S3 logs, swap over to the Service Principal; it will stop you from having to react when the AWS notification of a new region happens. If you’re already sending CloudTrail logs to your security team then switch to a global Trail and rest easy as the new Regions come on line.

There’s more that CloudTrail has done – including validation files containing signatures of log files delivered, along with a chain of delivery proof where each validation file also has information about the previous one, so there can be no break in the chain of log files delivered.