How did you get started in AWS?

Someone posed the question recently: how did you get started in using AWS?

Once upon a time…. I was working in London (2003-2010), and during my time at Vibrant Media running the IT operations team for their contextual advertising platform, I was looking for ways to serve content and process requests efficiently.

Vibrant had thousands of customers, and CommScore reporting indicated our advertising services were seen by some 49% of the US population each month (the platform was world-wide, but the CommScore report was for the US market). It was fairly busy.

In 2008 I stumbled across the then-launched AWS (starting 2006). At that time, the rudimentary controls were basic, and the architectural patterns for VPC at that time did not suit our requirements (all traffic from the VPC had to egress to the customer VPN – no IGW!). So I parked the idea, and moved on.

In 2010 I returned to Australia, and was approached by the team at Netshelter to implement a crawler for forum sites to identify the influencers in the network. Unlike my previous role at Vibrant, Netshelter had no data centres, no infrastructure, just AWS.

It was the words of Richard Brindley who said “we just have AWS, don’t worry about the bill, because anything you do in AWS is going to be vastly cheaper than what we would have done on premises”.

With only myself to architect, implement and operate the solution, I had to find ways to make myself scale. Platform as a service – managed components, was key. Any increase in pricing meant that I didn’t have to deal with the details of operations.

As a Linux developer and System Admin for the 15 years prior to that, I started with the EC2 platform. Finding images, launching them, and configuring them. Then came the automation of installation: scripting the deployment of packages as required for the code I was writing (back then, in Perl).

Pretty quickly, I realised I needed to scale horizontally to get through the work, and I would need some capability to distribute out work. I turned to SQS, and within a day had the epiphany that a reliable queue system was more important than a fleet of processing nodes. Individuals nodes could fail, but a good approach to queuing and message processing could overcome many obstacles.

In storing my results, I needed a database. I had been MySQL certified for years, writing stored procedures, creating schemas, and managing server updates. All of which was fascinating, but time consuming. RDS MySQL was the obvious choice to save me time.

As VPC capability evolved, additional layers of security became easier to implement without introducing Single Points of Failure (SPOFs), or pinch-points and bottlenecks.

From an Australian perspective, this was an interesting era: it was pre-Region in Australia. That meant that, at that time, most organisations dismissed cloud as not being applicable to them. True, some organisations addressing European and US markets were all-in, but latency and fears around the then-relevant Patriot Act kept usage low (this obviously changed in 2012!).

But in essence, the getting started advise of not worrying about the bill with respect to what the equivalent all-in-cost would have been for co-location fees, bandwidth commitments, compute and storage hardware, rack and stack time and costs, and th eoverhead of managing all these activities, meant that the immediacy and control of an AWs environment was far more effective.

I didn’t go wild on cost. Keeping an eye on the individual components mean the total charges remained sensible. As they say, look after the pennies, and the pounds look after themselves.

What was key was the approach to continuously learn. And then relearn something when it changes slightly, or unlearn past behaviours that no longer made sense.

It was also useful to always push the boundaries; reach out and ask service teams to add new capabilities, be they technical, compliance, policy, etc.

How would I start today…well, that’s another article for another day….

Death of the Data Centre; long live the Data Centre

The last time I worked inside a data centre co-lo was in2009. From East Coast US to West, from the UK to Europe, and here in Australia I spent many long hours in these windowless hubs of electronic existence.

It’s been 10 years.

I started making a data centre at my fathers manufacturing organisation in the early 1990s. As a small business it had a number of computer systems, a small 100 MBit/sec LAN, and a room with air-conditioning that we sealed off and deployed physical dedicated servers and UPS units. I recall naming every host on the network with a name starting with the letter P:

  • Parrot
  • Pigeon
  • Pardaloot
  • Pootoo
  • Peacock
  • Pacific Black Duck

You get the idea. The company was called Pelican.

By the time I attended The University of Western Australia I was of course gravitating to the University Computer Club, a student association I would end up being Vice President and then President of. During my time there with friends, we furnished a small data centre our of recycled materials in order to contain the cooling for our server farm within the expanse of the vast Cameron Hall building; this structure still stands today (webcams).

In 1997 my interest in networking and digital rights led to help found The Western Australian Internet Association, now known as Internet.asn.au. Thus not being a network.

Despite not creating or working at an ISP in these earlier years of Internet, I was reasonably proficient in the IT physical infrastructure deployment. My professional career saw me spend 20 years within the data centres of banks, education, government, financial services. I used to order millions of dollars of server blade enclosures, remote control power distribution units and dual-power transfer units for reliability, switches, load balancers, remote KVM units; and upon notification of delivery at a data centre in Manhattan (111 8th Avenue, or 6th Ave), Seattle, China Basin in San Francisco, Andover MA, Amsterdam or more, I would organise for myself or my team to parachute in, un-box, unwrap, stack, and then crimp Ethernet leads, power on, and deploy clusters of servers, then kick off initial install server deployment, retreating home to then finish the software install remotely, and bring servers online and into service.

It was all about dependencies; have the right equipment in the right place at the right time to minimise the time spent in the co-lo.

The last one that I worked in was 2009. The last one that I visited was in 2013 – and that was one of the massive halls within the sprawling Amazon Web Services (AWS) US-East-1 complex; a facility that few people ever get to see (no photos).

All that effort, the logistics and physical work of installing equipment, is now largely redundant. I create virtual data centres on cloud providers from templates with more fault tolerance, scalability, and privacy in literally 5 minutes, across the planet without having to spend my time hidden for days (to weeks) crimping Ethernet cables, balancing redundant power usage, and architecting spanning tree powered reliable layer 2 networks.

While some write of the death of the data centre, I think the data centre has changed who its direct customers are. I’m not interested in touring facilities and planning cabinet layouts. I have better things to do. The hyper-scale cloud providers have automated and abstracted so much, that it is not cost effective for me to do any of that manual work any more.

Vivre la Data Centre. You don’t need to market to me any more. Just to those Cloud providers; cut your costs, you’re a commodity, and have been for a decade.

Put your CAA in DNS!

There are hundreds of public, trusted* certificate authorities (CAs) in the world. These CAs have had their root CA Certificate published into the Trust Store of many solutions that the world uses. These Trust Stores include widely used web browsers (like the one you’re using now), to the various programming language run times, and individuals operating systems.

A trust store is literally a store of certificates which are deemed trusted. While users can edit their trust store, or make their own, they come with a set that have been selected by your software vendor. Sometimes these are manipulated in the corporate environment to include a company Certificate Authority, or remove specific distrusted authorities.

Over time, some CAs fall into disrepute, and eventually software distributors will issue updates that remove a rouge CA. Of course, issuing an update for systems that the public never apply doesn’t change much in the short term (tip: patch your environments, including the trust store).

Like all x509 certificates the CA root certificates have an expiry, typically over a very long 20+year period, and before expiry, much effort is put into creating a new root Certificate and having it issued distributed and updated in deployed applications.

Legitimate public certificate authorities are required to undertake some mandatory checks when they issue their certificates to their customers. These checks are called the Baseline Requirements, and are governed by the Browser/CA Forum industry body. CAs that are found to be flouting the Baseline Requirements are expelled from the Browser/CA Forum, and subsequently, most software distributions then remove them from their products (sometimes retrospectively via patches as mentioned above).

Being a Certificate Authority has been a lucrative business over the years. In the early days, it was enough to make Mark Shuttleworth a tidy packet with Thawte – enough for him to become a very early Space Tourist, and then start Canonical. With a trusted CA Root certificate widely adopted, a CA can then issue certificates for whatever they wish to charge.

What’s important to note though, is that any certificate in use has no bearing on the strength of encryption or negotiation protocol being used when a client connects to an HTTPS service. The only thing a CA-issued certificate gives you is a reasonably strong validation that the controller of the DNS name you’re connecting to has validate themselves to the CA vetting process.

It doesn’t tell you that the other end of your connection is someone you can TRUST, but you can reasonably TRUST that a given Certificate Authority thinks the entity at the other end of your connection may be the controller of their DNS (in Domain Validated (DV) certificates). Why reasonably? Well what if the controll erof the web site you’re trying to talk to accidentally published their PRIVATE key somewhere; a scammer could then set up a site that may look legitimate, poison some DNS or control a network segment your traffic routes over….

When a CA issues a certificate, it adds a digital signature (typically RSA based) around the originating certificate request. With in the certificate data are the various fields about the subject of the certificate, as well as information about who the issuer is, including a fingerprint (hash) of the issuer’s public certificate.

Previously CAs would issue certificates with an MD5 of their certificate. MD5 was replaced with SHA1, and around 2014, SHA1 was replaced with SHA2-256.

This signature algorithm is effectively the strength of the trust between the issuing CA, and the subjects certificate that you see on a web site. RSA gets very slow as key sizes get larger; today’s services typically use RSA at 2048 bits, which is currently strong enough to be deemed secure, and fast enough not to be a major performance overhead; make that 4096 bits and its another story.

Not only is the RSA algorithm being replaced, but eventually the SHA2-256 will be as well. The replacement for RSA is likely to be Eliptic Curve based, and SHA2-256 will either grow longer (SHA2-384), or to a new algorithm (SHA3-256), or a completely new method.

But back to the hundreds of CAs: you probably only use a small number in your organisation. LetsEncrypt, Amacon, Google, Verisign, GlobalTrust, etc. However, all CAs are seen as equally trusted when presented with a valid signed certificate. What can you do to prevent other CAs from issuing certificates in your (DNS) name?

The answer is simple: the DNS CAA record: Certificate Authority Authorisation. Its a list that says which CA(s) are allowed to issue certificates for your domain. It’s a record in DNS that is looked up by CAs just before they’re about to issue a certificate: if their indicator flag is not found, they don’t issue.

As it is so rarely issued, you can set this DNS record up with an extremely low TTL (say, 60 seconds). If you get the record wrong, or you forget to whitelist a new CA you’re moving to, update the record.

DNS isn’t perfect, but this slight incremental step may help keep public CAs to only issue from the CA’s you’ve made a decision to trust, and for your customers to trust as well.

DNS CAA was defined in 2010, and an IETF RFC in 2014. I worked with AWS Route53 team to have the record type supported in 2015. You can inspect CAA records using the dig command:

dig caa advara.com
; <<>> DiG 9.10.6 <<>> caa advara.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5546
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;advara.com. IN CAA
;; ANSWER SECTION:
advara.com. 60 IN CAA 0 issue "amazon.com"

Here you can see that advara.com has permitted AWS’s Certificate Manager, with its well known flag of “amazon.com” (and its a 60 second TTL).

You’ll also see that various online services will let you inspect this, including SSLLabs.com, Hardenize.com, and more.

Putting a CAA record in DNS typically costs nothing; its rarely looked up and can easily be changed. It protects you from someone tricking another CA into issuing certificates they think are legitimate; and this has been seen several times (think how valuable a google.com certificate would be ot intercept (MITM) mobile phones, searches, gmail, etc) – and while mis-issuance like this MAy lead to Browser/CA forum expulsion, and eventual client updates to distrust this CA, its far easier to prevent issuance with this simple record.

Of course, DNS Sec would be nice too…

Project & Support versus DevOps and Service teams

The funding model for the majority of the worlds IT projects is fundamentally flawed, and the fall out is, over time, broken systems, lacking security and legacy systems.

It’s pretty easy to see that digital systems are the lifeblood of most organisations today. From banking, stock inventory and tracking, HR systems. And the majority of these critical operations have been deployed as “projects”, and then “migrate to support”. And it’s that “migrate to support” that is the problem.

Support roles are typically over subscribed, and under empowered. It’s a cost saving exercise to minimise the overhead, by taking the more expensive development resources and moving them to a fresh project, while more commodity problem solving labour comes along to triage operational run time issues. However, that support function has no history in the design and architecture, and often either has no access to the development and test environments to continue doing managed change, or is not empowered to do so. The end result is that Support teams use the deployed production features (eg: manually add a user to a standalone system) instead of driving incremental improvements (eg: automatically add a user base don the HR system being updated).

Contrast with a DevOps team, of dynamic size over time. The team that builds & tests & deploys & automates this more complete lifecycle, and stays with the critical line-of-business system, becomes a Service Team. Any changes they need to perform are not applied in production locally, as is often the case with “Support teams”, but in the Development environment. This then should pass automated testing and feedback loops before being promoted to a higher environment. Sounds great, yeah?

Unfortunately, economic realities are the constraint here. Both the customer, and consultancy are trying to minimise cost, not maximise capability. And navigating a procurement and legal team is something that the procurement cycle wants to do as rarely as possible, not on a continuous basis.

Contrast a Service team focus, of variable size over time, containing different capabilities over time. The cost for this team varies over time, based upon the required skill set. The team objective is to make the Best Service they can, and need to drive from metrics: Availability, Latency, Accuracy while meeting strict security requirements.

From the Service team’s perspective, they obviously need remuneration for their time, but also want to take a sense of pride in their work, and a sense of achievement.

A Support Team is not a Service Team, as they don’t have the full Software Lifecycle Management capability and/or Data Lifecycle Management capability. A Service Team should never be one person; that’s one step away from being zero people. A Service Team may look after more than one service, but not so many that they do not have crystal clear focus on any service.

S3 Public Access: Preventable SNAFUs

It’s happened again.

This time it is Facebook who left an Amazon S3 Bucket with publicly (anonymously) accessible data. 540 million breached records.

Previously, Verizon, PicketiNet, GoDaddy, Booz Allen Hamilton, Dow Jones, WWE, Time Warner, Pentagon, Accenture, and more. Large, presumably trusted names.

Let’s start with the truth: objects (files, data) uploaded to S3, with no options set on the bucket or object, are private by default.
Someone has to either set a Bucket Policy to make objects anonymously accessible, or set each object as Public ACL for objects to be shared.

Lets be clear.

These breaches are the result of someone uploading data and setting the acl:public-read, or editing a Bucket’s overriding resource policy to facilittate anonymous public access.

Having S3 accessible via authenticated http(s) is great. Having it available directly via anonymous http(s) is not, but historically that was a valid use case.

This week I have updated a client’s account, that serves a static web site hosted in S3, to have the master “Block Public Access” enabled on their entire AWS account. And I sleep easier. Their service experienced no downtime in the swap, no significant increase in cost, and the CloudFront caching CDN cannot be randomly side-stepped with requests to the S3 bucket.

Serving from S3 is terrible

So when you set an object public it can be fetched from S3 with no authentication. It can also be served over unencrypted HTTP (which is a terrible idea).

When hitting the S3 endpoint, the TLS certificate used matches the S3 endpoint hostname, which is something like s3.ap-southeast-2.amazonaws.com. Now that hostname probably has nothing to do with your business brand name, and something like files.mycompany.com may at least give some indication of affiliation of the data with your brand. But with the S3 endpoint, you have no choice.

Ignoring the unencrypted HTTP; the S3 endpoint TLS configuration for HTTPS is also rather loosely curated, as it is a public, shared endpoint with over a decade of backwards compatibility to deal with. TLS 1.0 is still enabled, which would be a breach of PCI DSS 3.2 (and TLS 1.1 is there too, which IMHO is next to useless).

Its worth noting that there are dual-stack IPv4 and IPv6 endpoints, such as s3.dualstack.ap-southeast-2.amazonaws.com.

So how can we fix this?

CloudFront + Origin Access Identity

CloudFront allows us to select a TLS policy, pre-defined by AWS, but permitting us to restrict available protocols and ciphers. This lets us remove “early crypto” and be TLS 1.2 only.

CloudFront also permits us to use a customer specific name, for SNI enabled clients for no additional cost, or a dedicated IP address (not worth it, IMHO).

Origin Access Identities give CloudFront a rolling API keypair that the service can use to access S3. Your S3 bucket then has a policy permitting this Identity access to the host.

With this access in place, you can then flick the “Block Public Access” setting account-wide, possibly on the bucket first, then the account-wide settings last.

One thing to work out is your use of URLs ending in “/”. Using Lambda@edge, we convert these to a request for “/index.html”. Similaly URL paths that end in “/foo” with no typical suffix get mapped to “/foo/index.html”.

Governance FTW?

So, have you checked if Block Public Access is enabled in your account(s). How about a sweep through right now?

If you’re not sure about this, contact me.