The AWS console (and Twitter, and LinkedIn) has just lit up with the EC2 Dashboard Console page showing a service status with a new Availability Zone (AZ): ap-southeast-2c.
Before I go any further, I should be clear on my position here – I do not work for AWS. I used to in the past (~a year and a half ago). These opinions disclosed here are mine and not based upon any inside knowledge I may have – that data is kept locked up.
What is an AZ
AWS Regions (currently 11 in the main public global set, with around 5 publicly disclosed as coming soon) are composed of many data centres. For the EC2 services (and those services that exist within the Virtual Private Cloud or VPC world) these exist within customer defined networks that live in Availability Zones. You can think of an Availability Zone as being a logical collection of data centres facilities (one or more) that appear as one virtual data centre.
Each Region generally has at least two Availability Zones in order for customers to split workloads geographically; but that separation is generally within the same city. You can guess the separation by deploying services in each AZ, and then doing a ping from one to the other. They should be less than 10 milliseconds separated.
This separation should be sufficient to have separate power grids, flood plains and other risk factors mitigated, but close enough to make synchronous replication suitable for this environment. Any further separation then synchronous replication becomes a significant performance overhead.
So each AZ is at least one building, and transparent to the customer, this can grow (and shrink) as a physical footprint over time.
What’s New?
Until now, customers have had a choice of two Availability Zones in the Sydney AWS Region, and the general advice was to deploy your service by spreading across both of them evenly in order to get some level of high availability. Indeed, the EC2 SLA talks about having availability zones as part of your strategy for obtaining their 99.95% SLA. Should one of those AZs “become unavailable to you” then you stand a reasonable chance of remaining operational.
In this event of unavailability, those customers that had designed AutoScale groups around their EC2 compute fleet would then find their lost capacity being deployed automatically (subject to their sizings, and any scale-up/down alarms) in the surviving AZ. It meant that your cost implication was to run two instances instead of one, but potentially you ran two slightly smaller instances than you otherwise may have traditionally thought, but the benefit of this automatic recovery to service was a wonderful. It did mean that you ran a risk of losing ~50% of your capacity in one hit (one AZ, evenly split), but that’s better than cold standby elsewhere.
With three AZs, you now have a chance to rethink this. Should you use a third AZ?
Divide by 3!
If your EC2 fleet is already >= 3 instances, then probably this is a no-brainer. You’re already paying for the compute, so why not spread it around to reduce the loss-of-AZ risk exposure. Should an AZ fail then you’re only risking 1/3 of your footprint. The inter-AZ costs (@1c/GB) is in my experience negligible – and if you were split across two anyway then you’re already paying it.
Your ELBs can be expanded to be present in the new AZ as well – at no real increased cost; if ELBs to instances is your architecture, then you would not spread compute across 3 AZs without also adjusting the ELBs they may sit behind to do likewise.
But I don’t need three EC2 instances for my service!
That’s fine – if you’re running two instances, and you’re happy with the risk profile, SLA, and service impact of losing an AZ is that you already have in place, then do nothing. Your existing VPCs that you created won’t sprout a new Subnet in this new AZ by themselves; that’s generally a customer initiated action.
What you may want to do is review any IAM Policies you have in place that are explicit in their naming of AZs and/or subnets. You can’t always assume there will only ever be 2 AZs, and you can’t always assume there will only ever be 3 from now on!
Why is there a 3rd AZ in Sydney?
We’re unlikely to ever know for sure (or be permitted to discuss). Marketing (hat tip to my friends there) will say “unprecedented customer demand”. This may well be true. The existing AZs may be starting to become quite busy. There may be no more additional data centre capacity within a reasonable distance of the existing building(s) of the existing two AZs. And as we know, certain AWS services require a third AZ: for example, RDS SQL Server uses a witness server in a 3rd AZ as part of the multi-AZ solution – perhaps there’s been lots of customer demand for these services rather than exhaustion on the existing services.
But there are other reasons for this. Cost optimisation on the data centre space may mean the time is right to expand in a different geographical area. There’s the constant question as if the AWS services run from AWS-owned buildings or 3rd parties. At certain scales some options become more palatable than others. Some options become more possible. Tax implications, staffing implications, economies of scale, etc. Perhaps a new piece of industrial land became available – perhaps at a good price. Perhaps a new operator built a facility and leased it at the right price for a long term.
Perhaps the existing data centre suppliers (and/or land) in the existing areas became out priced as AWS swallowed up the available capacity. As Mark Twain allegedly said: “buy land, their not making any more of it”. If you owned a data centre and were away of your competitors near by being out of spare capacity, surely that supply-and-demand equation would push pricing up.
So what is clear here?
In my humble opinion,this is a signal that the Cloud market in Australia is a strong enough prospect that it warrants the additional overhead of developing this third AZ. That’s good news for customers who are required – or desire – to keep their content in this Region (such as public sector) as a whole lot of the more modern AWS services that depend upon three *customer accessible* AZs being present in a Region now become a possibility. I say possibility, as each of those individual service teams need to justify their expansion on their own merits – it’s not a fait accompli that a 3rd AZ means these services will come. What helps is customers telling AWS what their requirements are – via the support team, via the forums, and via the AWS team in-country. If you don’t ask, you don’t get.
How do I balance my VPC?
Hm, so you have an addressing scheme you’ve used to split by two? Something like even-numbered third-octect in an IPv4 is in AZ A, and odd numbered is in AZ B?
I’d suggest letting go of those constraints. Give your subnets a Name tag (App Servers A, App Servers B, App Servers C), and balance with whatever address space you have. You’re never going to have a long term perfect allocation in the uncharted future.
If you’ve exhausted your address space, then you may want to renumber – over time – into smaller more distributed subnets. If you’re architecting a VPC, make it large enough to contain enough residual address space that you can use it in future in ways you haven’t even through of yet. The largest VPC you can define is a /16, but you may feel quite comfortable allocating each subnet within that VPC as a /24. That’s 256 subnets of /24 size that you could make; but you don’t have to define them all now. Heck, you may (in an enterprise/large corporate) need a /22 network one day for 1000+ big data processing nodes or Workspaces desktops.
AWS CloudTrail had some quiet updates in 2015 to make it a smoother ride when new Regions launch.
When AWS CloudTrail launched in 2013 as a free service (except for the consumed storage of its logs it dumped into S3) it was filling a hole — not advertised as an audit trail, but as close as AWS could get without fear of it becoming a blocking internal service on legitimate API calls. CloudTrail has to work quick enough to keep up with the constant stream of APIs.
Having a log of API calls that a customer makes is a key enabler for compliance reasons. CloudTrail did (at least) one thing that was pretty awesome — cross-account logging. Logs from CloudTrail in Account A could log to Account B, without anyone in Account A having been able to modify the log. For this to work, the recipient account had to configure their S3 bucket with appropriate permissions to receive the logs, with the correct originating identity, and to the specific paths — matching the account numbers of the source account(s) that will be permitted to log to it.
Clearly, one wouldn’t authorise the entire name-space too wide, or you would potentially let any account chose to log to you. They’d have to know the name of your bucket, but once discovered, they could generate enough API activity to start generating logs into your receiving account. Now these logs are quite small (and gziped), but its the principal!
If we think of this ‘receiving’ account as being our security and governance team, then they workload was to:
Add additional paths (account numbers) as the organisation added AWS accounts
White-list user IDs matching the AWS CloudTrail identity in each region as it came on line.
This second item is important. As AWS expands — it’s added a region already this year (2016), with plans for another 5 to come before Christmas — then the Security team in this account would have a race to find the CloudTrail ID for the new region, add it to the S3 Bucket policy for receiving logs, and then contact each of its sending accounts and get them to visit the region purely to turn on CloudTrail. Here’s what that looked like in the S3 bucket policy:
{
     "Sid": "AWSCloudTrailAclCheck20131101",
     "Effect": "Allow",
     "Principal": {"AWS": [
       "arn:aws:iam::903692715234:root",
       "arn:aws:iam::859597730677:root",
       "arn:aws:iam::814480443879:root",
       "arn:aws:iam::216624486486:root",
       "arn:aws:iam::086441151436:root",
       "arn:aws:iam::388731089494:root",
       "arn:aws:iam::284668455005:root",
       "arn:aws:iam::113285607260:root"
     ]},
     "Action": "s3:GetBucketAcl",
     "Resource": "arn:aws:s3:::my-sec-team-logs"
   },
But the CloudTrail and IAM teams didn’t stand still. In mid 2015, the race to find the new region ID was removed with the ability to specify a global Service Principal ID that mapped to CloudTrail in all Regions – with AWS updating this to include new Regions as they come on line:
Turning to the ‘sending’ account, it had that same race – to turn on CloudTrail in a new region. Some questioned the need for doing this – if you’re not planning on using AWS in ap-northeast-2, then why turn on logs there? The simple reason is – to catch any activity that may happen, that you’re not aware of or expecting. Again during 2015, CloudTrail updated to change what used to be 1 ‘Trail’ per region, to having a ‘ShadowTail’ that was actually configured in one Region, but applied to all, with AWS turning on CloudTrail in new regions as they come online.
This replaced a CloudFormation template I’d developed to uniformly do the old Region-by-Region turn on of CloudTrail — and helps future proof the rapidly expanding service to reduce the ‘fog of war’ — the blind spots where activity may happen, but you don’t have any logging of it.
Lastly, a single trail per region was the default – and if you configured that to be handed immediately and directly to a separate account, then you may miss out on being able to inspect it yourself in the service account that generated the events! CloudTrail team fixed that too – permitting multiple trails per Region. This means I can pass one copy of the API log to the central security team, and then direct a duplicate stream to my own bucket for me to review should I need to.
When configuring the delivery of these logs, its also important to think about the long term retention – and automatic deletion of these logs. S3 LifeCycle policies are perfect for this – setting a deletion policy couldn’t be easier – just specify the number of days until deleted.
Should you be worried about one of your security team deleting the log – turn on Versioning for the receiving S3 bucket, and MFA delete. Whenever you access a log, you can always check to see if there are any “previous revisions” that are a result of an overwrite.
Lastly, its important to do something with these logs. CloudTrail Logs, Alerts, or a 3rd party suite like Splunk or managed service like SumoLogic works OK; but the key element is starting to wrap rules around your APIs calls that map to your activity. If you know you’re only ever going to access the API from a certain range, then set up an alert for when this happens from somewhere else. If you know you’re only going to access during office hours, set up an alert for when this happens outside of these hours. Easy stuff! Here’s a few others I like:
If using federated identity (SAML, OAuth), look for the definition of a new Identity Provider. Also look for updates (overwrites) of the MetadataDocument for existing Identity Providers – this will happen in (often yearly as SAML metadata contains X509 certificates that have expiry dates in them)
If using local users, check for additional users being created, especially if you have a pattern for usernames or know that you only create additional users from the office IP range
Check for IAM policies being modified
Check for new VPNs being established, new DirectConnect interfaces being offered (including sub-interfaces from another account), new Peering Requests ebing offered/accepted
Check for routing table changes; this is often stable after initial set-up
There’s many more situations to think about, and your profile of what wraps around your use of AWS may vary from account to account (eg, between Development and Production, or a payroll workload versus an account used purely for off-site backup.
If you’re already receiving S3 logs, swap over to the Service Principal; it will stop you from having to react when the AWS notification of a new region happens. If you’re already sending CloudTrail logs to your security team then switch to a global Trail and rest easy as the new Regions come on line.
There’s more that CloudTrail has done – including validation files containing signatures of log files delivered, along with a chain of delivery proof where each validation file also has information about the previous one, so there can be no break in the chain of log files delivered.
The recent AWS introduction of the Elastic File System gives you an automatic grow-and-shrink capability as an NFS mount, an exciting option that takes away the previous overhead in creating shared block file systems for EC2 instances.
However it should be noted that the same auto-management of capacity is not true in the EC2 instance’s Elastic Block Store (EBS) block storage disks; sizing (and resizing) is left to the customer. With current 2015 EBS, one cannot simply increase the size of an EBS Volume as the storage becomes full; (as at June 2015) an EBS volume, once created, has fixed size. For many applications, that lack of resize function on its local EBS disks is not a problem; many server instances come into existence for a brief period, process some data and then get Terminated, so long term managment is not needed.
However for a long term data store on an instance (instead of S3, which I would recommend looking closely at from a durability and pricing fit), and where I want to harness the capacity to grow (or shrink) disk for my data, then I will need to leverage some slightly more advanced disk management. And just to make life interesting, I wish to do all this while the data is live and in-use, if possible.
Enter: Logical Volume Management, or LVM. It’s been around for a long, long time: LVM 2 made a debut around 2002-2003 (2.00.09 was Mar 2004) — and LVM 1 was many years before that — so it’s pretty mature now. It’s a powerful layer that sits between your raw storage block devices (as seen by the operating system), and the partitions and file systems you would normally put on them.
In this post, I’ll walk through the process of getting set up with LVM on Debian in the AWS EC2 environment, and how you’d do some basic maintenance to add and remove (where possible) storage with minimal interruption.
Getting Started
First a little prep work for a new Debian instance with LVM.
As I’d like to give the instance its own ability to manage its storage, I’ll want to provision an IAM Role for EC2 Instances for this host. In the AWS console, visit IAM, Roles, and I’ll create a new Role I’ll name EC2-MyServer (or similar), and at this point I’ll skip giving it any actual privileges (later we’ll update this). As at this date, we can only associate an instance role/profile at instance launch time.
Now I launch a base image Debian EC2 instance launched with this IAM Role/Profile; the root file system is an EBS Volume. I am going to put data that I’ll be managing on a separate disk from the root file system.
First, I need to get the LVM utilities installed. It’s a simple package to install: the lvm2 package. From my EC2 instance I need to get root privileges (sudo -i) and run:
apt update && apt install lvm2
After a few moments, the package is installed. I’ll choose a location that I want my data to live in, such as /opt/. I want a separate disk for this task for a number of reasons:
Root EBS volumes cannot currently be encrypted using Amazon’s Encrypted EBS Volumes at this point in time. If I want to also use AWS’ encryption option, it’ll have to be on a non-root disk. Note that instance-size restrictions also exist for EBS Encrypted Volumes.
It’s possibly not worth make a snapshot of the Operating System at the same time as the user content data I am saving. The OS install (except the /etc/ folder) can almost entirely be recreated from a fresh install. so why snapshot that as well (unless that’s your strategy for preserving /etc, /home, etc).
The type of EBS volume that you require may be different for different data: today (Apr 2015) there is a choice of Magnetic, General Purpose 2 (GP2) SSD, and Provisioned IO/s (PIOPS) SSD, each with different costs; and depending on our volume, we may want to select one for our root volume (operating system), and something else for our data storage.
I may want to use EBS snapshots to clone the disk to another host, without the base OS bundled in with the data I am cloning.
I will create this extra volume in the AWS console and present it to this host. I’ll start by using a web browser (we’ll use CLI later) with the EC2 console.
The first piece of information we need to know is where my EC2 instance is running. Specifically, the AWS Region and Availability Zone (AZ). EBS Volumes only exist within the one designated AZ. If I accidentally make the volume(s) in the wrong AZ, then I won’t be able to connect them to my instance. It’s not a huge issue, as I would just delete the volume and try again.
I navigate to the “Instances” panel of the EC2 Console, and find my instance in the list:
Here I can see I have located an instance and it’s running in US-East-1A: that’s AZ A in Region US-East-1. I can also grab this with a wget from my running Debian instance by asking the MetaData server:
Time to navigate to “Elastic Block Store“, choose “Volumes” and click “Create“:
You’ll see I selected that I wanted AWS to encrypt this and as noted above, at this time that doesn’t include the t2 family. However, you have an option of using encryption with LVM – where the customer looks after the encryption key – see LUKS.
What’s nice is that I can do both — have AWS Encrypted Volumes, and then use encryption on top of this, but I have to manage my own keys with LUKS, and should I lose them, then I can keep all the cyphertext!
I deselected this for my example (with a t2.micro), and continue; I could see the new volume in the list as “creating”, and then shortly afterwards as “available”. Time to attach it: select the disk, and either right-click and choose “Attach“, or from the menu at the top of the list, chose “Actions” -> “Attach” (both do the same thing).
At this point in time your EC2 instance will now notice a new disk; you can confirm this with “dmesg |tail“, and you’ll see something like:
[1994151.231815]Â xvdg: unknown partition table
(Note the time-stamp in square brackets will be different).
Previously at this juncture you would format the entire disk with your favourite file system, mount it in the desired location, and be done. But we’re adding in LVM here – between this “raw” device, and the filesystem we are yet to make….
Marking the block device for LVM
Our first operation with LVM is to put a marker on the volume to indicate it’s being use for LVM – so that when we scan the block device, we know what it’s for. It’s a really simple command:
pvcreate /dev/xvdg
The device name above (/dev/xvdg) should correspond to the one we saw from the dmesg output above. The output of the above is rather straight forward:
 Physical volume "/dev/xvdg" successfully created
Checking our EBS Volume
We can check on the EBS volume – which LVM sees as a Physical Volume – using the “pvs” command.
# pvs
 PV        VG  Fmt Attr PSize PFree
 /dev/xvdg      lvm2 --- 5.00g 5.00g
Here we see the entire disk is currently unused.
Creating our First Volume Group
Next step, we need to make an initial LVM Volume Group which will use our Physical volume (xvdg). The Volume Group will then contain one (or more) Logical Volumes that we’ll format and use. Again, a simple command to create a volume group by giving it its first physical device that it will use:
# vgcreate OptVG /dev/xvdg
 Volume group "OptVG" successfully created
And likewise we can check our set of Volume Groups with ” vgs”:
# vgs
 VG   #PV #LV #SN Attr  VSize VFree
 OptVG  1  0  0 wz--n- 5.00g 5.00g
The Attribute flags here indicate this is writable, resizable, and allocating extents in “normal” mode. Lets proceed to make our (first) Logical Volume in this Volume Group:
# lvcreate -n OptLV -L 4.9G OptVG
 Rounding up size to full physical extent 4.90 GiB
 Logical volume "OptLV" created
You’ll note that I have created our Logical Volume as almost the same size as the entire Volume Group (which is currently one disk) but I left some space unused: the reason for this comes down to keeping some space available for any jobs that LVM may want to use on the disk – and this will be used later when we want to move data between raw disk devices.
If I wanted to use LVM for Snapshots, then I’d want to leave more space free (unallocated) again.
We can check on our Logical Volume:
# lvs
 LV   VG   Attr      LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
 OptLV OptVG -wi-a----- 4.90g
The attribytes indicating that the Logical Volume is writeable, is allocating its data to the disk in inherit mode (ie, as the Volume Group is doing), and that it is active. At this stage you may also discover we have a device /dev/OptVG/OptLV, and this is what we’re going to format and mount. But before we do, we should review what file system we’ll use.
Filesystems
Popular Linux file systems
Name
Shrink
Grow
Journal
Max File Sz
Max Vol Sz
btrfs
Y
Y
N
16 EB
16 EB
ext3
Y off-line
Y
Y
2 TB
32 TB
ext4
Y off-line
Y
Y
16 TB
1 EB
xfs
N
Y
Y
8 EB
8 EB
zfs*
N
Y
Y
16 EB
256 ZB
For more details see Wikipedia comparison. Note that ZFS requires 3rd party kernel module of FUSE layer, so I’ll discount that here. BTRFS only went stable with Linux kernel 3.10, so with Debian Jessie that’s a possibility; but for tried and trusted, I’ll use ext4.
The selection of ext4 also means that I’ll only be able to shrink this file system off-line (unmounted).
I’ll make the filesystem:
# mkfs.ext4 /dev/OptVG/OptLV
mke2fs 1.42.12 (29-Aug-2014)
Creating filesystem with 1285120 4k blocks and 321280 inodes
Filesystem UUID: 4f831d17-2b80-495f-8113-580bd74389dd
Superblock backups stored on blocks:
       32768, 98304, 163840, 229376, 294912, 819200, 884736
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
And now mount this volume and check it out:
# mount /dev/OptVG/OptLV /opt/
# df -HT /opt
Filesystem             Type Size Used Avail Use% Mounted on
/dev/mapper/OptVG-OptLV ext4Â 5.1GÂ Â 11MÂ 4.8GÂ Â 1% /opt
Lastly, we want this to be mounted next time we reboot, so edit /etc/fstab and add the line:
/dev/OptVG/OptLV /opt ext4 noatime,nodiratime 0 0
With this in place, we can now start using this disk. I selected here not to update the filesystem every time I access a file or folder – updates get logged as normal but access time is just ignored.
Time to expand
After some time, our 5 GB /opt/ disk is rather full, and we need to make it bigger, but we wish to do so without any downtime. Amazon EBS doesn’t support resizing volumes, so our strategy is to add a new larger volume, and remove the older one that no longer suits us; LVM and ext4’s online resize ability will allow us to do this transparently.
For this example, we’ll decide that we want a 10 GB volume. It can be a different type of EBS volume to our original – we’re going to online-migrate all our data from one to the other.
As when we created the original 5 GB EBS volume above, create a new one in the same AZ and attach it to the host (perhaps a /dev/xvdh this time). We can check the new volume is visible with dmesg again:
[1999786.341602]Â xvdh: unknown partition table
And now we initalise this as a Physical volume for LVM:
# pvcreate /dev/xvdh
 Physical volume "/dev/xvdh" successfully created
And then add this disk to our existing OptVG Volume Group:
# vgextend OptVG /dev/xvdh
 Volume group "OptVG" successfully extended
We can now review our Volume group with vgs, and see our physical volumes with pvs:
# vgs
 VG   #PV #LV #SN Attr  VSize VFree
 OptVG  2  1  0 wz--n- 14.99g 10.09g
# pvs
 PV        VG   Fmt Attr PSize PFree
 /dev/xvdg OptVG lvm2 a--  5.00g 96.00m
 /dev/xvdh OptVG lvm2 a-- 10.00g 10.00g
There are now 2 Physical Volumes – we have a 4.9 GB filesystem taking up space, so 10.09 GB of unallocated space in the VG.
Now its time to stop using the /dev/xvgd volume for any new requests:
# pvchange -x n /dev/xvdg
 Physical volume "/dev/xvdg" changed
 1 physical volume changed / 0 physical volumes not changed
At this time, our existing data is on the old disk, and our new data is on the new one. Its now that I’d recommend running GNU screen (or similar) so you can detach from this shell session and reconnect, as the process of migrating the existing data can take some time (hours for large volumes):
# pvmove /dev/sdb1 /dev/sdd1
 /dev/xvdg: Moved: 0.1%
 /dev/xvdg: Moved: 8.6%
 /dev/xvdg: Moved: 17.1%
 /dev/xvdg: Moved: 25.7%
 /dev/xvdg: Moved: 34.2%
 /dev/xvdg: Moved: 42.5%
 /dev/xvdg: Moved: 51.2%
 /dev/xvdg: Moved: 59.7%
 /dev/xvdg: Moved: 68.0%
 /dev/xvdg: Moved: 76.4%
 /dev/xvdg: Moved: 84.7%
 /dev/xvdg: Moved: 93.3%
 /dev/xvdg: Moved: 100.0%
During the move, checking the Monitoring tab in the AWS EC2 Console for the two volumes should show one with a large data Read metric, and one with a large data Write metric – clearly data should be flowing off the old disk, and on to the new.
A note on disk throughput
The above move was a pretty small, and empty volume. Larger disks will take longer, naturally, so getting some speed out of the process maybe key. There’s a few things we can do to tweak this:
EBS Optimised: a launch-time option that reserves network throughput from certain instance types back to the EBS service within the AZ. Depending on the size of the instance this is 500 MB/sec up to 4GB/sec. Note that for the c4 family of instances, EBS Optimised is on by default.
Size of GP2 disk: the larger the disk, the longer it can sustain high IO throughput – but read this for details.
Size and speed of PIOPs disk: if consistent high IO is required, then moving to Provisioned IO disk may be useful. Looking at the (2 weeks) history of Cloudwatch logs for the old volume will give me some idea of the duty cycle of the disk IO.
Back to the move…
Upon completion I can see that the disk in use is the new disk and not the old one, using pvs again:
# pvs
 PV        VG   Fmt Attr PSize PFree
 /dev/xvdg OptVG lvm2 ---  5.00g 5.00g
 /dev/xvdh OptVG lvm2 a-- 10.00g 5.09g
So all 5 GB is now unused (compare to above, where only 96 MB was PFree). With that disk not containing data, I can tell LVM to remove the disk from the Volume Group:
# vgreduce OptVG /dev/xvdg
 Removed "/dev/xvdg" from volume group "OptVG"
Then I cleanly wipe the labels from the volume:
# pvremove /dev/xvdg
 Labels on physical volume "/dev/xvdg" successfully wiped
If I really want to clean the disk, I could choose to use shred(1) on the disk to overwrite with random data. This can take a lng time
Now the disk is completely unused and disassociated from the VG, I can return to the AWS EC2 Console, and detach the disk:
Wait for a few seconds, and the disk is then shown as “available“; I then chose to delete the disk in the EC2 console (and stop paying for it).
Back to the Logical Volume – it’s still 4.9 GB, so I add 4.5 GB to it:
# lvresize -L +4.5G /dev/OptVG/OptLV
 Size of logical volume OptVG/OptLV changed from 4.90 GiB (1255 extents) to 9.40 GiB (2407 extents).
 Logical volume OptLV successfully resized
We now have 0.6GB free space on the physical volume (pvs confirms this).
Finally, its time to expand out ext4 file system:
# resize2fs /dev/OptVG/OptLV
resize2fs 1.42.12 (29-Aug-2014)
Filesystem at /dev/OptVG/OptLV is mounted on /opt; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 1
The filesystem on /dev/OptVG/OptLV is now 2464768 (4k) blocks long.
And with df we can now see:
# df -HT /opt/
Filesystem             Type Size Used Avail Use% Mounted on
/dev/mapper/OptVG-OptLV ext4Â 9.9GÂ Â 12MÂ 9.4GÂ Â 1% /opt
Automating this
The IAM Role I made at the beginning of this post is now going to be useful. I’ll start by adding an IAM Policy to the Role to permit me to List Volumes, Create Volumes, Attach Volumes and Detach Volumes to my instance-id. Lets start with creating a volume, with a policy like this:
{
 "Version": "2012-10-17",
 "Statement": [
   {
     "Sid": "CreateNewVolumes",
     "Action": "ec2:CreateVolume",
     "Effect": "Allow",
     "Resource": "*",
     "Condition": {
       "StringEquals": {
         "ec2:AvailabilityZone": "us-east-1a",
         "ec2:VolumeType": "gp2"
       },
       "NumericLessThanEquals": {
         "ec2:VolumeSize": "250"
       }
     }
   }
 ]
}
This policy puts some restrictions on the volumes that this instance can create: only within the given Availability Zone (matching our instance), only GP2 SSD (no PIOPs volumes), and size no more than 250 GB. I’ll add another policy to permit this instance role to tag volumes in this AZ that don’t yet have a tag called InstanceId:
{
 "Version": "2012-10-17",
 "Statement": [
   {
     "Sid": "TagUntaggedVolumeWithInstanceId",
     "Action": [
       "ec2:CreateTags"
     ],
     "Effect": "Allow",
     "Resource": "arn:aws:ec2:us-east-1:1234567890:volume/*",
     "Condition": {
       "Null": {
         "ec2:ResourceTag/InstanceId": "true"
       }
     }
   }
 ]
}
Now that I can create (and then tag) volumes, this becomes a simple procedure as to what else I can do to this volume. Deleting and creating snapshots of this volume are two obvious options, and the corresponding policy:
{
 "Version": "2012-10-17",
 "Statement": [
   {
     "Sid": "CreateDeleteSnapshots-DeleteVolume-DescribeModifyVolume",
     "Action": [
       "ec2:CreateSnapshot",
       "ec2:DeleteSnapshot",
       "ec2:DeleteVolume",
       "ec2:DescribeSnapshotAttribute",
       "ec2:DescribeVolumeAttribute",
       "ec2:DescribeVolumeStatus",
       "ec2:ModifyVolumeAttribute"
     ],
     "Effect": "Allow",
     "Resource": "*",
     "Condition": {
       "StringEquals": {
         "ec2:ResourceTag/InstanceId": "i-123456"
       }
     }
   }
 ]
}
Of course it would be lovely if I could use a variable inside the policy condition instead of the literal string of the instance ID, but that’s not currently possible.
Clearly some of the more important actions I want to take are to attach and detach a volume to my instance:
Now with this in place, we can start to fire up the AWS CLI we spoke of. We’ll let the CLI inherit its credentials form the IAM Instance Role and the polices we just defined.
…and at this stage, the above manipulation of the raw block device with LVM can begin. Likewise you can then use the CLI to detach and destroy any unwanted volumes if you are migrating off old block devices.
Add a constraints when using long term (IAM User) API keys
Never put any API keys in your code
Never check API keys into revision control
Rotate credentials
Use EC2 Instance Roles
Let’s go into some detail to explain why these are good ideas…
1. Never use Root API keys
Root API keys cannot be controlled or limited by any IAM policy. They have an effective “god mode”. The best thing you can do with your Root keys is delete them; you’ll see the AWS IAM Console now recommends this (the Dashboard has a traffic light display for good practice).
Instead, create an IAM User, assign an appropriate IAM Policy. You can assign the Policy directly to a user, or they can inherit it via a simple Group membership (the policy being applied to the Group). IAM Policy can be very detailed, but lets start with a simple policy: the same “god mode” you had with Root API keys:
This policy is probably a bad idea for anyone: it’s waaay too open. But for the insistent partner who wants your Root API keys, consider handing them an IAM User user with this policy (to start with).
2. Minimal permissions for the task at hand
So that service provider (or your script) doesn’t need access to your entire AWS account. Face it: when ServiceNow wants to handle your EC2 fleet, they don’t need to control your Route53 DNS. This is where you can build out the Action section of the policy. Multiple actions can be bundled together in a list. Lets think of a program (script) that acesses some data file sitting in S3; we can apply this policy to the IAM user:
With this policy in place for this IAM User, they are now limited to Listing and getting objects from any S3 bucket we have created (or have been granted access to in other AWS accounts). Its good, but we can do better. Lets draw in the Resource to a specific bucket. We do that with an Amazon Resource Name, or ARN, that can be quite specific.
Now we have said we want to list the root of the bucket “mybucket“, and every object within it. That’s kept our data in other buckets private from this script.
3. Add Policy Constraints
Let’ s now add some further restrictions on this script. We know we’re going to run it always from a certain ISP or location, so we can have a pretty good guess at the set of IP addresses our authenticated requests should come from. In this case, I’m going to suggest that from somewhere in the 106.0.0.0/8 CIDR range should be OK, and that any requests from elsewhere cannot get this permission applied (unless another policy permits it). I’m also going to insist on using an encrypted transport (SSL). Here’s our new policy:
That’s looking more secure. Even if credentials got exfiltrated, then the culprit would have to guess what my policy is to use these keys. I could add a set of IP addresses here; I could also check CloudTrail for attempted use of those credentials from IPs that don’t have access (and rotate them ASAP and figure out how they were carried off to start with).
We could further constrain with conditions such as a UserAgent string, which while not the best security in the world, can be a nice touch. Imagine if we had two policies, one with a UserAgent condition StringEquals match of “MyScript” that lets you read your data, and one with a UserAgent condition StringEquals of “MyScriptWithDelete” that lets you delete data. It’s a simple Molly guard, but when you do that delete you also set your UserAgent string to let you delete. (See also MFA Delete).
If we’re running as a cron/Scheduled Task, then we could also limit this policy to a small time window per day to be active. Perhaps +/- 5 minutes for a set time? Easy.
4. Never put any API keys in your code
Hard coding your credentials should raise the hackles on the back of your neck. There’s a number of much better options:
use a config file (outside of your code base)
or an environment variable
or a command line switch
or all of the above
There’s plenty of libraries for most languages that let you handle config files quite easily, even multi-layered config files (eg. first read /etc/$prog/config, then read ~/.$prog/config).
5. Never check API keys into revision control
Consider checking for API keys in your code and rejecting commits that contain them. Here’s a simple check for an Access Key or Secret Key:
So perhaps having this in your pre-commit hooks to stop you from checking in something that matches this would be useful? The format for Secret and Access keys is a little rough, and could always change in future, so prepare to be flexible. If you opted above for a config file, then perhaps that’s a good candidate for adding to your .gitignore file.
 6. Rotate Credentials
Change credentials regularly.The IAM Console app now has a handy Credential Report to show the administrator when credentials are used and updated, and IAM lets you create a second Access and Secret key pair while the current one is still active letting you update your script(s) in production to the new credential without an interruption to access. Just remember to complete the process by marking the old key inactive for a short spell (to confirm all is well), and then delete the old one. Don’t be precious about the credentials themselves – use and discard. It’s somewhat manual (so see the next point).
7. Use IAM Roles for EC2 Instances (where possible)
The AWS IAM team have solved the chicken-and-egg of getting secured credentials to a new EC2 instance: IAM Roles for EC2 Instances. In a nutshell its the ability for the instance to ask for a set of credentials (via an HTTP call) and get return role credentials (Access Key, Secret Key, and Token) to then do API calls with; you set the Policy (similar to above) on the Role, and you’re done. If you’re using one of the AWS supplied libraries/SDKs, then your code wont need any modifications – these libraries check for an EC2 role and use that if you don’t explicitly set credentials.
It’s not magic – its the EC2 metadata service that’s handing this to your instance. Word of warning through – any process on your instance that can do an outbound HTTP call can grab these creds, so don’t do this on an a multi-user SSH host.
The metadata service also tells your instance the life span of these credentials, and makes fresh creds available multiple times per day. With that information, the SDKs know when to request updated credentials from the metadata service, so your now auto-rotating your credentials!
Of course, if the API keys you seek are for an off-cloud script or client, then this is not an option.
Summary
IAM Policy is a key piece in you securing your AWS resources, but you have to implement it. If you have AWS Support, then raise a ticket for them to help you with your policies. You can also use the IAM Policy Generator to help make them, and the IAM Policy Simulator to help test them.
Its worth keeping an eye on the AWS Security Blog, and as always, if you suspect something, contact AWS Support for assistance from the relevent AWS Security Team.
What’s interesting is if you take this to the next level and programatically scan for violations of the above in your own account. A simple script to test how many policies have no Constraints (sound of klaxon going off)? Any policies that Put objects into S3 that don’t also enforces3:x-amz-server-side-encryption: AES256 and s3:x-amz-acl: private?
PS: I really loved the video from the team at Intuit at Re:Invent this year – so much, it’s right here:
In the mid 80s, my Dad wrote a lyrics for a song for my primary school – Woodlands -Â with one of our neighbours creating the music. It became the official school song for nearly 30 odd years, and was only recently supplanted. I was trying to remember the lyrics, and found only one document with it left on line, so I thought I ‘d paste it here to preserve it a little longer.