I touched on this in a recent article, but I wanted to dive deeper on this.
AWS makes much about Well-Architected principles, something I worked on the early stages of circa 2013, and applied to $work. I strongly recommend anyone deploying to any cloud provider think about these principles and their responsibility in implementing these, or in ensuring they are implemented.
Around the same time (2013/2014), the term DevOps and the rise of CI/CD pipelines was also coming to the fore. Looking back, the biggest advantage that Well-Architected lent on DevOps for was the ability to make rapid, incremental improvements to an architecture.
Poor architectural implementations traditionally went unchanged during he lifecycle of the deployed solution. Poor software would eventually be replaced in a follow up project; replaced with massive fanfare and budget.
So while Well-Architected starts a project, the concept of Well Maintained is the constant re-application of Well-Architected to a workload post go-live. It’s also the rapid adoption of software patches throughout the stack: the database version, the SDKs and libraries in the code base, the uplift of runtime versions (such as Java 8 -> 11, and beyond), the enabling of new TLS protocols and sun-setting of the old (TLS 1.3 turned on, TLS <= 1.1 turned off at this time).
A project that always adopts the current version of SDKs, and is always in good compliance with current best practice over time is Well Maintained. It’s almost evergreen. Its ages well – in the fact it doesn’t really age.
How can you tell if something is Well-Maintained?
Check the versions of its components. Dive deep. Find out what prevents you from updating these items. Find the known vulnerabilities in the versions between what your project has now, and the current released version.
McAfee Enterprise Security Manager (ESM) is a Security Information and Event Management (SIEM) product that helps collect and then correlate events from various data sources (McAfee info).
Gartner rates it in their Magic quadrant, shown here from 2019:
I was working with a customer recently who wanted to ingest their AWS CloudTrail logs into this product.
McAffee have implemented support for this, using an IAM Access and Secret key, reading from an SQS Queue, and fetching the referenced files from S3.
AWS CloudTrail itself supports sending a notification to the Simple Notification Service for each log file delivered; it is left as an exercise to the customer to plumb the rest together. It’s not hard, and can be put into a CloudFormation template. Here it is: Download the CloudFormation Template.
Lets pull apart the simple architecture that is wrapped up in the above linked template.
Parameters
The parameters to a template are the only items you likely need to customise to your environment.
The S3 Bucket name that you have CloudTrail being deployed in to (from the Trail config)
The SNS Topic name that you have CloudTrail sending file delivery notifications sent to (from the Trail config)
The Name of the new SQS Queue to be made that will catch the notifications, and buffer them for McAfee to then read. You can probably leave this as the default.
The name of the IAM User to be created to permit an off-cloud SIEM to have API credentials. You can probably leave this as McAfeeSIEM.
The Public IPv4 address range that requests will originate for your ESM making its outbound API calls to AWS.
Your AWS Organizations “Organization ID”. Its a string that starts with “o-” that is in the prefix for the files appearing in S3 from CloudTrail.
Inside the template
The resources created in the template are the guts of what gets created on our behalf.
The SQS queue is configured with long polling enabled, to reduce the number of polls being made in case McAfee tries a tight continual loop, but also so that when the queue is already drained, the McAfee will remain connected for up to this duration to immediately get the message to fetch a file.
An SQS Policy is added to the Queue to permit the selected SNS Topic to publish messages to the queue, and then a Subscription to hook these together is defined.
Lastly, an IAM user is created, with a policy that permits access to process messages from the queue (read and delete messages, plus a few other APIs that McAfee documented), as well as access to List the contents of the target bucket, and Get the objects within the defined prefix.
Admin actions after deploying the template
With this CloudFormation stack deployed, go to the IAM console, find the new IAM User (McAfeeSIEM was the default), go to the Credentials tab, and issue an Access key pair. Take care to record the secret key – this is the only time you’ll see this; if you lose it, then start a key rotation to get a fresh Access/Secret key pair.
On McAfee SIEM, insert the access key and secret key into the AWS CloudTrail config. If your SIEM has outbound Internet access (possibly via a proxy) then this should start to fetch messages form the SQS Queue and process files.
You can look for the number of messages in the SQS queue as a help to debug: if the queue is non-zero (and growing) then your SIEM is probably not fetching and clearing its Queue messages.
Unless you’ve been under a rock, you’ve seen the impact that Hyperscale Public Cloud has made on the IT industry. Its invention wasn’t to be a thing, but to be a continually evolving, improving thing.
And while many organisations will use SaaS platforms, those platforms themselves often run atop the IaaS and PaaS platforms of a hyperscale cloud platform.
One person’s SaaS is another person’s IaaS.
Me, James Bromberger
But its worth just checking on the evolution of IT service delivery at a low level, for not everyone who is in the IT industry has seen what that looks like at this time.
Change is hard. Humans are bad at it. I’ve seen many who evolved from column 1 to column 2, and have felt they are “done”. They aren’t on-board for the next wave of the evolution.
I suffer from this too. But three is a short cut that I can offer: try to jump from where you are now, to as far to the right as you can in one step.
Every one of these phases is a monumental shift in the way that services are delivered, requiring training, and experience. There is an overhead knowledge baggage that engineers take with them, trying to work out what functions the same as before, and what is different. This is taxing, stressful, and unpleasant.
So rather than repeat this process in sequence, over years for each change, my recommendation is to see how far to the right you can jump. Some limitations will crop up that prevent you from leap-frogging all the way to Serverless, but that’s OK. Other services will not be thus constrained.
Well Architected, meet Well Maintained
In 2012, the Well Architected concept was born inside AWS. It is a set of principles that helps lead to success in the Cloud; at that time, that was the AWS EC2 environment. It’s well worth a read if you have not seen it. At this time, its also been adopted my Microsoft for the Azure environment as well.
However, I want to move your attention from Architecture time, to operations time.
If you look at the traditional total life-cycle activities, there’s a lot of time and effort spent learning, adjusting, and implementing supporting technologies that are starting to become invisible in the Serverless world.
Lets look at the operational activities done in a physical environment, and compare that to Serverless. I’ll skip the middle phases of evolution as shown above:
Activity
Physical
Serverless
Physical security
Required
Managed
Physical installation
Required
Managed
Capacity Planning
Required
Managed
Network switching
Required
Managed
Hardware power planning
Required
Managed
Physical cooling
Required
Managed
Hardware procurement
Required
Managed
Hardware firmware updates
Required
Managed
OS installation
Required
Managed
OS patching
Required
Managed
OS upgrade
Required
Managed
OS licensing
Often Required
Managed
Runtime selection
Required
Required
Runtime minor patching
Required
Managed
Runtime major version upgrade
Required
Required
App server selection
Required
Managed
App server minor patching
Required
Managed
App server major version upgrade
Required
Managed
Code base maintenance
Required
Required
Code base 3rd party library updates (SDKs)
Required
Required
Network encryption protocol and cipher upgrades (TLS, etc)
Required
Required
As you can see, a large number of activities that should be done regularly to ensure operational excellence. However, I am yet to see a traditional physical environment, or virtualised on-prem environment that actively does all of the above well.
It’s an easy test: wander into any Java environment, and ask what version of the Java runtime is deployed in production. The typical response is “we updated to Java 8 two years ago“. What that means if “we haven’t touched the exact deployed version of Java for two years“.
Likewise, ask what version of Windows Server is deployed? Anything older than 2016 (even that, with 2019 has been out for nearly 2 years at this time is generous) shows a lack of agility and maintenance.
I challenge those in IT operations to think through the above table and check the last time their service updated each row – post project launch. If its a poor show, the change is your in “support mode”, and not “DevOps Operations”.
So what can be done to help do this maintenance?
Take it away. Stop it. While it can be argued to be important, and interesting, you’re possibly better off spending that effort on the smaller list that remains in a Serverless environment.
Evolution Continues
We can’t see where this evolution will go next. We do see that identity, authentication, authorisation, in-flight encryption, remain key elements to be aware of.
What comes next, I can’t predict. I know many ideas will be thrown about, new or recycled, and some will work, while others will wither and disappear again.
I have spent many years working with Landgate, the state government Department of Land Administration. It’s a well known AWS Case Study, and a platform that is available for other land jurisdictions of the world if they wish to move to it.
One of the integrations that implemented is from/to the Electronic Lodgement Network Operators (ELNOs) to facilitate electronic settlement of property transactions, of which only one is currently active in Australia, Property Exchange Australia, otherwise known as PEXA.
Using PEXA saves settlement agencies, and banks from having to send representatives to a specific location at a specific time with the assortment of cheques, paperwork, and other administration that, should one thing be out of order, causes settlement to be delayed (a costly exercise). Many transactions types have now been mandated to be done via electronic interfaces, one of the first of which in Western Australia was the Discharge of Mortgage.
More than 80% of transactions on the land registry are a property being sold while under mortgage, to someone else who has also taken out a mortgage. This is called DTM; Discharge, Transfer, Mortgage. It’s one of the first transactions that the Advara platform automated the validation of data submitted, saving huge amounts of manual effort.
For a transaction submitted by PEXA, the general turn around time on data validation and transaction approval has now dropped to around 10.8 seconds, down from historical highs of ?30 days.
My Transaction
I was recently purchasing a new property (my home study is occupied by a rather adorable 5 year old girl, IMHO) and armed with the workings of the land titling system, I figured I’d actively watch my settlement transaction.
PEXA has created a user application called PEXA Key, for Android and iPhone, that permits sellers and purchasers to be invited to their property settlement transaction.
All a settlement agents needs to do is collect a mobile phone number and email address from the seller or purchaser, and enter into into the PEXA workspace.
I enquired about this to the real estate agent selling the property, and then in turn my settlement agent, and none of them had heard of this, much less actually done it. So I pushed on, and lo, managed to have them submit my details.
This post shows what happens next.
A Text Message
I received a text message almost immediately – with variables shown where real names were used:
Hi JAMES,
${SETTLEMENT_AGENT} has invited you to download the PEXA Key app to track your settlement. Check your email for more details. Get the app free here key.pexa.com.au/download or exclusively on Google Play or Apple App Store.
Text message I received after my settelment agent registered me in the workspace.
I quickly complied, and was then sent a security activation code.
The app then told me when my settlement had been scheduled for, and any pending tasks that I was responsible for (as it happened, I had already done everything, so it was fine).
This immediately gave me piece of mind, knowing the transaction workspace was set up and pending.
The morning of settlement came around, and I was greeted with this:
I nervously checked the application every few minutes to see what would happen next.
And so it begins
It turned out that the process was initiated around 10:25am or so, after which the PEXA Key application showed:
OK, strap in, the wheels are in motion.
It took around 40 minutes until all was done and dusted, and the final result came through:
A few hours later, a new set of house keys were in my hand.
This has to be the most expensive testing I have ever personally done! 😉
The inclusion of the end customer in this process, which just simple visibility, is something that I think should be offered to all parties in the transaction to bring confidence and clarity in the progress or inhibitors in the transaction.
Key to this (pun intended, in two ways) is the swift and efficient recognition of land property transactions. My colleagues and I have worked hard to uplift the validation and security of the land registry system for years, and continue to do so. And as a customer of this system, it worked smoothly.
Some coverage of PEXA Key is here in Cyber Security Magazine (saying this stops an avenue of attack).
I recommend anyone buying or selling property to ask their agent to invite them into the settlement on PEXA using PEXA Key. As many in the real estate industry I have spoken to are unaware of this, you may need to explain this (send them this article’s URL), but its worth it.
Disclosure: I do not work for PEXA, nor have been asked by them (or anyone else) to write this. I share the above to assist anyone else who would like to see their property transactions being processed. While PEXA is a national (Australian) electronic settlement platform, the turn around time from each separate land jurisdiction to validate and register the transaction will vary. Indeed, I’d challenge any of them to beat 10.8 seconds full validation!
Many organisations are today able to access their email, corporate video conferencing and other services while mobile and without being connected to their company VPN endpoint.
Universal access over the Internet – on IPv4 and IPv6 – working seamlessly wherever you are to these services just works. It’s liberating, and no one is jumping up and down asking about the firewall, VPN.
Key amongst the platforms being used to give this is Microsoft Office 365 and its various platforms.
So why do you still have a corporate VPN? Why does your existing corporate IT services require you to jump through hoops to access it?
Let me be direct: your corporate strategy on security is based around lowest cost, lowest effort. This budget approach also means the least amount of work for the technology staff who operate these services for your organisation.
Office365, Salesforce, and a slew of other universally-just-works over the Internet solutions have something that the bespoke solutions you have in-house do not: funding to operate as such.
The main premise when you make services available over the internet is a commitment to do several things from an operational perspective:
Support newer encryption protocols (TLS) over time, and remove older encryption protocols (TLS) over time
Add new encryption ciphers over time, and remove older encryption ciphers over time
Use federated sign-on (single sign-on)
Maintain (update) the single sign on service over time, with continual uplift (eg, introduce MFA)
Examine logs and look for anomalies in access, and then automatically lock out a user, and iterate improvements into the application
Your organisation probably does not do this. Your company’s IT operations team probably “keep the lights on”, ensuring the currently deployed application is responsive, poking it with a stick to ensure it moved. They probably didn’t uplift to TLS 1.3 in the last 2 years, and they probably haven’t removed TLS 1.1 and below.
And while they collect application logs, any review is probably pretty basic.
Why?
Doing so requires time, training, effort, experience and knowledge. Until you have a 24×7 DevOps team able to turn on a dime, a CISO who represents the security risk and operational response to the board, and a few other tell-tale signs, then your organisation is not ready.
All of the above requires a strong vision, strong senior leadership fro the top, and a strong funding model that prioritises the digital security of the company.
A traditional VPN means there is a controlled ingress point (in theory) as a single point to protect. Here you need to have the focus on encryption and authentication, but quite often most organisations just deploy a firmware on a device, install an initial config, and leave a device for years.
I’ve seen some MSPs deploy minor version updates on their security endpoints, but never adopt the major version updates they are entitled to, despite the customer paying support for the major upgrades. And still, when the major version upgrades were installed, the config was not adjusted to enable newer capabilities, or disable outdated options.
So, next time you have to VPN in to the company, ask yourself: why? Why are spending money on expensive bottlenecks that slow you down, instead of mature operations? The value proposition isn’t there. Budget. Focus. Leadership.