Software License Depreciation in a Cloud World

Much effort is spent on preserving and optimising software licenses when organisations shift their workloads to a cloud provider. It’s seen as a “sunk cost”, something that needs to be taken whole into the new world, without question.

However, some vendors don’t like their customers using certain cloud providers, and are making things progressively more difficult for those organisations that value (or are required) to keep their software stack well maintained.

Case in point, one software vendor who has their own cloud provider made significant changes to their licensing, removing rights progressively for customers to have the choice to run their acquired licences in a competitors cloud.

I say progressively, customers can continue to run (now) older versions of the software before that point in time the licensing was modified.

The Security Focus

Security in IT is a moving target. Three’s always better ways of doing something, and previous ways which, once were the best way, but are now deemed obsolete.

Let me give you a clear example: network encryption in flight. The dominant protocol used to negotiate this is called Transport Layer Security (TLS), and its something I’ve written about many times. There’s different versions (and if you dig back far enough, it even had a different name – SSL or Secure Sockets Layer).

Older TLS versions have been found to be weaker, and newer versions implemented.

But certain industry regulators have mandated only the latest versions be used.

Support for this TLS is embedded in both your computer operating system, and certain applications that you run. This permits the application to make outbound connections using TLS, as well as listen and receive connections protected with TLS.

Take a database server: its listening for connections. Unless you’ve been living under a rock, the standard approach these days is to insist on using encryption in flight in each segment of your application. Application servers may access your database, but only if the connection is encrypted – despite them sitting in the same data centre, possibly in the same rack or same physical host! It’s an added layer of security, and the optimisations done mean its rarely a significant overhead compared to the eavesdropping protection it grants you.

Your operating system from say 2019 or before may not support the latest TLS 1.3 – some vendors were pretty slow with implementing support for it, and only did so when you installed a new version of the entire operating system. And then some application providers didn’t integrate the increased capability (or a control to permit or limit the version of TLS) in their software in those older versions from 2019 or earlier.

But in newer versions they have fixed this.

Right now, most compliance programs require only TLS 1.2 or newer, but it is foreseeable that in future, organisations will be required to “raise the bar” (or drawbridge) to use only TLS 1.3 (or newer), at which time, all that older software becomes unusable.

Those licences become worthless.

Of course, the vendor would love you to take a new licence, but only if you don’t use other cloud providers.

Vendor Stickiness

At this time, you may be thinking that this is not a great customer relationship. You have an asset that, over time, will become useless, and you are being restricted from using your licence under newer terms.

The question then turns to “why do we use this vendor”. And often it is because of historical reasons. “We’ve always used XYZ database”, “we already have a site licence for their products, so we use it for everything”. Turns out, that’s a trap. Trying to smear cost savings by forcing technology decisions because of what you already have may preclude you from having flexibility in your favour.

For some in the industry, the short term goal is the only objective; they signa purchase order to reach an immediate objective, without taking the longer term view of where that is leading the organisation – even if that’s backing hem into a corner. They celebrate the short term win, get a few games of golf out of it, and then go hunting for their next role elsewhere, using the impressive short term saving as their report card.

A former colleague of mine once wrote that senior executive bonuses shouldn’t be paid out in the same calendar year, but delayed (perhaps 3 years) to ensure that the longer term success was the right outcome.

Those with more fortitude with change have, over the last decade, been embracing Open Source solutions for more of their software stack. The lack of licence restriction – and licence cost – makes it palatable.

The challenge is having the team who can not only implement potential software changes, but also support a new component in your technology stack. For incumbent operations and support teams, this can be an upskilling challenge; some wont want to learn something new, and will churn up large amounts of Fear, Uncertainty and Doubt (FUD). Ultimately, they argue it is better to just keep doing what we’ve always done, and pay the financial cost, instead of the effort to do something better.

Because better is change, and change is hard.

An Example

Several years ago, my colleagues helped rewrite a Java based application and change the database from Oracle, to PostgreSQL. It was a few months from start to finish, with significant testing. Both the Oracle and PostgreSQL were running happily on AWS Relational Database Service (RDS). The database was simple table storage, but the original application developers already had a site license for Oracle, and since that’s what they had, that’s what they’ll use.

At the end of the project, the cost savings were significant. The return on investment for the project services to implement the change was around 3 months, and now, years later, the client is so much better off financially. It changed the trajectory of the TCO spend.

The coming software apocalypse

So all these licences that are starting to hold back innovation are becoming progressively problematic. The time that security requirements tighten again, you’re going to hear a lot of very large, legacy software license agreements disintegrate.

Meanwhile, some clod providers can bundle the software licence into the hourly compute usage fee. If you use it, you pay for it; when you don’t use it, you don’t pay for it. if you want a newer version, then you have flexibility to do so. Or perhaps event to stop using it.

More TLS 1.3 on AWS

Earlier this week, AWS posted about their expanded support for TLS 1.3, clearly jumping on the reduced handshake as a speed improvement in their blog post entitled: Faster AWS cloud connections with TLS 1.3.

Back in 2017, (yes, 6 years ago) we started raising Product Feature Requests for AWS products to enable this support, and at the same time, customer control to be able to limit the acceptable TLS versions. This makes perfect sense in customer applications (the data plane). Not only do we not want our applications supporting every possible historic version of cryptography, various compliance programs require us to disable them.

Most notable in this was PCI DSS 3.1, the Payment Card (credit card) Industry Association’s Data Security Standard, which drove the nail in to the coffin of TLS 1.1 and everything before it.

Over time, TLS versions (and SSL before it) have fallen from grace. Indeed, SSL 1.0 was so bad it never saw the light of day outside of Netscape.

And it stands to reason that, in future, newer versions of TLS will come to life, and older versions will, eventually, have to be retired; and between those two, is another transition. However, this transition requires deep upgrades from cryptography libraries, and sometimes to client code to support the lower level library’s new capability..

On the server side, we often see a more proactive implementation of what currently supported TLS versions are permitted. Great services like SSLLabs.com, Hardenize.com, and testssl.sh have guided many people to what today’s current state of “acceptable” and “good” would generally look like. And the key item of those services, is their continual uplift as the state of “acceptable” and “good” changes over time.

On the client side, its not always been as useful. I may have a process that establishes outbound connections to a server, but as a client, I amy wan tto specify some minimum version for my compliance, and not just rely upon the remote party to do this for me. Not many software packages do this – the closest control you get is an integration possibly using HTTPS (or TLS), and not the next level down of “yeah, so which versions are OK to use when I connect outbound”. Of course, having specified HTTPS (or TLS) and doing server certificate validation against our local trust store, we then have a degree of confidence hat its probably the right provider, given that one of my 500 trusted CAs signed that certificate. we got given back during the handshake

This sunrise/sunset is even more important to understand in the case of managed services from hyperscaler cloud providers. AWS speaks of the deprecation of TLS 1.1 and prior in this article (June 2022).

If you have solutions that use AWS APIs, such as applications talking to DynamoDB, then this is part of your technical debt you should be actively, regularly addressing. If you haven’t been including updated AWS SDKs in your application, and updating your installed SSL libraries, updating your OS, then you may not be prepared for this. Sure, it may be “working” fine right now.

One option you have is to look at your application connection logs, and see if the TLS version for connections is being logged. If not, you probably want to get that level of visibility. Sure, you could Wireshark (packet dump) a few sample connections, but it would probably be better not to have to resort to that. Having the right data logged is all part of Observability.

June 28 is the (current) deadline for AWS to raise the minimum supported TLS version. That’s a month away from today. Let’s see who hasn’t been listening…

Cyber Insurance: EoL?

The Chief Executive of insurance company Zurich, Mario Greco, recently said:

“What will become uninsurable is going to be cyber,” he said. “What if someone takes control of vital parts of our infrastructure, the consequences of that?” 

Mario Greco, Zurich

In the same article is Lloyds insurance looking for exceptions in Cyber insurance for those attacks that are state based actors, which is a difficult thing to prove with certainty.

All in all, some reasons that Cyber Insurance exists is to cover from a risk perspective the opportunity of spending less on insurance premiums (and having financial recompense to cover operational costs) that having competent processes around software maintenance to code securely to start with, detect threats quickly, and maintain (patch/update) rapidly over time.

The structure of most organisations to have a “support team” who are responsible for an ever growing list of digital solutions, goaled on cost minimisation, and not measured against the amount of maintenance actions per solutions operated.

Its one of the reasons I like the siloed approach of DevOps and Service Teams. Scope is contained to one (or a small number of similar) solution(s). Same tech base, same skill set. With a remit to have observability, metrics and focus on one solution, the team can go deep on full-stack maintenance, focusing on a job well done, rather than a system that is just turned on.

It’s the difference between a grand painter, and a photocopier. Both make images; and for some low-value solutions, perhaps a photocopier is all they are worth investing in from a risk-reward perspective. But for those solutions that are the digital-life-blood of an organisation, the differentiator to competitors, and those that have the biggest end-customer impact, then perhaps they need a more appropriate level of operational investment — as part of the digital solution, not as a separate cost centre that can be seen to be minimised or eradicated.

If Cyber insurance goes end-of-life as a product in the insurance industry, then the war on talent, the focus to find those artisans who can adequately provide that , increases. All companies want the smartest people, as one smarter person may be more cost effective than 3 average engineers.

Is it safe to move to The Cloud?

I try and stay as up-to-date with all things Cloud, and have done for the better part of a decade and a bit. But I recently came across a social media post entitled “Is it safe to move to the cloud?“, and with this much experience, I had so many immediate thoughts, that this post thus precipitated.

My immediate reaction was “Is it safe to NOT move to The Cloud?“, but then I thought about the underlying problems with all digital solutions. And the key issue is understanding TCO, and ensuring the right cost is being endured over the operating time of the solution, rather than the least cost as is so typical.

The truth is that with digital systems, things change all the time. And if those systems are facing untrusted networks (such as the Internet), or processing untrusted data (such as came from humans) then there are issues lurking.

Let me take a moment to point out, as an example, any Java implementation that used the very popular Log4J library to handle error messages. Last December (2021) a serious vulnerability arose that meant that if you logged a certain message, then it would trigger an issue. Quite often error messages being raised include the offending input that failed validation or caused an exception, and thus, you could have untrusted data triggering a vulnerability via this (wildly popular and heavily used) library.

It’s not that anyone had done anything bad on purpose. No one had spotted it (and reported it to the developer of the library) earlier.

Of course, the correct thing happened: an updated version of this library was released. And then other vendors of solutions updated their products that included this newer version of the Log4J library. And then your operations team updated your deployment of this application.

Or did they.

There’s a phrase that fills me with fear in IT operations: “Transition to Support“. It indicates we’re punting the operational responsibility of the solution to a team that a did not build it, and do not now how to make major changes to the application. We’re sending to to a team that already look after other digital solutions, and adding one more thing to their work for them to check is operational, and for them to maintain — which, as they are often overwhelmed with multiple solutions, they do the simplest thing: check it is operational, not that it is Well Maintained.

Transition to support: the death knell for Well-Maintained systems

James Bromberger

I’ve seen first hand that critical enterprise systems, line-of-business processing that is the core of the business, is best served when the smart people who built it, stay to operate it in a DevOps approach. This team can make the major surgical changes that happen after deployment, and as business conditions and cyber threats change.

The concern here is cost. Development teams cost more than dumping large numbers of systems on under staffed Support teams. Or support gets sent offshore to external providers who may spend 30 seconds checking the system works, but no time investigating the error messages and their resolution that may require a software update.

It’s a question of cost.

A short-term CIO makes their hero status by cutting costs. Immediately this has only a positive impact on the balance sheet. But as time goes on, the risks of poor maintenance goes up. But after the financial year has ended, and short term EBITDA shows massive growth, and a heroes party is given for the CIO, they then miraculously depart for another job based on the short term success.

Next up, the original company finds that their digital solution needs to be updated, but there is no one who understands it to make such a change.

The smart people were let go of. They were seen as a cost, not part of the business.

So lets rephrase the question: “Is it safe to move to the cloud with your current IT management and maintenance approach?” Possibly not: you probably have to modify the way you do a lot of things, including how you structure your teams and Org Unit. You may need to up-weight training for teams who will now take on full responsibility for workloads, instead of just being “the network guy”. But this is an opportunity; those teams can now feel that THEY are the service team for a workload that supports something more substantial than just rack-and-stack of storage. Moving to separate DevOps teams per critical workload, you can then have them independently innovate – but collaborate on standards and improvements. a friendly competition on addressing technical debt, or number of user feature improvements requested – and satisfied.

So is it safe to move to the Cloud? It depends on who is doing, how much knowledge and experience they have, and what happens next in your operating model.

The Cloud is not just another data centre. And TCO isn’t just cloud costs, and it isn’t just people cost. Sometimes the cost is the compliance failure and fine you get by inadvertently removing the operating model that would have prevented a data breach.

Its been 7 years since I (and my colleagues at Ajilon/Modis, soon to be Akkodis) moved the Land Registry of Western Australia, the critical government registry of property ownership of the state, into the AWS Cloud for Landgate. We’ve kept a DevOps approach for the solution – ensuring it was not just Well-Architected, but Well Maintained. It’s a small DevOps crew now that ensure that Java Updates, 3rd party library updates and more get imported, but also maintenance of the Cloud environment such as load balancing, virtual machine types & images (AMIs) get updated, managed relational database versions get updated, newer TLS versions get supported and — more importantly — older versions get deprecated and disabled. FinOps, DevOps, and collaboration.

Log4Shell and your apps in AWS

There’s a great XKCD cartoon entitled Depencency that cuts to the heart of today’s software engineering world: developers (and in turn organisations) everywhere love the use of libraries to accelerate their development efforts, particularly if that library of code is free to use, and typically that’s Open Source Free.

The image speaks about large complex systems, critical to organisations, needing the unpaid, thankless contributors of these libraries but upon whom everything relies.

In the last week, we’ve seen Log4J, a Java logging utility, come under such focus due to a critical remote code execution bug that can see the server side triggered to make outbound requests. A vast amount of Java based solutions for the last 15+ years has dependencies on logging messages being implemented using this library.

It’s lead to articles like “The Internet is on Fire” by the Australian Broadcasting Corporation, and thousands of posts on Infosec specific news sites (eg: (Kaspersky, Postswigger, SecurityScorecard, BleepingComputer).

Java is widely used, as Oracle corporation points out clearly:

3 Billion Devices Run Java – Oracle

There’s two sides to this: invalid requests coming in that should be handled with sensible data validation, and the resulting external requests that servers can be tricked into making.

Now I am not saying everyone should use their own logging library; that would be even more on fire. But we should stand ready to update these things rapidly, and we should help with either code contributions or financial donations (or both) to help improve this for the common good.

Untrusted Data Validation

Validating untrusted data sources is critical. The content of a local configuration file is vastly different from the query from the Internet. I’ve often joked about setting my Browser user-agent string to the EICAR test file content, used as a dummy value to trigger Antivirus software to match on this text.

In this case, we have remote attackers stuffing custom generated data strings in HTTP requests (and email and other sources that accept external traffic/data) to try and trick the Log4j library into processing and interpreting this data instead of just writing it to a log file.

Web servers always accept data from the Internet, and Web Application Firewalls can offer some protection, but in this case, the actual “string to check” can be escaped, making it harder to write simple rules that match.

Restricting outbound traffic

An attacker is often trying to get a better access into the systems they target; their initial foothold may be tentative. In this example, the ability to trick a target server to fetch additional data (payload) from an external service is key. There’s two main types of external data egress: direct, and indirect.

In the direct model, your server, which you installed and thus trust, may be running behind a firewall, but have you checked if you have restrictions on what it can fetch directly from the Internet?

In AWS, the default AWS Security Group for egress is to permit all traffic; this is a terrible idea, but is the element of least surprise for those new to the AWS VPC environment. It is strongly recommended that you pair this down for all applications, to end up with only the minimum network access you need, even when behind a (managed) NAT Gateway or routing rules, and even if you think your server only has internal network access.

I wrote a whitepaper on this topic for Modis in 2019 about Lateral Movement within the AWS VPC, and some of the concepts there are relevant now.

Your VPC-deployed virtual machine instance probably only needs to initiate connections to S3 on 443, and its database server on the local CIDR (address) range. For example, if you have three Subnets for databases:

  • 10.0.0.0/26 (Databases in AZ-A)
  • 10.0.0.64/26 (Databases in AZ-B)
  • 10.0.0.128/26 (Databases in AZ-C)
  • 10.0.0.196/26 (reserved for future expansion of Databases in a yet-to-be announced AZ-D)

… and are running MySQL (eg, RDS MySQL) in those AZs, then you probably want an egress rule on your Application Server/instance of 10.0.0.0/24:3306. (Note, be ready for making this all IPv6 in future). However, your inbound rule on the same group is probably referential to your managed Load Balancer, on port 443.

What about DNS and Time Sync?

If you have cut down your egress to just the two rules (HTTPS for S3 to bootstrap, CFN-init to signal ASG creation, and database traffic), what about things like DNS and Time. These are typically UDP based (ports 53, 123).

Indeed, the typical DNS firewall used for NTP, when syncing from external time services, is *:123 inbound and *:123 outbound. Ouch.

AWS Time Sync Service

The good news is you do not need to permit this in your security group rules IF you are using the AWS VPC provided Time Sync service and DNS Resolvers. These are available over the link-local network, and security groups do not restrict this traffic; hence can be left closed for UDP port 123.

This time service is also scalable; you don’t need to have thousands of hosts pointing at one or two of your own NTP servers; the AWS Time Sync service runs from the hyper-visor, so as you randomly add instances, you have more physical nodes (droplets) involved in provisioning this, so your time services scale.

Managed & Scalable DNS Resolution

DNS can be used for data exfiltration. If you run your own DNS resolver (eg, on a Windows Domain Controller(s) or Linux host(s) and set your DHCP to hand this resolver address to clients, then you may be at risk of not even seeing this happen. This is an indirect way of being exploited; your end server may not have access to egress to the Internet, but it can egress to your DNS resolver to… well, look up addresses. If you do run your own DNS server, you should be looking at the log of what is being looked up, and managing the process to match this against a threat list, and issuing warnings of potential compromise.

Managed DNS Security Checks: Guard Duty

If that’s too much effort, then there is a managed solution for this: AWS Guard Duty and the VPC-provided DNS resolver. In order for Guard Duty to inspect and warn on this traffic, you must be sending DNS queries via the VPC resolver. Turning on Guard Duty while not sending DNS traffic through the AWS provided service – for example, running your own root-resolving DNS server, means the warnings from Guard Duty will probably never trigger.

By contrast, having your self-managed resolver (eg Active Directory server) use the VPC resolver means that it is the one that will be reported upon when any other instance uses it as a resolver with a risky lookup! I’m sue that will be a mild panic.

Managed DNS Proactive Blocking: DNS Firewall

Going beyond simply retrospectively telling you that traffic happened is pro-actively blocking DNS traffic. Route53 DNS Firewall was introduced in 2021, using managed block lists for malicious domains. This gives some level of protection that clients (instances) will get a failed DNS lookup when trying to resolve these bad domains.

My Recommendations

So here’s the approach I tell my teams when using VPCs:

  1. Always use the link-local time Sync service; it scales, and reduces SPOFs and bad firewall rules.
  2. Always use the link-local DNS resolver; it scales. use a Resolver Rule if you need to then hook the DNS traffic up to your own DNS server (AD Domain Controller).
  3. Turn on Guard Duty, set up notifications of the Findings it generates.
  4. Turn on DNS Firewall to actively BLOCK DNS lookups for bad domains.
  5. Turn on your own Route53 query logs for yourself, with some retention period (90 days?)
  6. For inbound Web traffic, use a managed Web Application Firewall with managed rules, and/or scope your application to the country you’re intending to serve traffic to. In particular, block access to administrative URL paths that don’t come from trusted source ranges.
  7. Leverage any additional managed services that you can, so you minimise the hand-crafted solutions in your application.
  8. Template your workload, and implement updates from template automation; no local changes. Deploy changes rapidly using DevOps principles. Socialiase with your team/management the importance of full stack maintenance and least privilege access — including at the network layer ingress and egress — and schedule and prioritise time to include technical debt in each iteration, including the updating of every third party library in your app.
  9. If you have a DevOps pipeline with something like SonarQube or Whitesource, have it report on dependencies (libraries), and get reports on how out-of-date those libraries are, and/or if those out of date versions have known CVEs against them. Google Lighthouse (in the browser) does a great job of his for JavaScript web frameworks.

For this exploit you need to go widerthat what you run in cloud: your company printer (MFP), network security cameras, VoIP phones, UPS units, air-conditioners, Smart Hubs, TVs, Home Internet Gateways, and other devices will probably have an update. Your games console, and the games on it (this started from an update in Minecraft to address this and has… escalated quickly!). Even the physical on–prem firewalls and virtual appliances themselves – but ensure you don’t just do firewalls and ignore the larger landscape of equipment you have.

PS: I highly recommend my colleague Elliot Segler’s recent post entitled Learnings with AWS WAF and Log4Shell.

PPS: You may want to apply something like this to Apache:

 <IfModule mod_rewrite.c>
 RewriteEngine On
 RewriteCond %{THE_REQUEST} ^.*%7Bjndi:.* [NC]
 RewriteRule ^(.*)$ - [F,L]
 </IfModule>