Dunelm’s CSPM puzzle
There are 2 things the devsecops / cybersecurity scene has in abundance:
- acronyms
- security tooling that promise to make all your problems magically disappear
Today I’m going to talk about the latter, whilst using an awful lot of the former.
Let me start by explaining the CSPM acronym, so we all understand what we are talking about.
What’s a CSPM?
CSPM stands for Cloud Security Posture Management.
In traditional information-security speak people refer to the CIA triangle if they want explain how to secure or harden IT systems:
- Confidentiality: keeping data secure or private
- Integrity: your data is trustworthy and has not been tampered with
- Availability: you can provide access to data at any point
There is actually a fourth pillar, which doesn’t always get mentioned because you would be hard-pressed calling it a triangle: - non-repudiation: having proof that someone has committed a certain act or proof that a certain piece of information is valid.
A good CSPM tool shows you the current implementation status of the CIA triad across your cloud estate and is able to provide recommendations to further strengthen that implementation.
Dunelm is heavily invested in AWS public cloud offering, so let’s have a look at how you could start to secure your AWS accounts.
There are a million different ways of securing your cloud:
- Logically segregating your different environments (development, qa , UAT, pre-production, production) using multiple AWS Virtual Private Clouds (VPC) inside a single AWS Account, having an AWS Account per environment or splitting it up even further
- Applying encryption wherever possible, in transit, at rest
- Use Role-Based-Access-Control (RBAC), preferably with some sort of Single-Sign-On (SSO) and MFA using an established Identity Provider (idP) to implement a least-privilege access control model.
- Inspect and control traffic going in and out of your accounts across the different layers of the OSI (Open Systems Interconnection)model using a combination of Firewalls, Security groups, Network Access Control lists and Internet Proxies.
And the list goes on and on and on…
But how do you know when you have done a good enough job at securing all your resources and the data they potentially could hold?
Keeping tabs on things
Every cloud provider has a set of best practices on how to create (secure) cloud resources and workloads.
For AWS this is the Well Architected Framework Review(WAFR). A more security-oriented view at your cloud estate can be found in AWS’ Security Reference Architecture or the AWS Security Maturity Model.
At Dunelm we use these 3 artefacts to assess what our security posture looks like right now, where there is room for improvement and how to best implement the necessary changes.
There are also sets of controls known as security standards. Most of those are cloud and platform-agnostic by default but have specifically tailored rulesets for the major public cloud providers and infrastructure platforms.
There are a couple of well-known industry standards controls:
- the Center for Internet Security (CIS) has an AWS Foundations Benchmark
- the Payment Card Industry (PCI) Data Security Standard (DSS) has different compliance levels you need to adhere to depending on the amount of cardholder data you store/process.
- NIST, the US National Institute for Standards and Technology have created the 800–53 Special Publication, which is a catalog of security and privacy controls to protect your assets.
- AWS themselves provide a Foundational Security Best Practices security standard which has the most amount of security controls out of all the standards.
We ended up using the AWS Foundational Security Best Practices as our starting point, because it overlaps with the other 3 standards almost entirely.
AWS Security Hub
AWS provide a CSPM service out of the box, called AWS Security Hub. This is what AWS has written on the AWS Security Hub tin:
AWS Security Hub provides you with a comprehensive view of your security state in AWS and helps you assess your AWS environment against security industry standards and best practices.
AWS Security Hub, being a native AWS Service, neatly integrates with other AWS services to control and enhance your security posture.
It is doing a lot of things very well:
- entirely configurable using Infrastructure as Code (IaC)
- Support for 4 security standards out of the box with automated security checks for each of these standards
- whole list of 3rd party integration whose findings you can import into AWS Security Hub as additional findings
- neatly integrated into AWS Organisations. Both this and the fact that it can be configured using IaC makes that AWS Securityhub scales very efficiently across your organisation.
- cost-effective, you only pay for what you use
it is however lacking in certain areas as well:
- visualisation
- reporting
- historical analysis
AWS Config vs Cloud Custodian: let the techies decide
As mentioned before, AWS Security Hub uses a lot of other AWS Services to surface its security findings.
To execute the controls of the various Security Standards it relies on a service called AWS Config.
There is also a different tool, that on the surface does exactly what AWS Config does in this context and that tool is called Cloud Custodian.
Cloud custodian is an open-source tool that allows you, amongst other things, to write detective and corrective controls/policies and perform actions on a variety of infrastructure platforms and cloud providers using YAML. But why go with Cloud Custodian if there is already an AWS-native service?
The answer is simple:
Let the team decide.
We already had a lot of Cloud Custodian-knowledge in the team so it was a lot easier to get started with Cloud Custodian to write custom policies.
The team needs to write the policies and controls. It should be up to them to decide which tools they want to use to accomplish this.
And so the quest begins…
To overcome AWS Security Hub’s shortcomings mentioned before.
First Stop: AWS Quicksight
AWS Quicksight is their business intelligence service, which you can hook up to a number of other AWS services to visualise and present its data.
The integration of AWS Quicksight with AWS Security Hub was a relatively straight-forward one, but ultimately we found it to be quite cumbersome to work with and it had the potential to become quite expensive if you wanted to store a lot of data in there.
Second stop: Custom of the shelf (COTS) tooling
We performed proof of concepts with 3 custom of the shelf tools, but each one had their drawbacks which made it hard to justify the extra financial investment over AWS Security Hub.
Third stop: DIY
One of our engineers decided to write a Node front-end for AWS Security Hub which gave us enough flexibility to display the data in ways that we wanted.
We did end up with a front-end we could tailor to our liking. The downside of this iteration was the extra management overhead it meant for the team. We are a relatively small team, only 4 engineers, so our time is best spent supporting the business in making sure our cloud estate is secure and defend against potential attacks.
Fourth stop: Cloudquery & Metabase
We then finally stumbled on yet another open-source tool called Cloudquery. It positions itself as open source high performance data integration platform built for developers.
You can see it as an ETL (extract, transform, load) tool that can be tailored to fit a lot of different security use-cases.
With Cloudquery you can extract data from a number of different sources, perform some transform action on the data before sending it to your destination of choice. That destination can be S3, Postgres, bigquery, Snowflake, just to name a few.
We started off using a Postgres RDS database, with Cloudquery running on an EC2 instance during the PoC Phase.
Cloudquery performs nightly runs to scrape all meta-data from our AWS accounts, including the meta-data around services like AWS Security Hub.
To visualise all of this we use Metabase, an open-source business insights tool.
Metabase allows us to combine and visualise a host of different log sources we send its way, using either bog standard SQL queries or the query builder.
We currently have metadata and logs from:
- AWS
- Auth0
- Gitlab
- Azure
- Cloud Custodian
- our EDR tool
- CISA Known Exploitable Vulnerability (KEV) list
being sent into Metabase and ready to be visualised.
We can create alerts or send out scheduled Metabase reports to Slack channels, so we can serve our customers, for example the Engineering chapter at Dunelm, where they live.
Life was starting to look good.
Fifth stop: Snowflake
We knew we were close to hitting gold with the last iteration, but we were not quite there yet.
The EC2 instance running Cloudquery and Metabase can be seen as a single point of failure (SPoF) and Postgres RDS instance had an increasingly harder time churning all the data we were sending it its way.
Salvation, for our poor Postgres RDS database at least, came in the form of an upgrade to Cloudquery and Metabase where they both officially started supporting Snowflake.
Snowflake is a data platform which is heavily used across the Data Chapter at Dunelm.
This is where our close ties with some of our Data teams at Dunelm started paying dividends.
They agreed to provision us a small datawarehouse in Snowflake so we could migrate data from the Postgres database there.
The other part of this fifth iteration is moving Cloudquery and Metabase of that single EC2 instance. Both are now happily running in AWS Elastic Container Service using Fargate, AWS managed container host service.
Taking stock
As of August 2023, this is the current state of our CSPM-stack:
- founded on expertise from AWS WAFR, AWS SAMM and AWS SRA
- All our AWS accounts reporting to AWS Securityhub
- All Security Standards in AWS Security Hub enabled and monitored
- Various other AWS Services like AWS Guard Duty, AWS Inspector and AWS Systems Manager enabled and producing findings
- Cloud custodian performing custom detective and corrective controls
- Cloudquery scraping meta-data from all our cloud environments
- a Snowflake data warehouse to store all of the above
- Metabase to visualise and report all this data
What’s next
Over the coming year we are looking at following projects to further enhance our CSPM:
- continue the refactor of our current AWS organisations layout to the AWS Security Reference Architecture
- perform a proof of concept of AWS Security lake, a very promising new AWS security service
- further enhance our current stack so we can surface security findings better and faster, whilst keeping the bill lean at the same time
Thank you for taking time out of your busy lives to stop and read this post. I sincerely hope this was helpful to you in any way, shape or form.
If you have any questions around this feel free to reach out!