Linters have a long and enduring history in software. From their origins in late 1970s to the present, they’ve caught things like programming errors, confusing formatting, unsafe functions, and everything in between. The static analysis approach lends itself to general purpose templates that make it easy to share between projects making best practices more accessible.
With infrastructure-as-code (IaC) becoming more ubiquitous and mature for cloud infrastructure, linters have made their impact in this area as well. IaC linters range from catching errors like illegal resource types, deprecated syntax, to enforcing best practices like tagging conventions. As they continue to mature, specialized security oriented linters are also becoming more commonplace; preventing the creation of overly permissive IAM policies, firewall rules, and storage bucket policies are getting easier to codify as lint.
The ever changing flux and churn of production infrastructure ensures that there is always a delta between the intended infrastructure and what’s actually deployed. Whether it’s in response to an outage, maintenance, or developer experimentation changes to infrastructure can occur via CLI, the vendor’s UI, or some other out-of-band channel. This gap between IaC’s intent and what’s actually deployed is a reality even with teams reaching the asymptote of 100% IaC controlled infrastructure. We refer to the view of what’s actually live in production as infrastructure-as-deployed and having a view of the world through this prism opens up the door to finding new types of lint.
Our tool Introspector makes it easy to find new categories of lint such as:
- Unused resources (e.g. access keys created but never used)
- Stale resources (e.g. user principals that haven’t logged in some configurable amount of time)
- Orphaned resources (e.g. security groups no longer attached)
The purpose of cleaning up this lint is to reduce surface area—thereby improving your security posture and reducing complexity of your infrastructure.
Finding unused IAM groups:
SELECT G.uri, G.groupid, G.groupname, G.createdate, age(G.createdate) AS age FROM aws_iam_group AS G WHERE G._id NOT IN ( SELECT DISTINCT(group_id) FROM aws_iam_group_user )
Finding unused elastic IPs:
SELECT A.uri, A.allocationid FROM aws_ec2_address AS A WHERE A.associationid is null
Finding users that have never logged in:
SELECT username, createdate, age(createdate) AS age FROM aws_iam_user WHERE password_enabled = true AND age(createdate) > '3 months'::interval AND passwordlastused IS NULL
In the next part in this series, we’ll blog about linting your infrastructure for stale resources. Querying resources with a parameter around how long it has been idle makes it easy to find the needles in the haystack.
Engineering teams are steadily adopting a “cattle, not pets” attitude towards infrastructure. Cloud providers are enabling easy-on, easy-off services. As a result, churn in production deployments has become a fact of life. Engineers have begun to apply the tools of the trade to this problem: infrastructure-as-code (IaC) tools such as Terraform and CloudFormation allow practitioners to express their desired state of the world. Because IaC fits into the paradigm of source code, the same supporting tools are available: IDEs, code review, continuous integration, and continuous delivery. This movement towards structure is helping teams to effectively take advantage of the benefits of the cloud at larger and larger scales.
Describing how your infrastructure should be deployed is, however, only a piece of the puzzle. As companies grow, it becomes more and more important to also understand Infrastructure-as-Deployed.
Infrastructure-as-Deployed is what is actually running, regardless of what you intended to run. And this is where the constant flux can bite you.
So how can we tame this churn? Just as we did with expressing our intent, we can use the tools of the trade. In this case, the relational database. Infrastructure-as-Deployed is snapshotted (via Introspector or other tools) and made available for querying via SQL. Much like IaC enables code review of infrastructure changes, cloud deployments in a database enable the whole universe of existing analysis tools that speak SQL. We can leverage those to replace sampling for compliance with certainty.
But what about the data model? Cloud infrastructure configurations are both relational and graph-based in nature. Network interfaces have a many-to-many relationship with Security Groups, for instance, while the question of which principals have a particular permission involves traversing a graph of groups and policy attachments. Modern databases allow both paradigms to coexist and intermingle, even in the same query. In particular, recursive common table expressions allow traversing graph relationships in a SQL context, with the ability to apply traditional relational algebra at any point in the process.
This power of expression enables writing complex regulatory policies in a way that can be answered by a database engine (see an example assessing AWS Resource Policies here), replacing what was previously a manual assessment. Furthermore, once you pull in multiple data sources, the questions you can ask escape single provider APIs. Pull in your version control provider and query for every unreviewed line of code running in a particular container. Or pull in your org chart, and find out who outside the engineering department has access to an S3 bucket. Using SQL lets you explore your infrastructure-as-deployed with the tools you are already familiar with.
AWS’ policy language is notoriously challenging. As you build out your infrastructure, you commonly run into situations where two components ought to be able to communicate, but can’t. In an attempt to unstick your development progress, you reach for progressively larger and larger hammers as you broaden the permissions in your policies. You promise yourself that once everything is working, you will come back and lock things down to just what is necessary. The accumulation of this type of technical debt is a common cost of product development.
Avoiding the predictable conclusion of this scenario is a matter of visibility. If you can see the problem, it’s easier to prioritize fixing it. Several tools exist to help with assessing IAM policies. AWS has Access Analyzer. Cloudsplaining is also a good starting point for assessing your exposure. Today, we’re adding to this mix with rpCheckup. rpCheckup covers resource policies specifically, looking for outside access to your resources. This is what it looks like, run against an account exploited by Endgame:
Any resources that show up as externally accessible or public ought to be recognizable to you. Some examples include intentionally public buckets and roles used by 3rd party vendors. For example, if you are a Gold Fig customer, you will see an IAM Role that allows access from Gold Fig’s account. Things to watch out for include resources that are unintentionally public, like an SNS Topic or SQS Queue, and access by accounts that you don’t expect.
We hope this will help teams follow through on their intentions to properly secure their infrastructure, along with other tools in the ecosystem.
Gold Fig can help you with your resource policies, IAM policies, and more.
If you imagine your organization as a sea-faring vessel, infosec’s goal is to ensure the boat can survive krakens or canon-wielding pirates and successfully complete its journey. If you ignore the existence of sea terrors, you may not make it to your destination unless Poseidon grants you merciful passage. If you prioritize defense above your vessel’s mission, you will find yourself aboard a battleship that is entirely inadequate for transporting revenue-generating cargo. — On YOLOsec and FOMOsec, Kelly Shortridge
Startups are all about focusing on the right thing at the right time. Juggling everything through the fog of product development, managing your runway, and growing a team are tough on their own. Unless it’s a primary piece of your product offering1, security is rarely prioritized in the early days of a startup. Contemporary startups have the benefit of the accumulation of best practices becoming more commonplace and accessible: appsec best practices get caught in code reviews, infrastructure providers bias toward secure defaults, engineers are accustomed to using things like password managers and MFA apps. However, beyond that, founders in search of product-market-fit do not have the cycles to focus on infosec. It’s a type of technical debt that is accrued while focus is elsewhere.
As your traction begins to grow, paying down technical debt becomes a recurring focus for the team. This typically takes the form of application, infrastructure, and ops related debt. Preventing the CEO from accidentally deleting the production database is a more immediate threat than a targeted attack. Improving your processes to prevent shooting yourself in the foot will pay immediate dividends. Solving your startup’s problems around self-incurred outages and data loss are more pressing than infosec.
There is no positive benefit when it comes to security — the best outcome you can expect and actually aim to get is a reduction of negative impact. Your product or customer experience is not directly improved by an increased security posture. However, the amount of downside is unknown and potentially large. This is what will start keeping the founders up at night. Peace of mind. It all changes when there’s now something at stake. Reputation. Customer trust. Reliability. Anything that’ll erode that hard earned product market fit. Any bad press that’ll reduce the slope of your week over week growth.
This is the right time a startup should start prioritizing infosec.
The Startup Curve - Overlayed with paying down tech debt and when to start thinking about infosec
For all the fear-mongering related to security that’s out there, even for well-established companies, security’s priority with respect to product can be a tricky thing to pin down. Is it just another sign off like legal review? Was it just bolted on because an enterprise sale necessitated it? The earlier your startup weaves infosec into the engineering culture, the longer head start you have in paying down security related technical debt. The dividends you get from this yields a resilient engineering organization which treats security as a partner in building the product and not an impediment.
- How Early-Stage Startups Can Enlist The Right Amount of Security As They Grow
- A Comprehensive Guide to Security for Startups
- A Startups Guide to Implementing a Security Program
1 Some other scenarios where security is an early priority for a startup. 1) Product mandated security considerations: it’s in the value prop of the product, mandated by a vendor (i.e. using the GMail API requires an external security audit); 2) Externally mandated security considerations: Government or industry regulatory considerations; product penalties if you are non-formant (i.e. amount of loan origination that can be a penalty if your banking startup is fails external security audits); 3) Customer mandated considerations: AWS GovCloud, SOC2, etc — it’s forced by the need to acquire specific customers and/or drive sales.
How do engineers make the seemingly-obvious mistake of opening their infrastructure to the world? Usually, with the best of intentions. When you’re building out your infrastructure, you tend to accept the first set of permissions that makes things “just work”. I just need this lambda to talk to that database. I just need to read files from that bucket. And quickly.
Yes, maybe now your Lambda is a bit over-provisioned and it could overwrite the data in that S3 bucket, but you wrote the Lambda, and it doesn’t do that. All good. Except when it isn’t. Except when you opened some resource to everyone with an AWS account, instead of everyone in your AWS account. Security misconfigurations aren’t like the other bugs in your application. They don’t break functionality, and usually, customers don’t notice. You probably don’t have an integration test that fails if it can successfully publish to your SNS topic from the wrong AWS account.
Being a responsible engineer, you set out to rectify the problem. You peruse blog posts and look up standards. You determine that you need to enforce least privilege, follow the swiss cheese model, and enable network flow logs. Your ship date slips, a lot. It’s easy to go overboard.
What’s missing is a prioritized list of basic checks and settings. Just like launching without every feature built, you don’t need every security principle fulfilled to the highest level. With that in mind, here’s our short list for when you’re starting out:
Secure your perimeter. You should know, off the top of your head, every resource that is public. It’s probably a short list. A load balancer, an api gateway, or an EC2 instance. Maybe an S3 bucket, or maybe a CDN in front of one. If it’s not a short list, determine how to shorten it. Use this list to conduct a quick audit of your resources. Is it public? It had better be on the list. Otherwise, make sure it’s private.
Secure your credentials. Know which humans have admin access. Use an IAM Group for this. It should be trivial to look up this information. Ensure they all use multi-factor authentication. Know which third parties you’ve given credentials to. Remove the ones you no longer use. Delete any user API credentials that are not needed.
Turn on multi-region CloudTrail. Start building your audit log. You might not need it anytime soon, but someday you’ll be glad you had it enabled.
These steps will largely keep you from falling victim to automated sweeps for infrastructure mistakes. As you grow your team and accumulate data from running your product, your needs will change. The threats will change. You will read about defense-in-depth, and about the perils of a hard exterior shell with a gooey center, and the importance of intrusion detection and auditing. There will be encryption-at-rest, log verification, and IAM access analysis. All of these things are important, but what is often unstated is that they only matter if you have done the basics. Security features must be layered on a foundation, or else they will only end up causing headaches for little benefit. Do the basics first.
Gold Fig can help with the basics, and beyond! Talk to us about getting an assessment of the next steps to take, tailored to the stage of your company.
No one doubts that security is important for cloud infrastructure. The potential for harm to your business, your customers, and your reputation is real, and that potential increases with your business’ success. And yet, customers will not reward you for weeks spent locking each credential down to the barest minimum of permissions. You will not increase your site traffic numbers by meticulously applying network segmentation to your cloud environment. Your product doesn’t become more useful if everyone on your team has MFA enabled.
So, should you do these things? Well, it depends (except for MFA: do that). It depends on who you are. It depends on what you have at stake. And it depends on what the threat is. Finding and fixing what really matters to stay secure is not one-size-fits-all.
Who Are You? Are you a single developer, or small shop? Broad permissions for you and your team are probably ok. These should still be applied at a group level, rather than to an individual, but you probably don’t need to think too hard about limits yet.
On the other hand, if you don’t personally know everyone with access to your account, it’s past time to start applying some stricter grouping and permissioning.
What do you have at stake? Do you avoid collecting Personally Identifying Information? Do you avoid hosting content uploaded by users? You may not need a full audit trail for every data access in your system.
On the other hand, if you host sensitive information, verifiable logging starts to look like a pretty good idea. Effort spent ensuring your data is encrypted at every stage is probably worth it.
What is the threat? Are you a smaller or mostly unknown business? Your biggest risk is probably from automated scans and phishing attacks. Keep your buckets and credentials private, keep your firewall locked down, and enable multi-factor authentication.
On the other hand, if you are worried about targeted attacks, you’ll need more serious measures. Intrusion detection and limiting blast radius become requirements, rather than distractions.
At Gold Fig, we think it’s important to understand your situation before making a security assessment. Presenting a red wall of security failures guarantees that nothing will be addressed. Prioritization matters. Prioritization means more than just attaching a severity score to each security check in a scan. Once you’ve closed gaping holes, ROI becomes a major driver in the discussion. Meaningful security improvements come from matching a company’s current stage to a handful of immediate steps to make. Peace of mind for your infrastructure team comes from building this process into your routine.
Want help prioritizing your security projects? Talk to us!
subscribe via RSS