Once upon a time you needed to hand-tune and architect infrastructure to performantly serve static assets at scale. However, now, getting started with a CDN or storage buckets is as easy as can be. Many services completely abstract away such concerns altogether.
Once upon a time you needed to know the details around a distributed consensus algorithms to build out a scalable and fault tolerant database system. However, now, all the major cloud providers offer database services that any startup can utilize at the click of a console button.
Burdening those who wanted these benefits with knowing how to do it themselves or care more about having that expertise in-house was the wrong way to go about things. Making them easy to use and attain was the correct way to increase adoption.
Every piece cloud infrastructure has security related surface area
Devops tools benefit from the stratification that occurs in the toolchain — each layer is a prime candidate for simplification and ease of use. You plug the tool into your workflow and it just works. Fast moving teams have high expectations of their tools and are at a competitive advantage if their engineering teams don’t need to reinvent the wheel.
In 2021 these are minimums that effective engineering teams have come to expect. Shifts in abstraction continue to make product development easier and faster.
Security, on the other hand, hasn’t made the same strides. Further exacerbating the situation for startups is that the hurdle of security squarely falls in the bucket of tech-debt. It is only an impediment — with zero product benefit for end-users (of course, unless, the product itself is security focused).
At some inflection point stakeholders begin to realize that it’s time to start paying down security-related tech-debt.
The industry expects developers to broadly and deeply educate themselves about security and all the technical details that come with it. Furthermore, it doesn’t fit neatly in the devops paradigm.
Do dependency confusion attacks fit within the build system or is that a deployment concern?
Is the security of an expiring domain name the responsibility of an engineer or the person managing the credit card that’s expired?
Whose responsibility is it if someone has been temporarily added as an external contributor to a git repo?
What happens if an SRE needed to make an emergency change outside of IaC directly to a cloud console in response to an outage? Where is the drift tracked?
Asking teams to shift left is an antiquated way to reason about security in a modern cloud-native context. The problem is not that developers are isolated from security concerns.
Of course all teams and stakeholders want to be doing the right thing. If presented with a choice, any reasonable engineer would pick the more secure option. However, they aren’t presented with an easy to digest and actionable route. After a startup founder has made a slew of changes, all they want to know is what’s wrong and change it to the sensibly secure option. Some complicated severity and priority matrix with a link to a CVE isn’t helpful. They need to be presented with easy and actionable guidance. Expecting them to act upon esoteric advice and becoming experts in all things security is a recipe for them to continue to ignore it and accumulate security tech debt.
Turning the gain up on getting people to care more about security is the wrong approach. It needs to be made easier across the board. Expecting users to grant all permissions and then gradually ratchet them down has been shown time and again to be insufficient. Services need to provide engineers an easy path from a default state of no permissions up to the required permissions for a feature to work. Time spent blazing this path is far more valuable than educating developers to care more about security; they already do.
Unless you’re an extremely high value target, advanced persistent threats and exotic attacks aren’t a startup’s concern. Diligence around securing the basics of your cloud infrastructure settings is the highest ROI engineering activity you can undertake in the early days. Paying down technical debt around security can seem intractable given infosec’s seemingly never ending surface area. A startup needs to do the basics across a broad range of concerns.
In our continuing series, here are some easy wins you can do to stay secure. (Previously: Do the basics, Mediocre persistent threats: infosec basics for startups (part 2))
Encrypt all of your employees’ company issued hard disks When the team is small, this is easy to get everyone to flip a setting encrypting their local disks. It’s widely supported in modern computers and there isn’t any noticeable performance impact. Bonus: Use a MDM like Jamf to also ensure laptops are up-to-date and patched.
Use SSO where possible and tie all services your team uses to those credentials (force MFA across the org) Use GSuite or an equivalent to get your employees into the systems they use. This makes it easy to not only onboard and offboard members to the team, but leverages the provider’s security infrastructure for authentication.
Corollary: Do not use shared logins for any services. They end up being a nightmare to track who has access to what, rotating credentials is a pain, and identifying who used the account to do something is a game of guessing timestamps and IPs.
Bonus: For other important accounts that do not support SSO, make sure the team is using a password manager that can generate and handle this for them.
Centralize logging and setup basic alerting As you start making strides toward a stronger security posture, you’ll be glad you set up logging and alerts. Centralize your logs and plug in to infrastructure logs (GCP’s Cloud Audit Logs, AWS’ CloudTrail, GitHub’s audit log) as well.
Corollary: If an engineer needs to log in to an instance or service to debug logs, the logging hasn’t been sufficiently centralized.
Bonus: Deliver these logs to isolated accounts that don’t share surface area with your application. For example, AWS makes it easy to emit CloudTrail from multiple accounts into a single one. This isolated account can have extremely limited access, users, and exposure.
Secure your backups and ensure their fidelity This one is highly specific to your application and infrastructure. In general, ensure that access to backups is kept at the same bar or higher as access to the source data. Having historic data all sitting together in one place is a treasure if found. Data backups are a perfect candidate for encrypting at rest. Keep encryption and decryption keys separate. Verify the integrity of restoration procedures and have them checked automatically. Further reading.
Stay tuned for more posts in this series where we’ll continue to enumerate other areas startups can get easy wins toward their infosec posture.
Most “start here” for infosec guides begin with an exercise in assessing and enumerating the threats and risks. However, for most startups there isn’t a clear answer here. Unless you’re a high-value target attracting focused attacks the main threat to your company and products is the ambient background noise on the internet: port scans, dorks, or vulnerabilities in widely deployed software. While one specific actor might be interested in trawling for some narrow niche, taken in aggregate all of this background noise sums to a collective threat. A startup needs to do just the basics to stay ahead of it all.
In our continuing series, here are some easy wins you can do to keep your startup secure. (Previously: Do the basics)
- 2FA/MFA every service that your team interacts with
Ensure multi-factor auth is turned on for all your users across all of your services—your cloud providers (AWS, GCP, Azure), your source control (GitHub, GitLab), and your account systems (GSuite, Rippling, Office 365).
- Delete old users and stale access keys
As team members, contractors, and collaborators come and go be sure to delete old accounts and access keys. This is especially important in your private code repos where collaborators might be added in a one-off manner that isn’t visible at the organization level.
- Keep widely deployed software (Jenkins, Grafana, Jupyter, etc) up to date and optionally behind a VPN
Keeping your deployed software up to date is extremely important. Security updates are a frequent occurrence in all widely deployed open source projects. If your endpoints are available on the public internet, it’s crucial to keep them updated. If you have the cycles to trade off some convenience, put them all behind a VPN.
- Firewall off and close all ports that don’t need to be on the public internet
Modern providers have made strides in sensible and secure defaults for many service (i.e. SSH requires key exchange and password logins disabled). If you have other services open to the internet such as a database port, RPC endpoints, other administrative interfaces they are susceptible to brute forcing.
Stay tuned for more posts in this series where we’ll continue to enumerate other areas startups can get easy wins toward their infosec posture.
Cloud and other B2B infrastructure providers are notorious for creating a new and never ending stream of buzzwords and acronyms. As things progress the marketing speak muddies where it fits and who it applies to. However, as with all things they start with a kernel of genuine good faith to communicate a niche offering.
Cloud security posture management (CSPM) is a relatively new acronym in the secops space. In a nutshell, CSPM can be described as an offering that:
- Snapshots your cloud settings and other cloud metadata
- Checks for misconfigurations with a focus on security
That’s all it is in its narrowest incarnation. It’s a point-in-time analysis of your cloud infrastructure. Common questions that a CSPM tool can answer are things like:
- Inconsistent MFA settings across your users
- Misconfigured or test firewall/security groups left attached in production
- Overly permissive IAM policies that can allow for privilege escalation
- Configuration deviations across your source control repos such as code-review settings
- Asset and inventory analysis across all of your accounts, regions, and providers
Providers further overload CSPM to sell more things like auto-remediation, continuous monitoring, or pre-packaged compliance and regulatory checks. However, these other areas start to overlap with other teams’ responsibilities that are covered by: SIEM tools, compliance/regulatory reports, and IaC static analysis. In true software fashion, there isn’t a one-size fits all approach—it’s completely normal to see overlap between these areas of concern.
At Gold Fig we believe that CSPM is a specialization of infrastructure-as-deployed. Not only can a CSPM tool such as Introspector answer questions about security and compliance, but having a full inventory of your cloud provider’s settings, configurations, and relationships between them all gives infrastructure teams tools to further consolidate one-off code that live in brittle bash scripts and the like.
SIEM tools are focused on the logs and events through the prism of security. Compliance tools are focused on items that fall squarely into external requirements. CSPM tools let your team encode your team’s own internal priorities.
In our last blog post we introduced the concept linting through the prism of infrastructure-as-deployed and went over three simple example queries. Continuing our series on linting your cloud infrastructure, we’ll go over three new queries of successively increasing complexity.
A common area of infrastructure lint that cascades additional cruft is when engineers write one off Python/Go or bash/jq to get answers about their infrastructure. Whether it ends up as throwaway code or kept running, maintaining them creates unnecessary toil and busy work that modern teams really ought not to deal with. Further exacerbating the situation for these scripts is the fact that as teams shift to multi-account layouts within their organizations, correctly iterating through all accounts and regions while juggling credentials can be tricky. Finding infrastructure lint shouldn’t further incur more technical debt in even getting to the point of asking the question.
Just like osquery made it easy to ask questions that were previously squirreled away in brittle scripts, Introspector makes it easy to ask questions about your infrastructure.
We’ll look at some use cases and example Introspector queries we’ve seen our customers ask of their infrastructure that previously lived in one-off Go programs or bash scripted AWS cli calls strung together.
- As a team we have a policy that users that have not logged in to the console in over 3 months get their credential disable until they are in need of it again in the future. Give me a list of all idle users.
- As a team we have a policy that states all records that point to internal resources must be of type
ALIAS; to support our migration of our Route53 config over to Terraform, give me a list of all
Arecords that point to RFC1918/private network IP addresses.
- As a team we have a policy to disable access keys that haven’t been used in over 3 months. Give me a list of all users that who have active access keys that are stale. Note that AWS users have the ability to have 2 access keys.
While these SQL queries appear lengthy, teams get the benefit of deprecating their brittle one-off scripts. As more and more tools are shifted to structured configuration systems like Introspector teams can have a centrally maintainable place for asking questions about their infrastructure settings and configs.
Linters have a long and enduring history in software. From their origins in late 1970s to the present, they’ve caught things like programming errors, confusing formatting, unsafe functions, and everything in between. The static analysis approach lends itself to general purpose templates that make it easy to share between projects making best practices more accessible.
With infrastructure-as-code (IaC) becoming more ubiquitous and mature for cloud infrastructure, linters have made their impact in this area as well. IaC linters range from catching errors like illegal resource types, deprecated syntax, to enforcing best practices like tagging conventions. As they continue to mature, specialized security oriented linters are also becoming more commonplace; preventing the creation of overly permissive IAM policies, firewall rules, and storage bucket policies are getting easier to codify as lint.
The ever changing flux and churn of production infrastructure ensures that there is always a delta between the intended infrastructure and what’s actually deployed. Whether it’s in response to an outage, maintenance, or developer experimentation changes to infrastructure can occur via CLI, the vendor’s UI, or some other out-of-band channel. This gap between IaC’s intent and what’s actually deployed is a reality even with teams reaching the asymptote of 100% IaC controlled infrastructure. We refer to the view of what’s actually live in production as infrastructure-as-deployed and having a view of the world through this prism opens up the door to finding new types of lint.
Our tool Introspector makes it easy to find new categories of lint such as:
- Unused resources (e.g. access keys created but never used)
- Stale resources (e.g. user principals that haven’t logged in some configurable amount of time)
- Orphaned resources (e.g. security groups no longer attached)
The purpose of cleaning up this lint is to reduce surface area—thereby improving your security posture and reducing complexity of your infrastructure.
Finding unused IAM groups:
SELECT G.uri, G.groupid, G.groupname, G.createdate, age(G.createdate) AS age FROM aws_iam_group AS G WHERE G._id NOT IN ( SELECT DISTINCT(group_id) FROM aws_iam_group_user )
Finding unused elastic IPs:
SELECT A.uri, A.allocationid FROM aws_ec2_address AS A WHERE A.associationid is null
Finding users that have never logged in:
SELECT username, createdate, age(createdate) AS age FROM aws_iam_user WHERE password_enabled = true AND age(createdate) > '3 months'::interval AND passwordlastused IS NULL
In the next part in this series, we’ll blog about linting your infrastructure for stale resources. Querying resources with a parameter around how long it has been idle makes it easy to find the needles in the haystack.
Update: Continue reading part 2, Linting cloud infrastructure
Engineering teams are steadily adopting a “cattle, not pets” attitude towards infrastructure. Cloud providers are enabling easy-on, easy-off services. As a result, churn in production deployments has become a fact of life. Engineers have begun to apply the tools of the trade to this problem: infrastructure-as-code (IaC) tools such as Terraform and CloudFormation allow practitioners to express their desired state of the world. Because IaC fits into the paradigm of source code, the same supporting tools are available: IDEs, code review, continuous integration, and continuous delivery. This movement towards structure is helping teams to effectively take advantage of the benefits of the cloud at larger and larger scales.
Describing how your infrastructure should be deployed is, however, only a piece of the puzzle. As companies grow, it becomes more and more important to also understand Infrastructure-as-Deployed.
Infrastructure-as-Deployed is what is actually running, regardless of what you intended to run. And this is where the constant flux can bite you.
So how can we tame this churn? Just as we did with expressing our intent, we can use the tools of the trade. In this case, the relational database. Infrastructure-as-Deployed is snapshotted (via Introspector or other tools) and made available for querying via SQL. Much like IaC enables code review of infrastructure changes, cloud deployments in a database enable the whole universe of existing analysis tools that speak SQL. We can leverage those to replace sampling for compliance with certainty.
But what about the data model? Cloud infrastructure configurations are both relational and graph-based in nature. Network interfaces have a many-to-many relationship with Security Groups, for instance, while the question of which principals have a particular permission involves traversing a graph of groups and policy attachments. Modern databases allow both paradigms to coexist and intermingle, even in the same query. In particular, recursive common table expressions allow traversing graph relationships in a SQL context, with the ability to apply traditional relational algebra at any point in the process.
This power of expression enables writing complex regulatory policies in a way that can be answered by a database engine (see an example assessing AWS Resource Policies here), replacing what was previously a manual assessment. Furthermore, once you pull in multiple data sources, the questions you can ask escape single provider APIs. Pull in your version control provider and query for every unreviewed line of code running in a particular container. Or pull in your org chart, and find out who outside the engineering department has access to an S3 bucket. Using SQL lets you explore your infrastructure-as-deployed with the tools you are already familiar with.
AWS’ policy language is notoriously challenging. As you build out your infrastructure, you commonly run into situations where two components ought to be able to communicate, but can’t. In an attempt to unstick your development progress, you reach for progressively larger and larger hammers as you broaden the permissions in your policies. You promise yourself that once everything is working, you will come back and lock things down to just what is necessary. The accumulation of this type of technical debt is a common cost of product development.
Avoiding the predictable conclusion of this scenario is a matter of visibility. If you can see the problem, it’s easier to prioritize fixing it. Several tools exist to help with assessing IAM policies. AWS has Access Analyzer. Cloudsplaining is also a good starting point for assessing your exposure. Today, we’re adding to this mix with rpCheckup. rpCheckup covers resource policies specifically, looking for outside access to your resources. This is what it looks like, run against an account exploited by Endgame:
Any resources that show up as externally accessible or public ought to be recognizable to you. Some examples include intentionally public buckets and roles used by 3rd party vendors. For example, if you are a Gold Fig customer, you will see an IAM Role that allows access from Gold Fig’s account. Things to watch out for include resources that are unintentionally public, like an SNS Topic or SQS Queue, and access by accounts that you don’t expect.
We hope this will help teams follow through on their intentions to properly secure their infrastructure, along with other tools in the ecosystem.
Gold Fig can help you with your resource policies, IAM policies, and more.
If you imagine your organization as a sea-faring vessel, infosec’s goal is to ensure the boat can survive krakens or canon-wielding pirates and successfully complete its journey. If you ignore the existence of sea terrors, you may not make it to your destination unless Poseidon grants you merciful passage. If you prioritize defense above your vessel’s mission, you will find yourself aboard a battleship that is entirely inadequate for transporting revenue-generating cargo. — On YOLOsec and FOMOsec, Kelly Shortridge
Startups are all about focusing on the right thing at the right time. Juggling everything through the fog of product development, managing your runway, and growing a team are tough on their own. Unless it’s a primary piece of your product offering1, security is rarely prioritized in the early days of a startup. Contemporary startups have the benefit of the accumulation of best practices becoming more commonplace and accessible: appsec best practices get caught in code reviews, infrastructure providers bias toward secure defaults, engineers are accustomed to using things like password managers and MFA apps. However, beyond that, founders in search of product-market-fit do not have the cycles to focus on infosec. It’s a type of technical debt that is accrued while focus is elsewhere.
As your traction begins to grow, paying down technical debt becomes a recurring focus for the team. This typically takes the form of application, infrastructure, and ops related debt. Preventing the CEO from accidentally deleting the production database is a more immediate threat than a targeted attack. Improving your processes to prevent shooting yourself in the foot will pay immediate dividends. Solving your startup’s problems around self-incurred outages and data loss are more pressing than infosec.
There is no positive benefit when it comes to security — the best outcome you can expect and actually aim to get is a reduction of negative impact. Your product or customer experience is not directly improved by an increased security posture. However, the amount of downside is unknown and potentially large. This is what will start keeping the founders up at night. Peace of mind. It all changes when there’s now something at stake. Reputation. Customer trust. Reliability. Anything that’ll erode that hard earned product market fit. Any bad press that’ll reduce the slope of your week over week growth.
This is the right time a startup should start prioritizing infosec.
The Startup Curve - Overlayed with paying down tech debt and when to start thinking about infosec
For all the fear-mongering related to security that’s out there, even for well-established companies, security’s priority with respect to product can be a tricky thing to pin down. Is it just another sign off like legal review? Was it just bolted on because an enterprise sale necessitated it? The earlier your startup weaves infosec into the engineering culture, the longer head start you have in paying down security related technical debt. The dividends you get from this yields a resilient engineering organization which treats security as a partner in building the product and not an impediment.
- How Early-Stage Startups Can Enlist The Right Amount of Security As They Grow
- A Comprehensive Guide to Security for Startups
- A Startups Guide to Implementing a Security Program
1 Some other scenarios where security is an early priority for a startup. 1) Product mandated security considerations: it’s in the value prop of the product, mandated by a vendor (i.e. using the GMail API requires an external security audit); 2) Externally mandated security considerations: Government or industry regulatory considerations; product penalties if you are non-formant (i.e. amount of loan origination that can be a penalty if your banking startup is fails external security audits); 3) Customer mandated considerations: AWS GovCloud, SOC2, etc — it’s forced by the need to acquire specific customers and/or drive sales.
How do engineers make the seemingly-obvious mistake of opening their infrastructure to the world? Usually, with the best of intentions. When you’re building out your infrastructure, you tend to accept the first set of permissions that makes things “just work”. I just need this lambda to talk to that database. I just need to read files from that bucket. And quickly.
Yes, maybe now your Lambda is a bit over-provisioned and it could overwrite the data in that S3 bucket, but you wrote the Lambda, and it doesn’t do that. All good. Except when it isn’t. Except when you opened some resource to everyone with an AWS account, instead of everyone in your AWS account. Security misconfigurations aren’t like the other bugs in your application. They don’t break functionality, and usually, customers don’t notice. You probably don’t have an integration test that fails if it can successfully publish to your SNS topic from the wrong AWS account.
Being a responsible engineer, you set out to rectify the problem. You peruse blog posts and look up standards. You determine that you need to enforce least privilege, follow the swiss cheese model, and enable network flow logs. Your ship date slips, a lot. It’s easy to go overboard.
What’s missing is a prioritized list of basic checks and settings. Just like launching without every feature built, you don’t need every security principle fulfilled to the highest level. With that in mind, here’s our short list for when you’re starting out:
Secure your perimeter. You should know, off the top of your head, every resource that is public. It’s probably a short list. A load balancer, an api gateway, or an EC2 instance. Maybe an S3 bucket, or maybe a CDN in front of one. If it’s not a short list, determine how to shorten it. Use this list to conduct a quick audit of your resources. Is it public? It had better be on the list. Otherwise, make sure it’s private.
Secure your credentials. Know which humans have admin access. Use an IAM Group for this. It should be trivial to look up this information. Ensure they all use multi-factor authentication. Know which third parties you’ve given credentials to. Remove the ones you no longer use. Delete any user API credentials that are not needed.
Turn on multi-region CloudTrail. Start building your audit log. You might not need it anytime soon, but someday you’ll be glad you had it enabled.
These steps will largely keep you from falling victim to automated sweeps for infrastructure mistakes. As you grow your team and accumulate data from running your product, your needs will change. The threats will change. You will read about defense-in-depth, and about the perils of a hard exterior shell with a gooey center, and the importance of intrusion detection and auditing. There will be encryption-at-rest, log verification, and IAM access analysis. All of these things are important, but what is often unstated is that they only matter if you have done the basics. Security features must be layered on a foundation, or else they will only end up causing headaches for little benefit. Do the basics first.
Gold Fig can help with the basics, and beyond! Talk to us about getting an assessment of the next steps to take, tailored to the stage of your company.
No one doubts that security is important for cloud infrastructure. The potential for harm to your business, your customers, and your reputation is real, and that potential increases with your business’ success. And yet, customers will not reward you for weeks spent locking each credential down to the barest minimum of permissions. You will not increase your site traffic numbers by meticulously applying network segmentation to your cloud environment. Your product doesn’t become more useful if everyone on your team has MFA enabled.
So, should you do these things? Well, it depends (except for MFA: do that). It depends on who you are. It depends on what you have at stake. And it depends on what the threat is. Finding and fixing what really matters to stay secure is not one-size-fits-all.
Who Are You? Are you a single developer, or small shop? Broad permissions for you and your team are probably ok. These should still be applied at a group level, rather than to an individual, but you probably don’t need to think too hard about limits yet.
On the other hand, if you don’t personally know everyone with access to your account, it’s past time to start applying some stricter grouping and permissioning.
What do you have at stake? Do you avoid collecting Personally Identifying Information? Do you avoid hosting content uploaded by users? You may not need a full audit trail for every data access in your system.
On the other hand, if you host sensitive information, verifiable logging starts to look like a pretty good idea. Effort spent ensuring your data is encrypted at every stage is probably worth it.
What is the threat? Are you a smaller or mostly unknown business? Your biggest risk is probably from automated scans and phishing attacks. Keep your buckets and credentials private, keep your firewall locked down, and enable multi-factor authentication.
On the other hand, if you are worried about targeted attacks, you’ll need more serious measures. Intrusion detection and limiting blast radius become requirements, rather than distractions.
At Gold Fig, we think it’s important to understand your situation before making a security assessment. Presenting a red wall of security failures guarantees that nothing will be addressed. Prioritization matters. Prioritization means more than just attaching a severity score to each security check in a scan. Once you’ve closed gaping holes, ROI becomes a major driver in the discussion. Meaningful security improvements come from matching a company’s current stage to a handful of immediate steps to make. Peace of mind for your infrastructure team comes from building this process into your routine.
Want help prioritizing your security projects? Talk to us!
subscribe via RSS