The Next-Gen Cyber Range: Bringing Incident Response Exercises to the Cloud

Author: Matthew Dobbs

At IBM X-Force, we keep our customers on the cutting edge of cybersecurity experiences, centered around incident response, and these include responding in cloud native environments.

What is cloud native? It is a concept that grew out of the astonishing and rapid transition to cloud computing and is a fundamental shift in how applications and infrastructure are built, deployed and secured. The driving forces behind cloud native is the popularity of Kubernetes, the fast-growing open-source container orchestration systems and the Cloud Native Computing Foundation (CNCF). The CNCF serves as the vendor-neutral home for many of the fastest-growing open-source projects, including Kubernetes.

At IBM, we define cloud native applications and infrastructure as those that “consist of discrete, reusable components known as microservices that are designed to integrate into any cloud environment. These microservices act as building blocks and are often packaged in containers. Microservices work together as a whole to comprise an application, yet each can be independently scaled, continuously improved and quickly iterated through automation and orchestration processes.”

With cloud native becoming a key part of many organizations’ IT infrastructure planning, we recognize all related capabilities need to extend into cloud native. That includes detecting and responding to incidents that can impact these environments.

From Physical to Virtual to Cloud Native

The concept of the cyber range was born in the physical era. If an enterprise deployed a server, security or IT administrators (admins) controlled the physical box and the cage, including network interface cards, the hard drives and everything else. If an enterprise ran software, it controlled the code it was installing. In the virtual and cloud native realms, admins and security have little to no control of the physical hardware or even the primitive control plane running beneath cloud native applications. This means that approaches to cybersecurity must take into account key differences in architecture, monitoring and control.

Virtualization, in many senses, can introduce some complexity. That’s especially true when organizations work in hybrid mode, where some of their assets are still in-house, but they also have cloud-based workloads and data across a number of different vendors.

To meet the real-world needs of businesses, both today and in the future, we have been designing a cloud native version of our cyber-range experience.

The need for incident response capabilities in the cloud is clearly growing more urgent as entire workforces are working remotely and the pandemic continues. We are seeing attackers increasingly targeting weak links in the cloud native universe, exploiting popular platforms and apps, and adapting malware to run on Linux and cloud environments, often written in the Go programming language to run across hybrid infrastructure.

For example, Amazon Web Services’ (AWS) S3 storage system is notoriously challenging to properly secure. It is also the most popular object storage fixture in the cloud, and companies continue to put sensitive data into S3 buckets. The result is a greater risk of either inadvertently exposing the bucket to the open internet or devastating attacks like that which CapitalOne suffered in the summer of 2019. In that attack, a rogue former AWS employee exploited a misconfigured web application firewall (WAF) to expose over 106 million people’s credit card applications.

How Do Cloud Native Security Exercises Differ?

In the old world, when an issue was detected, a security team could opt to isolate an affected server from the network, remove connectivity yet preserve the physical machine for further examination and analyses. This is not possible in the cloud native world because there is no physical machine to preserve. While virtual instances can be removed from the company’s network, there is no easy or quick way for the security team to remove the physical server from the cloud provider’s network. In fact, doing so would disrupt applications for many other customers.

Cloud native applications and architectures require a different way of thinking about and responding to security threats. For example, when a container or virtual machine is compromised, it is critical that the security response team immediately freeze and isolate that compute instance to allow for proper forensics. This goes against the initial impulse to shut down the cloud server, which might stop a breach in its tracks but would also obliterate the forensic trail required for proper root cause analysis.

These differences are multiplied when a DevSecOps team is working to secure their applications and infrastructure across multiple clouds. Security conventions and capabilities differ widely across clouds. For example, the default configuration of storage services like S3 may have different levels of security. The conventions for collecting and analyzing cloud log files are different in AWS Cloud Trails and Microsoft Azure Sentinel, the two cloud log file services. The base-level application programming interfaces (APIs) that run on AWS and Azure also have fundamental differences which require developers communicating with those APIs to use different language conventions in command scripts and DevOps tools for continuous integration (CI) and continuous deployment (CD).

To detect potential security incidents, proper monitoring must enable teams to see across the entire environment they have to secure. Unfortunately, with complex hybrid infrastructures, creating a single-pane-of-glass view that is apples-to-apples and easy to use presents real challenges. That is why we often generate this type of dashboard and connectivity layer as part of our security consultancy work for enterprise and government clients.

Beyond monitoring hurdles, security tools and controls available to teams can also vary across clouds. For example, AWS has three types of load balancers with different (or no) WAFs. Managed Kubernetes services running on AWS and Azure do not allow the easy placement of a WAF in front of the cluster.

In addition, how a security team might use a continuous security control validation platform, like SafeBreach, differs significantly between on-prem and in-cloud paradigms. In the cloud, the majority of applications are deployed using container technologies. The capabilities of containers to create immutable and ephemeral infrastructure at scale is what makes cloud native possible. So, for these exercises, we use attack simulations against containers, targeting their network stack, their security processes and any other attack surface that security teams must manage as part of building out and securing a multicloud environment.

The shift to container-based security also expands the circle of participants. In this universe, developers are continuously deploying code and applications, and they control what goes into the containers. Security reviews tend to be automated due to the rapid release cycle. So, the Cloud Native Cyber Range includes participants from the developer team as well as DevSecOps, which was not always the case for classic incident response teams.

Tying It All Together in One Place

Speaking to our customers about their needs for incident response training, we heard they wanted a practice arena where security teams can analyze, monitor and orchestrate not only across hybrid (legacy) implementations and cloud but also across multiple clouds. This is the driving idea behind the IBM X-Force’s Cloud Native Cyber Range.

On the cyber range, exercises start with an urgent trigger — like a call from a journalist or an email from the FBI — notifying the business’s team of a breach. The source of the breach in these drills can be one of the numerous attack points in the public cloud. We also make sure to include portions of the cloud that are not under the control of the participants to teach them how to react to this new reality and also how to interact with the public cloud security teams to maximize their chances of a successful response to minimize the impact.

With SafeBreach, we run a container-focused attack playbook to focus on potential kill chain progressions and then highlight which controls worked and which controls failed. The remediations prescribed by SafeBreach can serve as guidance for improving the cloud native security posture of participants. After remediations are enacted, we run the same playbook to ensure that controls are now working as expected.

Part of our primary goal of building the Cloud Native Cyber Range is to help security organizations and developer teams understand the best ways to validate security in distributed applications environments with higher-velocity deployments. This requires that teams up their security metabolism because the remediation steps required actually perform better if they can be baked into the CI/CD pipeline rather than added as reactions to indicators of compromise.

Most teams understand this already, but helping them live it is the best way for them to internalize this shift and change security stances to reflect the realities of the new cloud native way.

To learn more about IBM Security’s cyber ranges, please visit the Command Center site here.

Resource

SafeBreach Breach and Attack Simulation (BAS) Platform

Use Cases

Resource

SafeBreach Security Control Validation: Minimize Risk, Maximize the Return on Your Security Investments

GUIDE

Breach and Attack Simulation

Resource

SafeBreach & Zscaler Internet Access™

Resource

Four Pillars of Breach and Attack Simulation (BAS)