Tuning the signal to noise ratio

Ask any security operations analyst about their biggest frustrations, and alert fatigue will be among them. They constantly struggle to identify the serious threat indicators while ignoring the false positives. Scientists and engineers have a name for this balance between useful and irrelevant data. It’s called the signal-to-noise ratio.

The signal is the important data, while the noise is everything else; the white noise that gets in the way. When the signal-to-noise ratio is too low, the noise drowns out what’s important. Experts from radio operators to genome scientists grapple with these issues in some form.

Improving the signal-to-noise ratio is also a problem for modern IR teams who face information overload. They are swamped with rising levels of network event data. They have trouble sifting through it all to find the real threats. Sometimes they fail, with potentially disastrous consequences.

The problem

The problem facing SOCs is twofold. The first issue is data volume. There’s a lot of it. Modern networks are information firehoses, churning out rivers of data. Every year, better network telemetry increases that volume. The result is a surplus of alerts, which we can call ‘candidate signals’. These are interesting data points that might warrant further investigation.

This is compounded by the second problem: resource scarcity. SOCs continually struggle to find enough talent to cope with the flood of data from increasingly complex infrastructures. Without those manual skills, many find themselves overburdened and unable to get the intelligence they need from the data that’s coming in.

The natural reaction to not having enough of a signal is to add more data. For many SOCs, this means buying more tools and telemetry, typically in the form of endpoint detection and response (EDR) or endpoint protection platform (EPP) products.

This is the wrong approach. Many SOCs incident response platforms are already disjointed, comprising tools from different vendors, acquired over time, that don’t play well together. This makes it difficult to get an end-to-end view of the incident response process, and in most cases also stops operators handing off interesting telemetry investigations to each other.

Adding to these platforms might create more relevant signals, but it won’t help SOCs to spot them. It will do the opposite, creating more noise that drowns those signals out. Any attempt to fix the SOC by generating more data amplifies the underlying problem.

If the signal-to-noise ratio remains low, then the growth in network telemetry becomes a greater source of risk. Poor candidate signal filtering leaves operators unsure where to begin and blinds them to real, time-critical attacks. The results can be catastrophic.

The fix

SOCs can’t dig themselves out of this hole by generating more data. Instead, they must address the underlying problem. They must find better ways to spot the right signals in the data they already have. To do that, they must alter the signal-to-noise ratio.

In practice, this means reducing the number of candidate signals. SOCs must present SOC analysts with fewer alerts so that they can focus their attention on what really matters.

The key to increasing the signal-to-noise ratio is a tightly integrated end-to-end tool chain. This is a set of tools that work together seamlessly with little overlap, and all able to exchange data with each other smoothly throughout the entire cycle of detection, containment, mitigation, cleanup, and post-incident analysis.

This approach helps in several ways. First, it reduces the noise from different tools that would otherwise overlap with each other. This eliminates the shadow signals that can distract busy operators.

It also combines events and alerts into incidents, which are larger, more visible data elements that are easier to track. This gives analysts a top-down view of candidate signals without having to trawl through low-level events and correlate them manually.

Finally, it enables SOCs to better automate the detection, analysis, and reporting of incidents. This automation is a key part of the event correlation process.

A well-formed tool chain detects candidate signals early, developing them through several stages of analysis. This allows the SOC to either confirm and escalate candidate signals or dismiss them quickly if they are found to be benign. This helps to automatically mitigating many incidents without having to alert human operators, leaving them to focus on those alerts that need their attention.

The result

SOCs that invest in tool chain integration will enjoy a smaller, refined set of alerts that come with the appropriate data, ready for human operators to deal with efficiently.

This higher signal-to-noise ratio will show up on analyst screens, reducing their cognitive load. It will mean fewer investigation numbers and reduced investigation times. This will lead to better outcomes for SOCs in the form of shorter containment times and an overall reduction in response times. Ideally, this will prevent attackers from getting close to your infrastructure, but in the event of a successful compromise, it can also reduce attacker dwell time, mitigating the effect of the attack.

When it comes to handling fast-moving cybersecurity incidents, the sharper focus that comes from a less cluttered data environment can be the difference between containing an incident before it does any damage, and making the next week’s headlines for all the wrong reasons.

The time for change is now

This optimization process should begin as early as possible in the incident response process. The longer that the SOC allows less relevant candidate signals to linger, the more they will proliferate and the more difficult it will be to discern what’s important. Triaging candidate signals as soon as possible frees up analysts to apply their skills to the signals that matter. In an industry where talent is hard to come by, it’s imperative to keep those analysts as productive as possible.

With that in mind, now is the time to support these goals by revising your process chain to look for improvement opportunities. Take a beat and step back to examine your overall tool set and your team structure. At some point, you might find that generating more telemetry yields results, but only if you have the capabilities to weed out the noise quickly. In the meantime, less is more.

Jan Tietze, Director of Security for EMEA, SentinelOne 

Source link