7 min read

Introducing Correlated Alerting. A new method of detection that optimizes for high signal alerts

Introducing Correlated Alerting. A new method of detection that optimizes for high signal alerts

Today RunReveal is announcing the beta release of correlated alerting, a new security alerting technique that is running for all customers today and is designed to deliver significantly higher signal for cloud detection and SIEM use cases.

Current stream processing techniques and log query languages are really bad at searching for threat actors. No single indicator or log can reliably indicate a compromise, yet all the log processing and search tools we have available to us today are built to look for one set of conditions at a time. The fact that most security teams struggle with alert fatigue is a direct result of this old paradigm, and the current set of vendors repeating this same old mistake.

The out of the box detections you get with RunReveal already use this technique. However, with this release we're also providing a framework for you to easily build your own correlated alerts! There's no magic when it comes to hard data problems however, so let's look at the details of how this works.

Eliminate false positives with "The swiss cheese model"

If you look at the baseline detections that your average SIEM comes with, they might have a few hundred that look something like this:

my_log_source
	| where eventName=="SuperSensitiveThing"

This seems great at first glance. Your security team probably worries about SuperSensitiveThing so why not get a notification whenever that happens? Makes a lot of sense, right?

Well…maybe. If this isn't something that fits your organization's practices then it's likely to cause a false positive every time SuperSensitiveThing happens. After adding another 50 similar alerts you're bound to add detections that don't fit your practices, and the false positives will add up. The result is usually a lot of unhappy incident responders and security engineers. Your organizations detections, when alerting on individual events, need to closely match your company's security controls in order to be effective so most security teams slowly add detections one at a time to avoid creating noise.

But what if we could take a lot of relatively simple alerts and have them work together in order to make better alerts and reduce false positives? There are lots of other fields that have this kind of problem that we can borrow from. What would an abstraction like that look like?

The swiss cheese model is usually used for accident prevention. But it works well for false positive prevention too!

The swiss cheese model is usually referenced in the context of accident prevention in fields like Aviation, where having multiple layers can prevent accidents if any one layer fails. Could we apply this same strategy when writing alerts too? Instead of alerting on single events that match your detection, why not alert when a set of bad conditions all occur at once?

How would you express this kind of thing in a query language or within an event stream? Good luck! The syntax of doing this in a programming language like python, or a query language will be nightmarish and unmaintainable. Supporting this type of detection strategy requires a system designed to do it.

An abstraction for correlating detections

When RunReveal first built our detection engine, prior to thinking about correlation or false positives, we made the decision to log the results of every detection query that ran. We store the results of these detections in a table called detections, along with a bunch of metadata like the detection name, user-defined categories, mitreAttack classifications, riskScore, etc.

This ended up being a good choice.

After gaining some experience building detections on our platform, we realized that we could greatly reduce the false positive rate for alerts by stacking detections upon one another. We decided to classify these two distinct types of detections as Signals and Alerts. We also added views for the detections to make them easy to utilize:

  • signals - Signals are detections that extract information from your logs but don't send an alert on a notification channel.
  • alerts - Alerts are detections where a notification was sent to at least one notification channel.

Here I am querying my signals and you can see that I logged in from a new country.

rr> select * from signals LIMIT 1;\G

Row 0

 id              | 2d5iQuQmIopGZWLrZBlqHOL9BMf
 scheduledRunID  | 2d5iQbq8niSdzk0rjjbcegp6yk3
 workspaceID     | 2KUOdhvRreF5RZfQX8ILneT4fSd
 detectionID     | 2c9D7GZIJsyVrgYVSERAQJXVHs4
 detectionName   | new_country_login
 recordsReturned | 1
 runTime         | 1.13707854e+08
 query           | ...
 params          | map[from:2024-02-09 table:logs to:2024-02-10]
 columnNames     | [workspaceID sourceID ...
 columnTypes     | [String String ...]
 results         | ["2KUOdhvRReF5RZf", "2SO1VPGVFtF5bp" ...]
 severity        | medium
 actor           | map[email:evan@runreveal.com]
 resources       | []                            
 srcIP           | 1.2.3.4
 dstIP           |      
 categories      | ['okta', 'signal']
 mitreAttacks    | ['initial-access']
 riskScore       | 50
 error           |
 createdAt       | 2024-03-01T14:45:24Z
 eventTime       | 2024-03-01T14:13:33Z
 receivedAt      | 2024-03-01T14:32:51Z
 
Ran Query: select * from signals LIMIT 1

Retrieved 1 rows in 810.770875ms

This intermediate signals table is a critical staging-area required to build more advanced alerts. Let's look at an example to show why that is.

Let's say we wanted to alert when a single user did a lot of risky activity in a short window of time. We could collect all of the risky activity in the signals table, sum the riskScore for each actor within that timeframe, and set a threshold.

with all_risk_scores as (
    SELECT
	    actor['email'] as email,  
        sum(riskScore) as riskTotal
    FROM signals  
    WHERE
	    email != ''  
        AND eventTime > {from:DateTime}  
        AND eventTime < {to:DateTime}
    GROUP BY email
) SELECT * from all_risk_scores where riskTotal > 100

This is a pretty basic example but we think it illustrates the idea well. To recap, an individual event might be enough information to know whether or not an alert should be raised but that isn't always the case, and matching one event at a time will never be enough to find malicious patterns in otherwise normal behavior.

Easily writing correlated alerts

The example above isn't the most advanced SQL query or the most difficult to read, but subqueries are not ideal and can quickly become tedious. We wanted correlated alerts to be incredibly simple to write so our customers could build their own even if they weren't SQL experts or had never heard of a sub-query.

To make it simpler we had to first understand that the correlated alert above is really doing two things:

  • Grouping relevant information by the user in that time-span
  • The conditions you want to alert on, in this case where riskTotal > 100

Knowing that most correlated alerts are going to be using the detection metadata we can make writing an alert simpler by doing all of the grouping and aggregation by identity by creating more helpful views on top of the original detections table. This will allow our customers just to need to worry about writing conditions, and we worry about the complexity.

SELECT * FROM signals_grouped(from={from:DateTime}, to={to:DateTime}, window={window:Int64})
WHERE length(mitreAttacks) > 3

WOW! That's finally really simple. But, what's this magic signals_grouped doing? It's a view that is aggregating information from the signals table by identities, associated bits of metadata, and handling windowing by time – everything up to the WHERE clause is boiler-plate (for now). The output of this query (in this case expressed in JSON for readability) is much higher signal and much more useful for making decisions about when to alert, and where to start an investigation than the raw data would otherwise have been. Notice the ids field which contains the original event identifiers to easily pivot to the raw data that was alerted on.

 {
    "workspaceID": "2KUOdhvRReF5RZfQX8ILneT4fSd",
    "email": "evan@runreveal.com",
    "userIDs": [
      "deadbeefdeadbeefdeadbeefdeadbeef"
    ],
    "srcIPs": [
      "34.227.127.165",
      "2607:fb90:87e3:4c35:e066:a6d8:6856:4eaf",
      "70.122.134.143",
      "77.211.4.216"
    ],
    "srcASOrganizations": [
      "AMAZON-AES",
      "T-Mobile USA Inc.",
      "TWC-11427-TEXAS",
      "VODAFONE_ES"
    ],
    "srcASCountryCodes": [
      "US",
      "ES"
    ],
    "detectionNames": [
      "impossible-travel",
      "service-account-created",
      "okta-user-session-impersonation"
    ],
    "totalRiskScore": 155,
    "mitreAttacks": [
      "initial-access",
      "persistence",
      "lateral-movement"
    ],
    "categories": [
      "signal",
      "gcp",
      "okta"
    ],
    "ids": [
      "2fWEjgIRxPfksdwmqjFYQMGUqUs",
      "2fWEjezcji2uRZE5MCY7joFk2Yt",
      "2fWEjZMiaxtlHtjLfXZjdpA7fWH"
    ],
    "severities": [
      "Medium",
      "High"
    ]
  },

This is some heavy duty data management but luckily you don't need to worry about how it works if you don't want to. We give this to our customers so they can read the documentation and write similar queries that utilize signals_grouped.

The amazing thing about this is signals_grouped is a view, so as we add more enrichments, expand the capabilities of the view, and even update information to keep the context of your signal up to date.

What's next

SIEM vendors have been giving their customers "out of the box" detections for decades that are designed to create noise and not alert on meaningful threat scenarios. Now the industry is collectively complaining about false positives, alert fatigue, and burned-out SOCs. As an industry we need to reckon with this truth.

Whether you're building your own SIEM, or spending millions each year on a commercial tool, you need to transition away from alerting on individual logs. RunReveal is giving this capability to all of customers from the free tier to the enterprise. The highest performing detection and response teams rely on the right abstractions that collect the signal and filter out the noise, and that signal usually isn't in the raw telemetry.

RunReveal is here to help and our aim is to build a SIEM that is truly loved, which means innovating on each individual aspect of detection and response. To accomplish this we peeled back the curtain on detection and response programs out in the industry and read a lot that had been written on this topic before building this ourselves. Some blogs we read:

We have a few other big releases planned for the next few weeks. If you're interested in this product or what else is on our planned roadmap, please get in contact with us using this form.


RunReveal will be at BSidesSF and RSA this year in San Francisco. If you're buried in false positives and paying the price (figuratively and literally) reach out and the RunReveal founders will buy you a coffee to commiserate.