7 min read

Introducing Sigmalite. RunReveal's open source sigma rule evaluator for detection

Today RunReveal is announcing support for sigma detections directly in RunReveal, and releasing an open-source sigma rule evaluator called sigmalite.

Today RunReveal is announcing support for sigma detections within RunReveal's product and open-sourcing our sigma evaluation engine. RunReveal's sigma engine is being released under the Apache 2.0 license and is built for stream processing. These features are a core component of our pipeline and are fully integrated in RunReveal today!

Here's the codebase and a website with a live demo of sigmalite in action:

RunReveal has integrated sigmalite into our pipeline to provide our customers with the ability to use the sigma rule format for detection within the RunReveal ecosystem. We believe our customers and the security community at large will benefit from the library being open-source by being able to embed sigmalite into their own data pipelines and perform detection outside of their SIEM!

But to understand why we built sigmalite we have to start with what sigma is and how it's used today.

A refresher on sigma

The sigma project was released with the goal of being an open format that could be used to describe detections. The motivations behind Sigma are to make detections more portable, decrease switching costs between SIEM vendors, and increase collaboration between security teams. Most SIEM vendors today are actually selling a fancy database with proprietary indexes and a bunch of pre-built rules on top, which means the detections are usually rules written in the query language of the underlying database.

Sigma was designed to be agnostic to the underlying SIEM and solved this problem by writing translators between the sigma yaml format and a bunch of different target SIEM query languages. The yaml detection rules are passed to the sigma cli and the output is a detection rule in your SIEM's query format.

Here's a basic example of this process. This detection looks for AWS root account usage and ignores root account usage from the AwsServiceEvent. Note that the condition section is an expression that defines what from the detection is being looked for in matching logs.

title: AWS Root Credentials
description: Detects AWS root account usage
logsource:
    product: aws
    service: cloudtrail
detection:
    selection:
        userIdentity.type: Root
    filter:
        eventType: AwsServiceEvent
    condition: selection and not filter
falsepositives:
    - AWS Tasks That Require Root User Credentials
level: medium

This rule can then be translated to any format that has had a backend for pySigma built for it. Sigconverter.io is an online service that does this, and here is the Splunk query for this rule:

userIdentity.type="Root" NOT eventType="AwsServiceEvent"

The loki query:

{job=~".+"} | logfmt | userIdentity_type=~`(?i)Root` and eventType!~`(?i)AwsServiceEvent`

And the qradar query:

SELECT UTF8(payload) as search_payload from events
where "userIdentity.type"='Root' AND NOT "eventType"='AwsServiceEvent'

This process can be a little clunky but in general it works and sigma rules can easily be converted to numerous SIEM backends. This process only takes a little bit of exploration of the sigma-cli and a security team can start leveraging the sigma format for basic use cases.

Advanced and full adoption of this standard takes a lot more work. Security teams need to have CI/CD processes set up to perform the validation, translation, and upload of their detections. This is problematic since most SIEM vendors haven't adopted basic configuration as code or "detection as code" support, and setting these pipelines up is manual. Additionally, full adoption of Sigma can often involve a lot of rewriting of existing detections. Mature security orgs might have a few hundred to a few thousand custom detections and the work to do this rewriting can be tedious to validate and debug.

Today we're living in a world where Sigma is popular, used by many security teams, and reasonably well liked. However it's rare that a security team has a significant number of their rules expressed in the Sigma format, or has used sigma to realize the benefit of switching SIEMs.

Sigmalite is sigma without the SIEM

Our goal with this project was to de-couple the detection from the underlying database while still supporting community written detections. Sigmalite as an embeddable go package and runtime for sigma rules made a lot of sense to RunReveal from an architecture and product perspective. It allowed us to adopt realtime streaming detections while expanding our support for community written detections at the same time.

We wanted a simple design where given a sigma rule and a structured log we could make a determination if there was a match or not. Here's a basic hello world example of how to use sigmalite.

package main

import (
	"fmt"

	"github.com/runreveal/sigmalite"
)

func main() {
	rule, err := sigmalite.ParseRule([]byte(`
title: My example rule
detection:
  keywords:
    - foo
    - bar
  selection:
    EventId: 1234
  condition: keywords and selection
`))
	if err != nil {
		panic(err)
	}
	entry := &sigmalite.LogEntry{
		Message: "Hello foo",
		Fields: map[string]string{
			"EventId": "1234",
		},
	}
	fmt.Println(rule.Detection.Matches(entry, nil))
}

A major benefit of evaluating logs prior to saving them to the database is it gives us the ability to make decisions about what to do with each individual log. Most security teams have more data than they can afford to store in their SIEM, which is always a prerequisite for detection in an architecture where detections are queries. By using sigmalite and other matching mechanisms we can help security teams choose whether a log should go into low cost object storage, your high-cost SIEM, or somewhere else entirely (like /dev/null).

This type of flexibility is a missing link in the way we think about detection. Stream processing prior to the SIEM gives us better logs through enrichments, and lowers the cost of detection since it can save us from needing to store the log at all. This ability to decide on the priority of a log is something that RunReveal is working with our customers on supporting and should be coming to our product very soon.

Sigmalite at RunReveal

When we started work integrating sigma into RunReveal we needed to figure out where in our data pipeline sigma belonged. We decided that putting it after enrichments but before logging made the most sense, and the resulting pipeline looked something like this.

RunReveal gives our customers many detections out of the box, but we wanted our customers to be able to bring their own custom sigma rules to RunReveal. Creating a new sigma detection is as simple as copying and pasting it, validating it, and the rest is automatic.

RunReveal will automatically run the detection on all logs that match the logsource from the sigma rule. Once a sigma detection finds a log that matches the detection condition we save that to the detections table along with the metadata!

Saving the matching log to the detections table allows our customers to use sigma rules for correlation in our detection queries. In practice, here's how quick and easy it is to use this feature with RunReveal.

This may seem minor and simple, but sigma is integrated with our platform in a way that allows our customers to use it with all of our other features. Logs that match a sigma rule can be forwarded to a notification channels for slack alerting, SOAR playbooks, saved for longterm storage, or whatever other use case our customers have.

We're excited about sigmalite because it provides an alternative architecture to SIEM which can reduce costs by decoupling detection from the datastore, and it can be deployed alongside any detection stack.

What's next

Correlations are inherently a part of the sigma spec and they were just added to the sigma spec a couple of weeks ago. This is something that can't be supported in a stateless stream processing library like sigmalite so for these detections we'll likely transparently support them in the near future.

We are focused on providing our customers with the ability to make a decision about where to send each individual log they receive using sigma, regex and other matching mechanisms. This is functionality we already support but haven't exposed to customers, and we are working on exposing it this year.

Sigmalite is just a single piece of the puzzle to disrupt the SIEM category. We plan to talk about this in one of our next blogs so you'll have to stay tuned.


RunReveal loves to show off our tech and if you'd like to get a demo from the founders, ask questions, or see if RunReveal is a fit for your company then reach out today.