3 min read

Introducing RunReveal Transform: Easily ingest any log data

Solve your onboarding and data quality challenges with transforms.

Today, we’re excited to announce RunReveal Transforms. Transforms provides our customers with the power to quickly ingest any type of security log by running customer provided python functions. These python functions can be used to transform, validate, or discard ingested security data before that data is used by downstream alerting, notification, and storage systems.


An updated (for now) RunReveal product diagram.

When we first started RunReveal and asked security leaders about their biggest SIEM pain-points, data quality and data onboarding issues were amongst the most common answers. We believe that giving customers the ability to write custom code to aid in the ingest process solves a major pain-point for security teams everywhere.

How does it work?

We receive the data, parse it into an object, and invoke a function that you’ve written where you format the data into the canonicalized / normalized format. The code you write is just like any python function, but it returns the transformed version of the log into our standard normalized format. All of this code safely runs within a webassembly runtime environment.

import deep_get from runreveal

def transform(event):
    tags = {}
    if deep_get(event, "log", "nodeID") != None:
        tags["nodeID"] = deep_get(event, "log", "nodeID")
        
    return {
        "id":event.get("id"),
        "event_name": deep_get(event, "log" "type"),
        "actor_ip": deep_get(event, "log", "sourceIPAddress"),
        "actor_email": deep_get(event, "log", "data", "actor"),
        "tags": tags
    }

Having a single basic format across all of your logs makes searching simple. It also means that when writing triggers, you can write "universal triggers" that will work the same way across each of your log sources. Imagine the power of writing one alert for all Login events across all of your SaaS tools and infrastructure.

This design does raise an important question, though. If we are requiring all transformed objects be returned with a specific format, does that mean the RunReveal is schema on write?

Schema on read and write

Within the world of security data, schema on read versus schema on write is a never ending tradeoff. With schema on write, having a strict format for your logs makes search is performant and keeps your data clean. It is not without drawbacks, however. Schema on write is rigid and making changes to your schema can be difficult. Schema on read attempts to solve these by making a different set of tradeoffs. Querying and ingesting with a schema on read system is simple, but it is expensive to scale and is slow to query under reasonable load.

So which is better? The reality is that security teams need the benefits of both, and we fortunately have to technology in today's age to avoid these firm tradeoffs from the past.

Transforms provides a flexible tags system that can be used to store whatever special data that is needed, and searched in a performant manner. In the above you'll see an example of storing nodeID within tags using a python dictionary. When searching and triggering on the above transform example, using nodeID would work exactly how searching the standard fields would work.

Discard data using transforms

One thing we hear from security teams regularly is they end up needing to filter the data they are collecting because some data sources are noisier and lower signal than others, and collecting everything can lead to runaway costs.

We tried to consider this and build it directly into the transforms product. Dropping the data you don't want to collect is as simple as return None. In this example, all logs that don't have a type of Login are discarded.

import deep_get from runreveal

def transform(event):
    if deep_get(event, "log", "type") != "Login"
        return None
        
    return {
        "id":event.get("id"),
        "event_name": deep_get(event, "log" "type"),
        "actor_ip": deep_get(event, "log", "sourceIPAddress"),
        "actor_email": deep_get(event, "log", "data", "actor"),
    }

What's next?

We will be announcing a lot more in the coming weeks. If you aren't subscribed to our blog you really ought to be. I couldn't be more excited about our next blog we're writing. If you're interested in learning more about what we're building, please reach out to contact @ runreveal.