Introducing Object Storage Search with RunReveal
RunReveal's Custom Views feature now lets you search raw logs directly from any S3-compatible object storage bucket without ingesting the data into your SIEM. Query historical logs, cold storage, or data you've never indexed, using the same interface and AI agents you already use in RunReveal.
Today RunReveal is announcing that you can search raw data from any S3 compatible Object Storage with RunReveal’s custom views feature. This allows customers to keep data in their object storage and make use of it in the RunReveal platform without needing RunReveal to ingest and save that data.
This is an incredibly useful feature in cases that you need to search logs from long ago, or logs that you aren’t ingesting into your SIEM. These logs, despite existing unindexed in an object storage bucket, look like any other table in your database and can be used like any other table in your database.
Let’s take a look at how it works, what the setup is like, and what you can expect from this feature.
Setting up and searching an object storage bucket
Within RunReveal, a searchable object storage bucket can be configured under Custom Views in Settings. Setting it up requires configuring key details about the object storage bucket like:
- Access Method: Bucket Name, Region, base URL, filenames, etc.
- Authentication: Access Key and Secret Key.
- Data Format: NDJSON, Parquet, CSV, TSV, or ORC.
- Partition Key: Folder structure of the logs.
In this case I’m configuring a Cloudflare R2 Bucket to be searchable, so I can search the bucket while avoiding egress fees! You can see the different settings that are filled out. The Verification button queries a small set of logs, and then the “Auto-fill columns from samples” button will automatically fill in the table’s column structure based on the data that was selected. This means you don’t need to waste a bunch of time building our parsers or describing the data that is being searched, which can be incredibly painful.
Next is the process of querying the logs. This can be performed from our AI agent, manually in our query console, and can be interacted with exactly like any table in RunReveal can be interacted with. Commands like describe table and queries that perform grouping or use ClickHouse functions still work properly.
What’s amazing is because this feature manifested itself as a Custom View, our chat agent and SOC automation agents didn’t require modification for them to make use of this new capability. They submit queries like they normally would and are none-the-wiser that a completely different method of data access is occurring.
Pros and cons
While this is an amazingly convenient way to access your data, it’s important to recognize that there are no free lunches in data engineering. These example are querying a relatively small amount of data, about 30MB. Larger and expensive queries are significantly more expensive in terms of bandwidth needs, compute needs, and memory usage.
Buckets that contain pre-indexed data, like Parquet or ORC, have those indexes respected. But the average case for most users is likely to be NDJSON where the only index available is the partition key, which means query speeds will be potentially much slower than normal.
RunReveal plans to support parquet as a destination format which will make a dent in this problem, but there’s no substitute for indexing your data and saving it in a columnar format. This type of feature is great if you’re in a pinch, but isn’t likely to be useful for your data to data operations of large datasets.
What’s next
We couldn’t be more excited about supporting object storage search, enabling that data to be brought to your SOC automation, and supporting so many different S3 compatible APIs. We’re huge fans of Cloudflare’s R2 offering since it doesn’t require paying egress fees and it’s very user friendly.
But we’re most excited about unlocking the huge amounts of data for our customers, and that this is a feature that exists in the RunReveal Kubernetes on-premise offering and RunReveal Cloud offerings alike from day one.
We have more in flight being developed, especially enabling on-premise capabilities, so please stay tuned for what’s next.