One Step Ahead: How Harvey Uses RunReveal for Proactive Network Threat Detection

Harvey processes ~8 TB of network data every day. See how their security team uses RunReveal to run real-time threat detection at scale, and why network logs are now their first line of defense.

One Step Ahead: How Harvey Uses RunReveal for Proactive Network Threat Detection
A collaboration between Mike Parowski, Detection and Response at Harvey, and the RunReveal team.

Harvey processes ~8 TB of network data every day. RunReveal turns it into real-time threat detection.

AT A GLANCE: Harvey replaced a reactive, retrospective approach to network security with proactive, real-time threat detection using RunReveal. By ingesting Azure flow logs directly into RunReveal — built on ClickHouse — and restructuring how that data is queried and stored, the security team went from drowning in raw log files to running complex detection queries in seconds and compressing 8 TB of daily data down to ~350 GB.

"On the security team at Harvey, we like to say that if we do our jobs well, and if we protect this perimeter, we let all the brilliant people at Harvey do their jobs. RunReveal helps us do just that."

Mike Parowski, Detection and Response, Harvey

Harvey is rewriting what's possible in legal work — using AI and LLMs to help lawyers research, draft, and reason through complex problems faster than ever before. But legal AI comes with serious responsibility. Harvey's customers are law firms. The data they share is some of the most sensitive in the world.

When Mike Parowski joined Harvey as the company's first Detection and Response hire in April 2024, protecting that data from network threats was job one. The approach he wanted to build wasn't the standard one.

In most security programs, network logs are treated as a secondary resource. Something you reach for after an endpoint alert fires, to reconstruct what happened. Mike wanted to flip that model entirely: use network logs proactively, as a first-line signal for threats that endpoint detection and response (EDR) tools miss.

The problem was the data volume.

~8 TB of daily data, and queries that timed out before they finished

Harvey's network activity is captured through Azure flow logs, which log every connection across Harvey's infrastructure: source IPs, destination IPs, ports, protocols, whether traffic was allowed or denied. At Harvey's scale, that amounts to 8 TB of data per day, spanning ~100-210K unique IP addresses and up to ~3-5 billion rows.

And that number is only going up. "We expect this data to grow proportional to our customer base," Mike explains. "As Harvey gets more customers, this data will only go up."

The logs themselves arrive in "block blob" format in Azure storage, meaning raw, unstructured chunks that have to be traversed before any meaningful queries can run. Each minute generates hundreds of thousands of log entries for a single service alone. The file paths alone formed a complex web to navigate.

"At this volume, doing any sort of aggregate statistics and proactive detection becomes nearly impossible," Mike says.

Running queries against this data was slow. Out-of-memory errors were common. Complex queries timed out before they finished. The team was stuck reacting to incidents rather than getting ahead of them.

Getting off the back foot: RunReveal and ClickHouse as the foundation

Harvey turned to RunReveal to rebuild how their network logs were stored, queried, and acted on. RunReveal is built on ClickHouse, a high-performance columnar database designed for exactly this kind of workload — massive volumes of structured data that need to be queried fast.

RunReveal ingests Harvey's Azure flow logs directly, normalizing and enriching the raw data as it arrives. But the bigger shift was in how the data is structured for querying. Working with RunReveal, Mike's team carefully selected primary indexes around the fields they knew they'd use most: source IPs, destination IPs, and destination ports. They built materialized views to pre-compute aggregations, so common queries don't need to recalculate from raw data every time.

"By really understanding how we would use these logs, and by understanding the domain itself, we could carefully select the indexes we'd use in the materialized views to drive those efficiency gains," Mike says.

The results were immediate. Queries that used to time out now return in seconds. The ~8 TB of daily raw data compresses down to ~350 GB without losing the fidelity the team needs for investigation. Complex detection logic that was previously impractical to run continuously now runs in real time.

From raw IPs to readable context

0:00
/1:01

Raw network logs are full of IP addresses. Thousands of them. On their own, they're hard to interpret quickly, especially when you're trying to spot an anomaly in the middle of an incident.

Harvey's security team built enrichment workflows on top of RunReveal that map raw IP addresses to human-readable labels: known services, internal infrastructure, flagged external addresses. Instead of scanning a stream of numbers, the security team now gets a clear, high-level view of network activity as it unfolds.

"Combining these fields lets us effectively answer the question of who is doing what," Mike says, "and gives us quantifiable metrics to compare today to yesterday to the day before."

That longitudinal comparison is the core of the proactive approach. Anomalies become visible immediately: unexpected traffic spikes, unusual port activity, new external destinations appearing in the flow. The team isn't sifting through raw logs to find what went wrong. They're watching for patterns that point to real security concerns before those concerns escalate.

Detection that stays ahead of growth

Harvey is growing fast, and the threat surface grows with it. More customers means more network activity, more data to protect, more potential vectors to monitor.

RunReveal and ClickHouse give Harvey’s security team the infrastructure to scale alongside that growth. Pre-computed aggregations mean the team spends time acting on signals, not waiting for them. Enriched context means fewer dead ends during investigation. And the underlying architecture means that as the data volume increases, the detection capability doesn't degrade.

The philosophy Mike built Harvey's security program around has held up: network logs, used proactively and at scale, can catch threats that endpoint detection misses.

To learn more about RunReveal and see how it can improve your team's detection and response capabilities, request a demo here.

The original version of this article was published on ClickHouse's website under the title "One step ahead: How Harvey uses ClickHouse for proactive threat detection", written by Mike Parowski. To learn more about ClickHouse, please check out their site.