Why Security Logging and Monitoring Is So Hard to Get Right
Security teams hoard data, fight slow search, and overspend on fragmented stacks. Learn the real trade-offs behind security logging and how to build a stack that actually works.
If you ask ten CISOs what keeps them up at night, logging and monitoring will come up almost every time. It's not a new problem — it's a persistent one. And despite decades of tooling, budgets, and engineering effort thrown at it, most security teams are still struggling. Here's why.
The psychology of a security team
To understand the problem, you first have to understand how security teams think. They rarely know which logs are going to matter. Every alert, every incident, every breach investigation comes with the haunting possibility that the critical evidence was a log they never thought to collect.
This creates a hoarding mentality — and it's completely rational. Companies are dynamic; their infrastructure changes constantly, and with it, the logs that matter change too. New services spin up, old ones retire, third-party tools get added to the stack. The result is a never-ending list of log sources you're supposed to be monitoring. As soon as you get one under control, three more appear.
The data problem nobody warned you about
Once you commit to collecting everything, you end up with a genuinely unique data engineering challenge. Security teams sit on enormous volumes of data — data they need to search quickly during an incident, but also data they're legally required to retain for a year or more to satisfy compliance requirements.
Think about what that means operationally. You need to be able to run a search across twelve months of log data in near real-time when an incident hits. At the same time, you need to store that data cost-effectively because the volume is massive. These two requirements pull in opposite directions, and the way teams try to reconcile them is where things get messy.
The indexing challenges
The most common approach to fast search is indexing — pre-computing data structures at ingest time so queries can find relevant data quickly instead of scanning everything raw. Done well, indexing is powerful. Done poorly, it's a foot gun.
The challenge is that building and managing indexes at scale is expensive and full of surprises. Time-based indexes seem obvious, but what about other dimensions? You might assume a field is low cardinality and worth indexing — only to discover later it has millions of unique values, making your index massive, slow, and costly to maintain. Every new log source is a new opportunity for your indexing strategy to break down in unexpected ways.
And the compute cost compounds. Every byte of data you ingest has to be processed through your indexing pipeline. The more data you collect (and security teams collect a lot), the more that bill grows.
The stack sprawl problem
So, security teams start getting creative. Instead of putting everything in one place and indexing it, they split their data across multiple systems. Hot data goes into the primary SIEM for fast querying. Cold data gets dumped into S3 for cheap storage. Historical searches run through Athena or BigQuery when someone absolutely needs to go back that far.
The problem? Now you've built a distributed system out of mismatched tools that weren't designed to work together. When an incident requires correlating log sources across multiple systems, you're doing data engineering in the middle of a crisis. Joins are slow, schemas don't match, and your team is copy-pasting queries between three different interfaces while the clock ticks.
The duplication and fragmentation that was supposed to save money ends up costing you in a different currency: speed, clarity, and analyst sanity.
The fundamental trade-off we should be talking about with honesty
Here's the uncomfortable truth: every architecture decision is a trade-off. There are no free lunches in security data management.
When a vendor promises lightning-fast search, that speed comes from aggressive indexing at ingest time. That indexing costs money and compute. When another vendor promises rock-bottom pricing, they're almost certainly just dumping your logs into object storage and making you pay the performance penalty at query time. Searches will be slow (very slow).
The sweet spot — affordable storage with genuinely usable search — requires a thoughtful, balanced approach. It requires understanding your data, knowing what you're optimizing for, and choosing a system flexible enough to handle both ends of the spectrum without punishing you for it.
Tangible acts that actually helps
A few principles from my experience that help cut through the noise:
- Simplify your stack: The more tools you have talking to each other, the more places things break. Three thousand engineers built the tools in your average security stack, and they didn't coordinate their design decisions. Hoping that a patchwork of integrations holds together under incident pressure is a gamble you'll eventually lose.
- Learn to say no: Not every log source deserves a place in your pipeline. The discipline of saying "we don't need this" comes from having a clear picture of your actual business risks. When you know what you're protecting and why, you can make principled decisions about what data matters — instead of collecting everything and searching for meaning later.
- Understand your trade-offs before you buy: The worst outcome is spending a year's budget on a tool, getting it deployed, and then realizing it can't do what you actually need. Ask hard questions about where the costs live. Is this fast because it's expensive to ingest? Is this cheap because search is terrible? Get honest answers before you sign anything.
The bottom line
Most security teams just want their data stack to work. They buy tools, deploy them, and then gradually discover that what they built doesn't scale, doesn't search fast enough, and costs more than anyone budgeted. It's a pattern that repeats itself because the problem is genuinely hard — and because the marketing often obscures the real trade-offs.
Getting logging and monitoring right starts with understanding the constraints: you're managing a massive, dynamic dataset that has to be simultaneously cheap to store and fast to search, for compliance periods that stretch a year or more. No single tool erases that tension. But knowing it exists — and choosing your stack with eyes open — puts you in a much better position than most teams start from.