How We Built Our Model-Agnostic Agent for Log Analysis

Learn how RunReveal built a model-agnostic AI agent for security log analysis. From LLM reasoning to tool calling - an under-the-hood look into our architecture.

How We Built Our Model-Agnostic Agent for Log Analysis

When Evan and I started RunReveal in 2023, we knew we wanted to build a data platform that would work well with AI. However, we thought that we'd eventually need to spend months doing feature engineering work to extract relevant fields from structured logs and train our own foundational models, using a classic machine learning development approach with custom datasets.

We were wrong.

We had no idea that large language models (LLMs), combined with reasoning capabilities, would be this effective at security log analysis when given proper context and prompts. We knew that representing logs as structured data would be essential for AI success, but we were still surprised by how well general models performed without custom training.

We’re getting real transparent today: Here's how we built our AI agent and what we learned about making AI actually work with security data.

How LLM inference actually works

Every LLM request follows the same pattern: send complete conversation history plus new message, receive generated response. No persistent memory between sessions—just a stateless function processing entire conversations from scratch each time.

When a user asks the RunReveal AI agent to analyze failed logins, the system doesn't send just that question; it sends the full conversation: initial system prompt explaining the agent's role, every previous question and answer, and all tool calls and results. Ten queries deep means sending a novel-length prompt for each new request.

func (a *Agent) SendMessage(ctx context.Context, message string) (string, error) {
    // Add user message to history
    a.messageHistory = append(a.messageHistory, llms.TextParts(llms.ChatMessageTypeHuman, message))
    
    // Generate response with ENTIRE conversation history
    resp, err := a.llm.GenerateContent(ctx, a.messageHistory, llms.WithTools(a.tools))
    if err != nil {
        return "", fmt.Errorf("error generating response: %w", err)
    }
    
    // Add AI response to history for next request
    if len(resp.Choices) > 0 {
        response := resp.Choices[0].Content
        a.messageHistory = append(a.messageHistory, llms.TextParts(llms.ChatMessageTypeAI, response))
        return response, nil
    }
}

This architecture makes context window management critical. Each tool call consumes tokens from your limit. Complex investigations hit context limits mid-stream. The solution requires designing conversation flows that compress information as investigations progress—sometimes truncating early context to preserve working memory. We're still working on optimizing this across different providers, each with varying context window sizes and token counting methods.

Why reasoning models matter

Not all LLMs solve problems the same ways; traditional models like GPT-3.5 Turbo generate immediate responses through pattern matching. They’re fast and cheap, but they struggle with multi-step reasoning.

Reasoning models like Claude 3.5 Sonnet and GPT-4o work differently. They decompose complex problems step-by-step and self-correct initial assumptions. When analyzing "failed logins followed by successful ones from the same IP within five minutes," they mentally break down the task: identify failed authentication events, find successful events, establish temporal relationships, and correlate by source IP.

The difference becomes stark with sophisticated queries. "Show me 404 errors from the last hour" works fine with any model; "Identify potential credential stuffing by finding patterns of failed logins across multiple accounts followed by successful authentications" separates reasoning models from pattern-matching ones. One-shot models generate plausible SQL that misses logical connections and reasoning models construct queries that actually answer the intended question.

For security investigations, confidently wrong analysis can cause more damage than admitting uncertainty. Reasoning models cost more and respond slower, but they build the reliability security workflows demand.

Building for provider independence

We avoided single-vendor lock-in from day one partly because the AI landscape changed too rapidly in 2023-2024 to bet on one provider. LangChain Go became our abstraction layer, creating unified interfaces for OpenAI, Anthropic, Google AI, and AWS Bedrock.

Despite the abstraction layer, supporting multiple providers requires significant engineering effort. Each has different authentication mechanisms, tool-calling APIs, and capability limitations that need specific workarounds.

switch cfg.Provider {
case "openai":
    llmModel, err = openai.New(
        openai.WithToken(apiKey),
        openai.WithModel(cfg.Model),
    )
case "anthropic":
    llmModel, err = anthropic.New(
        anthropic.WithToken(apiKey),
        anthropic.WithModel(cfg.Model),
    )
// ... additional providers
}

Each provider has implementation quirks—Google AI needs JSON-to-Go map conversion, Bedrock requires IAM role assumptions. We accepted this complexity because it keeps our customers flexible as the AI landscape evolves and prevents vendor dependency from constraining product decisions.

Tool calling: Implementation with LangChain Go

Tool calling enables LLMs to execute functions and retrieve data in real-time during conversations. Instead of describing what authentication failure queries might look like, the AI constructs SQL, executes it against actual log databases, analyzes results, and presents actionable insights. Here's how we implemented this with LangChain Go:

tools := []llms.Tool{
    {
        Type: "function",
        Function: &llms.FunctionDefinition{
            Name:        "LogsQueryV3",
            Description: "Query security logs using SQL",
            Parameters: map[string]any{
                "type": "object",
                "properties": map[string]any{
                    "query": map[string]any{
                        "type":        "string",
                        "description": "SQL query to execute",
                    },
                },
            },
        },
    },
}

Tool calling quality varies significantly across providers. For example, OpenAI's function calling rarely produces malformed calls after millions of interactions, and Anthropic excels at chaining multiple tools together. Google works reliably but sometimes needs parameter validation. Open-source models can be inconsistent with missing parameters or incorrect formatting.

To scale adding and maintaining a growing list of tools, we use deterministic code generation with go generate, not LLMs, to produce the tool calling code directly from our API endpoints. This approach gives us precise control over parameter validation and error handling, and it enables cleaner authentication patterns that inherit user permissions seamlessly.

Authentication and authorization: Inheriting user context

The RunReveal AI agent inherits the exact authentication context of requesting users. This sounds simple but fundamentally changes security models compared to systems where AI runs as separate services with independent permissions.

When the RunReveal AI executes queries, it uses precisely the same access controls as if users wrote the SQL themselves. No separate AI service accounts. No permission escalation risks. No complex RBAC rules for AI behavior. The agent proxies user HTTP requests with authentication headers intact.

func executeHTTPToolCall(
    ctx context.Context,
    server *Server,
    method, path string,
    authHeader string,
    args json.RawMessage,
    originalReq *http.Request,
) (string, error) {
    proxyReq, err := http.NewRequestWithContext(ctx, method, path, body)
    
    // Copy authentication headers from original request
    if originalReq != nil {
        for k, v := range originalReq.Header {
            if k != "Content-Length" && k != "Accept" {
                proxyReq.Header[k] = v
            }
        }
    }
    
    // Execute with user's exact permissions
    server.router.ServeHTTP(recorder, proxyReq)
}

This eliminates entire categories of security concerns that plague AI systems added to products as an afterthought. No risk of AI accessing unauthorized data. No duplicate authorization logic for AI workflows. No complex mapping between user permissions and AI capabilities. Audit logs automatically capture appropriate user context because AI literally acts as that user.

The elegance shows in multi-tenant environments. A security analyst with access to specific business unit logs asks AI to investigate suspicious activity. Queries automatically respect boundaries because AI inherits identical access controls. No additional configuration. No cross-tenant data leakage risk. No separate permission management overhead.

Real investigation workflows

Reasoning models plus robust tool calling plus proper authentication context enable investigation workflows that seemed impossible years ago.

Example: "Have we seen unusual authentication patterns in the last 24 hours?"

The reasoning model breaks this down: unusual could mean failed logins, new locations, off-hours access, or dormant account activity. It starts broad, then narrows based on findings. First query examines authentication events with above-normal failure rates.

Results show elevated failures for several accounts, mostly normal password expiration issues. But one account shows failures from multiple IP addresses followed by successful login from a completely different location.

This triggers follow-up queries examining geographical distribution of authentication attempts. The AI notices failed attempts from residential broadband networks across different countries, while successful login originated from the user's typical office location.

This pattern suggests credential stuffing or brute force attempts that succeeded through other means—compromised VPN or lateral network movement. The reasoning model identified it by following logical investigation steps, executing appropriate queries, and synthesizing results across data sources. Without explicit programming for this attack pattern, the RunReveal AI agent conducted the investigation within the analyst's existing access controls and left complete audit trails of decision-making processes.

Try it yourself

RunReveal's AI agent works with your preferred LLM provider: OpenAI for cutting-edge capabilities, Anthropic for reliable reasoning, or AWS Bedrock for compliance requirements. We're unopinionated about which model you choose or how you deploy it.

Want to give it a go for yourself? Sign up to RunReveal for free and see how AI-native architecture transforms complex analytical workflows into intuitive conversations.

This blog post is part of our #RunRevealWeek where our team is sharing product announcements, behind the scenes on how we approach problems, and the unique features that make us, us.