News Featured

Introducing pql, a pipelined query language that compiles to SQL (written in Go).

Evan Johnson

27 Feb 2024 • 5 min read

Today we're open-sourcing pql under the Apache 2.0 license and announcing that all RunReveal customers can use pql to query their logs. We built pql because the major security vendors use proprietary languages as a source of vendor lock-in and there were no open-source alternatives. pql is SQL agnostic, allows utilization of underlying database functions, and can easily be added to existing products that support SQL.

Here's a link to the code, and the language's website:

Let's start with an example pql statement query, which uses the underlying Clickhouse function toIntervalMinute , to look for how many logs were created in the past 15 minutes.

cloudtrail_logs 
	| where eventTime > minus(now(), toIntervalMinute(15))
	| count

The pql translator will parse and lex this statement, validate the syntax, and translates it to the following SQL statement:

WITH "subquery0" AS (
	SELECT * 
	FROM "runreveal_logs"
	WHERE "eventTime" > minus(now(), toIntervalMinute(15))
)
SELECT COUNT(*) FROM "subquery0";

But why not just use SQL? SQL is still the gold "standard" of query languages despite turning 50 years old this year, it should be good enough right?

Why pql?

SQL has it's share of problems. When we started RunReveal we very quickly heard that SQL was not the language that most security teams used to search their security logs.

One challenge with SQL is that adding a GROUP, AGGREGATE, JOIN, subquery, etc all come with large structural changes or major re-factoring. Piped query languages more naturally embraces the Unix philosophy of a simple interface that builds upon the previous command, similar to the way awk and jq is normally used.

The largest SIEM vendors offer proprietary languages that are flexible, powerful, and simpler than SQL. Here's an example query using Splunk Processing Language from Splunk's docs:

sourcetype=access_* status=200 action=purchase
  | top categoryId

Other vendors followed suit. Notably, Sumologic and Azure Sentinel have their own variations following a similar syntax.

The major open source databases have no interface that is comparable to these proprietary piped query languages, but nearly every enterprise SIEM has their own language. Enterprises today have trained their security teams for decades to use these languages, and it is a major contributor to vendor lock-in. Not only do companies need to figure out how to move all their data, but they'll need to learn a new language, and re-train their analysts.

RunReveal built PQL to provide an open-source alternative to piped languages wherever SQL is supported. This not only allows security teams to minimize re-training costs but it allows open-source tooling to support piped query languages without needing to invent one.

Building a pql query

The syntax of a pql statements is inspired by Microsoft's KQL. Your query starts with the table name that you want to query, and in this below example it would query the entire table.

logs

To filter the results down you might use the where operator. The where operator will validate that the syntax is valid, but it will pass unknown function calls through to the underlying database. In RunReveal's case, we use Clickhouse under the hood, so if we wanted to do a case-insensitive match we could still use Clickhouse's lower function.

logs
	| where lower(eventName) == "decrypt"

The same is true of functions available to postgres or any other databases. These functions can also be used when 'projecting' or specifying what columns you want to see.

logs
  | where lower(eventName)=="decrypt"
  | project upperEventName=upper(eventName), eventName

These examples are basic but you can really see the power of this type of language while using an operator like summarize, which manages grouping and aggregation. In this query we're able to group all of the eventNames together using the Clickhouse function groupUniqArray by their associated sourceType. This works the same way with ARRAY_AGG and postgres.

logs
	| where eventName != ''
	| summarize AllEventNames=groupUniqArray(eventName) by sourceType

The results are one column called sourceType (a string) and another column called AllEventNames which is an Array(String). Currently, the supported pql operators are:

as
count
join
project
sort/order
summarize
take/limit
top
where

These operators should function nearly identically to the way kql functions, and over-time we will likely implement more, but today these operators make for a powerful query language when paired with the functionality of your underlying database.

How does pql work?

The current conversation in tech is dominated by AI/LLMs but lexers, parsers, and old-school computer science will never go out of style. The PQL go library has a function called Compile that takes in a pql query as a string and returns your SQL query as a string (plus an error, if applicable)

Here's an example program that does a simple translation.

package main

import (
	"github.com/runreveal/pql"
)

func main() {
	sql, err := pql.Compile("users | project id, email | limit 5")
	if err != nil {
		panic(err)
	}
	println(sql)
}

When you run this program it will print out the following SQL!

:) go run test.go

WITH "__subquery0" AS (SELECT "id" AS "id", "email" AS "email" FROM "users")
SELECT * FROM "__subquery0" LIMIT 5;

The Compile(string) (string, error) function is what you'll call to perform the translation. This function calls the parser.Parse method which will parse the string into an AST and validate the PQL syntax based on the definitions stored in ast.go.

Parsing the queries into an AST is only half the work. The AST still needs to be converted to a SQL query and to perform this conversion, each tabular expression is chained together, generally with each subsequent query referencing the previous subquery's name. You can see this in the above example, with __subquery0 selecting from the table and querying the desired columns, while the main query references __subquery0 and limiting it to only 5 results.

This method of chaining subqueries together has the same effect as piping each query into the next one, despite the compiled SQL syntax looking unusual.

What's next?

We want pql to become the de-facto query language for security professionals everywhere, and intend to support it as such. Now there's one fewer reason to be stuck on Splunk, Sentinel, or Sumologic but regardless we hope the interface is useful to security teams and developers alike; which is why we licensed it under the Apache 2.0 license.

At RunReveal we're at day one of building the platform for security data to ensure no breach goes undetected. We're going to be blogging a lot over the next 2 months, so you should sign up and stay tuned.

You can find pql on github and at https://pql.dev.

Acknowledgements: To build pql RunReveal partnered with Ross Light, who built the project over the course of February 2024. We really enjoyed working with them and hope to again for pql maintenance and further contributions!

If you want to get rid of Splunk or Sumologic, get in contact with us below:

Why pql?

Building a pql query

How does pql work?

What's next?

Sign up for more like this.