Introducing pql, a pipelined query language that compiles to SQL (written in Go).
Today we're open-sourcing pql
under the Apache 2.0 license and announcing that all RunReveal customers can use pql
to query their logs. We built pql
because the major security vendors use proprietary languages as a source of vendor lock-in and there were no open-source alternatives. pql
is SQL agnostic, allows utilization of underlying database functions, and can easily be added to existing products that support SQL.
Here's a link to the code, and the language's website:
Let's start with an example pql
statement query, which uses the underlying Clickhouse function toIntervalMinute
, to look for how many logs were created in the past 15 minutes.
cloudtrail_logs
| where eventTime > minus(now(), toIntervalMinute(15))
| count
The pql translator will parse and lex this statement, validate the syntax, and translates it to the following SQL statement:
WITH "subquery0" AS (
SELECT *
FROM "runreveal_logs"
WHERE "eventTime" > minus(now(), toIntervalMinute(15))
)
SELECT COUNT(*) FROM "subquery0";
But why not just use SQL? SQL is still the gold "standard" of query languages despite turning 50 years old this year, it should be good enough right?
Why pql?
SQL has it's share of problems. When we started RunReveal we very quickly heard that SQL was not the language that most security teams used to search their security logs.
One challenge with SQL is that adding a GROUP
, AGGREGATE
, JOIN
, subquery, etc all come with large structural changes or major re-factoring. Piped query languages more naturally embraces the Unix philosophy of a simple interface that builds upon the previous command, similar to the way awk
and jq
is normally used.
The largest SIEM vendors offer proprietary languages that are flexible, powerful, and simpler than SQL. Here's an example query using Splunk Processing Language from Splunk's docs:
sourcetype=access_* status=200 action=purchase
| top categoryId
Other vendors followed suit. Notably, Sumologic and Azure Sentinel have their own variations following a similar syntax.
The major open source databases have no interface that is comparable to these proprietary piped query languages, but nearly every enterprise SIEM has their own language. Enterprises today have trained their security teams for decades to use these languages, and it is a major contributor to vendor lock-in. Not only do companies need to figure out how to move all their data, but they'll need to learn a new language, and re-train their analysts.
RunReveal built PQL to provide an open-source alternative to piped languages wherever SQL is supported. This not only allows security teams to minimize re-training costs but it allows open-source tooling to support piped query languages without needing to invent one.
Building a pql query
The syntax of a pql
statements is inspired by Microsoft's KQL. Your query starts with the table name that you want to query, and in this below example it would query the entire table.
logs
To filter the results down you might use the where
operator. The where operator will validate that the syntax is valid, but it will pass unknown function calls through to the underlying database. In RunReveal's case, we use Clickhouse under the hood, so if we wanted to do a case-insensitive match we could still use Clickhouse's lower
function.
logs
| where lower(eventName) == "decrypt"
The same is true of functions available to postgres or any other databases. These functions can also be used when 'projecting' or specifying what columns you want to see.
logs
| where lower(eventName)=="decrypt"
| project upperEventName=upper(eventName), eventName
These examples are basic but you can really see the power of this type of language while using an operator like summarize
, which manages grouping and aggregation. In this query we're able to group all of the eventNames together using the Clickhouse function groupUniqArray
by their associated sourceType. This works the same way with ARRAY_AGG
and postgres.
logs
| where eventName != ''
| summarize AllEventNames=groupUniqArray(eventName) by sourceType
The results are one column called sourceType (a string) and another column called AllEventNames
which is an Array(String)
. Currently, the supported pql
operators are:
These operators should function nearly identically to the way kql functions, and over-time we will likely implement more, but today these operators make for a powerful query language when paired with the functionality of your underlying database.
How does pql work?
The current conversation in tech is dominated by AI/LLMs but lexers, parsers, and old-school computer science will never go out of style. The PQL go library has a function called Compile
that takes in a pql query as a string and returns your SQL query as a string (plus an error, if applicable)
Here's an example program that does a simple translation.
package main
import (
"github.com/runreveal/pql"
)
func main() {
sql, err := pql.Compile("users | project id, email | limit 5")
if err != nil {
panic(err)
}
println(sql)
}
When you run this program it will print out the following SQL!
:) go run test.go
WITH "__subquery0" AS (SELECT "id" AS "id", "email" AS "email" FROM "users")
SELECT * FROM "__subquery0" LIMIT 5;
The Compile(string) (string, error)
function is what you'll call to perform the translation. This function calls the parser.Parse
method which will parse the string into an AST and validate the PQL syntax based on the definitions stored in ast.go
.
Parsing the queries into an AST is only half the work. The AST still needs to be converted to a SQL query and to perform this conversion, each tabular expression is chained together, generally with each subsequent query referencing the previous subquery's name. You can see this in the above example, with __subquery0
selecting from the table and querying the desired columns, while the main query references __subquery0
and limiting it to only 5 results.
This method of chaining subqueries together has the same effect as piping each query into the next one, despite the compiled SQL syntax looking unusual.
What's next?
We want pql
to become the de-facto query language for security professionals everywhere, and intend to support it as such. Now there's one fewer reason to be stuck on Splunk, Sentinel, or Sumologic but regardless we hope the interface is useful to security teams and developers alike; which is why we licensed it under the Apache 2.0 license.
At RunReveal we're at day one of building the platform for security data to ensure no breach goes undetected. We're going to be blogging a lot over the next 2 months, so you should sign up and stay tuned.
You can find pql on github and at https://pql.dev.
Acknowledgements: To build pql
RunReveal partnered with Ross Light, who built the project over the course of February 2024. We really enjoyed working with them and hope to again for pql
maintenance and further contributions!
If you want to get rid of Splunk or Sumologic, get in contact with us below: