3 min read

Introducing Detection of Tor Exit Nodes

Today we’re announcing Tor exit node detection across all RunReveal log sources. This is available for all of our customers immediately, including our free tier, and is enabled by default.

The Tor exit node detection is available under the "Detections" tab and can be fully customized or disabled.

Providing this feature is part of a larger initiative to build a backend for our log enrichment process. We wanted to ensure that we could support enrichments from any different platform or feed, including customer provided feeds. We have a lot more to come in this space soon.

Why detect Tor exit nodes?

Tor launched in the early 2000s out of a project in the Naval Research Lab. The inside story is pretty informative, reported by BusinessWeek and worth the read.

Today, besides being too slow for the modern web, Tor has many positive uses. It's an important tool for censorship resistance under repressive regimes, it's a free service for online privacy, it's used by law enforcement, and more. The Tor Project is a 501(c)(3) that operates legally towards many noble goals.

Tor also has less agreeable use cases. The anonymity properties of the onion network provide a fantastic tool for crime. Cyber criminals use it to hide who they are while they access computer systems without authorization, participate in marketplaces selling illegal goods, and for malware distribution. Tor is much more widely known for the bad uses than the good.

Tor works by funneling all the traffic going into the onion network out of a few thousand "exit nodes". Good or bad, you should never see the IPs of these exit nodes in your internal security logs, and seeing Tor exit nodes IPs likely indicates something very bad is happening that you need to investigate immediately.

Keeping track of changing exit nodes

We wanted to build this feature in a way that customers could bring their own IP enrichments in the future. We recognized that these IPs would be changing regularly and so we needed to use a data structure that could performantly handle regular updates of a very large set of data.

We decided to use a VersionedCollapsingMergeTree in Clickhouse to handle this use case, and store all the enrichment data in a single table. A what? Why, a VersionedCollapsingMergeTree of course. This table type handled the type of data we wanted to ingest perfectly, but it's not likely to be a data-structure that anyone studied in an data structures textbook.

Let's break down what this is from the end to the beginning.

  • MergeTrees are a data structure that can be used for efficiently writing and indexing large amounts of data . Clickhouse uses this data structure heavily, and that's what RunReveal uses for most of our underlying data storage.
  • Collapsing means that the MergeTree additionally has a sign indicating the status of the data. The docs are more informative about the details of how collapsing works, but any data that is written to the table has a sign of either 1 to create a row or -1 to cancel a previously written row. This efficiently marks rows for deletion rather than immediately deleting them.
  • Versioned means that the CollapsingMergeTree isn't limited to 1 and -1, but includes a version component too. Inserted rows still need a 1 and -1 to be cancelled, but the version is used to avoid the data needing to be strictly ordered when inserted.

With this data structure we could easily replace previously written feeds, and use sql to look for tor exit nodes that show up in our customer's logs with a single query.

select 
  eventTime, 
  sourceType, 
  eventName, 
  srcIP, 
  srcCity, 
  srcASCountryCode, 
  srcASOrganization, 
  dstIP, 
  dstCity, 
  dstASCountryCode, 
  dstASOrganization 
from 
  runreveal_logs 
  INNER JOIN threat_feed_ip_list ON runreveal_logs.srcIP = threat_feed_ip_list.ip 
  OR runreveal_logs.dstIP = threat_feed_ip_list.ip 
WHERE 
  receivedAt >= now() - INTERVAL 15 MINUTE 
  AND feedName = 'Tor Exit'

What's next?

RunReveal is hiring product engineers to help us grow our team and help solve detection. The RunReveal founders will be at LASCON this week, so look out for us or reach out if you're in Austin, TX.

Detecting tor exit nodes is low hanging fruit that all companies should be doing across all of their security logs. We have announcements planned for more types of enrichments in the coming weeks, custom enrichments, and much more. Make sure you stay tuned and get in touch if you are interested in what we're doing!