Making Splunk frozen data searchable

Splunk is generally our go to tool for data analytics and security incident response. There’s good reasons for that. It’s realtively quick to setup a small deployment, our team can easily find the information they need and we don’t have to worry about missing logs.

The problem we have with some installs that use it as a SIEM solution is that long after your data has rolled from cold to frozen, it becomes very difficult to go back and track down individual records that might be needed during an incident response situation. If you want to use Splunk on it’s own for this task, you’re going to need to thaw quite a bit of data to get there.

Why Store Frozen Data at All

Most of the systems that Threat Informant installs are scoped for a minimum of 90 days of searchable log retention. Ninety days covers most compliance standards especially when we’re just talking about readily searchable data. That time frame covers normal operations just fine and even some trending, so whey can’t we just put the logs to /dev/null?

First, having the logs available for long-term trending and machine learning can be really useful. Just having the data available can allow us to answer questions about resource sizing or security, even if that need wasn’t even thought of at the time the system was engineered.

The main reason though, is for security incidents. The average time to discover a breach is roughly 285 days. Obviously, if our data was deleted over 6 months prior, we don’t have the data we need to fully understand the source of the breach and the timeline.

A Cloud Hybrid Approach

Since Splunk is already parsing the data and we have the fields we need already, we can just export that data when it’s moving to frozen and convert it into Parquet format. Parquet format allows us to search the data in a compressed state. We can upload the data to low cost storage on AWS S3 which will let us search the data using S3 SQL SELECT.

S3 SQL SELECT doesn’t give us the full feature set that we expect from Splunk but it’s actually perfect for what we’re typically doing in an incident response scenario. We usually want to search for an IP or a user rather than an advanced condition. Using S3 Select we can quickly scan through all of our uploaded files and return back the entries that match our search in CSV or JSON format.

We can then upload the files directly into an index and work with those entries directly in Splunk. This gives us all of the historical data that we needed without having to thaw all of the data into Splunk.

Share this Post