If Cribl Search meets its roadmap goals, it could challenge established observability players with queries that don’t require a central data repository, and the related infrastructure costs and its potentially disruptive product plans don’t end there, according to its CEO.
Cribl Search and a related data processing agent, Cribl Edge, became generally available in 2022, along with a rebranded data pipeline product, Cribl Stream, formerly LogStream. The Cribl Search SaaS tool connects to Cribl Edge agents to index and search observability data locally. At least theoretically, Cribl’s in-place search eliminates the need for users to submit data to a separate system for conversion to another data format, indexing, and long-term retention, all of which incur licensing fees or storage, or both, depending on the vendor. Instead, Cribl Edge indexes the data at its point of origin without converting it to a proprietary format and performs a federated search in response to queries.
With these updates, Cribl has begun to move away from a business model that conflicted with Splunk log monitoring, but it may start to step on more competitive fingers if it can lure customers away from the search services offered by Elastic Inc., AWS and others.
“Cribl Search works with data wherever it is,” said Clint Sharp, CEO and co-founder of Cribl. “We’re giving people the ability to still own their data… When I put data into Splunk, or Snowflake, or Datadog, that data becomes the vendor data… which gives you a great deal of experience recovering this data , but the downside is that I have to maintain a relationship with that vendor to recover the data.”
Early-stage Cribl research has tradeoffs
Cribl Search is still a version 1.0 product, Sharp said. So far it only supports log data, despite expanding Cribl Stream to support metrics, events, and traces in 2019 and 2020. Cribl Search has yet to deliver granular monitoring and alerting capabilities based on search results; are all features Sharp said Cribl will implement soon.
That roadmap has one customer considering replacing his OpenSearch with Cribl Search for the long term. OpenSearch requires data to be copied to search cluster nodes; The AWS OpenSearch service offers low-cost storage tiers, but also requires separate copies of the original application data.
“Cribl Search could eliminate the need for us to host our OpenSearch and having to store S3 data twice,” said Bob Chen, director of infrastructure engineering at iHerb, an online retailer of health and wellness products in Irvine, Calif. “We need some other elements of parity of functionality to match … OpenSearch [such as] threshold alerts and dashboards.”
Meanwhile, Cribl Stream, Edge, and the Cribl.Cloud managed service have already impacted iHerb’s observability costs, Chen said. Cribl Stream prevents insignificant data from being sent to OpenSearch and accelerates the recovery of historical data from cold storage into S3 when needed. This reduced the storage and top-level compute resources iHerb must maintain for OpenSearch by 25%, saving tens of thousands of dollars per month in costs. Cribl.Cloud SaaS downloaded Cribl Stream and Edge agent updates and maintenance from iHerb’s five-person SRE team, which supports approximately 300 developers.
“We’ve gone from three [search] clusters down to one,” Chen said. “We also reduced the number of support tickets SREs obtained for missing logs or assessing a backlog to nearly zero.”
Since S3 buckets can’t host the Cribl Edge agent, Cribl Search triggers short-lived AWS Lambda functions to run it for S3 data searches. Such an approach may present cost trade-offs for S3 users at scale, in the form of network egress charges incurred when the Cribl-hosted Lambda function accesses data within the user’s AWS account.
“In your model, I’m paying to send data to you so you can process it,” said Carl Fugate, director and cloud technology consultant at electronic health record software maker Netsmart in Overland Park, Kan., during a Recorded presentation and Q&A session with Cribl representatives at a Tech Field Day event in November. Fugate asked if Cribl had any plans to allow customers to host Lambda functions and Cribl Edge agents themselves to avoid those charges.
“We could certainly do that, but it would require [giving us] permissions to access the compute resources that live in your account,” Oliver Draese, senior principal software engineer at Cribl, said during the Q&A session. “You should still account for some network outputs, because the Lambdas are producing some filter data… in the cloud environment where we do UI, post-processing, keep query history and so on.”
Cribl Search supports cloud region-based search to minimize egress costs, according to a company spokesperson. It does not yet search for data within Amazon Glacier cold storage instances.
Cribl CEO addresses product roadmap, Splunk lawsuit
Cribl does not market its own back-end data storage and analytics, but is primarily focused on refining the data sent to those systems. But Cribl also plans to roll out its own version of a data lake in the next couple of years, according to Sharp.
“We’re going to do some sort of offering maybe this year, maybe next year that will help people build their own data lakes,” Sharp said. “And if you want to pay us to own the S3 bucket, you can. … But if you choose to feed us the data and feed it into a lake we orchestrated for you, you are never locked into what Cribl is doing You can replace us immediately with any other vendor .”
Cribl will also eventually make further strides into the realm of artificial intelligence and machine learning, where Cribl Search has already established the company’s first foothold, Sharp said. Future efforts here for Cribl would likely focus on network infrastructure and security datasets, which lend themselves better to analysis with machine learning than application data, according to Sharp.
“There are certain areas that we will move into over the next couple of years – I don’t have a specific timeline,” Sharp said. “AI is really good at finding new things that it’s been trained on, and it’s very difficult to train it on something that’s never happened before, so these approaches tend not to work as well in general observability.”
Meanwhile, Cribl has filed a motion to dismiss the lawsuit Splunk filed against it in October. That motion, based on arguments about Splunk’s patent claims, won’t be considered until March. He doesn’t mention Sharp’s October social media rebuttal to Splunk’s allegations of intellectual property theft, which pointed out that part of the data-gathering IP in question is available as open source code on GitHub.
“We’re going through the legal process in the way our law firm advises…and we’re really optimistic about our chances,” Sharp said. “Now there are many ways to [customers] to get value from Cribl…and we have many successful joint customers [with Splunk]. … I am very confident that we will continue to coexist in this world with them.”
Beth Pariseau, senior writer at TechTarget, is an award-winning veteran of computer journalism. She can be reached at [email protected] or on Twitter @PariseauTT.