Webhose takes aim at the Dark Web

Webhose is not just a science project. The company is earning more than $250,000 a month with a profit margin of close to 50 percent. Webhose clients include, for instance, a fintech startup in France and Luxembourg, a US startup that detects fake news and a cryptocurrency compliance watchdog.

The nicest company that crawls the Web

Most of the company’s revenue is not from the Dark Web. That tool was unveiled only last year. Webhose spends most of its resources searching the “visible” Internet. “We are the nicest company that crawls the Web,” says Geva.” We comply with everything; everyone can see what we’re doing. On the Dark Web, though, it’s a very different story.”

Webhose doesn’t do anything with the data it finds. “We focus on content collection,” Geva explains. “We let our clients analyze the content.”

That partly explains the company’s odd name. Twitter calls the stream of tweets that come from its platform the “firehose.” Webhose applies that same concept to data it collects from the Internet, creating what Geva dubs a “firehose for the web.”

Geva’s previous company, Omgili (Oh My God I Love It) was a search engine for message boards, an area where Google wasn’t so successful back in 2007 when Geva, now 41, started Omgili. As the initial crawler technology evolved, Geva imagined using it to track publicity campaigns on the Internet. He even built an app to do so, but it wasn’t reliable enough.

Geva kept plugging away until the technology worked. This is what enables Webhose to deliver highly structured data to its clients, “much more than the title, snippet and summary Google provides.”

Webhose organizes message-board posts, blogs, news, reviews, ratings and stock data, plus it has historical pricing information on some 20 million products, making it an ideal tool for e-commerce businesses and banks looking for patterns to ferret out fraud.

Webhose stores everything it’s collected in machine-readable format on its own databases. Customers tap into that database by creating an XML or JSON feed. Webhose has a nearly four-year archive of millions of websites, although the company only keeps 30 days’ worth of data live on its system, in order to keep searches snappy.

Webhose is not the only company structuring data around the web. Silicon-Valley-based Import.io uses machine learning to return relevant results to customers. DeepCrawl, which has its headquarters in London, is popular with SEO providers.

Webhose sells the data it collects on a freemium basis. A student, for example, can make “up to 1,000 API calls a month, which can return up to 100 results each,” Geva explains. “So in theory you could get access to up to 100,000 articles or message-board posts for free. After that pricing starts at $50 per month. We have tens of thousands of free users.”

The one area where Webhose is not free: searching the Dark Web. So, can Webhose keep a real-world Mr. Robot from hacking the world’s banks and ushering in financial ruin?

“We love that show,” says Geva. “But it’s much crazier than what we’re doing. Still, it feels good to be useful on that front.”

Brian Blum writes about new local startups, pharmaceutical advances, and scientific discoveries for Israel21c. This article is published courtesy of Israel21c.