Couchdoop and other tales of Big Data at Avira
Thinking Fast and Deep
- Processing the historical data in the company to glean deep learnings for topics ranging from customer behaviour to threat vectors. For this challenge we have Apache Hadoop. It takes 5-10 minutes to run an analytical job, whether your algorithm requires a year’s worth of data or just a few days.
- Making real-time decisions based on the live data stream at hand, where you only have 50-100 milliseconds to make up your mind. This has been the more elusive of the two challenges, through improvements have accelerated these past two years.
For example in E-Commerce the difference between deep and fast insight is perhaps knowing a) someone is a pretty-serious triathlete but perhaps b) right now shopping for himself on Valentine’s Day.
These both are massive engineering challenges with hardware constraints (e.g., hard-drive and RAM physics) so innovation is largely happening in open-source software projects. Thousands of engineers actively contribute across the globe to the bevy of projects that have sprung up these past few years.
The trick in an algorithmic consumer business like Avira’s is to design a system that can address both questions equally well and support massive consumer applications like malware-detection, search engines or online shopping recommendations. It’s the silicon analogue to Thinking Fast and Slow (Kahneman, 2012).
German Engineering meets Silicon Valley
At Avira the Big Data journey that lead us to Couchdoop began back in late 2012. We were looking for bridging technologies that could span both types of analytics and underwhelmed with the open-source options available. HBase for example was too brittle for our real-time use-cases, even using some of the frameworks, you needed big dedicated infrastructure teams and server clusters to ensure robust availability. Fine for a massive operation with hundreds of data scientists and engineers like Google or Facebook. But too unwieldy for a think-on-your-feet, (mobile-) internet business like ours. We were essentially a start-up with a million sessions per hour. Other technologies seemed to lack momentum in the engineering community.
After a few months of bending our swords on HBase, and a particularly rainy Californian winter, the break in came in a conversation with VC friend who put me in touch with Bob Wiederhold, CEO at Couchbase. Couch DB had been around in the open source community for a few years and, true to its moniker, already gained a reputation for speed and bullet-proof availability.
Bob and his Mountain View-based team were working on connecting it to real-world consumer problems like ours at Avira. We had a perfect match, and a German-Californian engineering partnership was born.
This Thursday evening at 18:00 CET we are hosting a meetup in Munich of the local Hadoop User Group (HUG).
We’re showcasing some of our fast and deep use-cases at Avira. Including a technical overview of Couchdoop, essentially a massive pipe an Avira Romania database engineer built to flush data between the two systems. The machine-equivalent of power-nap with deep REM.
With 300M installations, the proliferation of fast-moving threat-vectors since Android, and our nascent E-Commerce business, including applications such as Avira Offers, these days at Avira we probably have some of Europe’s toughest Big Data challenges. Exciting times ahead.
This article is also available in: German