Kaggle provides statistical and analytical outsourcing via global data modelling competitions to a range of high profile companies, researchers and governments from around the world. With over 11,500 members already, the company is growing fast. Having the right IT system in place is an absolute must for any small business. And for Kaggle it has been instrumental in our short, but fast-paced, journey.
In a world where data is being collected faster and in greater volumes than ever before, most modern organisations now support analytics functions – making the potential market opportunity enormous. IDC predicts that $38 billion will be spent on predictive analytics services in 2011. We saw a gap in the market to use a crowdsourced competition model to inspire rapid innovation by introducing the analytics function to a wide audience of experts.
Kaggle gives large organisations the opportunity to tap into a global network of data analysts that includes more than 11,000, often PhD-level, specialists from over 200 universities across more than 100 countries. Competitions typically run for 60 days and are scored objectively, based on the accuracy of the predictions. When competitions end, the host organisation receives all intellectual property behind the winning model in exchange for prizes, typically cash.
The unique thing about our business model is that we are able to deliver cheaper, faster and more powerful analytics to our customers, who range from banks, telecommunications organisations, insurance companies and internet providers to pharmaceutical companies, governments and universities.
Setting up a business such as Kaggle would not have been possible without the computer power to manage all the data modelling and scoring. However, the sheer cost of such an IT infrastructure could be completely out of reach for a start-up business. We knew we had a unique business model, so that’s why we turned to the cloud.
Harnessing the cloud
Our need for computer power that could scale up and down at the flick of switch, in line with when competitions were running, and the datasets we had to score, brought us to Amazon Web Services (AWS). As a result, our competition platform runs in the cloud on Amazon Elastic Compute Cloud (Amazon EC2), Amazon Simple Storage Service (Amazon S3) and Amazon Relational Database Service (Amazon RDS).
Due to the nature of the business we wanted to create, we knew we had to embrace the cloud. With cloud computing we can fast-track the solutions to complex data problems which otherwise would have taken years – not months – to resolve. The solutions would also have typically cost organisations hundreds of thousands of dollars.
For Kaggle, there have been three clear benefits that the cloud has brought to the business:
- Cost-effectiveness – For a young and growing data analytics business like us, being sure that our IT infrastructure was able to handle a large number of entries and datasets from the very beginning of our operation was crucial. Using Amazon EC2 meant we had access to a resizable computer capacity that could scale up and down as competitions started and submissions peaked. We also use a cloud storage solution, Amazon S3, to store and retrieve the data, meaning we didn’t have to purchase our own storage infrastructure. By paying for only the services we need, when we need them, we have been able to free up vital resources, allowing us to take on new markets, win new business and grow quickly.
- Speed to market – The flexibility cloud computing provides is particularly important for small businesses, which are looking to grow and expand quickly. With a cloud-based IT infrastructure we are able to deliver a faster, more robust and superior approach to analytics. By feeding data problems to data experts via the cloud, we are able to rapidly process data and help organisations resolve problem in months, not years.
- Business growth and expansion – Because we are not limited by the computer power of our own IT infrastructure, we have been able to rapidly grow and attack new market opportunities as they arise. For example, we are now based in Silicon Valley and talking to market sectors such as astronomy, where large storage capacity is required. Supported by Amazon’s cloud infrastructure, there is no question that we have strong chances of success in these new, international markets.
What it means for our customers
A recent project with the NSW Government’s Road Traffic Authority (RTA) to find a data model to help reduce traffic congestion on Sydney’s roads, is just one example of our hosted competitions. In November 2010, the RTA became the first government department in the world to host a global predictive modelling competition. A $10,000 prize was earmarked for the winning submission that best predicted model of travel times on the M4 freeway.
For the RTA, this kind of data analytics would have previously required the support of an external consultant, and a six-figure bill would not have been uncommon. However, with Kaggle running 100 percent in the cloud, not only did the government get rapid access to the best and brightest minds across the globe, but it only needed to pay around 10 percent of the cost for the infrastructure, hosting of the competition and consultancy support.
In just two and a half weeks we had 750 people download the data for the RTA competition, with over 350 teams responding, and the cloud infrastructure stood up to this without any problem at all, giving us the power to harness the world’s best expertise and deliver a high calibre solution. External consultants could have taken years to complete the project, whereas we were able to deliver the results in less than four months. These types of success stories are really helping us to accelerate our time to market, as well as keeping our customers very happy.
Despite our short history, the power of cloud computing is having a real positive impact on our global expansion. We recently launched a competition for NASA, involving large data sets, that is looking to address one of the big questions in physics – mapping dark matter with the aim of measuring the ellipticity of 100,000 simulated galaxies. We are also currently running the biggest ever algorithm competition to date, the $3 million Heritage Health Prize. This prize challenges our members to develop a predictive algorithm that can identify patients who will be admitted to hospital within the next year, using historical claims data.
The use of cloud computing has been crucial in allowing Kaggle to grow quickly and expand, particularly with larger data sets and hosting bigger and more popular competitions. Looking into the future, the cloud will continue to play an even more important role in enabling us to move quickly with our services, across global markets. By using cloud computing, we have been able to grow the business at a rate that has been well beyond our initial planning and we hope to continue on this trajectory.
–Anthony Goldboom is CEO, Kaggle