As most of you know, I don’t work in IT – I just write about it. I do try to look out for trends, so when I kept seeing this word, Hadoop, popping into my Google Alerts, I thought I’d better pay attention.
According to a recent survey from TDWI, overall Hadoop adoption by enterprises is on the rise, with 60 percent of respondents planning on having Hadoop clusters in production by the first quarter of 2016. But what exactly is Hadoop and what can it do for us?
Forrester analyst Mike Gualtieri offered a “Hadoop for Dummies” (otherwise known as regular business people) tutorial on the firm’s blog last summer. Here’s how he explained the technology:
Hadoop is an open source project that offers a platform to store and manage big data. There are two important things to understand about it. The first is how Hadoop stores files, and the second is how it processes data.
Hadoop’s storage capabilities are extremely powerful. Using it, an organization can store very large files and a great number of files. No longer are companies encumbered by the storage limits of a particular node or server.
Hadoop also has a cool framework for processing data called MapReduce. Moving data over a network can be painfully slow because files are so large, so MapReduce splits the data sets into smaller, independent chunks that are processed in a parallel manner – thus speeding up processing time.
How Hadoop is Conquering the Enterprise
As an open source technology that got its start in digital organizations, Hadoop’s challenge now is to scale to a variety of industries and types of companies, and to successfully integrate with more traditional IT platforms. As it expands across the enterprise and its ownership moves back and forth from citizen developers to central IT, run of the mill IT professionals have to become data architects, analysts, and scientists.
If the TDWI survey is any indication, Hadoop is not only meeting this challenge, but proving essential. “Hadoop for the enterprise is driven by several rising needs,” said Philip Russom in a 2015 TDWI white paper. “On a technology level, organizations need data platforms to handle exploding data volumes. They also need a scalable extension for existing IT systems in warehousing, archiving, and content management. On a business level, everyone wants to get business value and other organizational advantages out of big data instead of merely managing it as a cost center.”
And Hadoop has another trick up its sleeve – analytics. “Hadoop is not just a storage platform for big data: it’s also a computational platform for business analytics,” said Russom. “This makes Hadoop ideal for firms that wish to compete on analytics, as well as retain customers, grow accounts, and improve operational excellence.”
Then there's companies like Cloudera that have built a data management and analytics platform on Apache Hadoop and the latest open source technologies. Operations departments are using it to create an enterprise data "hub" to deploy a single analytic data management platform that handles a variety of data to ensure optimal service and product delivery.
But What Was That MapReduce Thing Again?
Hadoop may provide improvements to data warehousing, data scalability, and analytics – and 89 percent of the TDWI respondents consider it a major opportunity for innovation. But getting into the weeds on planet Hadoop can be complex. So how do you sell an implementation to executives and non-IT colleagues?
The answer is to simplify. "Mainstream business users don't need to know how Hadoop works," Forrester’s Gualtieri told InformationWeek. "But they do need to understand that the constraints they once had on storing and processing data are removed when Hadoop is installed. The business can start thinking big again when it comes to data.”
This may be true, but even when it’s well understood, Hadoop isn’t perfect. The TDWI survey cites several barriers to successful implementation, including inadequate technical skills, weak business support, security issues, and weak open source tools.
Hadoop’s Future Promise
By the end of this year, the open source community will be even further along in its ability to offer best practices for enterprise Hadoop. As clusters move to the cloud, the leading use cases will likely be enterprise data hubs, archives, and business intelligence/data warehousing. And half of the TDWI survey respondents expect to improve existing Hadoop clusters by integrating them with data quality and data management tools.
Do you have experience with Hadoop? Has the platform been everything you hoped it would be?