To be successful, big data projects need to deliver actionable insights in a short period of time. If you can’t analyze the data thoroughly and quickly, new opportunities will be missed and the door will be left open for competitors. The problem today is that people are recycling old technology in order to handle modern problems, and the result is a significant slowdown in the process.

The industry wants to use x86-based clusters to handle big and complex data, but that architecture, even when retrofitted with the latest versions of Hadoop and Spark software, was not designed for the task at hand. More often than not, it takes weeks or months to go from raw data to analytics that provide valuable insight. And, in some cases, the projects the business wants to tackle—such as simultaneously analyzing batch and streaming data or expanding fuzzy search capabilities—aren’t even feasible with the technology.

Time Measurements that Mean Something
Before digging into alternatives to x86 clusters, it is important to talk about time—specifically, the actual time it takes for a data project. The numbers most vendors tout represent only a portion of the full data pipeline activity. For example, a Hadoop system may crunch through data in an hour, but the real time sink was the weeks of programming required—programming often delayed because finding Hadoop experts can be extremely difficult.

What is needed is a number that encapsulates all the activities it takes to go from raw data to insight. Mean time to decisions, or MTTD, is a very powerful number that does just that. Besides simply timing your data project, it allows you to judge competing solutions, estimate times for new projects and determine return on investment (ROI).

Reimagining Hardware
With a realistic measurement, now you can focus on the best technology that can be employed to solve the data challenge faced. Critical factors include cost to buy and maintain infrastructures, setup time, programming complexity, security, management needs and performance. Currently, the most important data projects involve big and complex data, meaning that any solution must handle a large amount of data as well as different types, such as structured and unstructured or batch and streaming.

The default for many years has been x86 servers in distributed clusters, stitched together with Hadoop software. More recently, Spark has been introduced to help overcome some inefficiencies. However, Spark or any other add-on can’t overcome inherent limitations. Distributed x86 technology is just not a good solution for addressing high performance data analysis (HPDA). Tacking on parallel programming via software instead of relying on parallel-optimized hardware limits performance. Spreading data across hundreds of nodes introduces latencies as time is wasted transferring data.

If one lets go of the recycling concept and designs a new HPDA-centric device for the data challenges we deal with today, the hardware and software will look very different. Instead of a compute focus, the device will sport a balanced architecture in which compute, storage and I/O are all given significant weighting. For compute performance, leveraging the power of FPGA chips combined with large amounts of onboard memory makes sense, especially if the complexity of FPGA is abstracted and hidden under easy-to-use software interfaces and accessible APIs. For storage needs, embedding large, fast SSD storage overcomes the need to continuously load and unload data. And for I/O, a fast connection that can’t be saturated by streaming data ensures that the solution is never waiting for something to do.

More to Come
For a time, Hadoop-type solutions were about the only way to deal with HPDA challenges outside of spending lots of money on one-off, proprietary and very expensive hardware and software. However, if organizations cast aside the notion of recycling, it becomes clear that a radically different solution makes more sense for today’s data challenges and can fit in with current budgets.

Stay tuned for additional articles that delve deeper into the mainstreaming of HPDA, balanced architecture, software primitives tailored to new age hardware, MTTD and other related topics.

Leave a Reply

Your email address will not be published. Required fields are marked *