In the new era of fast data, we must free analytic performance from the prison of x86 and GPU sequential processors, which are architecturally incapable of scaling to analyze and identify trends and anomalies in massive quantities of data. Don’t get me wrong—these processors have their place, but most CPU and GPU clusters max out at 1 to 2 GB/second for the simplest analytics under ideal conditions.

In part two of this series on transforming high performance analytics (see part one), we’re diving into how you can use converged storage and FPGA-accelerated compute to unlock fast, actionable insights from your data.

With FPGA acceleration, an entire algorithm can be executed in a single tick of the clock. And you can build multiple parallel execution units into an FPGA to push very large volumes of data through the same algorithm at speeds that are orders of magnitude higher than what can be achieved with multi-core CPU or GPU resources. Using truly hardware parallel FPGA capabilities, you can get multiple GB/second speeds for instant insights into all types of data from both batch and streaming sources without taking the steps to prepare, normalize, translate and index the data.

Ryft’s converged NAS and compute appliance, the Ryft ONE, capitalizes on the power of FPGA acceleration with an open API and x86 ease of integration—otherwise known as heterogeneous computing. With this groundbreaking appliance, users can now explore any type of data the moment it is captured, without the need for ETL, indexing and other data normalization, which saves hours or days.

Ryft’s Converged NAS and Compute Appliance Massively Outperforms AWS’s Fastest Spark Clusters in New Benchmark Testing
To demonstrate the stark contrast between sequential processors and massively parallel processing engines like the one that powers the Ryft ONE, Ryft recently conducted benchmark testing against the fastest high-speed servers at Amazon. We pitted a single 1U Ryft ONE that requires only 750 Watts of power at peak usage against an Apache Spark cluster running on AWS EC2, which included up to 200 c3.8xlarge “Compute Optimized” 2U servers that pull 1100 Watts each.

The Ryft ONE Dramatically Reduces Latency on Any Type of Data
As the actual test results show, a single Ryft ONE outperforms an AWS-hosted high-speed Spark cluster on complex fuzzy searches of both structured and unstructured data sets. We tested three scenarios on both types of un-indexed data using a range of different fuzzy search Hamming and edit distances.

In the first test on 1 TB of unstructured, un-indexed Reddit comments, Ryft conducted a fuzzy search with a Hamming distance of two. This use case is well suited for detecting slight differences between renditions of individual words, such as enabling rapid and comprehensive searching of medical, customer and other records for abbreviated or misspelled matches. A single 1U Ryft ONE was able to reduce the latency of that complex query from 10.05 hours across a 100 node Apache Spark cluster to just three minutes and eliminate many of the ETL and indexing steps that add latency to the front end of the search.

unstructured-reddit-data

For our second test, we looked at the same fuzzy search use case but applied it to a 1 TB structured data set, keeping the Hamming distance the same. Here, we see results similar to those of the initial unstructured data set test, with a single Ryft ONE completing the search in just 23 minutes and AWS requiring 11.38 hours.

structured-json-reddit-data

For our final test, we switched the data set to 1 TB of genome data for 100 humans and used an edit distance of four. Edit distance enables approximate string matching, phrase matching, natural language processing and automatic spelling correction, making it a more natural fit for human language. It is also well suited for more sophisticated applications including bioinformatics to quantify the similarity of DNA and protein sequences. Once again, even with increasingly complex workloads, a single 1U Ryft ONE had the lowest latency of just one hour versus 12 hours using 200 Apache Spark nodes configured in the AWS high-speed server cluster.

unstructured-genome-data

Faster and More Efficient Analytics in Place Eliminates Network Bottlenecks
Data networks are a critical bottleneck for enterprises and cloud services providers coping with an explosion of traffic. Centralizing data analytics to overcome legacy system limitations requires lots of data movement over networks typically limited to 10 Gb/second throughput under ideal conditions. That creates choke points throughout your IT infrastructure that combine to add an increasing number of delays. The net result is data is generated faster than it can be processed or delivered to the processing elements.

With faster and more efficient converged NAS and FPGA-accelerated compute appliances, embedded throughout your big data ecosystem, you can now use the tools you know—like SQL, Spark, Tableau, Zoomdata, Birst, Qlik or your own applications—to get faster and more accurate insights into growing volumes of data with less cost and complexity. You not only will get insights into data in the moment but also be able to intelligently thin data throughout your environment to reduce load on the network.

See Ryft Benchmarks with Your Own Data
By enabling the fastest high performance analytics throughout your existing IT infrastructure, Ryft products are transforming data analysis in the data center, in the cloud and at the network edge.

In the last installment in this blog series, we’ll discuss how Ryft’s converged storage and powerful FPGA-accelerated compute will speed your mean time to actionable results. Ready to take the first step toward real-time responsiveness? Contact us today to discuss a free no-risk trial of the Ryft ONE. We will walk you through how our clients are using Ryft throughout their infrastructures and show you benchmarks based on your own data.

Additional Reading:

Leave a Reply

Your email address will not be published. Required fields are marked *