by Pat McGarry
Running complex analytics against massive amounts of all types of data—legacy, streaming, structured, semistructured and unstructured—is a difficult challenge. At Ryft, we’ve blazed the trail to solve this struggle with FPGA/x86 heterogeneous compute accelerators that eliminate data preparation latency and make data discovery fast and simple. But until now, only companies willing to modernize their data centers with FPGA-accelerated hardware have been able to take advantage of the resulting performance and cost benefits. But all that changes with today’s announcement from Amazon that it has doubled down on FPGA-accelerated infrastructures with its new F1 heterogeneous compute instance on AWS.
And what we’re even more excited to announce is that we’ve had the opportunity to work with Amazon and release Ryft Virtual, a heterogeneous cloud-based version of our core Ryft ONE technology, on the AWS marketplace in Q1 2017. With that release, no matter what infrastructure you’re using—on-premises data centers, hybrid environments or pure Cloud ecosystems—you can eliminate data indexing and transformation to achieve real-time actionable insights from all your data with Ryft. Ryft Virtual enables users with AWS-based data and analytics applications to fully leverage FPGA-accelerated analytics with Ryft’s high-performance and easy-to-use analytics algorithms and application connectors.
Analytics at the Speed of Your Business
Amazon’s announcement happens as industry has finally come to grips with the fact that businesses must embrace heterogeneous computing practices to solve the complex problems that today’s fast and varied data creates. For important classes of analytics problems including various types of high-speed searches, these heterogeneous architectures improve performance by several orders of magnitude in comparison to contemporary monolithic CPU- and GPU-based architectures. Difficult problems such as high-distance searches that cannot be solved in a reasonable time with any contemporary architecture are now achievable using heterogeneous computing techniques.
Amazon’s move is critical since an increasing number of businesses are making the decision to migrate more of their data and systems to the cloud. The cloud has to keep up, and not just from the storage perspective, but also from the networking and—maybe most importantly—compute perspectives.
To successfully compete in a fast-changing marketplace, businesses require fast answers from growing, dynamic data. Until recently, many businesses were forced to segment their operations by putting some of their data in the cloud while keeping some of their data in home-grown data centers. This segmentation enabled businesses to crunch through the data at desired speeds while avoiding data movement penalties that kill performance and impact total cost. As the cloud providers continue to innovate, there is less and less of a need for this often complex segmentation.
However, cloud-based heterogeneous computing techniques must be simple to use if they are to achieve widespread adoption. And not just “lip service” simple but truly simple. Businesses must be able to use the same applications they use today without any of the nuances of cloud-based heterogeneous computing.
Make Big Data Relevant Now With Ryft
At Ryft, we’re experts at abstracting away the complexities of heterogeneous computing principles, as evidenced in our Ryft ONE platform. We achieve this with open APIs that seamlessly integrate into any data analytics ecosystem to provide the full benefits of heterogeneous computing elements without any of the difficulty. We are extending these same open APIs into the AWS F1 instance, allowing for simplified integration with any type of input data, including cloud-based data such as data stored on Amazon S3.
Our initial release leverages the powerful open source Elasticsearch ecosystem, which has received wide-scale adoption as a common cloud platform for general searching needs. For all Elasticsearch does very well, it still struggles. For example, Elasticsearch only supports Levenshtein distance matches up to a distance of two, which is quite limiting for most real-world approximate searches. Furthermore, to achieve even this, Elasticsearch pays a significant upfront performance penalty, since data transformation and indexing time grows significantly when Levenshtein searching is required. Making matters worse, only certain fields in Elasticsearch JSON records can be searched using the complex Levenshtein algorithm.
All of these limitations are overcome when using Ryft’s virtual implementation on Amazon’s F1 instance:
- Absolutely no indexing is required, and distance values can be very large, as high as 10 or more depending on the search string size (so as to avoid too many false positives).
- Any field in Elasticsearch JSON records can be searched using Levenshtein techniques.
- Searches occur at blazing speeds of many gigabytes per second due to the true hardware parallelization afforded by FPGA-accelerated heterogeneous architectures. Yet the end users have no idea this is happening—they just see results, fast.
All the users see is the Elasticsearch application. Behind the scenes, Ryft’s technology manages the retrieval of source data from stores such as Amazon S3, pipelining the required computations through the F1 instance’s FPGA-enabled compute fabric, executing multiple complex algorithms, and returning data in native Elasticsearch JSON formats for eventual output through the Elasticsearch application and/or native Amazon S3 file output.
At Ryft, we believe in the power of the cloud. We believe in extending ubiquitous platforms like Elasticsearch by enhancing them with the power afforded by heterogeneous computing principles. And Elasticsearch is just the beginning. Our plans for the heterogeneous cloud don’t end there. We have exciting plans for the Ryft ONE and Ryft Virtual platforms. Stay tuned!
CATEGORIES: Cloud-based Analytics, FPGA Acceleration, Heterogeneous Computing, High Performance Analytics, Ryft News