A ‘fail fast’ solution for end-to-end machine learning

January 15, 2020 devadvin

Enterprise AI solutions are characterized by an end-to-end workflow that involves data sourcing, querying, ETL, feature engineering, and training the machine learning algorithms. Did you know there’s an end-to-end machine learning pipeline, which can be built using Pythonic frameworks, that allows you to fail fast at TeraScale data levels?

Big data versus AI conundrum

Training machine learning algorithms with large amounts of relevant data addresses overfitting and helps with building more robust models. Consequently, handling big data in a distributed compute environment is an integral part of engineering enterprise AI solutions. Spark has become synonymous with big data processing, so let’s take a look at the traditional Spark-based workflow for AI solutions, as shown in Figure 1.

A typical Spark-based workflow for training machine learning algorithms

Figure 1. A typical Spark-based workflow for training machine learning algorithms

Currently, a major drawback of this workflow is that the Spark ecosystem doesn’t support seamless GPU integration. At TeraScale data levels and beyond, GPUs offer huge benefits for accelerating the entire machine learning pipeline, including data processing and feature engineering. Though several efforts are underway to facilitate easy GPU usage within Spark (Apache Arrow, Project Hydrogen, and others), there’s not yet a ready-to-use Spark API that effectively hides the complexity of GPU programming (and the architectural details thereof) from the user. Furthermore, any intermediate CPU operations in the pipeline necessitate transforming data between CPU and GPU.

However, the recently evolved machine learning frameworks (IBM Snap ML, Tensorflow, PyTorch, and others) have been highly successful in transparent GPU execution of machine learning algorithms. This Pythonic ecosystem is also rapidly evolving around big data (cuDF and Dask-cuDF) to catch up with the machine learning frameworks in utilizing GPUs.

An alternative fail fast solution

Failing fast, with quick turnaround times for training workloads by adopting a flexible end-to-end machine learning pipeline, is critical for development of enterprise AI applications. Read about why fail fast methodology becomes relevant in AI application development.

Figure 2 shows an end-to-end pipeline designed by IBM Systems Lab Services consultants using Pythonic frameworks as an alternative solution to a Spark-based workflow. It relies on open source tools and the IBM cognitive product line including IBM Power Systems, IBM Spectrum Scale, and Snap ML to leverage GPUs throughout the machine learning pipeline.

An end-to-end machine learning pipeline built using open source Pythonic tools and IBM cognitive product line

Figure 2. An end-to-end machine learning pipeline built using open source Pythonic tools and IBM cognitive product line

Similar to Spark, the workflow shown in Figure 2 seamlessly integrates data processing, querying, feature engineering, and machine learning algorithms. Additionally, it allows developers to fail fast by providing the following benefits over a Spark-based workflow:

  • Unified GPU integration through an easy-to-use API that hides GPU programming complexity
  • Shortened time to development
  • Easy setup and maintenance of the popular machine learning frameworks on IBM Power Systems through conda
  • GPU optimized TeraScale machine learning training through IBM Snap ML
  • High bandwidth CPU-GPU data transfers using NVLink
  • Interactive mode using python notebooks

Contact IBM Lab Services today to get more information on how you can achieve your AI goals by designing the workflows appropriate for your enterprise.

Chekuri S. Choudary

Previous Article
Open Innovation Stories: Tamar Eilam and how Istio become a microservices rallying point
Open Innovation Stories: Tamar Eilam and how Istio become a microservices rallying point

Learn about an IBM Fellow's open source microservices mesh platform that helps customers with the entire mi...

Next Article
Announcing new data sets on the IBM Data Asset eXchange
Announcing new data sets on the IBM Data Asset eXchange

Learn about new data sets and features recently released on the IBM Data Asset eXchange.

×

Want our latest news? Subscribe to our blog!

Last Name
First Name
Thank you!
Error - something went wrong!