Accelerated Distributed AI Applications on the AWS Cloud

In a webinar sponsored by Intel and the Hewlett Packard Enterprise Data Science Institute, Senior AI Solutions Engineer Eduardo Alvarez led a technical workshop on building accelerated distributed AI applications on the AWS cloud, providing an introduction to Kubernetes, building Docker images, and using the Intel AI Analytics Toolkit. Alvarez has previously worked for multiple energy tech startups and is currently specializing in applied deep learning and AI solution design at Intel.

April 25, 2023 /

Isabelle Sitchon

For the past decade or so, many companies have focused their deep learning training on accelerators, such as GPUs, in order to make deep learning more accessible and cost-effective in the long run. However, some AI/ML workloads may require different hardware, depending on their complexity. As data computations and algorithms grow increasingly intricate, it’s essential for developers and researchers to become more compute-aware and be able to understand the depth of optimization of an AI workload, according to Alvarez. To become a compute-aware AI/ML programmer, he encouraged aspiring developers to keep performance in their minds, making conscious decisions about the type of software and hardware they run on the stack.

In the rapidly-growing technological world of data science, one skill that compute-aware developers should possess is the ability to carry out a Kubernetes application, according to Alvarez. These applications can help manage a user’s infrastructure through autoscaling, aiding programmers in overseeing dockerized applications in a distributed fashion across various clusters. Alvarez explained the general anatomy and function of a Kubernetes system, addressing how the system dynamically addresses computation failures.

Alvarez briefly discussed Intel’s AI toolkit, showcasing its cloud optimization modules (ICOMs), which are open-source codebases with codified AI optimizations and instructions for selecting appropriate hardware per Cloud Service Provider (CSP). To showcase a module in action, he demonstrated how to configure and deploy an application focused on loan default risk prediction, utilizing the oneDAL hardware-level accelerations in the daal4py library, which are only available on Intel hardware. Additionally, he gave a tour of the Intel codebase, as well as the AWS cloud infrastructure.

News Category

Events

Institute Happenings

Research

Research Topics

ML / AI

Scientific Computing