CPU Accelerated Fine-Tuning for Image Segmentation Using PyTorch
Ben has a background in oil and gas and had worked on projects in seismic imaging before becoming a manager in AI solutions.
More specifically, the objective of the talk was to accelerate a PyTorch training job on Intel’s new 4th Gen. Xeon Scalable Processor CPU, just released in January of 2023. “This interests me because a lot of training usually happens on GPUs and I’m excited to present on the ability to train a deep learning workload on a CPU,” Ben stated. He demonstrated that the 4th Gen Xeon is attractive for many reasons.
This CPU has three to 10 times more speedup and 7.7 times performance per watt. The 4th Gen. Intel CPU also has a built-in AI acceleration engine called Intel Advanced Matrix Extensions (AMX), allowing for faster matrix multiplication, and something called “mixed precision training.” Furthermore, using CPUs can often be more cost and time effective, as GPUs are typically more expensive and less available. The new Intel CPU bare metal machine is a hardware beast: it has two sockets each with 56 physical cores, a total of 504 GB of memory, and a total of 224 virtual CPUs (vCPUs).
The previously mentioned AMX consists of two advanced tiles: one for 2D register files and the “TMUL” (Tile Matrix Multiply). The dual action of these tiles help to store bigger chunks of data, and the TMUL contains instructions that compute larger matrices in a single operation.
After introducing Intel’s new CPU, Ben transitioned to talking through the actual training of a satellite image dataset with matching street labels, known as a pixel segmentation task in computer vision. At this point, attendees on the call were given the link to the Jupyter Notebook, which you can find here.
The main repository used throughout the talk for road mapping is referred to as CRESI (City Scale Road Extraction from Satellite Imagery). In the talk, Ben’s aim was to show that it only takes a few line changes to the training scripts in this repository to get optimal performance on the new CPU. Ben stated that “the goal here is to map roads to [the] satellite images” pulled from the city of Moscow.
Ben noted that those thinking “we already have our roads mapped with Google Maps, etc.” should consider the following: “If you are in an emergency situation and want to be able to map the usable roads very quickly, or without having someone go and trace all the roads, this [approach] would be helpful.”
For the remainder of the workshop, Ben went through the process of running the cod. In the end, Ben created 1352 masks corresponding to the satellite images of Moscow to show that the trained PyTorch model correctly predicted the locations of the roads.
The workshop included showcasing optimizations like the Intel Extension For Pytorch, BF16 mixed precision training, the encoder-decoder model architecture of ResNet34-UNet, all running on the the new Intel 4th Gen Xeon Scalable Processor.
Ben concluded with the fact that “the new [Intel 4th Gen] brings home the new reality of fine-tuning training on a CPU, which is exciting!”