Exploring PyTorch’s Profiler

3 min readOct 17, 2020

You may have heard about Machine Learning Tokyo — a nonprofit organization that organizes events and study groups for machine learning knowledge dissemination. For the past few months, I have been attending their virtual study group on High Performance Python. In this group, every Saturday evening, several machine learning enthusiasts including me, read and discuss a chapter from Ozswald and Gorelick’s High Performance Python book. The study group, till now, has been an extremely fun and a rewarding experience!

During one of the chapter discussions, I learnt that PyTorch has its own profiler that provides a report on the time and memory consumption of a Torch model. Several of my peers asked me to investigate and if possible, write a blog on it. So here’s a brief account of the profiler.

Profiling in software engineering is the time and memory analysis of a program. In simple terms, profiling allows us to answer questions such as, how much time and memory does the program take to execute? Which part of the program is takes more time to execute? How performant is a snippet of code relative to some other program that produces the same output?

PyTorch comes with its own profiler that assists a developer in analyzing time and memory allocations for a PyTorch model. For instance, let us profile the VGG16 model. First, I wrote two lines to get a summary of the VGG16 model.

This would print out the following:

Next, I modified the code to input a random tensor through the model within the profiler context manager to get the execution time:

This gave out the following result:

The statistics page gives the time taken for the random tensor to pass through the convolution layers. One can note that the time statistics is quite granular — for each conv2d, times are given for convolution, contiguous and mkldnn_convolution. To be honest, I do not know much about these sub layers in conv2d — they seem to be C++ functions that make up a PyTorch conv2d layer. If I am not mistaken, mkldnn stands for Math Kernel Library for Deep Neural Networks — Intel’s optimized library for neural nets.

At the end, the CPU and CUDA time was reported like so:

Unfortunately, I couldn’t get the memory profiler to work. When I modified my code like so:

I got a TypeError:

Perhaps, it was a PyTorch version issue. I didn’t investigate the root cause of this problem.

One could also record the execution time in a trace.json file that could be visualized in the Chrome browser using the chrome://tracing functionality:

So that’s a brief look at the PyTorch profiler. In future, I would look into whether we can profile a custom model.

Connect with me on Linkedin and Twitter

Exploring PyTorch’s Profiler

Written by Subhankar Halder