Using line_profiler in Python

Currently I am reading one of the chapters from Python Data Science Handbook and saw that there are some magic commands that can be used in IPython.

One of them gives you the possibility to time the execution and it’s called %timeit. Going further it seems that there are other commands like %prun but it is not present in plain old Python, so I asked Gemini, the lovely chatbot, what were the alternatives. Here is his answer:

Unfortunately, there’s no single command in standard Python that perfectly replicates the functionality of IPython’s %prun magic command. Let me explain why and provide the closest alternatives:

What %prun Does

IPython’s %prun offers detailed profiling of a code statement. It shows you:

  • Number of times each function was called.
  • Time spent in each function (cumulative and per-call).
  • Other profiling information

Python Equivalents

Here are the most common alternatives:

  1. cProfile Module:
  • Usage:
    Python
    import cProfile
    cProfile.run(‘my_function()’)
  • Provides similar profiling information to %prun.
  • Advantage: More granular control over profiling.
  1. line_profiler Module (External Install):
  • Install: pip install line_profiler
  • Usage: Decorate functions with @profile
  • Provides line-by-line profiling, showing execution time for each code line within a function.
  1. Timeit Module:
  • Best for measuring short code snippets’ execution time.
  • Not a full-fledged profiler, but useful for quick timing comparisons.

Example: Using cProfile

Python

import cProfile

def my_function():
  # Code you want to profile

if __name__ == ‘__main__’:
    cProfile.run(‘my_function()’)

Choosing the Right Tool

  • For general profiling: cProfile is a good starting point.
  • For line-by-line analysis: Consider line_profiler.
  • For simple timing: timeit is sufficient.

So I tried to install line_profiler with pip, created a test py file with the example from the book

@profile
def sum_of_lists(N):
            total = 0
            for i in range(5):
                L = [j ^ (j >> i) for j in range(N)]
                total += sum(L)
            return total

sum_of_lists(1000000)

It seems like in order to grab the statistics you need to use kernprof which I likely found in my anaconda installation.

I executed it with the following output

kernprof -l lineprofiler_test.py 
Wrote profile results to lineprofiler_test.py.lprof
Inspect results with:
python -m line_profiler -rmt "lineprofiler_test.py.lprof"
python -m line_profiler -rmt "lineprofiler_test.py.lprof"
Timer unit: 1e-06 s

Total time: 0.906589 s
File: lineprofiler_test.py
Function: sum_of_lists at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     1                                           @profile
     2                                           def sum_of_lists(N):
     3         1         21.0     21.0      0.0      total = 0
     4         6         51.0      8.5      0.0      for i in range(5):
     5         5     862782.0 172556.4     95.2          L = [j ^ (j >> i) for j in range(N)]
     6         5      43735.0   8747.0      4.8          total +=sum(L)
     7         1          0.0      0.0      0.0      return total

  0.91 seconds - lineprofiler_test.py:1 - sum_of_lists

Which is kind of cool taking into consideration that we wrote in the past a lot of scripts without optimising for performance.

Cheers