Show a Progress Bar in Python

Janne Kemppainen |

Python is a great language for writing command line scripts. When you need to run long running processes it is polite to indicate the overall progress to your user. You don't want the user to think that your script has hanged and terminate the execution after a minute. Luckily, adding a progress indicator is really easy!

Progress indicators are already very common in your everyday command line tools. When you install a package with pip, or download a file with wget, you will see progress bars that tell exactly what's happening.

The problem of displaying progress is two-fold:

  1. First, you'll have to determine the actual progress on the code level. This can be a proportion of files processed, the amount of data downloaded, the time that has elapsed, or whatever your requirements are. This is the “difficult part”, which usually isn't that difficult.
  2. Finally, you need to show the status to the user. Here you can use external packages, so no need to reinvent the wheel. This is the “easy part”.

Calculating progress

Let's start with the first problem, determining the amount of progress. In many cases this comes naturally, maybe you get a list of files that you need to process one by one, or your user wants to repeat something a hundred times. In those cases you already know the amount of needed iterations.

If your script needs to process a large file line by line then the situation is a little different. When you open a file you can't really know how many lines it contains before you've looped through them all. One solution is to open the file and loop it through once before doing any processing to calculate the amount of iterations. This will increase the execution time a little.

def get_linecount(filename):
    with open(filename, "r") as infile:
        i = -1
        for i, _ in enumerate(infile):
            pass
        return i + 1

The above code defines a function called get_linecount. It opens the file, enumerates all lines, and returns the total line count.

The enumerate function starts indexing from zero, so we need to increment the index by one to get the actual line count. If the file is empty, then the for loop won't be executed at all, and the value of i is not touched. This is why the index is initially set to minus one.

When you are reading the file in binary mode you are in luck. You can use the file.tell() method to get the current position in the file (in bytes).

The tell() function guarantees proper results in binary mode only. If you open a file in text mode it can return an arbitrary number, so you cannot use the value to indicate progress, even if you know the file size. It will only work in conjunction with the seek() method to jump to a specific point in the file.

If you're reading the file sequentially, and not hopping back and forward in the file, then the file position method should work well. If you're doing random access to the file the progress calculation must also be more specific to your use case.

Display progress with Rich

There are many libraries that support creating command line progress bars, and one of them is called Rich. It is a library that was designed for writing rich text and displaying other advanced content on the terminal. It can go way beyond our use case here so it's a perfect candidate when you want to create beautiful command line applications. For now we'll concentrate on the progress.

You can install Rich with pip:

>> python3 -m pip install rich

Let's start with a simple dummy example. The script reads a list of items from a file and does some processing with each line:

# progress.py
import sys
import time
from rich.progress import track


def get_linecount(filename):
    with open(filename, "r") as infile:
        i = -1
        for i, _ in enumerate(infile):
            pass
        return i + 1


def process(line):
    time.sleep(1)
    

def main():
    try:
        filename = sys.argv[1]
    except IndexError:
        print("Please provide a filename")
        exit(1)
    linecount = get_linecount(filename)
    with open(filename, "r") as f:
        for line in track(f, description="Progress:", total=linecount):
            process(line)


if __name__ == "__main__":
    main()

The script uses the first argument as the file to be processed, uses the get_linecount function that we defined earlier to count the lines, and then creates the progress bar with the track function from rich.progress.

The track() function accepts a sequence as it's first argument, so we can pass the file object directly to process it line by line. The items are returned transparently, so track() can be easily added to existing loops. For sequences that don't support len(), such as our file example, you can tell the amount of iterations with the total argument.

The progress bar can be customized with a description and styles if needed.

The process() function just sleeps for a while in order to simulate a job that would take a while to complete.

If we run this script with a file that contains five lines the end result will look like this.

As you can see the output is automatically colorized, on-going progress is shown in red, and when the processing is complete the color changes to green. The progress bar also calculates an estimate for the remaining time automatically!

If your work can be put in a list then the whole thing becomes even simpler since Rich automatically uses the list length to calculate the progress.

from rich.progress import track

items = ["banana", "apple", "strawberry", "lemon"]
for item in track(items):
    # do something
    time.sleep(1)

It can't possibly get simpler than that!

Handling binary files

Earlier, I mentioned reading a file in binary mode. Binary files can be read in chunks, where you read a specific amount of data to a buffer, handle it, and continue with the next part. Byte strings can also be read line by line. Random access patterns need more sophisticated progress calculation, so we won't be talking about them.

This script opens a binary file, reads it sequentially in chunks, and displays the progress.

import os.path
import sys
import time
from io import DEFAULT_BUFFER_SIZE
from rich.progress import Progress


def main():
    try:
        filename = sys.argv[1]
    except IndexError:
        print("Please provide a filename")
        exit(1)
    with open(filename, "rb") as f, Progress() as progress:
        # get size
        f.seek(0, 2)
        filesize = f.tell()
        # seek back to beginning of the file
        f.seek(0)
        
        task = progress.add_task("Processing", total=filesize)

        while (data := f.read(DEFAULT_BUFFER_SIZE)):
            # do something with data
            time.sleep(0.01)
            progress.update(task, completed=f.tell())

if __name__ == "__main__":
    main()

Here the file is opened in the with statement, but it also contains a call to the Progress() class that we can use as a context manager. Progress lets us manage the progress bar in finer detail.

Within the with block the first thing we need to do is to get the total file size. This is easy as we can seek to the end of the file, check the position with tell(), and then seek back to the beginning to start the actual processing. The second argument to seek() defines the point from which to start seeking, 2 means that it should seek from the end. By default, seek uses absolute positioning, so in the second call zero means seeking to the start of the file.

Next, we need to create a new task. You can add multiple tasks to a single progress instance, allowing you to display many simultaneous progress bars.

The while loop reads chunks of data until it runs out of content. This uses the new “walrus operator” introduced in Python 3.8 and updates the data in-place with the loop condition evaluation. Within the loop there is again a small delay that simulates actual processing.

Finally, the status is updated using the Progress.update() method. The first argument is the task that you want to update, then the changed values need to be provided using keyword arguments.

The argument naming can be a little confusing at first. The amount of finished work is updated with completed using the value from tell(). The total amount of work can be changed with total when needed (though users don't typically like it when the progress goes backwards). You can also use advance to increment the progress by a certain value, in this case it could be the amount of processed bytes in the current chunk.

You should try to process a binary file, such as an image, to see the code in action. It should take some seconds to run through a few megabytes.

Conclusion

No matter how simple or complicated your script is, it is always useful to give relevant information to whoever needs to call it. Libaries like Rich make it super easy to improve the user experience with very little code.

Subscribe to my newsletter

What's new with PäksTech? Subscribe to receive occasional emails where I will sum up stuff that has happened at the blog and what may be coming next.

powered by TinyLetter | Privacy Policy