Speeding up Python code using multithreading

By -Pybeginner

January 16, 2022

We often end up writing Python code that makes remote requests or reads multiple files or processes some data. And in many of these cases I've seen programmers using a for loop simple that takes forever to finish executing. For example:

import requests

from time import time

url_list = [

"https://via.placeholder.com/400",

"https://via.placeholder.com/410",

"https://via.placeholder.com/420",

"https://via.placeholder.com/430",

"https://via.placeholder.com/440",

"https://via.placeholder.com/450",

"https://via.placeholder.com/460",

"https://via.placeholder.com/470",

"https://via.placeholder.com/480",

"https://via.placeholder.com/490",

"https://via.placeholder.com/500",

"https://via.placeholder.com/510",

"https://via.placeholder.com/520",

"https://via.placeholder.com/530",

]

def download_file(url):

html = requests.get(url, stream=True)

return html.status_code

start= time()

for url in url_list:

print(download_file(url))

print(f'Time taken: {time() - start}')

Output —--------------------

Time taken: 6.340632677078247

This is a sensible example and the code will open each URL, wait for it to load, print its status code and only then move on to the next URL. This type of code is a very good candidate for multi-threading.

Modern systems can run many threads and that means you can run multiple tasks at the same time with very low overhead. Why don't we try to make use of this to make the code above process these URLs faster?

We will use the Thread Pool Executor from the library concurrent.futures . It's super easy to use. Let me show you some code and then explain how it works.

import requests

from concurrent.futures import ThreadPoolExecutor, as_completed

from time import time

url_list = [

"https://via.placeholder.com/400",

"https://via.placeholder.com/410",

"https://via.placeholder.com/420",

"https://via.placeholder.com/430",

"https://via.placeholder.com/440",

"https://via.placeholder.com/450",

"https://via.placeholder.com/460",

"https://via.placeholder.com/470",

"https://via.placeholder.com/480",

"https://via.placeholder.com/490",

"https://via.placeholder.com/500",

"https://via.placeholder.com/510",

"https://via.placeholder.com/520",

"https://via.placeholder.com/530",

]

def download_file(url):

html = requests.get(url, stream=True)

return html.status_code

start= time()

processes = []

with ThreadPoolExecutor(max_workers=10) as executor:

for url in url_list:

processes.append(executor.submit(download_file, url))

for task in as_completed(processes):

print(task.result())

print(f'Time taken: {time() - start}')

Output —--------------------

Time taken: 0.8704216480255127

We just speeded up our code by a factor of almost 9! And we didn't even do anything super involved. The performance benefits would have been even greater if there were more URLs.

So what's going on? When we call executor.submit, we are adding a new task to the pool thread. We store this task in the list process. Later, we iterate through the processes and print the result.

The method as_completed produces the items (tasks) from the process list as soon as they are completed. There are two reasons why a task might go to the completed state. It has finished running or has been canceled. We could also have passed a timeout parameter to as_completed and if a task took longer than that amount of time, as_completed still would produce that task.

You should explore multi-threading a little more. For trivial projects, it's the fastest way to speed up your. If you want to learn more, read the official docs. They are super helpful.

This post is taken from the link below:

link: https://yasoob.me/2019/05/29/speeding-up-python-code-using-multithreading/

Speeding up Python code using multithreading

Post a Comment

Software Engineering - Software Development Lifecycle Model | Classic Waterfall Model

Hot Posts

Labels

Search This Blog

Most Recent

Software Engineering - Software Development Lifecycle Model | Classic Waterfall Model