Speeding up Python code using multithreading

Pybeginner
By -
0

 



We often end up writing Python code that makes remote requests or reads multiple files or processes some data. And in many of these cases I've seen programmers using a for loop simple that takes forever to finish executing. For example:


import requests

from time import time

 

url_list = [

    "https://via.placeholder.com/400",

    "https://via.placeholder.com/410",

    "https://via.placeholder.com/420",

    "https://via.placeholder.com/430",

    "https://via.placeholder.com/440",

    "https://via.placeholder.com/450",

    "https://via.placeholder.com/460",

    "https://via.placeholder.com/470",

    "https://via.placeholder.com/480",

    "https://via.placeholder.com/490",

    "https://via.placeholder.com/500",

    "https://via.placeholder.com/510",

    "https://via.placeholder.com/520",

    "https://via.placeholder.com/530",

]

 

def download_file(url):

    html = requests.get(url, stream=True)

    return html.status_code

 

start= time()

 

for url in url_list:

    print(download_file(url))

 

print(f'Time taken: {time() - start}')


Output —--------------------

Time taken: 6.340632677078247



This is a sensible example and the code will open each URL, wait for it to load, print its status code and only then move on to the next URL. This type of code is a very good candidate for multi-threading.

Modern systems can run many threads and that means you can run multiple tasks at the same time with very low overhead. Why don't we try to make use of this to make the code above process these URLs faster?

We will use the Thread Pool Executor from the library concurrent.futures . It's super easy to use. Let me show you some code and then explain how it works.


 

import requests

from concurrent.futures import ThreadPoolExecutor, as_completed

from time import time

 

url_list = [

    "https://via.placeholder.com/400",

    "https://via.placeholder.com/410",

    "https://via.placeholder.com/420",

    "https://via.placeholder.com/430",

    "https://via.placeholder.com/440",

    "https://via.placeholder.com/450",

    "https://via.placeholder.com/460",

    "https://via.placeholder.com/470",

    "https://via.placeholder.com/480",

    "https://via.placeholder.com/490",

    "https://via.placeholder.com/500",

    "https://via.placeholder.com/510",

    "https://via.placeholder.com/520",

    "https://via.placeholder.com/530",

]

 

def download_file(url):

    html = requests.get(url, stream=True)

    return html.status_code

 

start= time()

 

processes = []

with ThreadPoolExecutor(max_workers=10) as executor:

    for url in url_list:

        processes.append(executor.submit(download_file, url))

 

for task in as_completed(processes):

    print(task.result())

 

print(f'Time taken: {time() - start}')

 


Output —--------------------

Time taken: 0.8704216480255127


We just speeded up our code by a factor of almost 9! And we didn't even do anything super involved. The performance benefits would have been even greater if there were more URLs.

So what's going on? When we call executor.submit, we are adding a new task to the pool thread. We store this task in the list process. Later, we iterate through the processes and print the result.

The method as_completed produces the items (tasks) from the process list as soon as they are completed. There are two reasons why a task might go to the completed state. It has finished running or has been canceled. We could also have passed a timeout parameter to as_completed and if a task took longer than that amount of time, as_completed still would produce that task.

You should explore multi-threading a little more. For trivial projects, it's the fastest way to speed up your. If you want to learn more, read the 
official docs. They are super helpful.


This post is taken from the link below:

link: https://yasoob.me/2019/05/29/speeding-up-python-code-using-multithreading/



Post a Comment

0Comments

Post a Comment (0)