How to speed up multiple sequential http requests in Python using Http.Client

I want to get data from multiples pages about 10000 pages with number arrays. But one by one is taking so long and I'm begginer in Python so I don't know much about multithreading and asychronism

The code works fine, it takes all the data expected, but it takes several minutes to do this. And I know that it could probably be done faster if I'd do more than a request per time

import http.client
import json

def get_all_data():
    connection = http.client.HTTPConnection("localhost:5000")
    page = 1
    data = {}

    while True:
        try:

            api_url = f'/api/numbers?page={page}'
            connection.request('GET', api_url)
            response = connection.getresponse()

            if(response.status is 200):
                data[f'{page}'] = json.loads(response.read())['numbers']
                items_returned = len(data[f'{page}'])
                print(f'Por Favor, Aguarde. Obtendo os Dados... Request: {page} -- Itens Retornados: {items_returned}')
                page += 1
                if items_returned == 0 or items_returned == None :
                    break
    except:
        connection.close()

print('Todas as Requisições Concluídas!')
return data

How to refactor this code to do multiple requests at once sequentially instead one by one?

2 answers

  • answered 2019-01-11 09:23 ACE Fly

    Your parameter page (producer) is dynamic and it relies on the last request (consumer). Unless you can separate the producer, you can't use coroutines or multithreading.

  • answered 2019-01-11 09:43 minglyu

    Basically there are three ways of doing this kind of job, multithreading, multiprocessing, and async way, as mentioned by ACE the page parameter exists because of server dynamically generate template and number of pages may change over time due to the database update. the easiest way of doing this can be batch job, and try to put each batch into a try exception block, and handling the last part(not enough for one batch) separately. you can set the numer of jobs in each batch as a variable and try different solutions.