multi-thread image generation python

For a ML/DL project I have a set of features which I want to convert to image.

The data format looks like

Group Name Feat X Feat y .... Feat z
1     A.   
1.    B.
1.    C.   
2.    D
2.    A.
2.    E

Where feature X to Z is a ordered list of 60 numbers. The goal is to plot them using 1 to 60 as X axis and the feature value as Y as a line grouped by Name for each group. So the group 1 figure would have 3 lines (A,B,C).

This is the function I have so far where subs is a pandas dataframe after groupby by ['Group']

ids = list(subs['Group'])[0]
subs.set_index('Name', inplace=True)
subs = subs.T
fig=subs.plot(figsize=(32,32), legend=False).get_figure()
# convert figure to PIL image
buf = io.BytesIO()
fig.savefig(buf,  bbox_inches='tight')
img ='LA')"{}/{}.png".format(path, ids),"PNG", optimize=True,quality=50)

This is applied with df.groupby['Group'].apply(lambda x: data2img(x, img_path)).

The code works and generate the correct figure but I have millions of groups and takes forever.

My go-to for this kind of things is usually dask but when I did use it it raised a safety issue because matplotlib is not thread safe. Any idea on how to circumvent this issue/other ideas on how to speed up image generation?

1 answer

  • answered 2021-11-29 05:44 rezan21

    your question is not replicable but try something like:

    import concurrent.futures
    groups = list(df.groupby("Group").groups.keys())
    with concurrent.futures.ProcessPoolExecutor(max_workers=10) as ex:, groups)

    where your_func will be a function similar to the lambda function you got.

How many English words
do you know?
Test your English vocabulary size, and measure
how many words do you know
Online Test
Powered by Examplum