how do I combine N dictionaries in list of dictionaries based on matching key:value pair?

I want to achieve the following. It's essentially the composition or merging of N number of dictionaries, accumulating all data from duplicates id and appending all values(except id, updated_date) from all dictionaries in multiples data sources in the final result.

class A:
    def __init__(self):
        pass

    def run(self):
        return {"data":[{"id":"ID-2002-0201","updated_at":"2018-05-14T22:25:51Z","html_url":["https://github.com/ID-2002-0201"],"source":"github"},{"id":"ID-2002-0200","updated_at":"2018-05-14T21:49:15Z","html_url":["https://github.com/ID-2002-0200"],"source":"github"},{"id":"ID-2002-0348","updated_at":"2018-05-11T14:13:28Z","html_url":["https://github.com/ID-2002-0348"],"source":"github"}]}

class B:
    def __init__(self):
        pass

    def run(self):
        return {"data":[{"id":"ID-2002-0201","updated_at":"2006-03-28","html_url":["http://sample.com/files/1622"],"source":"sample"},{"id":"ID-2002-0200","updated_at":"2006-06-05","html_url":["http://sample.com/files/1880"],"source":"sample"},{"id":"ID-2002-0348","updated_at":"2007-03-09","html_url":["http://sample.com/files/3441"],"source":"sample"}]}
        
results = {}
data_sources = [A(),B()]
for data in data_sources:
    data_stream = data.run()
    for data in data_stream.get('data'):
        for key, value in data.items():
            if key in ['html_url']:
                results.setdefault(key, []).extend(value)
            elif key in ['source']:
                results.setdefault(key, []).append(value)
            else:
                results[key] = value
print(results)

desired output

[
    {
        "id":"ID-2002-0201",
        "updated_at":"2018-05-14T22:25:51Z",
        "html_url":[
            "https://github.com/ID-2002-0201",
            "https://github.com/ID-2002-0202",
            "https://github.com/ID-2002-0203",
            "https://github.com/ID-2002-0204"
        ],
        "source": [
            "github",
            "xxx",
            "22aas"
        ]
    },
]

1 answer

  • answered 2021-05-04 11:05 dracarys

    I am a little confused because the desired output you have given does not match with the sample classes provided by you in the code. However, I think I get what you want, correct me if I interpreted your question incorrectly.

    I have using your results array like a dictionary of dictionaries. The outer dictionary contains all the unique ids as keys and the inner dictionaries contain the data you wanted in your output. After the loop computes I just return the list(results.values()) to get a list of N dictionaries combined.

    Here is the code:

    class A:
        def __init__(self):
            pass
    
        def run(self):
            return {"data":[{"id":"ID-2002-0201","updated_at":"2018-05-14T22:25:51Z","html_url":["https://github.com/ID-2002-0201"],"source":"github"},{"id":"ID-2002-0200","updated_at":"2018-05-14T21:49:15Z","html_url":["https://github.com/ID-2002-0200"],"source":"github"},{"id":"ID-2002-0348","updated_at":"2018-05-11T14:13:28Z","html_url":["https://github.com/ID-2002-0348"],"source":"github"}]}
    
    class B:
        def __init__(self):
            pass
    
        def run(self):
            return {"data":[{"id":"ID-2002-0201","updated_at":"2006-03-28","html_url":["http://sample.com/files/1622"],"source":"sample"},{"id":"ID-2002-0200","updated_at":"2006-06-05","html_url":["http://sample.com/files/1880"],"source":"sample"},{"id":"ID-2002-0348","updated_at":"2007-03-09","html_url":["http://sample.com/files/3441"],"source":"sample"}]}
            
    results = {}
    data_sources = [A(),B()]
    for data in data_sources:
        data_stream = data.run()
        for data in data_stream.get('data'):
            curr_id = data["id"]
            result = results.setdefault(curr_id, {})
            for key, value in data.items():
                if key in ['html_url']:
                    result.setdefault(key, []).extend(value)
                elif key in ['source']:
                    result.setdefault(key, []).append(value)
                else:
                    result[key] = value
    print(list(results.values()))