How to change names of scraped images with Python?

So I need to download the images of every coin on the list on CoinGecko, so I wrote the following code:

import requests
from bs4 import BeautifulSoup
from os.path  import basename

def getdata(url): 
    r = requests.get(url) 
    return r.text 
    
htmldata = getdata("https://www.coingecko.com/en") 
soup = BeautifulSoup(htmldata, 'html.parser')
for item1 in soup.select('.coin-icon img'):
    link = item1.get('data-src').replace('thumb', 'thumb_2x')
    with open(basename(link), "wb") as f:
            f.write(requests.get(link).content)

However, I need to save the images with their names being the same as the ticker of the coin of that list from CoinGecko (rename bitcoin.png?1547033579 to BTC.png, ethereum.png?1595348880 to ETH.png, and so forth). There are over 7000 images that need to be renamed, and many of them have quite unique names, so slicing does not work here.

What is the way to do it?

3 answers

  • answered 2021-06-10 11:05 Luiz F. Bianchi

    I believe you could achieve this very easily using string slicing:

    import requests
    from bs4 import BeautifulSoup
    from os.path  import basename
    
    def getdata(url): 
        r = requests.get(url) 
        return r.text 
        
    htmldata = getdata("https://www.coingecko.com/en") 
    soup = BeautifulSoup(htmldata, 'html.parser')
    for item1 in soup.select('.coin-icon img'):
        link = item1.get('data-src').replace('thumb', 'thumb_2x')
        with open(basename(link[:link.find('?')]), "wb") as f:
                f.write(requests.get(link).content)
    

    I am slicing a section of the link string using [:] and looking for the question mark that marks the beginning of the query.

  • answered 2021-06-10 11:40 Luiz F. Bianchi

    I was browsing the html file and I found that the tag you are looking at has an alt parameter that has the ticker on the end of the string.

    <div class="coin-icon mr-2 center flex-column">
    <img class="" alt="bitcoin (BTC)" data-src="https://assets.coingecko.com/coins/images/1/thumb/bitcoin.png?1547033579" data-srcset="https://assets.coingecko.com/coins/images/1/thumb_2x/bitcoin.png?1547033579 2x" src="https://assets.coingecko.com/coins/images/1/thumb/bitcoin.png?1547033579" srcset="https://assets.coingecko.com/coins/images/1/thumb_2x/bitcoin.png?1547033579 2x">
    </div>
    

    So we can use that to get the correct name like so:

    
    import requests
    from bs4 import BeautifulSoup
    from os.path  import basename
    
    def getdata(url): 
        r = requests.get(url) 
        return r.text 
        
    htmldata = getdata("https://www.coingecko.com/en") 
    soup = BeautifulSoup(htmldata, 'html.parser')
    for item1 in soup.select('.coin-icon img'):
        link = item1.get('data-src').replace('thumb', 'thumb_2x')
        raw_name = item1.get('alt')
        name = raw_name[raw_name.find('(') + 1:-1]
        with open(basename(name), "wb") as f:
                f.write(requests.get(link).content)
    

    We are basically extracting the value between the parenthesis using string slicing.

  • answered 2021-06-10 12:08 MITHU

    This is something you could do alternatively:

    import requests
    from bs4 import BeautifulSoup
    from os.path  import basename
    
    url = "https://www.coingecko.com/en"
    
    r = requests.get(url) 
    soup = BeautifulSoup(r.text, 'html.parser')
    for item1 in soup.select('td.coin-name[data-text]'):
        ticker_name = item1.select_one(".center > span").get_text(strip=True)
        image_link = item1.select_one(".coin-icon > img").get('data-src').replace('thumb','thumb_2x')
    ##    with open(f"{basename(ticker_name)}.png", "wb") as f:
        with open(basename(ticker_name), "wb") as f:
            f.write(requests.get(image_link).content)