Scrapy ValueError: url cant be none

Introduction

I have to create a spider which crawls information of https://www.karton.eu/einwellig-ab-100-mm and the weight of a product which is scrapable after following the productlink to its own page.

After running my code, i get following error message: click me

I already checked if the url is broken or not, so in my scrapy shell i could fetch it.

Code is used:

import scrapy
from ..items import KartonageItem

class KartonSpider(scrapy.Spider):
    name = "kartons"
    allow_domains = ['karton.eu']
    start_urls = [
        'https://www.karton.eu/einwellig-ab-100-mm'
        ]
    custom_settings = {'FEED_EXPORT_FIELDS': ['SKU', 'Title', 'Link', 'Price', 'Delivery_Status', 'Weight'] } 
    def parse(self, response):
        card = response.xpath('//div[@class="text-center artikelbox"]')

        for a in card:
            items = KartonageItem()
            link = a.xpath('@href')
            items ['SKU'] = a.xpath('.//div[@class="signal_image status-2"]/small/text()').get()
            items ['Title'] = a.xpath('.//div[@class="title"]/a/text()').get()
            items ['Link'] = link.get()
            items ['Price'] = a.xpath('.//div[@class="price_wrapper"]/strong/span/text()').get()
            items ['Delivery_Status'] = a.xpath('.//div[@class="signal_image status-2"]/small/text()').get()
            yield response.follow(url=link.get(),callback=self.parse, meta={'items':items})

    def parse_item(self,response):
        table = response.xpath('//span[@class="staffelpreise-small"]')

        items = KartonageItem()
        items = response.meta['items']
        items['Weight'] = response.xpath('//span[@class="staffelpreise-small"]/text()').get()
        yield items

What causes this error?

1 answer

  • answered 2020-07-29 17:52 renatodvc

    The problem is that your link.get() returns a None value. It seems that the problem is in your XPath.

    def parse(self, response):
        card = response.xpath('//div[@class="text-center artikelbox"]')
    
        for a in card:
            items = KartonageItem()
            link = a.xpath('@href')
    

    While card variable selects a few div tags, there is no @href in the self axis of that div(that's why it returns empty), but there is in the descendent a tag. So I believe that this should give you the expected result:

    def parse(self, response):
        card = response.xpath('//div[@class="text-center artikelbox"]')
    
        for a in card:
            items = KartonageItem()
            link = a.xpath('a/@href') # FIX HERE <<<<<