XPath - all elements except those in header

Trying to figure out XPATH which match all elements except header or inside header. Let's assume that header can be detected by three conditions:

  1. outer tag is header eg. <header><div.....></header>
  2. outer tag has id which contains string "header"
  3. outer tag has class which contains string "header"

My xpath: //*[not(ancestor::header)] and //*[not(ancestor::*[contains(@id,"header")])] and //*[not(ancestor::*[contains(@class,"header")])]

is not correct.

EDIT: This should match all links which are inside header:

//*[ancestor::*[contains(@id,"header") or contains(@class,"header") or header]]

Now I want to get all elements except these.

Do you know how to make it work?

1 answer

  • answered 2018-03-11 14:01 Mads Hansen

    Each of the expressions in your original XPath were being evaluated separately, testing whether there is an element in the XML document that satisfies those conditions, and returning a boolean().

    Now that you have combined the predicates to order select the particular element(s) that you don't want, you just need to negate the test:

    //*[not(ancestor-or-self::header) and 
        not(ancestor::*[contains(@id,"header") or contains(@class,"header")])

  • Unable to create Docker container for Scraproxy

    Attempting to get scraproxy up and running using docker. Following this answer. I can see that I've successfully pulled the scraproxy image:

    REPOSITORY                   TAG      IMAGE ID       CREATED       SIZE
    fabienvauchelles/scrapoxy   latest    bcf119a67836   7 weeks ago   146MB

    However when I run the command to create the container:

    docker create --name string scrapoxy -e COMMANDER_PASSWORD='text_password'\ 
    -it -p 8888:8888 -p 8889:8889 fabienvauchelles/scrapoxy

    ...I get the following error:

    Unable to find image 'scrapoxy:latest' locally
    Error response from daemon: pull access denied for scrapoxy, repository does 
    not exist or may require 'docker login' 

    If you can point me in the right direction that would be most appreciated. I'm pretty green when it comes to docker.

  • Instagram scraper: how to retrieve all followers w.r.t user

    I am using selenium with python to scrape Instagram data. For instance: user followers, followings, posts, post likes & post comments.

    For example: A user has 3 Million followers

    I am able to scrape only 1000 followers for that user, but afterwards all I see is the loading icon. Is there a way to fetch the number of all the followers?

    PS: I would see the same situation for the user's followings/post, likes/post, and comments.

    Any insights would be really appreciated.

  • R Web scraping: How to automatically go to next page with R?


    Given the previous webpage, I would need a line of R-code to go to the next page and obtain the corresponding url.

    Please, find attached the screenshot. Thanks!

    enter image description here