PHP DOM getattribute manipulation

i'm struggling to find an answer for the following... i suspect I don't really know what i'm asking for or how to ask it... let me describe:

I would like to grab some links from a page. I only want the links that have the following word as part of the URL: "advertid". Therefore and for example, the URL would be something like http://thisisanadvertis.com/questions/ask.

I've got this far

                <?php
// This is our starting point. Change this to whatever URL you want.
$start = "https://example.com";

function follow_links($url) {
    // Create a new instance of PHP's DOMDocument class.
    $doc = new DOMDocument();
    // Use file_get_contents() to download the page, pass the output of file_get_contents()
    // to PHP's DOMDocument class.
    @$doc->loadHTML(@file_get_contents($url));
    // Create an array of all of the links we find on the page. 
    $linklist = $doc->getElementsByTagName("a");
    // Loop through all of the links we find.
    foreach ($linklist as $link) {
        echo $link->getAttribute("href")."\n";
    }
}
// Begin the crawling process by crawling the starting link first.
follow_links($start);
        ?>

This returns all URLs on the page... which is OK. So to try and get the URLs i wanted, i tried a few things including trying to amend the getattribute part:

echo $link->getAttribute("href"."*advertid*")."\n";

I've tried a few things... but can't get what i want. Can someone point me in the right direction, i'm a bit stuck.

Many thanks in advance.

4 answers

  • answered 2018-10-11 20:02 sietse85

    foreach ($linklist as $link) {
       if (strpos($link->getAttribute("href"), 'advertid') !== false) {
           echo $link->getAttribute("href")."\n";
       }
    }
    

  • answered 2018-10-11 20:02 Felippe Duarte

    You can check if the href attribute has the info you want, with some logic, dependending on the case:

    foreach ($linklist as $link) {
        if(strpos($link->getAttribute("href"), 'advertid') >= 0) {
            echo $link->getAttribute("href")."\n";
        }
    }
    

  • answered 2018-10-11 20:03 Jimmy Surprenant

    I would suggest you to use PHP function strpos

    strpos takes at least two parameter, the first is the string you're searching in. The second parameter is what you're looking for in the first string.

    strpos returns the position of the string if it's found, or false if it's not found.

    So your loop would look something like :

    foreach ($linklist as $link) {
        if( strpos($link->getAttribute("href"), 'advertid') !== false ){
           echo $link->getAttribute("href")."\n";
        }
    }
    

  • answered 2018-10-11 20:08 zkemppel

    $links = []
    foreach ($linklist as $link) {
        $href = $link->getAttribute("href");
        if (preg_match('/.*advertid.*/', $href)) {
            array_push($links, $href);
        }
    }