Inject <customTag> within text using DOMDocument

I would like to add a custom tag within a certain part of a text node using DOMDocument, my problem is that I can't figure out how can I locate that specific part, for example:

"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."

My purpose is to add the tag somewhere this way:

"Lorem ipsum dolor sit amet, <emphasis>consectetur adipiscing</emphasis> elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."

The problem is that every text node is an instance of DOMNode, so I can't properly get the text content of the node and "inject" the tag right in. Any suggestions? Thanks.

2 answers

  • answered 2019-06-24 11:16 weegee

    Do you want something like this? Some logic and regex and you are done. Explained in comments.

    <?php
    // example code
    $string = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.';
    $post = from("consectetur", "ut", $string, "<a>");
    
    function from($from,$to, $string, $tag) {
        $frompost = strpos($string, $from); // get the pos of first string
        $topost = strpos($string, $to); // get the post of second string
        $substrfirst = substr($string, 0 , $frompost) . $tag; // trim string for the first word and add concatinate the tag
        $substrsecond = $substrfirst . substr($string, $frompost , strlen($from)); // trim another string starting from the first word and ending the length of the word and combine it with previous result
        $strinbetweenregex = '/(?<='.$from.')(.*)(?='.$to.')/'; // regex to get string in between
        preg_match($strinbetweenregex, $string, $matches); // get regex result
        $restString = substr($string, $topost + strlen($to) , strlen($string)); // get the rest of the string by starting from last str postition + the length of the last str to the length of the str 
        return $substrsecond.  $matches[0] . $to .$tag  . $restString; // return all the string.
    }
    

    This will give Lorem ipsum dolor sit amet, <a>consectetur adipiscing elit, sed do eiusmod tempor incididunt ut</a> labore et dolore magna aliqua.
    This also gives us an inequality. Which is

    $frompost < $topost
    

    That also means that your first argument should come first from left to right, followed by second argument.

  • answered 2019-06-24 12:32 Nigel Ren

    This is a bit of a long winded way round the solution, but it basically starts with a DOMNode(or DOMElement) and ends up putting the content back as the same with the changes. It also attempts to ensure that any content is preserved around it (including markup and other structure).

    The idea being to save the HTML of the node to be updated and then just use str_replace() to change the content. This then is imported back into the document (using SimpleXML as I think it's easier, then importing the new node to the DOMDOcument and then replacing the original node with the new one...

    $source = '<div class="ToReplace">Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.</div>';
    
    $textToTag="consectetur adipiscing";
    $tag = "emphasis";
    
    $doc = new DOMDocument();
    $doc->loadHTML($source);
    
    foreach ( $doc->getElementsByTagName("div") as $div )    {
        $nodeHTML = $doc->saveHTML($div);
        $newHTML = str_replace($textToTag, "<$tag>$textToTag</$tag>", $nodeHTML);
        $newNode = simplexml_load_string($newHTML);
        $import = $doc->importNode(dom_import_simplexml($newNode), true);
        $div->parentNode->replaceChild($import, $div);
    }
    echo $doc->saveHTML();