RSS Feed with Missing Tags

I'm working on a very simple RSS Feed. What I am doing is pulling the information from a database and transforming it into XML using PHP. However, when I use Chrome to look at the code to make sure it is all appearing as it should, I get these errors at the top of the page.

error on line 1322 at column 12: Encoding error

Here is the code that I am using to pull from my database and create the RSS Feed.

<?php
include('connectDatabaseScript.php');
$sql = "SELECT * FROM table ORDER BY id DESC";
$query = mysql_query($sql) or die(mysql_error());

header("Content-type: text/xml"); 

echo "<?xml version='1.0' encoding='UTF-8'?> 
<rss version='2.0'>
<channel>
<title>My RSS Feed</title>
<link>http://www.mywebsite.com/rss.php</link>
<description>The description for the feed.</description>
<language>en-us</language>"; 

while($row = mysql_fetch_array($query)) {
$title=$row['title'];
$finalTitle = str_replace("&", "and", $title);
$link=$row['link'];
$newLink = str_replace("&", "&amp;", $link);
$category = $row['category'];
$date = $row['date'];
$description = $row['description'];

echo "<item> 
<title>$finalTitle</title>
<link>$newLink</link>
<description>$description</description>
<author>John Doe</author>
<pubDate>$date<pubDate>
<category>$category</category>
</item>"; 
} 
echo "</channel></rss>"; 
?>

This code usually gets stuck on the title tag. When it does that, it will merge together the link and can also merge the rest of the item and several others after it. Here is an example of what is happening.

<item> 
<title>Title No 415: Title <item> 
<title>Title No 291: Another Title</title>
<link>http://www.mywebsite.com/post.php?id=291</link>
<description>description</description>
<author>John Doe</author>
<pubDate>Jan. 1, 2000</pubDate>
<category>Generic</category>
</item>

I have figured out what character is causing this to occur. It is the "–" character that appears in some of the titles that I have that is causing the problem. I've been trying to remove it by using the str_replace function. While I have been able to use it with "&" with success, it is not working with "–". Is there another solution to get rid of the "–" from the title or is it still possible with str_replace?

2 answers

  • answered 2018-01-11 20:20 Syscall

    You should not write your XML like this. To avoid this kind of errors, you may use DOMDocument to write your XML, and save it using saveXML.

  • answered 2018-01-11 23:34 rcade

    I have some PHP scripts that make a MySQL query and use it to produce an RSS feed. The text for RSS elements such as title and description needs to be cleaned up for presentation as XML.

    Here's a function to do that:

    function clean_text($in_text) {
        return utf8_encode(
            htmlspecialchars(
                stripslashes($in_text)));
    }
    

    I think a simpler function might solve the problem you're having:

    function clean_text($in_text) {
        return htmlspecialchars(
                stripslashes($in_text));
    }
    

    The call to utf8_encode() encodes an ISO-8859-1 string as UTF-8 and was necessary for me because I was dealing with ISO-8859-1 character encoding in my database. The htmlspecialchars() function in PHP turns & to &amp;, < to &lt; and > to &gt;.

    Here's a statement that uses the function to output some RSS:

    echo "<description>" . clean_text($row['description']) . "</description>";