Missing < > & characters from article content

Support requests, bug reports, etc. go here. Dedicated servers / VDS hosting only

Missing < > & characters from article content

Postby adq » 06 Oct 2008, 09:01

This isn't a support request, just a problem I've tracked down. I updated my server recently and suddenly noticed that tt-rss was stripping html characters from feeds seemingly at random. A bit of hunting later showed it was a problem with libxml2 and php. The bug here http://bugs.php.net/bug.php?id=45996 describes exactly what I am seeing.

I downgraded my server to libxml2 2.6.32 and it started working properly again.

I also tried upping to libxml2 2.7.1 and rebuilding php against that (in case it had some compile-time detection logic), but the problem recurred.

I'm happy with an older version of libxml2, and obviously this isn't a tt-rss issue; just thought I'd mention it in case someone else has the same problem.
adq
Bear Rating Trainee
Bear Rating Trainee
 
Posts: 1
Joined: 06 Oct 2008, 08:55

Re: Missing < > & characters from article content

Postby fox » 06 Oct 2008, 09:25

Eh, I never encountered this problem. Thanks for reporting!
User avatar
fox
^ me irl ^
 
Posts: 4789
Joined: 27 Aug 2005, 18:53
Location: Saint-Petersburg, Russia

Re: Missing < > & characters from article content

Postby Qwark » 06 Oct 2008, 19:14

I have (had) the same problem as well.

I modified simplepie.inc to replace the problematic chars to their numeric equivalents just before parsing. the XML parser leaves them alone and everything works again as expected (for me)

(remember to set ENABLE_SIMPLEPIE to true in config.php to make ttrss actually use simplepie) But I guess this works just as well in magpie if you paste these 3 lines at the right place.



Code: Select all
diff -Naur --exclude .backups --exclude icons --exclude '*.png*' tt-rss-20080919.org/simplepie/simplepie.inc tt-rss-20080919.alex/simplepie/simplepie.inc
--- tt-rss-20080919.org/simplepie/simplepie.inc 2008-09-19 02:00:03.000000000 +0200
+++ tt-rss-20080919.alex/simplepie/simplepie.inc    2008-09-21 01:29:13.000000000 +0200
@@ -12761,6 +12761,10 @@
        xml_set_character_data_handler($xml, 'cdata');
        xml_set_element_handler($xml, 'tag_open', 'tag_close');

+       $data=str_replace("&lt;","&#60;",$data);
+       $data=str_replace("&gt;","&#62;",$data);
+       $data=str_replace("&amp;","&#38;",$data);
+
        // Parse!
        if (!xml_parse($xml, $data, true))
        {
Qwark
Bear Rating Trainee
Bear Rating Trainee
 
Posts: 3
Joined: 22 Sep 2008, 22:28

Re: Missing < > & characters from article content

Postby thecount » 16 Oct 2008, 14:02

Same problem, Qwark's fix works for me.
thecount
Bear Rating Trainee
Bear Rating Trainee
 
Posts: 8
Joined: 20 Mar 2006, 19:26

Re: Missing < > & characters from article content

Postby fox » 21 Oct 2008, 09:52

I'm not sure I can merge the fix into trunk, though - I think it might break things for those with non-broken libxml.

I'll keep this thread sticky as a reference for people whose system experience this problem.

Edit: The patch above also solves the problem with missing & (&amp;) in article links.
User avatar
fox
^ me irl ^
 
Posts: 4789
Joined: 27 Aug 2005, 18:53
Location: Saint-Petersburg, Russia

Re: Missing < > & characters from article content

Postby padde » 28 Oct 2008, 15:33

Same problem here... php 5.2.6 + libxml2 2.7.2 (Gentoo).
padde
Bear Rating Trainee
Bear Rating Trainee
 
Posts: 13
Joined: 28 Oct 2008, 15:30

Re: Missing < > & characters from article content

Postby candrews » 28 Oct 2008, 20:00

I didn't notice this thread, so I reported a bug here: http://tt-rss.org/trac/ticket/224
candrews
Bear Rating Trainee
Bear Rating Trainee
 
Posts: 4
Joined: 28 Oct 2008, 19:59

Re: Missing < > & characters from article content

Postby fox » 28 Oct 2008, 20:23

I've added the entry to the FAQ which links to this thread and the ticket you created.
User avatar
fox
^ me irl ^
 
Posts: 4789
Joined: 27 Aug 2005, 18:53
Location: Saint-Petersburg, Russia

Re: Missing < > & characters from article content

Postby padde » 29 Oct 2008, 07:35

Umm, could somebody provide a patch for magpie?

I'm trying to package tt-rss for Gentoo, but this is a showstopper (as the bug shows up with the versions of php/libxml2 that are being shipped with Gentoo currently). Until the problems in php/libxml2/wherever are fixed, I'll automatically apply the patch(es) during installation to provide a working tt-rss to Gentoo users.
padde
Bear Rating Trainee
Bear Rating Trainee
 
Posts: 13
Joined: 28 Oct 2008, 15:30

Re: Missing < > & characters from article content

Postby fox » 29 Oct 2008, 08:32

Try adding the same three str_replace() calls after magpierss/rss_parse.inc:158, e.g.

Code: Select all
     xml_set_character_data_handler( $this->parser, 'feed_cdata' );

    // add these three lines
    $source=str_replace("&lt;","&#60;",$source);
    $source=str_replace("&gt;","&#62;",$source);
    $source=str_replace("&amp;","&#38;",$source);
   
    xml_parse(), etc
    ...
User avatar
fox
^ me irl ^
 
Posts: 4789
Joined: 27 Aug 2005, 18:53
Location: Saint-Petersburg, Russia

Re: Missing < > & characters from article content

Postby padde » 29 Oct 2008, 12:19

Great, that worked. Thanks :)

I attached the two patches (in -Nur format).
Attachments
patches.tar.gz
(569 Bytes) Downloaded 250 times
padde
Bear Rating Trainee
Bear Rating Trainee
 
Posts: 13
Joined: 28 Oct 2008, 15:30

Re: Missing < > & characters from article content

Postby fox » 29 Oct 2008, 12:53

I'll try to check how unbroken libxml operates with those tomorrow.
User avatar
fox
^ me irl ^
 
Posts: 4789
Joined: 27 Aug 2005, 18:53
Location: Saint-Petersburg, Russia

Re: Missing < > & characters from article content

Postby paulproteus » 30 Nov 2008, 21:18

Howdy Fox,

Any updates on this?

It seems to me that the suggested patches should have zero impact on a working libxml2.
paulproteus
Bear Rating Trainee
Bear Rating Trainee
 
Posts: 1
Joined: 30 Nov 2008, 21:16

Re: Missing < > & characters from article content

Postby fox » 01 Dec 2008, 08:20

Oh crap, I forgot all about it. I'll merge the patches into trunk and see whether it breaks stuff for me.
User avatar
fox
^ me irl ^
 
Posts: 4789
Joined: 27 Aug 2005, 18:53
Location: Saint-Petersburg, Russia

Re: Missing < > & characters from article content

Postby Bernd » 03 Dec 2008, 09:32

There seems to be some more problems with spezial characters within (image-)links.

The source
Code: Select all
<p><img class="alignnone size-full wp-image-957" title="bestandene Diplompr&#252;fungen an Fachhochschulen" src="http://blog.bernd-distler.net/wp-content/uploads/2008/12/vdi_diplompruefungen.png" alt="" width="480" height="320" /></p>

becomes
umlauts.png
umlauts.png (4.09 KiB) Viewed 6061 times


I have this problem with diferent feeds, not only with my one :|
Bernd
Bear Rating Trainee
Bear Rating Trainee
 
Posts: 24
Joined: 06 May 2008, 11:46

Next

Return to Support

Who is online

Users browsing this forum: No registered users and 1 guest