Heads up: native fulltext search on pgsql

Development-related discussion, including bundled plugins
pcause
Bear Rating Master
Bear Rating Master
Posts: 144
Joined: 23 Aug 2013, 19:52

Re: Heads up: native fulltext search on pgsql

Postby pcause » 06 Aug 2015, 14:58

Sorry, since nginx was shut down before I ran the update and the only thing I did after restarting was login and click preferences, thought the message might be enough. too much crud now in logs, but will see if I can dig out anything.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Heads up: native fulltext search on pgsql

Postby fox » 06 Aug 2015, 15:16

you can try restoring that backup to a test database and run indexer again but tbh if it works now i don't see why

unless system panel literally crashes or w/e

AngryChris
Bear Rating Master
Bear Rating Master
Posts: 135
Joined: 08 Apr 2013, 02:42

Re: Heads up: native fulltext search on pgsql

Postby AngryChris » 06 Aug 2015, 18:41

I just came to say that I have about 690k articles and absolutely nothing went wrong with the update for me. I did the update like this:

* Shut down update_daemon2.php process (sudo service tt-rss stop)
* Do a git pull (~/tools/tt-rss/qupdate_ttrss, a script I wrote)
* Use psql to apply the schema update (psql -U ttrss ttrssdb -f /var/www/html/tt-rss/schema/versions/pgsql/128.sql)
* Set cache/* feed-icons and lock to be owned by me (necessary or the update.php script would not run)
* Run the ./update.php --gen-search-idx command (self explanatory)
* Set cache/* feed-icons and lock to be owned by the web server (necessary or tt-rss does not work)
* Restart update_daemon2.php process (sudo service tt-rss start)

The index rebuild took about an hour and a half or so, maybe an hour, I didn't time it. I have a really slow server (it's 8 years old), but I was fine with reading a book while waiting. The script reported that it was going to process about 690k articles and did them 500 at a time. I think at the end it said it processed 216 articles and then it was done and gave me a command prompt back. I didn't encounter any errors or other issues. It all just worked.

Relevant server software information is: Ubuntu Server 14.04 LTS, PostgreSQL 9.3.9, php 5.5.9.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Heads up: native fulltext search on pgsql

Postby fox » 06 Aug 2015, 18:59

yeah tbh it's not like the script is doing anything arcane, it literally gets 500 articles from the database, copies content field of selected articles to another field in a special reduced format, and then goes to do the next batch of articles which don't have that field already set

i am somewhat surprised why would there be MILLIONS OF ERROR LOG ENTRIES and TOTAL BREAKAGE and shit

it does lock the table, i guess, so overall ttrss database would work slowly, and it's probably a good idea to turn off feed updates - that won't break anything either, it would just make both processes slower

anyway, \_(ツ)_/

User avatar
Skibbi
Bear Rating Disaster
Bear Rating Disaster
Posts: 61
Joined: 15 Mar 2013, 14:59
Location: Poland

Re: Heads up: native fulltext search on pgsql

Postby Skibbi » 15 Aug 2015, 22:19

Not sure if it's related but recently I discovered that searching for articles with specific keywords is not providing accurate results. For example I have a feed with local news and searching for specific one word keywords almost all the time returns no results even if the article has it. I'm using postgres database but I haven't run yet the indexing script. Is this requred to get more accurate search results on postgres?

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Heads up: native fulltext search on pgsql

Postby fox » 15 Aug 2015, 22:54

it is required to get results, yes *facepalm*

User avatar
sleeper_service
Bear Rating Overlord
Bear Rating Overlord
Posts: 884
Joined: 30 Mar 2013, 23:50
Location: Dallas, Texas

Re: Heads up: native fulltext search on pgsql

Postby sleeper_service » 15 Aug 2015, 23:08

you mean the search doesn't run on magic? I'm surprised and strangely disappointed in you, Fox.

User avatar
Skibbi
Bear Rating Disaster
Bear Rating Disaster
Posts: 61
Joined: 15 Mar 2013, 14:59
Location: Poland

Re: Heads up: native fulltext search on pgsql

Postby Skibbi » 16 Aug 2015, 22:27

It seems the search magic is not working as expected. I've reindexed my database but still some keywords are not searchable. I did some analysis and found out that postgresql tsvector data is not tokenizing correctly some articles. For example if article has following content:

Code: Select all

<b>Category:</b> KEYWORD<br/><b>Type:</b> Some text<br/>

the tokenized data is following (numbers are random):

Code: Select all

...'category':123 'keywordtyp': 13...

It seems postgresql somehow concatenated both category KEYWORD with the next line of article content. So basically when I search for "KEYWORD" nothing is returned, but if I search for "keywordtyp" I receive search results.
Not sure if it's a postgresql limitation, crappy feed content or some tt-rss bug.

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Heads up: native fulltext search on pgsql

Postby fox » 16 Aug 2015, 23:23

tt-rss also does strip_tags() on content before passing it to postgres for tokenization maybe that's why this happens

User avatar
Skibbi
Bear Rating Disaster
Bear Rating Disaster
Posts: 61
Joined: 15 Mar 2013, 14:59
Location: Poland

Re: Heads up: native fulltext search on pgsql

Postby Skibbi » 17 Aug 2015, 15:20

Yes, this seems to be the issue. strip_tags removes <br/> tags therefore some words separated with those tags are concatenated. Would you consider implementing an algorithm that will replace all html tags with spaces? This should prevent concatenating words and will make search more reliable. There is a discussion regarding this kind of solution on stackoverflow that might be used here (I guess).

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Heads up: native fulltext search on pgsql

Postby fox » 17 Aug 2015, 15:50

Code: Select all

$spaceString = str_replace( '<', ' <',$string );
$doubleSpace = strip_tags( $spaceString );


this simple hack would probably be good enough

User avatar
Skibbi
Bear Rating Disaster
Bear Rating Disaster
Posts: 61
Joined: 15 Mar 2013, 14:59
Location: Poland

Re: Heads up: native fulltext search on pgsql

Postby Skibbi » 18 Aug 2015, 11:18

I've made changes on my instance and it seems to be working OK at least for the problematic feed I have. Patch attahced.
Attachments
0001-Prevent-concatenating-words-for-full-text-search-ind.patch
(2.2 KiB) Downloaded 76 times

User avatar
fox
^ me reading your posts ^
Posts: 6318
Joined: 27 Aug 2005, 22:53
Location: Saint-Petersburg, Russia
Contact:

Re: Heads up: native fulltext search on pgsql

Postby fox » 18 Aug 2015, 12:04

thanks!


Return to “Development”

Who is online

Users browsing this forum: No registered users and 1 guest