The whole thing is a *work in progress* but seems to work surprisingly well so far.
Repository: https://tt-rss.org/gitlab/fox/ttrss-per ... image-hash (clone to plugins.local/af_zz_img_phash)
2. Enough memory to load potentially large images into GD and disk space to hold them
3. if using postgresql, count-bits extension: https://github.com/sldab/count-bits
Stuff to do:
1. Clone plugin to plugins.local
2. Import schema in (pluginroot)/sql
3. Enable plugin for feeds containing potential reposts in preferences and set maximum Hamming distance (the default should be ok).
If plugin catches a potential duplicate, it will rewrite the image to a plain-text link with a dialog next to it showing all related stuff in the database.
How to check similarity manually:
Code: Select all
select url,phash,unique_1bits((select phash from ttrss_plugin_img_phash_urls where url ='http://imgur.com/something.jpg'), phash) AS distance from ttrss_plugin_img_phash_urls order by distance limit 5;
on mysql replace unique_1bits with bit_count.