Stripping twitter/sharing image
Daniel — Feb 14, 2017 10:11AM CET
Is there any way to strip a 'sharing image'? It is usually contained in <meta content="http://example.com/example.jpg" property="og:image" /> or something like that.
I tried to strip it using several methods without any success.
I'm struggling with this particular feed (as well as with some others):
I use my custom pattern...
find_string: <div class="b-document__body b-social__layout-mutation">
replace_string: <p><div class="b-document__body b-social__layout-mutation">
strip_id_or_class: b-button__social__wrapper no-print
strip_id_or_class: b-inset no-print
...but as you may see, basic version of Full-Text RSS also contains these images:
Please, help me to strip this useless junk.
4 Community Answers
Keyvan Minoukadeh - Feb 14, 2017 at 11:03AM CET
This is a case of Full-Text RSS inserting the image found in og:image at the start of the article. We do this when the article content we've extracted appears not to contain any images. Usually the og:image is the image associated with the article and it's used on sharing sites like Facebook and Twitter to show an appropriate image. In this case, the images do not seem to be relevant to the article.
If you're using Full-Text RSS 3.7, which we released a few days ago, you can pass the &images=0 request parameter to have all images removed. So your feed URL would look like:
Daniel - Feb 14, 2017 at 11:21AM CET
Thanks for your answer. Yes, I use v3.7, but removing all images is not an option, because some articles contain important graphs, diagrams, etc.
Is there absolutely no way to strip it via pattern editing?
I managed to strip og:image from yet another feed using replace method:
find_string: <meta property="og:image" content="http://s.rbk.ru/v7_top_static/current/images/rbc-share.png" /><link rel="http://s.rbk.ru/v7_top_static/current/images/rbc-share.png">
However, this is possible only when the same image is used for all articles. Can I somehow use wildcards or something to find and replace strings?
Keyvan Minoukadeh - Feb 15, 2017 at 12:10PM CET
You make a good point. We need to add a way to control when og:image elements get added to the start of each article, without having to strip all images.
One way to do this in a site config file is to do what you're doing, but focus only on the og:image string. So:
This will mean Full-Text RSS will see no og:image element in the document (because we've removed the attribute value it looks for). The downside is if you use our JSON output, extract.php, for this site, it will return og_image as blank. But I don't think this will affect you. And if the site doesn't really have an appropriate image in there anyway, it's probably no great loss.
Ideally we'd have a way to signal that we don't want the og:image inserted into the article, while still letting the extractor return it in the JSON output.
Hopefully the method above will do the trick for now. Let us know if you have trouble with it.
Keyvan Minoukadeh - Feb 15, 2017 at 12:11PM CET
I've made a note to support this in the next release.