I submitted a similar question at the end of an older thread (Turning APC off), but wasn’t sure if that thread is still actively monitored. My apologies for the duplication.
I am using FTR to monitor several client feeds (300+) and am trying to figure out the most efficient way to extract NEW content. I currently check the feeds every few hours for new articles. When a new article is found, I store it in our db and from that point forward I will then only extract articles newer than the last one stored. Most of the feeds do not publish every day, and if they do they rarely publish more than one article a day. Still, because I can’t know if, when or how often they publish, I am checking regularly. Also, in order to do the date compare, I’m obligated to pull back whatever is in the feed, checking the article pub date, then stopping once I reach an older date, and moving on to the next feed.
Currently, I am looping through my list of hundreds of feeds, and calling makefulltextfeed.php for each feed – array(‘format’=>‘json’,‘max’=> 100,‘summary’=>1,‘url’=>$this->feed_url)
I am able to do this for about 70 at a time before I get a server error (500), which I am presently trying to debug.
I’m wondering if there is a more efficient way to do this. From another thread (Feature Request: Support array of URLs on the extract.php endpoint) I see I can combine URLs in a single request to makefulltextfeed.php. I’m wondering if this strategy supports hundreds of URLs concatenated together, and if this would be more efficient. I could break it into fewer URLs per request, if that would help.
Also, wondering if there is some more efficient way in which I can accomplish the date compare, to just check for new content.
Your assistance is immensely appreciated. This is the last piece of the puzzle for publishing this service.
This tool has been incredibly useful!