Usage and Request Parameters
The simplest way to use Full-Text RSS is to use the form provided.
In the URL field, enter the URL of a partial feed or web page and click ‘Create Feed’. The resulting page should show you a newly generated feed with the full content. To use this feed in your application, copy the URL in the address bar. You can now use this new URL in place of the original partial feed URL.
If you're a developer and need to integrate Full-Text RSS in your application, we have a simple code example to give you an idea of how it can be used.
In addition to the URL, you can also specify a number of other options in the form:
Set the maximum number of feed items we should process. The smaller the number, the faster the new feed is produced.
If your URL refers to a standard web page, this will have no effect: you will only get 1 item.
By default, links within the content are preserved. Change this field if you'd like links removed, or included as footnotes.
Extraction Failure Handling
If the extraction pattern above fails to match, FTR can remove the item from the feed or keep it in.
Keeping the item will keep the title, URL and original description (if any) found in the feed. In addition, FTR inserts a message before the original description notifying you that extraction failed.
Check the box and we'll include a brief plain text excerpt from the extracted content in the output.
We'll output JSON if selected (useful if your application already parses JSON and you want to avoid importing an RSS parsing library).
Check the box to see what's happening behind the scenes.
Query String Parameters
Using the form is the simplest way to create a Full-Text RSS URL, but you can also construct one yourself. The form fields above are turned into query string parameters when you submit the form. Let's look at those parameters here, and a few more that are not presented on the form.
These parameters are to be appended on to the base URL. The base URL is where you installed Full-Text RSS, e.g. http://example.org/full-text-rss/makefulltextfeed.php. Because this will differ from installation to installation, in this guide we'll simply use makefulltextfeed.php in examples.
These parameters can be combined in the URL.
A note on encoding: if you're constructing URLs without using the form, make sure you URL encode the parameter values (anything after the '=' and before the '&'). In PHP the function to use is
urlencode(). If you're doing it by hand, you can paste the parameter values into the form field at http://meyerweb.com/eric/tools/dencoder/ and click 'Encode' to get the encoded the value.
This is the only required parameter. It should be the URL to a partial feed or a standard HTML page. You can omit the ‘http://’ prefix if you like.
Note: %2F is the encoded value for '/'
|format||rss (default), json||
The default Full-Text RSS output is RSS. The only other valid output format is JSON. To get JSON output, pass format=json in the querystring. Exclude it from the URL (or set it to ‘rss’) if you’d like RSS.
|summary||0 (default), 1||
If set to 1, an excerpt will be included for each item in the output.
|content||0, 1 (default)||
If set to 0, the extracted content will not be included in the output.
|links||preserve (default), footnotes, remove||
Links can either be preserved, made into footnotes, or removed. None of these options affect the link text, only the hyperlink itself.
|exc||0 (default), 1||
If Full-Text RSS fails to extract the article body, the generated feed item will include a message saying extraction failed followed by the original item description (if present in the original feed). You ask Full-Text RSS to remove such items from the generated feed completely by passing 1 in this parameter.
|html||0 (default), 1||
Treat input source as HTML (or parse-as-html-first mode). To enable, pass html=1 in the querystring. If enabled, Full-Text RSS will not attempt to parse the response as a feed. This increases performance slightly and should be used if you know that the URL is not a feed.
Note: If excluded, or set to 0, Full-Text RSS first tries to parse the server’s response as a feed, and only if it fails to parse as a feed will it revert to HTML parsing. In the default parse-as-feed-first mode, Full-Text RSS will identify itself as PHP first and only if a valid feed is returned will it identify itself as a browser in subsequent requests to fetch the feed items. In parse-as-html-first mode, Full-Text RSS will identify itself as a browser from the very first request.
|xss||0 (default), 1||
Use this to enable XSS filtering. We have not enabled this by default because we assume the majority of our users do not display the HTML retrieved by Full-Text RSS in a web page without further processing. If you subscribe to our generated feeds in your news reader application, it should, if it’s good software, already filter the resulting HTML for XSS attacks, making it redundant for Full-Text RSS do the same. Similarly with frameworks/CMSs which display feed content - the content should be treated like any other user-submitted content.
If enabled, we’ll pass retrieved HTML content through htmLawed (safe flag on and style attributes denied). Note: if enabled this will also remove certain elements you may want to preserve, such as iframes.
|lang||0, 1 (default), 2, 3||
Language detection. If you’d like Full-Text RSS to find the language of the articles it processes, you can use one of the following values:
If language detection is enabled and a match is found, the language code will be returned in the <dc:language> element inside the <item> element.
|debug||[no value], rawhtml, parsedhtml||
If this parameter is present, Full-Text RSS will output the steps it is taking behind the scenes to help you debug problems.
If the parameter value is rawhtml, Full-Text RSS will output the HTTP response (headers and body) of the first response after redirects.
If the parameter value is parsedhtml, Full-Text RSS will output the reconstructed HTML (after its own parsing). This version is what the extraction rules are applied to, and it may differ from the original (rawhtml) output. If your extraction rules are not picking out any elements, this will likely help identify the problem.
Note: Full-Text RSS will stop execution after HTML output if one of the last two parameter values are passed. Otherwise it will continue showing debug output until the end.
The default parser is libxml as it’s the fastest. HTML5-PHP is an HTML5 parser implemented in PHP. It’s slower than libxml, but can often produce better results. You can request HTML5-PHP be used as the parser in a site-specific config file (to ensure it gets used for all URLs for that site), or explicitly via this request parameter.
|proxy||0, 1, string (proxy name)||
This parameter has no effect if proxy servers have not been entered in the config file. If they have been entered and enabled, you can pass the following values: 0 to disable proxy use (uses direct connection). 1 for default proxy behaviour (whatever is set in the config), or a string to identify a specific proxy server (has to match the name given to the proxy in the config file).
Feed-only parameters — These parameters only apply to web feeds. They have no effect when the input URL points to a web page.
By default, if the input URL points to a feed, item titles in the generated feed will not be changed - we assume item titles in feeds are not truncated. If you’d like them to be replaced with titles Full-Text RSS extracts, use this parameter in the request (the value does not matter). To enable/disable this for for all feeds, see the config file - specifically $options->favour_feed_titles
The maximum number of feed items to process. See section on max items in form options above. (The default and upper limit will be found in the configuration file.)
Did you find this article helpful?