# Form fields

The simplest way to use Feed Creator is to use the form provided. See below for information about the main form fields you can use.

# Web page URL

Feed Creator will fetch this page and look for content using the selectors that follow.

Remember, this should be a regular web page URL, not a feed URL.

A note on Javascript

At this time Feed Creator does not process Javascript when loading a page. If the site you're trying to extract content from loads the desired content using Javascript, you will not have much luck extracting it with Feed Creator.

If you're unsure whether the content is accessible without Javascript, the easiest way to check is to load the page after disabling Javascript in your browser.

Find links inside elements whose id or class attribute matches this value.

If you enter the string 'story' here, the result should be something similar to the following CSS selector:

#story a, a#story, .story a, a.story

If you're familiar with XPath, what you enter here will be used in the following XPath expression as the string value:

//a[@href and ancestor-or-self::*[@id="string" or contains(concat(" ",normalize-space(@class)," "), " string ")]]

Example

Given the following HTML, use 'entry' or 'story' to select the desired links.

<span class="entry">
  <a href="[url 1]" class="story">title 1</a>
</span>
<span class="entry">
  <a href="[url 2]" class="story">title 2</a>
</span>
<span class="entry">
  <a href="[url 3]" class="story">title 3</a>
</span>

Feed Creator will select the <a> elements and use the href attributes for item URLs and the text content ("title 1", "title 2", etc.) for item titles.

# Item selector (CSS)

Look inside elements matching this CSS selector.

Example

Given the following HTML, use 'div.news .item' to select the desired item elements.

<div class="news">
  <div class="item">
    <a href="[url 1]" class="story">title 1</a>
  </div>
  <div class="item">
    <a href="[url 2]" class="story">title 2</a>
  </div>
  <div class="item">
    <a href="[url 3]" class="story">title 3</a>
  </div>
</div>

Feed Creator will select the <div> elements with class attribute 'item'.

By default, it will look for the first <a> element inside each of these, and will use the href attributes for item URLs and the text content ("title 1", "title 2", etc.) for item titles.

# Item title selector (CSS)

Extract item title from selected element. This is applied within the context of each item selected by the item selector.

If left empty, the text of the first matching <a> element will be used. This selector is useful if the title is in a different element.

If set to 0, titles will not be included in the output.

To use an element's attribute value rather than text content, use @attr, for example: 'img @alt'.

To use the text in the context element itself (element selected by item selector), enter ':scope'. At the moment, this only works if entered by itself, ie. you cannot write: ':scope > h2'.

Example

Given the following HTML, and assuming we've set the item selector to 'div.news .item', to select the desired item title elements we'd pass 'h3' as the item title selector.

<div class="news">
  <div class="item">
    <h3>title 1</h3>
    <a href="[url 1]">Read more...</a>
  </div>
  <div class="item">
    <h3>title 2</h3>
    <a href="[url 2]">Read more...</a>
  </div>
  <div class="item">
    <h3>title 3</h3>
    <a href="[url 3]">Read more...</a>
  </div>
</div>

Without specifying 'h3' as the item title selector, Feed Creator would use the link titles ('Read more...'), which is not what we want. But the item URLs will still be correctly extracted from those <a> elements.

# Item description selector (CSS)

Extract item description from selected element. This is applied within the context of each item selected by the item selector.

If left empty, the generated feed will not include item descriptions.

To use an element's attribute value rather than text content, use @attr, for example: 'img @alt'.

To use the text in the context element itself (element selected by item selector), enter ':scope'. At the moment, this only works if entered by itself, ie. you cannot write: ':scope > p'.

Example

Given the following HTML, and assuming we've set the item selector to 'div.news .item', to select the desired item description elements we'd pass 'p' as the item description selector.

<div class="news">
  <div class="item">
    <a href="[url 1]">Title 1</a>
    <p>description 1</p>
  </div>
  <div class="item">
    <a href="[url 2]">Title 2</a>
    <p>description 1</p>
  </div>
  <div class="item">
    <a href="[url 3]">Title 3</a>
    <p>description 1</p>
  </div>
</div>

# Item URL selector (CSS)

Extract item URL from selected element. This is applied within the context of each item selected by the item selector.

If left empty, the URL of the first matching <a> element will be used.

If set to 0, URLs will not be included in the output. If set to 1, all item URLs will point to the input URL.

To use a different element or attribute value, use 'selector @attr', for example: 'img @src'.

Example

Given the following HTML, and assuming we've set the item selector to 'div.news .item', to select the desired item URL we'd pass 'a[2]' or 'a.story' as the item URL selector.

<div class="news">
  <div class="item">
    <a href="/news">News:</a>
    <a class="story" href="[url 1]">title 1</a>
  </div>
  <div class="item">
    <a href="/opinion">Opinion:</a>
    <a class="story" href="[url 2]">title 2</a>
  </div>
  <div class="item">
    <a href="/news">News:</a>
    <a class="story" href="[url 3]">title 3</a>
  </div>
</div>

# Item date selector (CSS)

Extract item date from selected element. This is applied within the context of each item selected by the item selector.

If left empty, the generated feed will not include item dates.

To use an element's attribute value rather than text content, use @attr, for example: 'time @datetime'.

To use an attribute of the context element itself (element selected by item selector), use ':scope' and @attr, for example ':scope @datetime'.

Example

Given the following HTML, and assuming we've set the item selector to 'div.news .item', to select the desired item date elements we'd pass 'time' as the item date selector.

<div class="news">
  <div class="item">
    <time>28 June 2020</time>
    <a href="[url 1]">title 1</a>
  </div>
  <div class="item">
    <time>10 June 2020</time>
    <a href="[url 2]">title 2</a>
  </div>
  <div class="item">
    <time>25 May 2020</time>
    <a href="[url 3]">title 3</a>
  </div>
</div>

With dates, many sites will display relative dates to visitors, for example:

<div class="item">
  <time datetime="2020-06-26">2 days ago</time>
  <a href="[url 1]">title 1</a>
</div>

In such cases, Feed Creator will ignore the date because depending on server time zones, the calculated date could end up changing on subsequent requests.

If an absolute date is available in an attribute value, as in the example above, you can specify it with @attr: 'time @datetime'.

# Item date format

If the date is not recognised correctly (e.g. treated as US format instead of European or vice versa), you can specify the format it appears in here.

This should follow the createFromFormat pattern, e.g. 'j-M-Y'.

If left empty, the date selected by the item date selector will be processed using PHP's strtotime function.

Example

The following HTML contains dates which should be processed according to the European date format of day/month/year.

<div class="news">
  <div class="item">
    <time>28/6/2020</time>
    <a href="[url 1]">title 1</a>
  </div>
  <div class="item">
    <time>10/6/2020</time>
    <a href="[url 2]">title 2</a>
  </div>
  <div class="item">
    <time>11/5/2020</time>
    <a href="[url 3]">title 3</a>
  </div>
</div>

However, this format is ambiguous because without context, someone from the US would read 2/5/2020 as 5th of February 2020, not 2nd of May 2020.

Feed Creator uses PHP, which uses the US interpretation when the slash separator is used, and European interpretation when a hyphen is used. So '2/5/2020' will be interpreted as 5th February 2020, but '2-5-2020' will be 2nd May 2020.

To handle the example above, we pass the following format: 'j/n/Y'

# Item image (CSS)

Extract item image URL from the selected element. This is applied within the context of each item selected by the item selector.

If left empty, images will not be included in the output.

This should point to an img element. Feed Creator will extract the URL from the src attribute.

To use a different element or attribute value, use 'selector @attr', for example: 'img @data-src'.

Example

Given the following HTML, and assuming we've set the item selector to 'div.news .item', use 'img' as the item image selector to select the image elements.

<div class="news">
  <div class="item">
    <h3>title 1</h3>
    <img src="[image url 1]">
    <a href="[url 1]">Read more...</a>
  </div>
  <div class="item">
    <h3>title 2</h3>
    <img src="[image url 2]">
    <a href="[url 2]">Read more...</a>
  </div>
  <div class="item">
    <h3>title 3</h3>
    <img src="[image url 3]">
    <a href="[url 3]">Read more...</a>
  </div>
</div>

# Remove HTML elements (CSS)

Remove elements matching CSS selector. This will be processed before we start looking for items.

This can be used as an alternative way to narrow the selection of elements. Instead of increasing the specificity of the selector, remove the elements that should not be included.

It is also useful when the text content being processed contains text nodes from multiple elements, some of which should not be extracted.

Example

Given the following HTML, and assuming we're using 'p.summary' as the description selector, we can remove the time elements with 'p.summary time'.

<div class="news">
  <div class="item">
    <a href="[url 1]" class="story">title 1</a>
    <p class="summary">
      <time>1 hour ago</time> description 1
    </p>
  </div>
  <div class="item">
    <a href="[url 2]" class="story">title 2</a>
    <p class="summary">
      <time>6 hours ago</time> description 2
    </p>
  </div>
  <div class="item">
    <a href="[url 3]" class="story">title 3</a>
    <p class="summary">
      <time>2 days ago</time> description 3
    </p>
  </div>
</div>

# Clean query string params

Keep or remove query string parameters from item URLs.

The query string in a URL appears after the question mark symbol, e.g.

http://example.org/article?id=879&session=19382

The URL above has two query string parameters, named 'id' and 'session'. On some sites, query string parameters identify content, and should be preserved. On others, they are used for tracking and can be stripped.

We recommend stripping non-essential query string parameters because they can affect whether feed items are treated as new or not by your feed reader.

Possible values

  • 1 = preserve all (default)
  • 0 = remove all
  • param1,param2 = remove all except param1 and param2

In the example URL above, the 'id' parameter identifies the article ID and should be preserved, but the session parameter is nonessential.

The site might generated a new session ID for its links next time Feed Creator fetches the page, which might result in the same feed items now being treated as new by a feed reader because the URLs now look different from before.

To prevent that happening, we can tell Feed Creator to only preserve the 'id' parameter by entering 'id' in this field.

# Keep filters

Submit text or URL segments that should appear in each item. Removes items that do not match these.

For example, if every item needs to have either /news/ or /opinion/ in its URL, or 'corona' or 'covid' in its text, these filters can help.

# Remove filters

Submit text or URL segments that should not appear in the items. Feed Creator will removes items that match these.

For example, if you want to keep all items except those with URLs that contain /opinion/ or /blog/, these filters can help.

Similarly, you can remove items which contains the given text. For example, if you want to keep all items except those that contain the text 'corona' or 'covid', these filters can help.

# User-Agent HTTP header

How should Feed Creator present itself when fetching the content. If a site only produces content for certain browsers, you can use this field to identify as that browser.

This is sent in the HTTP request in a 'User-Agent' header.

Explore User Agent strings used by different browsers and software applications.

# Referer HTTP header

The Referer header is used to tell the requested page the URL of the previous page you were on.

It's not common today for sites to give you a different response based on this header, so in most cases you should not have to edit this. But if you know a site does base its response on this value, you can set it here.

Set to 0 to disable sending the HTTP header. Set to 1 to use the source page page URL entered at the beginning as the Referer. Or specify a custom Referer header, e.g. 'https://www.google.com'.

This should be used if the site needs you to be logged in to view content, or to bypass GDPR and cookie walls. In most cases cookies are used to identify you, but won't affect the content the site serves. But in situations where they are needed to load the desired content, use them here.

If you want to examine the cookies your browser sends when you visit a certain page, you can open Firefox's Storage Inspector after loading the page. Under Cookies you will see a list of cookies sent.

Cookies should be entered in the following format: name=value for one cookie and name=value; name2=value2; name3=value3 for multiple cookies.

# Feed title

The feed title to use in the generated feed. If omitted, whatever's in the <title> element of the web page will be used.

Note: this should be the actual title, not a selector.

# Item guid

A guid is an identifier that's usually used by feed readers to determine if a feed item is new or not. It's not required by the RSS spec, but some feed readers might want it included.

By default, the guid is not included when you generated a feed with Feed Creator.

If you'd like it included, you can tell Feed Creator to generate an ID based on each item's url, title or both.

If the guid is omitted, most feed readers will use the item URL to determine if a feed item is new or not.

# Premium access key

If you have an premium access key, enter it here to remove certain restrictions.

The key itself will not appear in the final feed URL. It will be replaced by a key index and a hash based on the input URL.