Skip to content

acalejos/Reed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reed

Reed version Hex Docs Hex Downloads Twitter Follow

Streaming RSS parser with a built-in Req plugin for network-enabled chunked streaming.

Installation

def deps do
  [
    {:reed, "~> 0.2.0"}
  ]
end

Reed implements a Sax-based parser for RSS feeds using the Saxy library.

You can manually use the Reed.Handler (which implements the Saxy.Handler behaviour) with Saxy to parse strings or from Streams, but the killer feature of Reed is the Reed.ReqPlugin module, which powers the top-level Reed.get / Reed.get! API.

Reed began as a need for a way to read RSS feeds by first reading the feed-level metadata followed by item-by-item streaming without loading the entire feed into memory, all while doing so from a remote URL.

Reed combines the Saxy.Partial module with Req's streaming :into option to do just that.

Reed.ReqPlugin takes advantage of Req's chunking capability to parse RSS feeds directly from over the network, applying transformation functions to each RSS item lazily.

This means you do not have to store the entire RSS feed in memory or on disk to convert to a traditional Elixir Stream (as is required to use Saxy.parse_stream/4), but instead directly uses Saxy.Partial to parse chunk-by-chunk directly over the wire.

The Reed.Transformers module provides some convenient transformation functions to be used during the parsing.

The transformation pipeline is invoked whenever a new RSS item is read, and works with an accumulating state that persists during the entire RSS read.

Reed provides a dead-simple API that also allows for flexible handling of items during the stream through the use of transformation pipelines (see Reed.Transformers). These pipelines define how to handle the item stream, and function as a reduction over an input state containing feed-level metadata, the current item, and a private field where you can store other data.

The state also maintains a :halted field that controls whether to halt the stream after the current item has been processed (had the whole :transform pipeline applied).

You can also control when to move on to the next item during a transformation pipeline by returning either :halt or {:halt, state} from any step in the transformation pipeline (see Reed.Transformers.filter/2 for an example).

You can compose the built-in pipeline function from Reed.Transformers or create your own unique steps to create very simple yet powerful parsing instructions to carefully read only the exact parts of the RSS stream that you're interested in.

Examples

Get the feed metadata

import Reed.Transformers
Reed.get!(rss_url, transform: transform(halt()))

Get all items in a list

import Reed.Transformers
Reed.get!(rss_url, transform: pipeline(collect()))

Get the first 5 items in a list

import Reed.Transformers
Reed.get!(rss_url, transform: collect() |> limit(5) |> pipeline())

Get all itunes: namespaced elements from the first 2 items as a list

import Reed.Transformers

Reed.get!(rss_url,
  transform:
    transform(
      &Map.filter(&1, fn
        {<<"itunes:", _rest::binary>>, _v} -> true
        _ -> false
      end)
    )
    |> collect()
    |> limit(2)
    |> pipeline()
)

Get the description, title, and publication date of the first episode that starts with a 10

import Reed.Transformers

Reed.get!(url,
  transform:
    filter(&match?(%{"title" => <<"#10", _rest::binary>>}, &1))
    |> transform(&Map.take(&1, ["description", "title", "pubDate"]))
    |> limit(1)
    |> collect()
    |> pipeline()
)

About

Streaming RSS reader for Elixir

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages