Skip to content

Commit

Permalink
Construct the cdx API query using a URI object
Browse files Browse the repository at this point in the history
This avoids problems related to URL encoding.

Obsoletes: hartator#116
  • Loading branch information
pabs3 committed May 3, 2021
1 parent e6707a9 commit afab72c
Showing 1 changed file with 12 additions and 11 deletions.
23 changes: 12 additions & 11 deletions lib/wayback_machine_downloader/archive_api.rb
Original file line number Diff line number Diff line change
@@ -1,28 +1,29 @@
require 'uri'

module ArchiveAPI

def get_raw_list_from_api url, page_index
request_url = "https://web.archive.org/cdx/search/xd?url="
request_url += url
request_url += parameters_for_api page_index
request_url = URI("https://web.archive.org/cdx/search/xd")
params = [["url", url]]
params += parameters_for_api page_index
request_url.query = URI.encode_www_form(params)

URI.open(request_url).read
end

def parameters_for_api page_index
parameters = "&fl=timestamp,original&collapse=digest&gzip=false"
if @all
parameters += ""
else
parameters += "&filter=statuscode:200"
parameters = [["fl", "timestamp,original"], ["collapse", "digest"], ["gzip", "false"]]
if !@all
parameters.push(["filter", "statuscode:200"])
end
if @from_timestamp and @from_timestamp != 0
parameters += "&from=" + @from_timestamp.to_s
parameters.push(["from", @from_timestamp.to_s])
end
if @to_timestamp and @to_timestamp != 0
parameters += "&to=" + @to_timestamp.to_s
parameters.push(["to", @to_timestamp.to_s])
end
if page_index
parameters += "&page=#{page_index}"
parameters.push(["page", page_index])
end
parameters
end
Expand Down

0 comments on commit afab72c

Please sign in to comment.