Skip to content

Fannychini/HTA_workshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

R for HTA 2021: Fantastic Data and Where to Scrape Them

Repository for the presentation by Fanny Franchini at the online R for HTA 2021 conference

📖 Abstract

Please note that this presentation is aimed at beginners

Health Technology Assessment (HTA) and Health Economic (HE) analyses rely partly on the data hosted across multiple websites curated by different governmental bodies. As a result, there is no unified repository containing all the information necessary for data mining and subsequent analyses.

Web scraping is a technique that performs automated information extraction from websites. Scrapers work by parsing the page source code to retrieve programmatically specified elements. This workshop aims to introduce participants to scraping in R, for HTML-based websites.

In the case study presented, we scrape the Pharmaceutical Benefits Scheme website to produce a structured dataframe containing all drugs listed by the Australian TGA and their restriction of use, doses, current unit cost as well as historical cost.


🔍 What is in here?

The repository contains two R scripts.

  1. The first one pbs_scraper.R is the script used for this case-study, i.e. scraping the PBS.
  2. The second one function_scraper.R is the script containing the functions that are used in 1.

🎬 Slides used in the workshop

Please head over here to access the slide deck: https://fannychini.github.io/


🔧 Want to learn more?

HTML basics : introduction to web structure @ Mozilla
HTML elements : complement to the above @ W3Schools
Scraping with R : rvest package homepage
Scraping etiquette : polite package homepage
CSS selectors : CSS selectors @ Interneting is hard
Scraping Javascript : Rselenium when Rvest is not enough


Have fun! 💪

Please get in touch with any questions or suggestions: [email protected]

About

Fantastic data and where to scrape them

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages