GitHub - Fannychini/HTA_workshop: Fantastic data and where to scrape them

R for HTA 2021: Fantastic Data and Where to Scrape Them

Repository for the presentation by Fanny Franchini at the online R for HTA 2021 conference

📖 Abstract

Please note that this presentation is aimed at beginners

Health Technology Assessment (HTA) and Health Economic (HE) analyses rely partly on the data hosted across multiple websites curated by different governmental bodies. As a result, there is no unified repository containing all the information necessary for data mining and subsequent analyses.

Web scraping is a technique that performs automated information extraction from websites. Scrapers work by parsing the page source code to retrieve programmatically specified elements. This workshop aims to introduce participants to scraping in R, for HTML-based websites.

In the case study presented, we scrape the Pharmaceutical Benefits Scheme website to produce a structured dataframe containing all drugs listed by the Australian TGA and their restriction of use, doses, current unit cost as well as historical cost.

🔍 What is in here?

The repository contains two R scripts.

The first one pbs_scraper.R is the script used for this case-study, i.e. scraping the PBS.
The second one function_scraper.R is the script containing the functions that are used in 1.

🎬 Slides used in the workshop

Please head over here to access the slide deck: https://fannychini.github.io/

🔧 Want to learn more?

HTML basics : introduction to web structure @ Mozilla
HTML elements : complement to the above @ W3Schools
Scraping with R : rvest package homepage
Scraping etiquette : polite package homepage
CSS selectors : CSS selectors @ Interneting is hard
Scraping Javascript : Rselenium when Rvest is not enough

Have fun! 💪

Please get in touch with any questions or suggestions: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
function_scraper.R		function_scraper.R
pbs_scraper.R		pbs_scraper.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

R for HTA 2021: Fantastic Data and Where to Scrape Them

📖 Abstract

🔍 What is in here?

🎬 Slides used in the workshop

🔧 Want to learn more?

Have fun! 💪

About

Releases

Packages

Languages

Fannychini/HTA_workshop

Folders and files

Latest commit

History

Repository files navigation

R for HTA 2021: Fantastic Data and Where to Scrape Them

📖 Abstract

🔍 What is in here?

🎬 Slides used in the workshop

🔧 Want to learn more?

Have fun! 💪

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages