Skip to content

Latest commit

 

History

History
1353 lines (761 loc) · 19.9 KB

retracted_citation_network.md

File metadata and controls

1353 lines (761 loc) · 19.9 KB

A Retracted Publication Citation Network

Neil Saunders 2020-12-18 20:39:17

1 Introduction

This report has 4 aims:

  • obtain the identifiers for all retracted publications in PubMed
  • obtain the identifiers for all articles in PubMed that cite those retracted publications
  • generate citation networks based on these datasets
  • explore the networks with some basic analysis

2 The dataset

We search PubMed using the rentrez package. Knowing that there are currently around 8700 retracted articles, we can set retmax to a suitably-high number. Or run an initial search, then use the value of es$count in a second search. This creates a data frame with PMID (article identifiers) in one column.

library(rentrez)

es <- entrez_search("pubmed", "Retracted Publication[PTYP]", retmax = 10000)
articles <- data.frame(pmid = es$ids)

We use entrez_link to find citations in PubMed for the given PMID. Multiple citation PMIDs can be stored in each row of a list column in the data frame.

The get_cites function took around 2.5 hours to run, but completed successfully.

The final step is to unlist the cites column, generating each pair of article PMID and citing article PMID, per row. For articles without citations, get_cites returns NULL and so only PMIDS with one or more citations are retained. This is what we want.

get_cites <- function(id) {
  el <- entrez_link(dbfrom = "pubmed", id = id, db = "pubmed")
  el$links$pubmed_pubmed_citedin
}

articles$cites <- sapply(articles$pmid, get_cites)

articles_df <- articles %>% 
  unnest(cites)

Each pair of article PMID and citing article PMID looks like this.

dataset %>% 
  head(10) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "condensed"))

pmid

cites

32873781

33262330

32696949

33006362

32683951

32873282

32668870

33281107

32649709

33048995

32646999

32760174

32646999

32666253

32623526

33141364

32598092

32875064

32581016

33281107

3 Analysis

We can count pmid to find the top 10 most-cited retracted articles.

Then we can retrieve the XML summary for those articles using entrez_fetch and parse the XML for the article titles.

top10 <- dataset %>% 
  count(pmid, sort = TRUE) %>% 
  head(10)

x <- entrez_fetch("pubmed", top10$pmid, rettype  = "xml")
titles <- read_xml(x) %>% 
  xml_find_all("//ArticleTitle") %>% 
  xml_text()

top10 %>% 
  bind_cols(title = titles) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "condensed"))

pmid

n

title

23432189

1081

Primary prevention of cardiovascular disease with a Mediterranean diet.

12609035

681

An enhanced transient expression system in plants based on suppression of gene silencing by the p19 protein of tomato bushy stunt virus.

16642001

598

Lysyl oxidase is essential for hypoxia-induced metastasis.

22088800

563

Cardiac stem cells in patients with ischaemic cardiomyopathy (SCIPIO): initial results of a randomised phase 1 trial.

24711954

475

A comprehensive review on metabolic syndrome.

15604363

419

Visfatin: a protein secreted by visceral fat that mimics the effects of insulin.

19524507

415

A pleiotropically acting microRNA, miR-31, inhibits breast cancer metastasis.

21753854

366

Selective killing of cancer cells by a small molecule targeting the stress response to ROS.

15222900

351

TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics.

9500320

339

Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children.

4 Convert to a graph

Now we bring out the igraph package. graph.data.frame converts the dataset to a graph. Then we can add additional attributes to the vertices.

We’ll write out the graph as graphml to use later in Gephi.

dataset_graph <- graph.data.frame(dataset)

V(dataset_graph)$label <- V(dataset_graph)$name
V(dataset_graph)$retracted <- ifelse(V(dataset_graph)$name %in% dataset$pmid, 1, 0)

write.graph(dataset_graph, file = "../../data/retracted_pmids_citations.graphml", format = "graphml")

5 Graph analysis

5.1 Components and Groups

components finds the connected components of the graph. groups identifies the vertices in each component.

We can use sapply and length to find the top 10 largest components, i.e. the most-connected articles.

dataset_components <- components(dataset_graph)
dataset_groups <- groups(dataset_components)

top10 <- sapply(dataset_groups, length) %>% 
  sort(decreasing = TRUE) %>% 
  head(10)

top10
##    26     4  2459  1288  1692  1902  2937  1204   882  1305 
## 55444   585   567   224   195   195   191   150   144   132

So the largest connected group still contains 55444 vertices of the original 84992.

We can create a subgraph of just those articles from the largest connected group, and write it out for later use.

dataset_subgraph <- subgraph(dataset_graph, which(V(dataset_graph)$name %in% dataset_groups[[26]]))

write.graph(dataset_subgraph, "../../data/retracted_pmids_subgraph.graphml", format = "graphml")

5.2 A subgraph of only retracted articles

We can create another subgraph containing only retracted articles - i.e. one in which the citing articles were also retracted.

dataset_onlyretracted_subgraph <- subgraph(dataset_graph, V(dataset_graph)[retracted == 1])

write.graph(dataset_onlyretracted_subgraph, "../../data/onlyretracted_pmids_subgraph.graphml", format = "graphml")

As before, we can find the connected components in this graph.

dataset_onlyretracted_components <- components(dataset_onlyretracted_subgraph)
dataset_onlyretracted_groups <- groups(dataset_onlyretracted_components)

top10 <- sapply(dataset_onlyretracted_groups, length) %>% 
  sort(decreasing = TRUE) %>% 
  head(10)

top10
##  350  155 1957 3145  921 1314 2083 2410  509 2460 
##   55   36   31   29   26   19   19   17   14   14

And as before, retrieve the XML and article titles for groups of interest. Let’s start with the largest group. We’ll just look at the top 20 out of 55.

x <- entrez_fetch("pubmed", dataset_onlyretracted_groups[[names(top10)[1]]], rettype  = "xml")
titles <- read_xml(x) %>% 
  xml_find_all("//ArticleTitle") %>% 
  xml_text()

data.frame(pmid = dataset_onlyretracted_groups[[names(top10)[1]]],
           title = titles) %>% 
  head(20) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "condensed"))

pmid

title

30233176

Synthesis and characterization of a novel peptide-grafted Cs and evaluation of its nanoparticles for the oral delivery of insulin, in vitro, and in vivo study.

26586942

PLGA-encapsulated tea polyphenols enhance the chemotherapeutic efficacy of cisplatin against human cancer cells and mice bearing Ehrlich ascites carcinoma.

26164001

SUMO-specific protease 6 promotes gastric cancer cell growth via deSUMOylation of FoxM1.

26032092

Curcumin inhibits growth of prostate carcinoma via miR-208-mediated CDKN1A activation.

25792385

Curcumin enhances the radiosensitivity of U87 cells by inducing DUSP-2 up-regulation.

23399702

RETRACTED: Tea polyphenols enhance cisplatin chemosensitivity in cervical cancer cells via induction of apoptosis.

23349727

The different role of Notch1 and Notch2 in astrocytic gliomas.

22806240

Activated K-Ras and INK4a/Arf deficiency promote aggressiveness of pancreatic cancer by induction of EMT consistent with cancer stem cell phenotype.

22363731

3,3’-Diindolylmethane exhibits antileukemic activity in vitro and in vivo through a Akt-dependent process.

22261338

RETRACTED: Increased Ras GTPase activity is regulated by miRNAs that can be attenuated by CDF treatment in pancreatic cancer cells.

22213426

Inactivation of Ink4a/Arf leads to deregulated expression of miRNAs in K-Ras transgenic mouse model of pancreatic cancer.

21673986

Activated K-ras and INK4a/Arf deficiency cooperate during the development of pancreatic cancer by activation of Notch and NF-κB signaling pathways.

21503965

Over-expression of FoxM1 leads to epithelial-mesenchymal transition and cancer stem cell phenotype in pancreatic cancer cells.

21463919

Notch-1 induces epithelial-mesenchymal transition consistent with cancer stem cell phenotype in pancreatic cancer cells.

21408027

Anti-tumor activity of a novel compound-CDF is mediated by regulating miR-21, miR-200, and PTEN in pancreatic cancer.

20824697

Restoring sensitivity to oxaliplatin by a novel approach in gemcitabine-resistant pancreatic cancer cells in vitro and in vivo.

20658545

Down-regulation of Notch-1 is associated with Akt and FoxM1 in inducing cell growth inhibition and apoptosis in prostate cancer cells.

20599780

Cyclodextrin-complexed curcumin exhibits anti-inflammatory and antiproliferative activities superior to those of curcumin through higher cellular uptake.

20388782

Gemcitabine sensitivity can be induced in pancreatic cancer cells through modulation of miR-200 and miR-21 expression by curcumin or its analogue CDF.

20379844

Platelet-derived growth factor-D contributes to aggressiveness of breast cancer cells by up-regulating Notch and NF-κB signaling pathways.

Clearly a network of cancer-related articles. How about at the other end of the top 10?

x <- entrez_fetch("pubmed", dataset_onlyretracted_groups[[names(top10)[10]]], rettype  = "xml")
titles <- read_xml(x) %>% 
  xml_find_all("//ArticleTitle") %>% 
  xml_text()

data.frame(pmid = dataset_onlyretracted_groups[[names(top10)[10]]],
           title = titles) %>% 
  kable() %>% 
  kable_styling(bootstrap_options = c("striped", "condensed"))

pmid

title

23173109

Strategy for prevention of hip fractures in patients with Parkinson’s disease.

22372723

Efficacy of antiresorptive agents for preventing fractures in Japanese patients with an increased fracture risk: review of the literature.

21825080

Once-weekly risedronate for prevention of hip fracture in women with Parkinson’s disease: a randomised controlled trial.

21050796

Amelioration of osteoporosis and hypovitaminosis D by sunlight exposure in Parkinson’s disease.

19499964

Efficacy of menatetrenone (vitamin K2) against non-vertebral and hip fractures in patients with neurological diseases: meta-analysis of three randomized, controlled trials.

18384711

Efficacy of risedronate against hip fracture in patients with neurological diseases: a meta-analysis of randomized controlled trials.

18306478

Comparison of effects of alendronate and raloxifene on lumbar bone mineral density, bone turnover, and lipid metabolism in elderly women with osteoporosis.

17372126

Risedronate and ergocalciferol prevent hip fracture in elderly men with Parkinson disease.

16538619

Alendronate and vitamin D2 for prevention of hip fracture in Parkinson’s disease: a randomized controlled trial.

16087822

Risedronate sodium therapy for prevention of hip fracture in men 65 years or older after stroke.

16087821

The prevention of hip fracture with risedronate and ergocalciferol plus calcium supplementation in elderly women with Alzheimer disease: a randomized controlled trial.

15664003

RETRACTED: Menatetrenone and vitamin D2 with calcium supplements prevent nonvertebral fracture in elderly women with Alzheimer’s disease.

12913194

Amelioration of osteoporosis and hypovitaminosis D by sunlight exposure in stroke patients.

12110423

Amelioration of osteoporosis by menatetrenone in elderly female Parkinson’s disease patients with vitamin D deficiency.

Something has gone awry in the world of aging bones.

5.3 Gephi Visualisation

In summary: nice pictures, but not many insights.

We load the graphml files into Gephi for manipulation and visualisation. The OpenOrd layout was found to be fastest, and effective in arranging the graphs.

5.3.1 Connected subgraph

First, the largest connected subgraph. Vertices are coloured by modularity class.

Not sure we can conclude much from this, other than that there are several highly-connected areas of the graph which presumably relate to articles about a particular topic.

We can zoom into the graph, with some difficulty as it is large. This shows just how connected a retracted articles can be. PMID 19524507 is an article titled A pleiotropically acting microRNA, miR-31, inhibits breast cancer metastasis. This article was retracted due to concerns regarding statistical analysis and data presentation.

5.3.2 Retracted-only PMIDs subgraph

We turn now to the subgraph containing only retracted articles and retracted citing articles. This is clearly less connected and easier to read.

Vertices are again coloured by modularity class, and vertex size reflects “authority” - a measure of informational importance.

Zooming in allows inspection of connected articles.

The large vertex PMID 22851539 is the article Tracking chromatid segregation to identify human cardiac stem cells that regenerate extensively the infarcted myocardium. It was retracted for somewhat mysterious reasons related to a figure (2E) in the article.