Visualize the Top500 and Green500 Supercomputer Lists
This describes the Visual Data Analysis of the evolution of supercomputing trends from 2009 to present. Since the Green500 and Top500 describe different aspects of essentially the same population of supercomputers, the assumption here is there's inherent value in looking at the two lists combined.
The main effort here is combining the historical lists into a common formatted data set. While both lists have fundamentally measured the same quantities over time, data names, etc have shifted. Since older lists require a significant amount of "hand tuning" of variable names, etc., I've included
The data set is extremely rich and there are literally hundreds of questions that can be asked of it. To keep this archive sane, I've only included a few analyses. However, I hope that by providing a cleaned archive of data others will find this useful for their own exploration.
If you use this data, please include a reference to the archive in any publication or public mention of the data.
###About Exascalar
Exascalar reveals answers to the question "how does the population of "top" super computers evolve?"
Exascalar looks at both the Top500 super computer list (based on performance) and the Green500 super computer list (based on efficiency) in a single visually digestable graph. It overlays a transverse rectilinear coordinate system of power and "Exascalar" onto the Performance and Efficiency axes.
You can more read about the history of Exascalar here, here, and here.
###Data Sources Data are downloaded from the Green500.org and the Top500.org websites.
The data cleaning program assumes the top500 lists are locally stored in a directory called "Exascalar" as .csv files in the sub-directories Top500 and Green500. These directories are cloned in this repository. Green500.org lists are downloadable directly as .csv files from the Green500 website. Top500.org lists are stored on the Top500 site as .xls. Since this anlaysis assumes .csv I have converted them using numbers or Excel.
Currently I have download files back to 2009.
#####Exascalar_Cleaner.R
This reads in the Top500 and Green500 lists stored locally, cleans the data, and creates data.frames with descriptive names of columns. The cleaning function gets updated frequently since the cleaning of individual lists is a bit customized (naming and data entry has not been consistent across the years)
Naming conventions are:
Nov13.csv - the combined Top500 and Green 500 list from November 2013
Jun09.csv - the combined Top500 and Green 500 list from June 2009
It also creates a file
BigExascalar.csv - which is the combined cleaned files with a date column added
The program saves the files in a folder results
currently the data in the cleaned files are:
"ExaRank" Numerical rank of computers based on Exascalar
"exascalar" The computed Exacalar Value
"green500rank" The rank of the system in the Green 500 (Efficiency)
"top500rank" The rank of the system in the Top500 (Performance)
"rmax" System Performance
"power" System Power
"mflopswatt" Efficiency
"computer" A descriptive name of the computer
#####Exascalar_Trend.R This program creates a plot of the most recent Green500 data and plots the trend lines of the Top and Median exascalar.
#####PlotWholeBigExascalar.R
This is a exploratory program which plots all the supercomputing data on one plot. It only prints to the screen.
#####PowerGap2.R
This program extracts the power and performance data of the most efficient and the least advanced (lowest Exascalar)
Note that the while the power consumption of the worst (lowest exascalar) is 100 times greater than the lowest power system, the performance of the systems are the same.
The output is stored as PowerCompare.png
#####TechTrend.R
This program helps visualize how different technologies contribute to supercomputing leadership by plotting the data for systems against the data of leading supercoputer. For example the grpah below shows how Intel's Xeon Phi systems have evolved.
The are stored as files named TechTrend_xxx.png_
####Fin