A comprehensive data analysis pipeline for NOAA global temperature data, using R and SQL Server.
Uses NOAAGlobalTemp dataset, including:
- Global Historical Climate Network-Monthly (GHCNm) for land
- Extended Reconstructed Sea Surface Temperature (ERSST) for sea
- ICOADS and IABP for Arctic Ocean
- Time Series Data:
combined_time_series.csv
(1850-present, anomalies vs. 1901-2000 average) - Gridded Data:
gridded_data.csv
(5° x 5° grid, anomalies vs. 1991-2020 base)
- Climate researchers
- Data scientists working with environmental data
- Anyone interested in global temperature patterns
- Automated data download and processing of raw NOAA temperature data
- Robust SQL database for data storage and querying
- Comprehensive data cleaning and analysis
- Calculates statistics and identifies temperature trends
- Exports results for further use
- Enhanced error handling and detailed logging
- Improved data consistency checks
- Modular SQL script execution
- Automated database setup and table creation
- Progress tracking for data conversion and processing
-
R (3.6.0+)
-
SQL Server (2019+)
-
R packages:
curl, DBI, dplyr, httr, ncdf4, odbc, readr, xml2, progress, lubridate, tidyverse
The script will automatically install and load these packages if they are not already available in your R environment.
- Ensure you have R and SQL Server installed on your system.
- Clone the repository.
- Open R or RStudio and set the working directory to the
R/
folder. - Run
runner.R
. - The script will automatically install any missing packages.
- Check
data/processed/
for results and the SQL database for exploration data.
The script handles package installation, database setup, data download, and processing automatically.
Before running, your project should look like this. After running, you'll find the empty folders filled with downloaded raw temperature data, processed CSVs, and more.
temperature-analysis-project/
│
├── data/
│ ├── raw/
│ └── processed/
│
├── docs/
│ ├── database_schema.md
│ ├── data_dictionary.md
│ └── data_processing_pipeline.md
│
├── outputs/
│ ├── plots/
│ ├── tableau/
│ └── tables/
│
├── R/
│ ├── runner.R
│ └── utils.R
│
├── sql/
│ ├── drop_tables.sql
│ ├── explore_data.sql
│ ├── process_data.sql
│ ├── run_diagnostics.sql
│ └── setup_database.sql
│
├── .gitignore
└── README.md
- Automated data download and conversion
- SQL database creation and management
- Enhanced error handling and logging in R scripts
- Improved SQL script execution with support for multiple statements
- Automated database and table creation
- Data consistency checks for
TimeSeries
andGriddedData
tables - Detailed diagnostic queries for data verification
- Progress bars for data conversion and processing tasks
- Advanced statistical analysis
- Machine learning integration
- Interactive visualization dashboard
- Geospatial analysis
- Correlation with other climate indicators
Contributions, bug reports, and feature requests are welcome!
When using this data, please cite:
NOAA National Centers for Environmental Information, Climate at a Glance: Global Time Series, published [Month] 2024, retrieved on [Date] from https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/global/time-series
Note: Replace [Month]
and [Date]
with the actual month and date of retrieval.
MIT License