-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Rewrite to support additional nutrient databases
- Loading branch information
Showing
25 changed files
with
107 additions
and
56 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,33 @@ | ||
This project converts the information contained within the USDA National Nutrient Database for Standard Reference into SQL files for import into specific database systems. | ||
DATABASE SYSTEMS SUPPORTED: | ||
- MySQL | ||
|
||
The database was downloaded from the USDA site at http://www.ars.usda.gov/nutrientdata and was not altered in any way. | ||
NUTRIENT DATABASES INCLUDED: | ||
- USDA National Nutrient Database for Standard Reference | ||
http://www.ars.usda.gov/nutrientdata | ||
|
||
Run the build.sh file to generate the SQL files. Currently, only a MySQL file is created, but it's easy to add files for other database systems. | ||
Simply copy one of the Perl module files (*.pm) and alter it to output the format that the other database system requires. | ||
To alter the database name or user credentials, edit the "generate_sql.pl" file. | ||
PROJECT DESCRIPTION: | ||
This project converts the food composition data released by various official | ||
sources in the world to more modern formats. Often this data is provided by the | ||
source as an Access database, an Excel file, or a set of character delimited | ||
text files. For programmatic access, however, some sort of SQL format is | ||
usually prefered to any of the above formats. | ||
|
||
HOW IT WORKS: | ||
A human being is needed to create a description file for a given nutrient database. The JSON format was chosen for readability and portability reasons. | ||
Nutriana always converts the official data without modification. However, some changes may be necessary to ensure a successful database creation and data import. For example: | ||
- the database schema as indicated by the official source is not compatible with the raw data files provided | ||
- additional data rows are needed to avoid conflicts with foreign key constraints | ||
See the *.MODIFICATIONS files for more details. | ||
|
||
The "db_description.json" file was created manually by extracting the information from the "data/sr24_doc.pdf" file. | ||
Modifications were made to the information in the "data/sr24_doc.pdf" file as well as the resulting SQL to remove any problems importing the nutrient database data; see the "MODIFICATIONS" file. | ||
IF YOUR PREFERRED DATABASE IS NOT SUPPORTED: | ||
It should be easy to add other SQL-based databases by copying one of the Perl module files (*.pm) and edit it to output the format that your database system requires. (If you find it's not, let me know by creating an issue.) | ||
Run the build.sh file to (re)generate the SQL files. The script will automatically detect the new .pm file and attempt to output SQL for it. | ||
To alter the database name or user credentials, edit the "generate_sql.pl" file. | ||
|
||
Author: | ||
AUTHOR: | ||
- Maarten van Egmond | ||
|
||
License: | ||
- Usda-nutrient-database-sql-port is released under the MIT license; see the LICENSE file. | ||
- The USDA Nutrient Database "USDA food composition data" is in the public domain and there is no copyright or licensing fees. | ||
LICENSE: | ||
- Nutriana is released under the MIT license; see the LICENSE file. | ||
- Full licensing and usage information for the incuded nutrient databases is available in the *.LICENSE files; below is a summary: | ||
- The USDA Nutrient Database "USDA food composition data" is in the public domain and there is no copyright or licensing fees. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,25 +1,34 @@ | ||
#!/bin/sh | ||
|
||
# The SQL files are generated via perl, so make sure it's installed. | ||
# The SQL files are generated via Perl, so make sure it's installed. | ||
PERL=`which perl` | ||
if [ "$PERL" == "" ]; then echo "Please install perl" ; exit 1 ; fi | ||
if [ "$PERL" == "" ]; then echo "Please install Perl" ; exit 1 ; fi | ||
|
||
# Check that the data files do not contain any special characters. | ||
# Because in shell scripts `file data/*.txt` does not preserve newlines, defer to perl for this. | ||
$PERL ./check_data_files.pl | ||
# Process all nutrient databases included. | ||
for NUTDBDIR in `find . -type d -depth 1`; do | ||
# Extract nutrient dabatase identifier. | ||
NUTDBID=`expr "$NUTDBDIR" : "\./\(.*\)"` | ||
|
||
# The perl modules indicate the databases to generate SQL for. | ||
for PMFILE in `find . -type f -name \*.pm`; do | ||
# Extract dabatase identifier. | ||
DBID=`expr "$PMFILE" : "\.\/\(.*\).pm"` | ||
# Convert outfile to lowercase. | ||
OUTFILE="$(tr [A-Z] [a-z] <<< "usda_nndsr_$DBID.sql")" | ||
# Ignore .git dir. | ||
if [ "$NUTDBID" == ".git" ]; then continue; fi | ||
|
||
# Generate the SQL file for this database. | ||
# Make sure to add the current directory to the beginning of @INC | ||
# to avoid accidentally using official modules with the same name. | ||
$PERL -I . -M$DBID ./generate_sql.pl > $OUTFILE | ||
# Check that the data files do not contain any special characters. | ||
# Because in shell scripts `file $NUTDBID/*.txt` does not preserve newlines, | ||
# defer to Perl for this. | ||
$PERL ./check_data_files.pl $NUTDBID | ||
|
||
echo "$DBID file generated: $OUTFILE" | ||
done | ||
# The Perl modules indicate the databases to generate SQL for. | ||
for PMFILE in `find . -type f -name \*.pm`; do | ||
# Extract dabatase identifier. | ||
RDBMSID=`expr "$PMFILE" : "\./\(.*\).pm"` | ||
# Convert outfile to lowercase. | ||
OUTFILE="$(tr [A-Z] [a-z] <<< $NUTDBID"_"$RDBMSID.sql)" | ||
|
||
# Generate the SQL file for this database. | ||
# Make sure to add the current directory to the beginning of @INC | ||
# to avoid accidentally using official modules with the same name. | ||
$PERL -I . -M$RDBMSID ./generate_sql.pl $RDBMSID $NUTDBID > $OUTFILE | ||
|
||
echo "$RDBMSID file for $NUTDBID generated: $OUTFILE" | ||
done | ||
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
The USDA Nutrient Database "USDA food composition data" is in the public domain and there is no copyright or licensing fees. |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
#!/usr/bin/perl | ||
# | ||
# Fixes data rows in preparation of adding foreign keys. | ||
# This file is part of http://github/m5n/nutriana | ||
|
||
use strict; | ||
|
||
my $project_url = $ARGV[0]; | ||
|
||
print sql_insert("DERIV_CD", ("Deriv_Cd" => "", "Deriv_Desc" => "Added by $project_url to avoid foreign key error")) . "\n"; | ||
print sql_insert("FOOD_DES", ("NDB_No" => "", "FdGrp_Cd" => "0100", "Long_Desc" => "Added by $project_url to avoid foreign key error", "Shrt_Desc" => "See Long_Desc")) . "\n"; | ||
print sql_insert("NUTR_DEF", ("Nutr_No" => "", "Units" => "g", "NutrDesc" => "Added by $project_url to avoid foreign key error", "Num_Dec" => "0", "Sr_Order" => 0)) . "\n"; | ||
|
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters