Skip to content

Commit

Permalink
Update usda_nndsr v24 -> v25
Browse files Browse the repository at this point in the history
  • Loading branch information
m5n committed Sep 30, 2012
1 parent cd4f1fb commit 2635f18
Show file tree
Hide file tree
Showing 21 changed files with 212,953 additions and 187,319 deletions.
35 changes: 22 additions & 13 deletions README
Original file line number Diff line number Diff line change
Expand Up @@ -4,29 +4,38 @@ DATABASE SYSTEMS SUPPORTED:

NUTRIENT DATABASES INCLUDED:
- Canadian Nutrient File, Health Canada, 2010
http://www.healthcanada.gc.ca/cnf
- USDA National Nutrient Database for Standard Reference, Release 24
http://www.ars.usda.gov/nutrientdata
(electronic version at www.healthcanada.gc.ca/cnf)
- U.S. Department of Agriculture, Agricultural Research Service. 2012.
USDA National Nutrient Database for Standard Reference, Release 25.
Nutrient Data Laboratory Home Page, http://www.ars.usda.gov/ba/bhnrc/ndl

PROJECT DESCRIPTION:
This project converts the food composition data released by various official
sources in the world to more modern formats. Often this data is provided by the
source as an Access database, an Excel file, or a set of character delimited
text files. For programmatic access, however, some sort of SQL format is
usually prefered to any of the above formats.
Nutriana takes the food composition data released by various official sources in the
world and converts it into formats specific to the database systems mentioned above.

HOW IT WORKS:
A human being is needed to create a description file for a given nutrient database. The JSON format was chosen for readability and portability reasons.
Nutriana never modifies the nutrient database's data files. However, to ensure successful database creation and data import, some changes may be necessary to the prescribed database schema or to the casing of the values (i.e. convert values to uppercase). See the */MODIFICATIONS files for more details.
A human being is needed to extract the description and constraints of a given nutrient
database into a file that can be programmatically processed. The JSON format was chosen
for readability and portability reasons.
Nutriana never modifies the nutrient database's official data files. However, to ensure
successful database creation and data import, some changes may be necessary. The
modifications are fully disclosed in the */MODIFICATIONS files, and usually involve
schema definition corrections (e.g. field size or primary or foreign key adjustments).
Occasionally though, values are converted to uppercase or trailing spaces are removed
from the data files.

IF YOUR PREFERRED DATABASE IS NOT SUPPORTED:
It should be easy to add other SQL-based databases by copying one of the Perl module files (*.pm) and edit it to output the format that your database system requires. (If you find it's not, let me know by creating an issue.)
Run the build.sh file to (re)generate the SQL files. The script will automatically detect the new .pm file and attempt to output SQL for it.
It should be easy to add support for other databases by copying one of the Perl module
files (*.pm) and editing it as needed to output the format for your database system.
(If you find it's not, let me know by creating an issue.)
Run the build.sh file to (re)generate the database vendor files. The script will
automatically detect the new .pm file and attempt to output SQL for it.
To alter the database name or user credentials, edit the "generate_sql.pl" file.

AUTHOR:
- Maarten van Egmond

LICENSE:
- Nutriana is released under the MIT license; see the LICENSE file.
- Full licensing and usage information for the incuded nutrient databases is available in the */LICENSE files.
- Full licensing and usage information for the incuded nutrient databases is available in
the */LICENSE files.
2 changes: 1 addition & 1 deletion canadian_nf/schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
"file": "FOOD_NM.txt",
"description": "Food Name. This is a principal file. It stores information about each food in the database. It contains a description of each food in English and French as well as dates and comments.",
"records": 5807,
"convert_rows_to_remove_trailing_whitespace": true,
"convert_rows_to_remove_trailing_whitespace_and_empty_lines": true,
"fields": [
{
"name": "FD_ID",
Expand Down
12 changes: 6 additions & 6 deletions generate_sql.pl
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,7 @@


my $project_url = "http://github.com/m5n/nutriana";
my $pwd = `pwd`; chomp $pwd;
my $file = "$pwd/$nutdbid/schema.json";
my $file = "./$nutdbid/schema.json";
my $json = do { local $/ = undef; open my $fh, "<", $file or die "Could not open $file: $!"; <$fh>; };
my $data = decode_json($json);
my $header = $data->{"description"} . " (" . $data->{"url"} . ")";
Expand Down Expand Up @@ -83,17 +82,18 @@
my $line_separator = "\\r\\n";

# Convert files if needed.
if ($table{"convert_rows_to_remove_trailing_whitespace"}) {
if ($table{"convert_rows_to_remove_trailing_whitespace_and_empty_lines"}) {
# Only convert once.
if (-e "$datafile.trimmed") {
$datafile = "$datafile.trimmed"; # Update datafile.
$datafile = "$datafile.trimmed"; # Update file used in sql_load_data below.
} else {
open INFILE, "<", $datafile or die "Could not open $datafile: $!";
$datafile = "$datafile.trimmed"; # Update datafile.
$datafile = "$datafile.trimmed"; # Update file used in sql_load_data below.
open OUTFILE, ">", $datafile or die "Could not open $datafile: $!";
# TODO: doing this while loop ends up setting @{$data->{"tables"}}[0] to undefined, leading to trouble below. Why?
while (<INFILE>) {
$_ =~ s/\s$//g;
print OUTFILE $_ . "\r\n"; # Interpreted version of $line_separator.
print OUTFILE $_ . "\r\n" if length($_) > 0; # Interpreted version of $line_separator. # TODO: find Perl function to convert "\\r\\n" -> "\r\n".
}
close INFILE;
close OUTFILE;
Expand Down
80 changes: 43 additions & 37 deletions usda_nndsr/MODIFICATIONS
Original file line number Diff line number Diff line change
@@ -1,54 +1,60 @@
Modifications made that were missing or incorrect in the data description file ./data/sr24.pdf:
Modifications made that were missing or incorrect in the data description file ./data/sr25_doc.pdf:

DATA_SRC table
- change Title to allow null because entry S1921 doesn't specify this value

DATSRCLN table
- add foreign key NUTR_DEF.Nutr_No for Nutr_No
- change record count from 187019 to 187156
- there are DataSrc_ID records in DATSRCLN which are not in DATA_SRC:
(Once fixed, add back "foreign_key": "DATA_SRC.DataSrc_ID", to DATSRCLN.DataSrc_ID in schema.json)
mysql> select distinct DataSrc_ID from DATSRCLN where DataSrc_ID not in (select DataSrc_ID from DATA_SRC);
+------------+
| DataSrc_ID |
+------------+
| S7821 |
| S7441 |
| S7521 |
| S7282 |
| S7321 |
| S7601 |
| S7701 |
| S6962 |
+------------+

DERIV_CD table
- change Deriv_Desc size from 120 to 263 because entry NP's Deriv_desc length is 263
- add back SR24 rows NR and O, which are still referenced by NUT_DATA
- (MySQL) SR24 was ok, but SR25 has ^M character after every line
and there's one empty line between NP and PAE, which causes problems with MySQL.
Also NP's Deriv_Desc value is not terminated by a ~ (instead there's that emtpy line).
There are no ways around this other than pre-processing the file.

FOOTNOTE table
- change Footnt_No to allow null because entry 12737 doesn't specify this value
- change Footnt_Typ to allow null because entries (35234, 01) through (35234, 06) don't specify this value
- (MySQL) convert empty string Nutr_No values to null to avoid foreign key error added on FOOTNOTE table
+ this is to fix ERROR 1452 (23000) at line 387: Cannot add or update a child row:
a foreign key constraint fails (`usda_nndsr`.<result 2 when explaining filename '#sql-36d3_48'>,
CONSTRAINT `#sql-36d3_48_ibfk_2` FOREIGN KEY (`Nutr_No`) REFERENCES `NUTR_DEF` (`Nutr_No`))
+ the error above occurs with this statement: alter table FOOTNOTE add foreign key (Nutr_No) references NUTR_DEF(Nutr_No)
+ "select distinct Nutr_No from FOOTNOTE order by Nutr_No" reveals there's an empty value, which the NUTR_DEF table does not contain

LANGUAL table
- change record count from 40205 to 39085

NUT_DATA table
- change DF size from 2 to 4 because entry (04025, 307) and others length is 3, and (14096, 207) and others length is 4
- (MySQL) convert empty string Deriv_Cd values to null to avoid foreign key error added on NUT_DATA table
+ this is to fix ERROR 1452 (23000) at line 368: Cannot add or update a child row:
a foreign key constraint fails (`food`.<result 2 when explaining filename '#sql-36d3_44'>,
a foreign key constraint fails (`usda_nndsr`.<result 2 when explaining filename '#sql-36d3_44'>,
CONSTRAINT `#sql-36d3_44_ibfk_4` FOREIGN KEY (`Deriv_Cd`) REFERENCES `DERIV_CD` (`Deriv_Cd`))
+ the error above occurs with this statement: alter table NUT_DATA add foreign key (Deriv_Cd) references DERIV_CD(Deriv_Cd)
+ "select distinct DERIV_CD from NUT_DATA order by DERIV_CD" reveals there's an empty value, which the DERIV_CD table does not contain

FOOD_DES table
- change record count from 7906 to 7907
- (MySQL) convert empty string NDB_No values to null to avoid foreign key error added on NUT_DATA table
- (MySQL) convert empty string Ref_NDB_No values to null to avoid foreign key error added on NUT_DATA table
+ this is to fix ERROR 1452 (23000) at line 382: Cannot add or update a child row:
a foreign key constraint fails (`food`.<result 2 when explaining filename '#sql-36d3_47'>,
a foreign key constraint fails (`usda_nndsr`.<result 2 when explaining filename '#sql-36d3_47'>,
CONSTRAINT `#sql-36d3_47_ibfk_5` FOREIGN KEY (`Ref_NDB_No`) REFERENCES `FOOD_DES` (`NDB_No`))
+ the error above occurs with this statement: alter table NUT_DATA add foreign key (Ref_NDB_No) references FOOD_DES(NDB_No)
+ "select distinct Ref_NDB_No from NUT_DATA order by Ref_NDB_No" reveals there's an empty value, which the FOOD_DES table does not contain

FOOTNOTE table
- change Footnt_Typ to allow null because entries (35234, 01) through (35234, 06) don't specify this value
- add foreign key NUTR_DEF.Nutr_No for Nutr_No

LANGDESC table
- change record count from 779 to 774

LANGUAL table
- add Factor_Code as second primary key as NDB_No is marked as primary key but is not unique
- (Oracle) convert Factor_Code values to uppercase as some databases (e.g. Oracle) use case-sensitive key values:
SQL> select distinct Factor_Code from LANGUAL where Factor_Code not in (select distinct Factor_Code from LANGDESC);
FACTO
-----
a0149
f0014

NUT_DATA table
- change DF size from 2 to 4 because entry (04025, 307) and others length is 3, and (14096, 207) and others length is 4
- add foreign key FOOD_DES.NDB_No for Ref_NDB_No

NUTR_DEF table
- (MySQL) convert empty string Nutr_No values to null to avoid foreign key error added on FOOTNOTE table
+ this is to fix ERROR 1452 (23000) at line 387: Cannot add or update a child row:
a foreign key constraint fails (`food`.<result 2 when explaining filename '#sql-36d3_48'>,
CONSTRAINT `#sql-36d3_48_ibfk_2` FOREIGN KEY (`Nutr_No`) REFERENCES `NUTR_DEF` (`Nutr_No`))
+ the error above occurs with this statement: alter table FOOTNOTE add foreign key (Nutr_No) references NUTR_DEF(Nutr_No)
+ "select distinct Nutr_No from FOOTNOTE order by Nutr_No" reveals there's an empty value, which the NUTR_DEF table does not contain

WEIGHT table
- change Msre_Desc size from 80 to 84 because entries (14400, *) have lengths of 83 and 84.
Loading

0 comments on commit 2635f18

Please sign in to comment.