Skip to content

Commit

Permalink
added first iter of proper-nouns pig solution
Browse files Browse the repository at this point in the history
  • Loading branch information
Aaron Kimball committed Jun 4, 2009
1 parent 0396e1e commit 6ffc9f1
Showing 1 changed file with 24 additions and 0 deletions.
24 changes: 24 additions & 0 deletions exercises/intro-to-pig/solution/proper_nouns.pig
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@

-- Load in the jar that contains our UDFs
REGISTER textudf.jar;

-- load the (word, sentence) pairs in.
bard = LOAD 'input_idx' USING PigStorage(',') AS (word, sentence);

-- Ignore anything that isn't a Capital word. We ignore FULLY capitalized words,
-- as well as the word 'I'.
caps = FILTER bard BY word matches '[A-Z][a-z][a-zA-Z]*';

-- Find all instances where the word is capitalized, and doesn't start its sentence
not_starting = FILTER caps BY NOT textudf.StartsWith(sentence, word);

-- throw away the sentence, we just want the word part of the record.
justword = FOREACH not_starting GENERATE word;

-- deduplicate...
nodups = DISTINCT justword;

-- and write out the results.
STORE nodups INTO 'proper_nouns';


0 comments on commit 6ffc9f1

Please sign in to comment.