-
Notifications
You must be signed in to change notification settings - Fork 0
wangdxf/ProPPR
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
ProPPR: PROGRAMMING WITH PERSONALIZED PAGERANK ============================================== This is a Java package for using graph walk algorithms to perform inference tasks over local groundings of first-order logic programs. The package makes use of parallelization to substantially speed processing, making it practical even for large databases. Contents: 1. Build 2. Run 2.0. Overview of Java main() classes 2.1. Experiment pipeline 2.1.0. Experiment: full pipeline 2.1.1. ExampleCooker: Grounding or "Cooking" 2.1.2. Trainer: SGD learning 2.1.3. Tester: Evaluating results 2.2. Utilities 2.2.0. QueryAnswerer: Answering queries against a program 2.2.1. Prompt: Interact with ProPPR machinery in a shell 3. File Formats 1. BUILD ======== ProPPR $ ant clean build 2. RUN ====== For all run phases, control logging output using conf/log4j.properties. 2.0. RUN: JAVA MAIN CLASSES =========================== edu.cmu.ml.Experiment - Full pipeline: ground, train, test. edu.cmu.ml.praprolog.ExampleCooker - for grounding (used by grounding.sh) edu.cmu.ml.praprolog.trove.Trainer - single-threaded parameter-learning over labeled examples edu.cmu.ml.praprolog.trove.MultithreadedTrainer - multithreaded parameter-learning - if the trove *Trainer classes have trouble, there are less memory-efficient but potentially easier-to-debug versions which use string keys: edu.cmu.ml.praprolog.Trainer edu.cmu.ml.praprolog.MultithreadedTrainer edu.cmu.ml.praprolog.Tester - single-threaded evaluation of trained parameters over labeled examples edu.cmu.ml.praprolog.prove.Prompt - interactive prompt for exploring state trees in grounding proofs - good for debugging rulefiles edu.cmu.ml.praprolog.QueryAnswerer - compute query solutions, using trained or untrained parameters 2.1. RUN: EXPERIMENT PIPELINE ============================= 2.1.0. RUN: EXPERIMENT PIPELINE: EXPERIMENT =========================================== Use the file formats below to create files that specify your program's rules and fact DBs, and optionally a graph. Create training and testing data files of positive and negative examples for queries on the program. Details on file format are below. Place these files in their own directory. ProPPR $ ls my_program facts.facts graph.graph rules.rules train.data test.data Use the provided script 'compile.sh' to convert these files into ProPPR-readable formats: ProPPR $ scripts/compile.sh my_program compiling my_program/rules.rules to my_program/rules.crules parsing my_program/rules.rules Converting my_program/facts.facts Now you can run the full pipeline on the resulting files. The compile script generates the --programFiles argument for you: ProPPR $ export PROGRAMFILES=`cat my_program/programFiles.arg` ProPPR $ java -cp .:bin/:lib/*:conf/ edu.cmu.ml.praprolog.Experiment \ --programFiles ${PROGRAMFILES%:} --train my_program/toytrain.data --output my_program/train.cooked \ --test my_program/toytest.data --params my_program/params.wts INFO [Component] Loading from file 'my_program/rules.crules' with alpha=0.0 ... INFO [Component] Loading from file 'my_program/facts.cfacts' with alpha=0.0 ... INFO [Component] Loading from file 'my_program/graph.graph' with alpha=0.0 ... INFO [Experiment] Cooking training examples from my_program/toytrain.data... INFO [Experiment] Training model parameters... INFO [Trainer] Importing cooked examples from my_program/train.cooked INFO [Trainer] Imported 11 examples INFO [Trainer] Training on cooked examples... INFO [Trainer] epoch 1 ... INFO [Trainer] epoch 2 ... INFO [Trainer] epoch 3 ... INFO [Trainer] epoch 4 ... INFO [Trainer] epoch 5 ... INFO [Trainer] epoch 6 ... INFO [Trainer] epoch 7 ... INFO [Trainer] epoch 8 ... INFO [Trainer] epoch 9 ... INFO [Trainer] epoch 10 ... INFO [Trainer] epoch 11 ... INFO [Trainer] epoch 12 ... INFO [Trainer] epoch 13 ... INFO [Trainer] epoch 14 ... INFO [Trainer] epoch 15 ... INFO [Trainer] epoch 16 ... INFO [Trainer] epoch 17 ... INFO [Trainer] epoch 18 ... INFO [Trainer] epoch 19 ... INFO [Trainer] epoch 20 ... INFO [Trainer] Finished in 1816 ms INFO [Experiment] Saving parameters to my_program/params.wts... INFO [Experiment] Testing on my_program/toytest.data... INFO [Tester] pairTotal 7.0 pairErrors 0.0 errorRate 0.0 map 1.0 result= running time 2836 result= pairs 7.0 errors 0.0 errorRate 0.0 map 1.0 The full list of options is available (including setting details on prover type, number of threads, loss output, etc) by running Experiment with no arguments. 2.1.1. RUN: EXPERIMENT PIPELINE: EXAMPLECOOKER ============================================== Use the provided script 'ground.sh' Use the file formats below to create files that specify your program's rules and fact DBs, and optionally a graph. Create a training data file of positive and negative examples for queries on the program. Details on file format are below. Place these files in their own directory. ProPPR $ ls my_program facts.facts graph.graph rules.rules train.data Run the grounding script on the directory, and specify the training data file explicitly: ProPPR $ scripts/ground.sh my_program my_program/train.data --- Compiling my_program first: scripts/compile.sh my_program compiling my_program/rules.rules to my_program/rules.crules parsing my_program/rules.rules Converting my_program/facts.facts --- Cooking my_program/train.data: java -cp .:bin/:lib/*:conf/ edu.cmu.ml.praprolog.ExampleCooker --programFiles my_program/rules.crules:my_program/facts.cfacts:my_program/graph.graph --data my_program/train.data --output my_program/train.cooked INFO [Component] Loading from file 'my_program/rules.crules'... INFO [Component] Loading from file 'my_program/facts.cfacts'... INFO [Component] Loading from file 'my_program/graph.graph'... 171 Done. --- Done. ProPPR $ ls my_program/ programFiles.arg rules.crules rules.rules facts.cfacts facts.facts train.cooked train.data graph.graph You can pass additional arguments to the cooker, if you wish to specify prover or multhreading settings: [--prover { ppr | dpr[:eps] | tr[:depth] }] (default prover is ppr) (default dpr epsilon=0.0001) (default tr depth=5) [--threads NUMBER] use multithreading (default implementation is single-threaded) ProPPR $ scripts/ground.sh my_program my_program/toytrain.data --prover dpr:0.00001 --threads 3 --- Compiling my_program first: scripts/compile.sh my_program compiling my_program/textcat.rules to my_program/textcat.crules parsing my_program/textcat.rules Converting my_program/toylabels.facts --- Cooking my_program/toytrain.data: java -cp .:bin/:lib/*:conf/ edu.cmu.ml.praprolog.ExampleCooker --programFiles my_program/textcat.crules:my_program/toylabels.cfacts:my_program/toywords.graph --data my_program/toytrain.data --output my_program/toytrain.cooked --prover dpr:0.00001 --threads 3 INFO [Component] Loading from file 'my_program/textcat.crules'... INFO [Component] Loading from file 'my_program/toylabels.cfacts'... INFO [Component] Loading from file 'my_program/toywords.graph'... 1124 Done. --- Done. To reset your project directory to just the original rule, fact, graph, and data files, use the script: ProPPR $ scripts/clean.sh my_program/ ProPPR $ ls my_program facts.facts graph.graph rules.rules train.data 2.1.2. RUN: EXPERIMENT PIPELINE: TRAINER ======================================== Given a cooked file we can learn the parameter weights that best support the training examples: ProPPR $ java edu.cmu.ml.praprolog.trove.Trainer my_program/toytrain.cooked my_program/toytrain.params You can specify additional options if you like: --epochs {int} Number of epochs (default 5) --traceLosses Turn on traceLosses (default off) NB: example count for losses is sum(x.length() for x in examples) and won't match `wc -l cookedExampleFile` For multithreaded learning, use: ProPPR $ java edu.cmu.ml.praprolog.trove.MultithreadedTrainer my_program/toytrain.cooked my_program/toytrain.params With the same options as above, plus: --threads {int} Number of threads (default 4) --rr Use round-robin scheduling (default:queue) (RECOMMENDED) 2.1.3. RUN: EXPERIMENT PIPELINE: TESTER ======================================= Given a cooked file of testing examples and the model parameters output from Trainer, we can get the MAP of the trained model over the test data. If you are running this step you should already have a compiled program my_program, likely cooked training data, and the trained parameters: ProPPR $ ls my_program/ facts.cfacts facts.facts graph.graph params.wts programFiles.arg rules.crules rules.rules test.data train.cooked train.data Run the tester: ProPPR $ export PROGRAMFILES=`cat my_program/programFiles.arg` ProPPR $ java -cp .:bin/:lib/*:conf/ edu.cmu.ml.praprolog.Tester \ --programFiles ${PROGRAMFILES%:} --test my_program/test.data --params my_program/params.wts INFO [Tester] flags: 0x59 INFO [Component] Loading from file 'my_program/rules.crules' with alpha=0.0 ... INFO [Component] Loading from file 'my_program/facts.cfacts' with alpha=0.0 ... INFO [Component] Loading from file 'my_program/graph.graph' with alpha=0.0 ... INFO [Tester] Testing on my_program/test.data... INFO [Tester] pairTotal 7.0 pairErrors 0.0 errorRate 0.0 map 1.0 result= running time 113 result= pairs 7.0 errors 0.0 errorRate 0.0 map 1.0 2.2. RUN: UTILITIES =================== 2.2.0. RUN: UTILITIES: QUERYANSWERER ==================================== If you want to use a program to answer a series of queries, you can use the QueryAnswerer class. If you are running this step you should already have a compiled program and a file containing a list of queries, one per line. Each query is a single goal. ProPPR $ cat testcases/family.queries sim(william,X) sim(rachel,X) ProPPR $ java edu.cmu.ml.praprolog.QueryAnswerer \ --programFiles testcases/family.cfacts:testcases/family.crules \ --queries testcases/family.queries --output answers.txt INFO [Component] Loading from file 'testcases/family.cfacts' with alpha=0.0 ... INFO [Component] Loading from file 'testcases/family.crules' with alpha=0.0 ... ProPPR $ cat answers.txt # proved sim(william,-1) 47 msec 1 0.8838968104504825 -1=c[william] 2 0.035512510088781264 -1=c[lottie] 3 0.035512510088781264 -1=c[rachel] 4 0.035512510088781264 -1=c[sarah] 5 0.002391414820793351 -1=c[poppy] 6 0.0017935611155950133 -1=c[lucas] 7 0.0017935611155950133 -1=c[charlotte] 8 0.0017935611155950133 -1=c[caroline] 9 0.0017935611155950133 -1=c[elizabeth] # proved sim(rachel,-1) 18 msec 1 0.9094251636624519 -1=c[rachel] 2 0.0452874181687741 -1=c[caroline] 3 0.0452874181687741 -1=c[elizabeth] 2.2.1. RUN: UTILITIES: PROMPT ============================= An interactive prompt can be useful while debugging logic program issues, because you can examine a single query in detail. If you are running this step you should already have a compiled program. Starting up the prompt: """ ProPPR $ java -cp conf/:bin/:lib/* edu.cmu.ml.praprolog.prove.Prompt --programFiles ${PROGRAMFILES%:} Starting up beanshell... prv set: edu.cmu.ml.praprolog.prove.TracingDfsProver@57fdc2d INFO [Component] Loading from file 'kbp_prototype/doc.crules' with alpha=0.0 ... INFO [Component] Loading from file 'kbp_prototype/kb.cfacts' with alpha=0.0 ... INFO [Component] Loading from file 'kbp_prototype/lp_predicate_SF_ENG_001-50doc.graph' with alpha=0.0 ... lp set: edu.cmu.ml.praprolog.prove.LogicProgram@2225a091 Type 'help();' for help, 'quit();' to quit; 'list();' for a variable listing. BeanShell 2.0b4 - by Pat Niemeyer ([email protected]) bsh % """ When it starts up, Prompt instantiates the logic program from the command line as 'lp', and a default prover which prints a depth-first-search-style proof of a query (default maximum depth is 5). You can specify a different prover on the command line if you wish. For information on built-in commands and interpreter syntax, type 'help();': """ bsh % help(); This is a beanshell, a command-line interpreter for java. A full beanshell manual is available at <http://www.beanshell.org/manual/contents.html>. Type java statements and expressions at the prompt. Don't forget semicolons. Type 'help();' for help, 'quit();' to quit; 'list();' for a variable listing. 'show();' will toggle automatic printing of the results of expressions. Otherwise you must use 'print( expr );' to see results. 'javap( x );' will list the fields and methods available on an object. Be warned; beanshell has trouble locating methods that are only defined on the superclass. '[sol = ]run(prover,logicprogram,"functor(arg,arg,...,arg)")' will prove the associated state. 'pretty(sol)' will list solutions first, then intermediate states in descending weight order. bsh % """ 3. FILE FORMATS =============== ****** File format: *.rules Example: predict(X,Y) :- hasWord(X,W),isLabel(Y),related(W,Y) #r. related(W,Y) :- # w(W,Y). Grammar: line= rhs ':-' lhs ('#' featureList)? '.' rhs= goal lhs= |= goal (',' goal)* featureList= |= goal (',' goal)* goal= functor |= functor '(' argList ')' argList= constantArgList |= variableArgList |= constantArgList ',' variableArgList constantArgList= constantArg (',' constantArg)* variableArgList= variableArg (',' variableArg)* constantArg= [a-z][a-zA-Z0-9]* variableArg= [A-Z][a-zA-Z0-9]* functor= [a-z][a-zA-Z0-9]* ****** File format: *.facts Example: isLabel(pos) isLabel(neg) Grammar: line= goal ****** File format: *.graph Example: hasWord bk punk hasWord bk queen hasWord bk barbie hasWord bk and hasWord bk ken hasWord rb a hasWord rb little hasWord rb red hasWord rb bike hasWord mv a hasWord mv big hasWord mv 7-seater hasWord mv minivan hasWord mv with hasWord mv an hasWord mv automatic hasWord mv transmission hasWord hs a hasWord hs big hasWord hs house hasWord hs in hasWord hs the hasWord hs suburbs hasWord hs with hasWord hs crushing hasWord hs mortgage Grammar: line= edge '\t' sourcenode '\t' destnode edge= functor sourcenode,destnode= constantArg ****** File format: *.data Example: predict(bk,Y) -predict(bk,neg) +predict(bk,pos) predict(rb,Y) -predict(rb,neg) +predict(rb,pos) predict(mv,Y) +predict(mv,neg) -predict(mv,pos) predict(hs,Y) +predict(hs,neg) -predict(hs,pos) Grammar: line= query '\t' exampleList query= goal exampleList= example ('\t' example)* example= positiveExample |= negativeExample positiveExample= '+' goal negativeExample= '-' goal
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Packages 0
No packages published