tez/tez-tools/tez-tfile-parser at master · 15801024150/tez

History

Name		Name	Last commit message	Last commit date
parent directory ..
src/main/java/org/apache/tez/tools		src/main/java/org/apache/tez/tools
README.txt		README.txt
pom.xml		pom.xml

README.txt

It can be time consuming to download logs via "yarn logs -applicationId <appId> | grep something". Also mining large volumes of logs can be time consuming on single node.
This is a simple Pig loader to parse TFiles and provide line by line format (tuple of (machine, key, line)) for distributed processing of logs.

Build/Install:
==============
1. "mvn clean package" should create "tfile-parser-x.y.z-SNAPSHOT.jar" would be created in ./target directory

Running pig with tez:
====================
1. Install pig
2. $PIG_HOME/bin/pig -x tez (to open grunt shell)

Sample pig script:
==================
set pig.splitCombination false;
set tez.grouping.min-size 52428800;
set tez.grouping.max-size 52428800;

/* Register all tez jars. Replace $TEZ_HOME, $TEZ_TFILE_DIR with absolute path */
register '$TEZ_HOME/*.jar';
register '$TEZ_TFILE_DIR/tfile-parser-1.0-SNAPSHOT.jar';
raw = load '/app-logs/root/logs/application_1411511669099_0769/*' using org.apache.tez.tools.TFileLoader() as (machine:chararray, key:chararray, line:chararray);
filterByLine = FILTER raw BY (key MATCHES '.*container_1411511669099_0769_01_000001.*')
                   AND (line MATCHES '.*Shuffle.*');
dump filterByLine;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tez-tfile-parser

tez-tfile-parser

README.txt

Files

tez-tfile-parser

Directory actions

More options

Directory actions

More options

Latest commit

History

tez-tfile-parser

Folders and files

parent directory

README.txt