Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Angel 3.0.1-hadoop3, hadoop3.1.1, angel-submit on yarn Failed for Hadoop NoClassDefFoundError #1199

Open
ifeela opened this issue Jan 12, 2022 · 5 comments

Comments

@ifeela
Copy link

ifeela commented Jan 12, 2022

error log:
name: Bug report/Feature request/Question
2022-01-12 11:01:53,433 FATAL [main] com.tencent.angel.master.AngelApplicationMaster: Error starting AppMaster
java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/lib/input/CombineTextInputFormat
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at com.tencent.angel.conf.AngelConf.(AngelConf.java:123)
at com.tencent.angel.master.AngelApplicationMaster$RunningAppContext.getRunningMode(AngelApplicationMaster.java:387)
at com.tencent.angel.master.AngelApplicationMaster.initAndStart(AngelApplicationMaster.java:779)
at com.tencent.angel.master.AngelApplicationMaster$1.run(AngelApplicationMaster.java:673)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at com.tencent.angel.master.AngelApplicationMaster.main(AngelApplicationMaster.java:671)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.lib.input.CombineTextInputFormat
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 20 more need this feature. if you required feature is big please connect us email list.

Angel-submit add hadoop env:
if [ command -v hadoop ];then
export HADOOP_CLASSPATH=hadoop classpath
else
echo "hadoop command not found in path!"
fi
HADOOP_HOME="/usr/hdp/3.1.4.0-315/hadoop"
if [ "${HADOOP_HOME}" != "" ]; then
echo "HADOOP_HOME is set"
DEFAULT_LIBEXEC_DIR="${HADOOP_HOME}"/libexec
HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}
. ${HADOOP_LIBEXEC_DIR}/hadoop-config.sh
fi

runtime log:
HADOOP_HOME is set
WARNING: DEFAULT_LIBEXEC_DIR ignored. It has been replaced by HADOOP_DEFAULT_LIBEXEC_DIR.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.1.4.0-315/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/root/angel-3.0.1-bin/lib/slf4j-log4j12-1.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
22/01/12 11:20:32 INFO utils.AngelRunJar: angelHomePath conf path=/root/angel-3.0.1-bin/bin/..//conf/angel-site.xml
22/01/12 11:20:32 INFO utils.AngelRunJar: load system config file success
22/01/12 11:20:32 INFO utils.AngelRunJar: for libJars..../root/angel-3.0.1-bin/bin/..//lib/angel-math-0.1.1.jar,/root/angel-3.0.1-bin/bin/..//lib/angel-format-0.1.1.jar,/root/angel-3.0.1-bin/bin/..//lib/angel-mlcore-0.1.2.jar,/root/angel-3.0.1-bin/bin/..//lib/jniloader-1.1.jar,/root/angel-3.0.1-bin/bin/..//lib/native_system-java-1.1.jar,/root/angel-3.0.1-bin/bin/..//lib/arpack_combined_all-0.1.jar,/root/angel-3.0.1-bin/bin/..//lib/all-1.1.2.pom,/root/angel-3.0.1-bin/bin/..//lib/core-1.1.2.jar,/root/angel-3.0.1-bin/bin/..//lib/netlib-native_ref-linux-armhf-1.1-natives.jar,/root/angel-3.0.1-bin/bin/..//lib/netlib-native_ref-linux-i686-1.1-natives.jar,/root/angel-3.0.1-bin/bin/..//lib/netlib-native_ref-linux-x86_64-1.1-natives.jar,/root/angel-3.0.1-bin/bin/..//lib/netlib-native_system-linux-armhf-1.1-natives.jar,/root/angel-3.0.1-bin/bin/..//lib/netlib-native_system-linux-i686-1.1-natives.jar,/root/angel-3.0.1-bin/bin/..//lib/netlib-native_system-linux-x86_64-1.1-natives.jar,/root/angel-3.0.1-bin/bin/..//lib/jackson-annotations-2.6.5.jar,/root/angel-3.0.1-bin/bin/..//lib/jackson-core-2.7.7.jar,/root/angel-3.0.1-bin/bin/..//lib/jackson-core-asl-1.9.13.jar,/root/angel-3.0.1-bin/bin/..//lib/jackson-databind-2.7.7.jar,/root/angel-3.0.1-bin/bin/..//lib/jackson-jaxrs-1.9.13.jar,/root/angel-3.0.1-bin/bin/..//lib/jackson-mapper-asl-1.9.13.jar,/root/angel-3.0.1-bin/bin/..//lib/jackson-module-paranamer-2.6.5.jar,/root/angel-3.0.1-bin/bin/..//lib/jackson-module-scala_2.11-2.6.5.jar,/root/angel-3.0.1-bin/bin/..//lib/jackson-xc-1.9.13.jar,/root/angel-3.0.1-bin/bin/..//lib/json4s-ast_2.11-3.2.11.jar,/root/angel-3.0.1-bin/bin/..//lib/json4s-core_2.11-3.2.11.jar,/root/angel-3.0.1-bin/bin/..//lib/json4s-jackson_2.11-3.2.11.jar,/root/angel-3.0.1-bin/bin/..//lib/netty-all-4.1.17.Final.jar,/root/angel-3.0.1-bin/bin/..//lib/angel-ps-mllib-3.0.1.jar,/root/angel-3.0.1-bin/bin/..//lib/angel-ps-tools-3.0.1.jar,/root/angel-3.0.1-bin/bin/..//lib/scala-reflect-2.11.8.jar,/root/angel-3.0.1-bin/bin/..//lib/memory-0.8.1.jar,/root/angel-3.0.1-bin/bin/..//lib/sketches-core-0.8.1.jar,/root/angel-3.0.1-bin/bin/..//lib/commons-pool-1.6.jar,/root/angel-3.0.1-bin/bin/..//lib/kryo-shaded-4.0.0.jar,/root/angel-3.0.1-bin/bin/..//lib/kryo-serializers-0.42.jar,/root/angel-3.0.1-bin/bin/..//lib/scala-library-2.11.8.jar,/root/angel-3.0.1-bin/bin/..//lib/angel-ps-core-3.0.1.jar,/root/angel-3.0.1-bin/bin/..//lib/angel-ps-psf-3.0.1.jar,/root/angel-3.0.1-bin/bin/..//lib/fastutil-7.1.0.jar,/root/angel-3.0.1-bin/bin/..//lib/sizeof-0.3.0.jar,/root/angel-3.0.1-bin/bin/..//lib/minlog-1.3.0.jar,/root/angel-3.0.1-bin/bin/..//lib/breeze_2.11-0.13.jar,/root/angel-3.0.1-bin/bin/..//lib/angel-format-0.1.1.jar,/root/angel-3.0.1-bin/bin/..//lib/angel-math-0.1.1.jar,/root/angel-3.0.1-bin/bin/..//lib/angel-mlcore-0.1.2.jar,/root/angel-3.0.1-bin/bin/..//lib/commons-math-2.2.jar
22/01/12 11:20:32 INFO utils.AngelRunJar: jars loaded: file:///root/angel-3.0.1-bin/bin/..//lib/angel-math-0.1.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/angel-format-0.1.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/angel-mlcore-0.1.2.jar,file:///root/angel-3.0.1-bin/bin/..//lib/jniloader-1.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/native_system-java-1.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/arpack_combined_all-0.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/all-1.1.2.pom,file:///root/angel-3.0.1-bin/bin/..//lib/core-1.1.2.jar,file:///root/angel-3.0.1-bin/bin/..//lib/netlib-native_ref-linux-armhf-1.1-natives.jar,file:///root/angel-3.0.1-bin/bin/..//lib/netlib-native_ref-linux-i686-1.1-natives.jar,file:///root/angel-3.0.1-bin/bin/..//lib/netlib-native_ref-linux-x86_64-1.1-natives.jar,file:///root/angel-3.0.1-bin/bin/..//lib/netlib-native_system-linux-armhf-1.1-natives.jar,file:///root/angel-3.0.1-bin/bin/..//lib/netlib-native_system-linux-i686-1.1-natives.jar,file:///root/angel-3.0.1-bin/bin/..//lib/netlib-native_system-linux-x86_64-1.1-natives.jar,file:///root/angel-3.0.1-bin/bin/..//lib/jackson-annotations-2.6.5.jar,file:///root/angel-3.0.1-bin/bin/..//lib/jackson-core-2.7.7.jar,file:///root/angel-3.0.1-bin/bin/..//lib/jackson-core-asl-1.9.13.jar,file:///root/angel-3.0.1-bin/bin/..//lib/jackson-databind-2.7.7.jar,file:///root/angel-3.0.1-bin/bin/..//lib/jackson-jaxrs-1.9.13.jar,file:///root/angel-3.0.1-bin/bin/..//lib/jackson-mapper-asl-1.9.13.jar,file:///root/angel-3.0.1-bin/bin/..//lib/jackson-module-paranamer-2.6.5.jar,file:///root/angel-3.0.1-bin/bin/..//lib/jackson-module-scala_2.11-2.6.5.jar,file:///root/angel-3.0.1-bin/bin/..//lib/jackson-xc-1.9.13.jar,file:///root/angel-3.0.1-bin/bin/..//lib/json4s-ast_2.11-3.2.11.jar,file:///root/angel-3.0.1-bin/bin/..//lib/json4s-core_2.11-3.2.11.jar,file:///root/angel-3.0.1-bin/bin/..//lib/json4s-jackson_2.11-3.2.11.jar,file:///root/angel-3.0.1-bin/bin/..//lib/netty-all-4.1.17.Final.jar,file:///root/angel-3.0.1-bin/bin/..//lib/angel-ps-mllib-3.0.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/angel-ps-tools-3.0.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/scala-reflect-2.11.8.jar,file:///root/angel-3.0.1-bin/bin/..//lib/memory-0.8.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/sketches-core-0.8.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/commons-pool-1.6.jar,file:///root/angel-3.0.1-bin/bin/..//lib/kryo-shaded-4.0.0.jar,file:///root/angel-3.0.1-bin/bin/..//lib/kryo-serializers-0.42.jar,file:///root/angel-3.0.1-bin/bin/..//lib/scala-library-2.11.8.jar,file:///root/angel-3.0.1-bin/bin/..//lib/angel-ps-core-3.0.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/angel-ps-psf-3.0.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/fastutil-7.1.0.jar,file:///root/angel-3.0.1-bin/bin/..//lib/sizeof-0.3.0.jar,file:///root/angel-3.0.1-bin/bin/..//lib/minlog-1.3.0.jar,file:///root/angel-3.0.1-bin/bin/..//lib/breeze_2.11-0.13.jar,file:///root/angel-3.0.1-bin/bin/..//lib/angel-format-0.1.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/angel-math-0.1.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/angel-mlcore-0.1.2.jar,file:///root/angel-3.0.1-bin/bin/..//lib/commons-math-2.2.jar
22/01/12 11:20:32 INFO utils.AngelRunJar: angel python file: null
22/01/12 11:20:32 INFO utils.UGITools: UGI_PROPERTY_NAME is null
22/01/12 11:20:32 INFO utils.AngelRunJar: submitClass: com.tencent.angel.ml.core.graphsubmit.GraphRunner
22/01/12 11:20:33 INFO utils.UGITools: UGI_PROPERTY_NAME is null
22/01/12 11:20:33 INFO yarn.AngelYarnClient: userName:root,stagingDir:,stagingDir, conf:Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, /usr/hdp/3.1.4.0-315/hadoop/etc/hadoop/yarn-site.xml, /usr/hdp/3.1.4.0-315/hadoop/etc/hadoop/hdfs-site.xml, /root/angel-3.0.1-bin/conf/angel-site.xml
22/01/12 11:20:33 INFO client.RMProxy: Connecting to ResourceManager at slaves01/172.13.1.95:8050
22/01/12 11:20:34 INFO client.AngelClient: running mode = ANGEL_PS_WORKER
22/01/12 11:20:34 INFO utils.HdfsUtil: tmp output dir is hdfs://slaves01:8020/tmp/root/application_1641874296476_0015_a84d16c7-bddf-4ab9-905e-ff2eac84d361
22/01/12 11:20:34 INFO utils.HdfsUtil: tmp output dir is hdfs://slaves01:8020/tmp/root/application_1641874296476_0015_5cef412a-67a2-4cb2-bbc4-a684f3f0b280
22/01/12 11:20:34 INFO client.AngelClient: angel.tmp.output.path=hdfs://slaves01:8020/tmp/root/application_1641874296476_0015_a84d16c7-bddf-4ab9-905e-ff2eac84d361
22/01/12 11:20:34 INFO client.AngelClient: internal state file is hdfs://slaves01:8020/tmp/root/application_1641874296476_0015_5cef412a-67a2-4cb2-bbc4-a684f3f0b280/state
22/01/12 11:20:34 INFO yarn.AngelYarnClient: default FileSystem: hdfs://slaves01:8020
22/01/12 11:20:34 INFO yarn.AngelYarnClient: libjarsDir=/tmp/hadoop-yarn/root/.staging/application_1641874296476_0015/libjars
22/01/12 11:20:34 INFO yarn.AngelYarnClient: libjars=file:///root/angel-3.0.1-bin/bin/..//lib/angel-math-0.1.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/angel-format-0.1.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/angel-mlcore-0.1.2.jar,file:///root/angel-3.0.1-bin/bin/..//lib/jniloader-1.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/native_system-java-1.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/arpack_combined_all-0.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/all-1.1.2.pom,file:///root/angel-3.0.1-bin/bin/..//lib/core-1.1.2.jar,file:///root/angel-3.0.1-bin/bin/..//lib/netlib-native_ref-linux-armhf-1.1-natives.jar,file:///root/angel-3.0.1-bin/bin/..//lib/netlib-native_ref-linux-i686-1.1-natives.jar,file:///root/angel-3.0.1-bin/bin/..//lib/netlib-native_ref-linux-x86_64-1.1-natives.jar,file:///root/angel-3.0.1-bin/bin/..//lib/netlib-native_system-linux-armhf-1.1-natives.jar,file:///root/angel-3.0.1-bin/bin/..//lib/netlib-native_system-linux-i686-1.1-natives.jar,file:///root/angel-3.0.1-bin/bin/..//lib/netlib-native_system-linux-x86_64-1.1-natives.jar,file:///root/angel-3.0.1-bin/bin/..//lib/jackson-annotations-2.6.5.jar,file:///root/angel-3.0.1-bin/bin/..//lib/jackson-core-2.7.7.jar,file:///root/angel-3.0.1-bin/bin/..//lib/jackson-core-asl-1.9.13.jar,file:///root/angel-3.0.1-bin/bin/..//lib/jackson-databind-2.7.7.jar,file:///root/angel-3.0.1-bin/bin/..//lib/jackson-jaxrs-1.9.13.jar,file:///root/angel-3.0.1-bin/bin/..//lib/jackson-mapper-asl-1.9.13.jar,file:///root/angel-3.0.1-bin/bin/..//lib/jackson-module-paranamer-2.6.5.jar,file:///root/angel-3.0.1-bin/bin/..//lib/jackson-module-scala_2.11-2.6.5.jar,file:///root/angel-3.0.1-bin/bin/..//lib/jackson-xc-1.9.13.jar,file:///root/angel-3.0.1-bin/bin/..//lib/json4s-ast_2.11-3.2.11.jar,file:///root/angel-3.0.1-bin/bin/..//lib/json4s-core_2.11-3.2.11.jar,file:///root/angel-3.0.1-bin/bin/..//lib/json4s-jackson_2.11-3.2.11.jar,file:///root/angel-3.0.1-bin/bin/..//lib/netty-all-4.1.17.Final.jar,file:///root/angel-3.0.1-bin/bin/..//lib/angel-ps-mllib-3.0.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/angel-ps-tools-3.0.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/scala-reflect-2.11.8.jar,file:///root/angel-3.0.1-bin/bin/..//lib/memory-0.8.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/sketches-core-0.8.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/commons-pool-1.6.jar,file:///root/angel-3.0.1-bin/bin/..//lib/kryo-shaded-4.0.0.jar,file:///root/angel-3.0.1-bin/bin/..//lib/kryo-serializers-0.42.jar,file:///root/angel-3.0.1-bin/bin/..//lib/scala-library-2.11.8.jar,file:///root/angel-3.0.1-bin/bin/..//lib/angel-ps-core-3.0.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/angel-ps-psf-3.0.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/fastutil-7.1.0.jar,file:///root/angel-3.0.1-bin/bin/..//lib/sizeof-0.3.0.jar,file:///root/angel-3.0.1-bin/bin/..//lib/minlog-1.3.0.jar,file:///root/angel-3.0.1-bin/bin/..//lib/breeze_2.11-0.13.jar,file:///root/angel-3.0.1-bin/bin/..//lib/angel-format-0.1.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/angel-math-0.1.1.jar,file:///root/angel-3.0.1-bin/bin/..//lib/angel-mlcore-0.1.2.jar,file:///root/angel-3.0.1-bin/bin/..//lib/commons-math-2.2.jar
22/01/12 11:20:36 INFO conf.Configuration: found resource resource-types.xml at file:/etc/hadoop/3.1.4.0-315/0/resource-types.xml
AppMaster capability = <memory:2048, vCores:1>
22/01/12 11:20:36 INFO yarn.AngelYarnClient: Command to launch container for ApplicationMaster is : $JAVA_HOME/bin/java -Dlog4j.configuration=log/angel.properties -Dlog4j.logger.com.tencent.ml=DEBUG -Dyarn.app.container.log.dir=<LOG_DIR> -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Xmx1536M -Xms1536M -XX:PermSize=100M -XX:MaxPermSize=200M -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintCommandLineFlags -XX:+PrintTenuringDistribution -XX:+PrintAdaptiveSizePolicy -Xloggc:<LOG_DIR>/gc.log com.tencent.angel.master.AngelApplicationMaster 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr
22/01/12 11:20:36 INFO util.AngelApps: yarn.application.classpath=$HADOOP_CONF_DIR,/usr/hdp/3.1.4.0-315/hadoop/,/usr/hdp/3.1.4.0-315/hadoop/lib/,/usr/hdp/current/hadoop-hdfs-client/,/usr/hdp/current/hadoop-hdfs-client/lib/,/usr/hdp/current/hadoop-yarn-client/,/usr/hdp/current/hadoop-yarn-client/lib/
22/01/12 11:20:36 INFO util.AngelApps: mapreduce.application.classpath=$PWD/mr-framework/hadoop/share/hadoop/mapreduce/:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/:$PWD/mr-framework/hadoop/share/hadoop/common/:$PWD/mr-framework/hadoop/share/hadoop/common/lib/:$PWD/mr-framework/hadoop/share/hadoop/yarn/:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/:$PWD/mr-framework/hadoop/share/hadoop/hdfs/:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp//hadoop/lib/hadoop-lzo-0.6.0..jar:/etc/hadoop/conf/secure
22/01/12 11:20:36 INFO yarn.AngelYarnClient: ApplicationSubmissionContext Queuename : default
22/01/12 11:20:37 INFO impl.YarnClientImpl: Submitted application application_1641874296476_0015
22/01/12 11:20:43 ERROR yarn.AngelYarnClient: submit application to yarn failed.
java.io.IOException: Failed to run job : Application application_1641874296476_0015 failed 2 times due to AM Container for appattempt_1641874296476_0015_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2022-01-12 11:20:42.801]Exception from container-launch.
Container id: container_1641874296476_0015_02_000001
Exit code: 1

@ifeela
Copy link
Author

ifeela commented Jan 12, 2022

I solved this problem by add all lib jars to angel-site.xml -> angel.job.libjars
but i got a new problem.

@ifeela
Copy link
Author

ifeela commented Jan 12, 2022

PSAttempt_0_0 failed due to: [2022-01-12 14:56:33.901]Exception from container-launch.
Container id: container_e01_1641970481943_0001_01_000002
Exit code: 1
[2022-01-12 14:56:33.906]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/hadoop/yarn/local/usercache/root/appcache/application_1641970481943_0001/container_e01_1641970481943_0001_01_000002/launch_container.sh: line 39: $PWD:$PWD/:$PWD/all-1.1.2.pom:$HADOOP_CONF_DIR:/root/angel-3.0.1-bin/:/root/angel-3.0.1-bin/lib/:/usr/hdp/3.1.4.0-315/hadoop/:/usr/hdp/3.1.4.0-315/hadoop/lib/:/usr/hdp/current/hadoop-hdfs-client/:/usr/hdp/current/hadoop-hdfs-client/lib/:/usr/hdp/current/hadoop-yarn-client/:/usr/hdp/current/hadoop-yarn-client/lib/:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/:$PWD/mr-framework/hadoop/share/hadoop/common/:$PWD/mr-framework/hadoop/share/hadoop/common/lib/:$PWD/mr-framework/hadoop/share/hadoop/yarn/:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/:$PWD/mr-framework/hadoop/share/hadoop/hdfs/:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
[2022-01-12 14:56:33.907]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/hadoop/yarn/local/usercache/root/appcache/application_1641970481943_0001/container_e01_1641970481943_0001_01_000002/launch_container.sh: line 39: $PWD:$PWD/:$PWD/all-1.1.2.pom:$HADOOP_CONF_DIR:/root/angel-3.0.1-bin/:/root/angel-3.0.1-bin/lib/:/usr/hdp/3.1.4.0-315/hadoop/:/usr/hdp/3.1.4.0-315/hadoop/lib/:/usr/hdp/current/hadoop-hdfs-client/:/usr/hdp/current/hadoop-hdfs-client/lib/:/usr/hdp/current/hadoop-yarn-client/:/usr/hdp/current/hadoop-yarn-client/lib/:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/:$PWD/mr-framework/hadoop/share/hadoop/common/:$PWD/mr-framework/hadoop/share/hadoop/common/lib/:$PWD/mr-framework/hadoop/share/hadoop/yarn/:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/:$PWD/mr-framework/hadoop/share/hadoop/hdfs/:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
PSAttempt_0_1 failed due to: [2022-01-12 14:56:52.949]Exception from container-launch.
Container id: container_e01_1641970481943_0001_01_000003
Exit code: 1

@ifeela
Copy link
Author

ifeela commented Jan 12, 2022

I solved this problem by add -Dhdp.version=xxx to ${HADOOP_OPT} in angel-submit

@ifeela
Copy link
Author

ifeela commented Jan 12, 2022

A new Problem

@ifeela
Copy link
Author

ifeela commented Jan 12, 2022

2022-01-12 16:32:12,467 INFO [ML-server-3-2] com.tencent.angel.master.MasterService: loc == null || container == null
2022-01-12 16:32:13,471 INFO [ML-server-3-2] com.tencent.angel.master.MasterService: loc == null || container == null
2022-01-12 16:32:14,476 INFO [ML-server-3-2] com.tencent.angel.master.MasterService: loc == null || container == null
2022-01-12 16:32:15,484 INFO [ML-server-3-2] com.tencent.angel.master.MasterService: loc == null || container == null
2022-01-12 16:32:15,733 INFO [RMCommunicator Allocator] com.tencent.angel.master.deploy.ContainerAllocator: ask request={AllocationRequestId: -1, Priority: 20, Capability: <memory:2048, vCores:1>, # Containers: 0, Location: *, Relax Locality: true, Execution Type Request: {Execution Type: GUARANTEED, Enforce Execution Type: false}, Node Label Expression: null}
2022-01-12 16:32:15,734 INFO [RMCommunicator Allocator] com.tencent.angel.master.deploy.ContainerAllocator: ask request={AllocationRequestId: -1, Priority: 20, Capability: <memory:2048, vCores:1>, # Containers: 0, Location: /default-rack, Relax Locality: true, Execution Type Request: {Execution Type: GUARANTEED, Enforce Execution Type: false}, Node Label Expression: null}
2022-01-12 16:32:15,734 INFO [RMCommunicator Allocator] com.tencent.angel.master.deploy.ContainerAllocator: ask request={AllocationRequestId: -1, Priority: 20, Capability: <memory:2048, vCores:1>, # Containers: 0, Location: slaves02, Relax Locality: true, Execution Type Request: {Execution Type: GUARANTEED, Enforce Execution Type: false}, Node Label Expression: null}
2022-01-12 16:32:15,734 INFO [RMCommunicator Allocator] com.tencent.angel.master.deploy.ContainerAllocator: ask request={AllocationRequestId: -1, Priority: 20, Capability: <memory:2048, vCores:1>, # Containers: 0, Location: slaves03, Relax Locality: true, Execution Type Request: {Execution Type: GUARANTEED, Enforce Execution Type: false}, Node Label Expression: null}
2022-01-12 16:32:15,734 INFO [RMCommunicator Allocator] com.tencent.angel.master.deploy.ContainerAllocator: ask request={AllocationRequestId: -1, Priority: 20, Capability: <memory:2048, vCores:1>, # Containers: 0, Location: slaves04, Relax Locality: true, Execution Type Request: {Execution Type: GUARANTEED, Enforce Execution Type: false}, Node Label Expression: null}
2022-01-12 16:32:15,844 INFO [ML-server-3-4] com.tencent.angel.master.MasterService: PSAgent register:psAgentId: 1
location {
ip: "172.13.1.96"
port: 27943
}

2022-01-12 16:32:16,488 INFO [ML-server-3-2] com.tencent.angel.master.MasterService: loc == null || container == null
2022-01-12 16:32:16,776 INFO [ML-server-3-2] com.tencent.angel.master.MasterService: keep alive clientId: 1

2022-01-12 16:32:17,490 INFO [ML-server-3-2] com.tencent.angel.master.MasterService: loc == null || container == null
2022-01-12 16:32:18,492 INFO [ML-server-3-2] com.tencent.angel.master.MasterService: loc == null || container == null
2022-01-12 16:32:19,494 INFO [ML-server-3-2] com.tencent.angel.master.MasterService: loc == null || container == null
2022-01-12 16:32:20,497 INFO [ML-server-3-2] com.tencent.angel.master.MasterService: loc == null || container == null
2022-01-12 16:32:21,499 INFO [ML-server-3-2] com.tencent.angel.master.MasterService: loc == null || container == null
2022-01-12 16:32:21,779 INFO [ML-server-3-2] com.tencent.angel.master.MasterService: keep alive clientId: 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant