Yarn web UI is not running
I'm working with Hadoop 2.9 with a very minimal configuration. In particular, I didn't set any ports, everything is running with default ports.
When I start Hadoop with the command start-all.sh, the Yarn ResourceManager and NodeManager are running, according to the result of command ps -ef | grep yarn.
Yet the Yarn web UI doesn't seem to be running as netstat -l returns no port 8088 listening. I haven't figured out how it is supposed to start. Is there a configuration parameter that will fire it ?
See also questions close to this topic
-
How to synchronise RDBMS data with HDFS data
I have Oracle database comprising 300 tables and all type of DML operations (insert/Update/delete) are performed on these tables . I have moved move my present data from RDBMS to HDFS using Sqoop . Now I want to synchronise real time data with HDFS data whenever any DML operation is performed. Can I use Kafka for this purpose and will it support update and delete operations.
-
Java Mapper: finding the number of transactions who purchased both item IDs 21 and 27
I would like to create a map function for a retail dataset with input key as long integer offset and input value as a line of text. The output key of the map is the text Both_21_27 and the output value is constant integer value 1.
In the map, two boolean variables item_21 and item_27 should be created and initialized to false. After changing value to string, StringTokenizer is used to have strings into tokens.
With each token, each token has to be iterated to see if it matches with 21 or 27. If there is a match, the corresponding boolean variable is changed to true. The switch condition can be used for checking.
After going through all the tokens, both the boolean variables should be true or not. If both boolean variables are false or one is true and one is false, then else return; statement should be used to skip the transaction and move on to the next.
A sample retail dataset is shown below:
2 7 15 21 32 41 5 14 19 21 25 27 45 57 62 75 80 1 3 7 15 19 21 26 27 35 44 54 2 9 16 24 35 41 49 57 68 72 88 4 23 31 33 42 45 67 73 92 9 12 18 21 22 24 27 43 74 15 19 45 47 53 58 64 79 83 94 99 107 3 7 15 17 21 23 26 27 33 42 44 47 49 55 62 77 82
Here is what I tried so far:
import java.io.IOException; import java.util.StringTokenizer; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class RetailMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private Text Both_21_27 = new Text(); private final static IntWritable one = new IntWritable(1); @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { boolean item_21 = false; boolean item_27 = false; StringTokenizer item = new StringTokenizer(value.toString()); while (item.hasMoreTokens()) { Both_21_27.set(item.nextToken()); context.write(Both_21_27, one); } switch(item) { case 21 : item_21 = true; break; case 27 : item_27 = true; break; } if (item_21 = true && item_27 = true) { context.write(Both_21_27, one); else return; } } }
I am stuck with this map function. Any help, advice, suggestions?
-
Unable to print PARQUET in SPARK SCALA
Folks, i am trying to print simple PARQUET file as below with 2 different approaches and getting the same output
Approach 1)
val sparkConf = new SparkConf().setAppName("Json Test").setMaster("local[*]") val sc = new SparkContext(sparkConf) val sqlContext = new org.apache.spark.sql.SQLContext(sc) import sqlContext.implicits._ val df = sqlContext.read.parquet(args(0)) -->df is perfect and can view fields. df.show() --> throws below error*************,
Approach 2 )
val spark = SparkSession.builder().appName("SomeAppName").config("spark.master", "local").getOrCreate(); import spark.implicits._ val ds=spark.read.format("com.databricks.spark.parquet").parquet(args(0)) ds.show() --> throws same below error***********,
I have searched around alot and tired to find the solution. Can someone please help pointing me the issue. A second eye sometimes immediately catch the error
Exception:-
java.lang.NoSuchMethodError: org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback()Lscala/Option; at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.<init>(FileScanRDD.scala:78) at org.apache.spark.sql.execution.datasources.FileScanRDD.compute(FileScanRDD.scala:71) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 18/04/25 23:35:22 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): java.lang.NoSuchMethodError: org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback()Lscala/Option; at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.<init>(FileScanRDD.scala:78) at org.apache.spark.sql.execution.datasources.FileScanRDD.compute(FileScanRDD.scala:71) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 18/04/25 23:35:22 ERROR TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job 18/04/25 23:35:22 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 18/04/25 23:35:22 INFO TaskSchedulerImpl: Cancelling stage 1 18/04/25 23:35:22 INFO DAGScheduler: ResultStage 1 (show at allocationsReading.scala:25) failed in 0.038 s due to Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): java.lang.NoSuchMethodError: org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback()Lscala/Option; at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.<init>(FileScanRDD.scala:78) at org.apache.spark.sql.execution.datasources.FileScanRDD.compute(FileScanRDD.scala:71) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: 18/04/25 23:35:22 INFO DAGScheduler: Job 1 failed: show at allocationsReading.scala:25, took 0.061315 s Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): java.lang.NoSuchMethodError: org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback()Lscala/Option; at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.<init>(FileScanRDD.scala:78) at org.apache.spark.sql.execution.datasources.FileScanRDD.compute(FileScanRDD.scala:71) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:333) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2371) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57) at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2765) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2370) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2377) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2113) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2112) at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2795) at org.apache.spark.sql.Dataset.head(Dataset.scala:2112) at org.apache.spark.sql.Dataset.take(Dataset.scala:2327) at org.apache.spark.sql.Dataset.showString(Dataset.scala:248) at org.apache.spark.sql.Dataset.show(Dataset.scala:636) at org.apache.spark.sql.Dataset.show(Dataset.scala:595) at org.apache.spark.sql.Dataset.show(Dataset.scala:604) at com.rbccm.Allocations.allocationsReading$.main(allocationsReading.scala:25) at com.rbccm.Allocations.allocationsReading.main(allocationsReading.scala) Caused by: java.lang.NoSuchMethodError: org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback()Lscala/Option; at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.<init>(FileScanRDD.scala:78) at org.apache.spark.sql.execution.datasources.FileScanRDD.compute(FileScanRDD.scala:71) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 18/04/25 23:35:22 INFO SparkContext: Invoking stop() from shutdown hook 18/04/25 23:35:22 INFO SparkUI: Stopped Spark web UI at http://10.209.231.197:4040
-
webpack build failing: apparent dependency mismatches, how to resolve?
I'm converting a largish project to webpack and am encountering a build error (and in fact different errors depending on how I attempt to resolve). I'm not super skilled with webpack yet and would love some help with what the typical solution is to such problems.
There seems to be a conflict between some dependencies:
"@babel/core": "^7.0.0-beta.42", "@babel/preset-env": "^7.0.0-beta.42", "backbone.radio": "2.0.0",
Webpack complains:
Cannot find module 'babel-preset-es2015'
. Looks like backbone.radio has a .babelrc file with"presets": ["es2015"]
which babel seems to be trying to honor, even though it's a dependency, but it lists babel-preset-env as a devDependency so it's not found.Ok, so I try to install babel-preset-es2015@6.3.13 (the same one backbone.radio referenced) explicitly as a top-level dependency so it can be found. Then it finds it but webpack (or really babel) complains:
Error: Plugin/Preset files are not allowed to export objects, only functions.
Fine, so I attempt to install a newer version of the preset that's compatible with babel 7, and hopefully is compatible with backbone.radio. It seems the name has changed so I install @babel/preset-es2015@7.0.0-beta.42. Of course the name doesn't match so I add a webpack alias but that doesn't work and it still can't be found; I suppose babel doesn't load the preset dependency using babel and instead does it itself.
So now I'm stuck. What do people usually do to resolve these sorts of problems? Other potential ideas which I haven't determined are possible:
- tell babel (somehow?) to alias babel-preset-es2015 to @babel/preset-es2015?
- configure yarn (somehow?) to omit/remove backbone.radio's .babelrc file (the problem goes away entirely if that file is gone, but I can't just manually delete it from node_modules all the time of course)
-
Yarn installs latest version unwanted
in my package.json i have the following
"@types/webpack": "3.8.3", "@types/webpack-notifier": "1.5.1"
Webpack-notifier has the following in it's package.json
"dependencies": { "@types/webpack": "*" },
When i run npm install it installs 3.8.3 of @types/webpack overall and no other versions
But when i use yarn it installs @types/webpack@3.8.3 in the root and @types/webpack@4.1.4 in the node_modules of webpack-notifier. Which, since i'm using webpack 3, breaks my application
How can i prevent this ?
-
yarn run script with parameters
How do I pass a parameter? When I run "yarn generate" it will make both a "-p" and a "test" directory. But it works well when I run "mkdir -p test" in bash. I tried to [-p] as well but it only creates that directory.
"scripts": { "generate": "mkdir -p test" }
-
Remove item in a hierarchical structure in map reduce
I have a data set with below structure:
key1, {id1, id2, id3} key2, {id2, id3, id4, id5} key3, {id4} key4, {id5, id6, id7}
Now I have another data set containing ids that I need to remove in the above data structure. Suppose the data set contains id2 and id5. I want to get the below result after a map reduce job:
key1, {id1, id3} key2, {id3, id4} key3, {id4} key4, {id6, id7}
Both data sets are too big to fit into memory. Is there any way to get this done in one map reduce job?
-
I am running my first MapReduce program and there seems to be some permission issue with the output file
[mykishore231087@ip-172-31-20-58 ~]$ hadoop jar /usr/hdp/2.3.4.0-3485/hadoop-mapreduce/hadoop-streaming.jar \
-input mayank/data/upx/wc_data.txt \ -output /mayank/output/res.txt \ -file /home/mykishore231087/d/wordcount_mapper.py \ -file /home/mykishore231087/d/wordcount_reducer.py \ -mapper "/home/mykishore231087/d/wordcount_mapper.py" \ -reducer "/home/mykishore231087/d/wordcount_reducer.py" WARNING: Use "yarn jar" to launch YARN applications. 18/04/25 21:28:18 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [/home/mykishore231087/d/wordcount_mapper.py, /home/mykishore231087/d/wordcount_reducer.py] [/usr/hdp/2.3.4.0-3485/hadoop-mapreduce/hadoop-streaming -2.7.1.2.3.4.0-3485.jar] /tmp/streamjob8387910308600265451.jar tmpDir=null 18/04/25 21:28:20 INFO impl.TimelineClientImpl: Timeline service address: http://ip-172-31-13-154.ec2.internal:8188/ws/v1/timeline/ 18/04/25 21:28:20 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-53-48.ec2.internal/172.31.53.48:8050 18/04/25 21:28:21 INFO impl.TimelineClientImpl: Timeline service address: http://ip-172-31-13-154.ec2.internal:8188/ws/v1/timeline/ 18/04/25 21:28:21 INFO client.RMProxy: Connecting to ResourceManager at ip-172-31-53-48.ec2.internal/172.31.53.48:8050 18/04/25 21:28:22 INFO mapred.FileInputFormat: Total input paths to process : 1 18/04/25 21:28:22 INFO mapreduce.JobSubmitter: number of splits:2 18/04/25 21:28:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1524162637175_1044 18/04/25 21:28:23 INFO impl.YarnClientImpl: Submitted application application_1524162637175_1044 18/04/25 21:28:23 INFO mapreduce.Job: The url to track the job: http://a.cloudxlab.com:8088/proxy/application_1524162637175_1044/ 18/04/25 21:28:23 INFO mapreduce.Job: Running job: job_1524162637175_1044 18/04/25 21:28:32 INFO mapreduce.Job: Job job_1524162637175_1044 running in uber mode : false 18/04/25 21:28:32 INFO mapreduce.Job: map 0% reduce 0% 18/04/25 21:28:33 INFO mapreduce.Job: Job job_1524162637175_1044 failed with state FAILED due to: Job setup failed : org.apache.hadoop.security.AccessControlExcept ion: Permission denied: user=mykishore231087, access=WRITE, inode="/mayank/output/res.txt/_temporary/1":hdfs:hdfs:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1771) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1755) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1738) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3905) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1048) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:622) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:422) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3020) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2988) at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1057) at org.apache.hadoop.hdfs.DistributedFileSystem$21.doCall(DistributedFileSystem.java:1053) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1053) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1046) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1877) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.setupJob(FileOutputCommitter.java:343) at org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:131) at org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:265) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobSetup(CommitterEventHandler.java:254) at org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:234) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=mykishore231087, access=WRITE, inode=" /mayank/output/res.txt/_temporary/1":hdfs:hdfs:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213) at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1771) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1755) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1738) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3905) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1048) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:622) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) at org.apache.hadoop.ipc.Client.call(Client.java:1427) at org.apache.hadoop.ipc.Client.call(Client.java:1358) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy9.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:558) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy10.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3018) ... 15 more 18/04/25 21:28:33 INFO mapreduce.Job: Counters: 2 Job Counters Total time spent by all maps in occupied slots (ms)=0 Total time spent by all reduces in occupied slots (ms)=0 18/04/25 21:28:33 ERROR streaming.StreamJob: Job not successful! Streaming Command Failed!
-
To store web data crawled by nutch into hdfs
I have installed hadoop 2.x and nutch 1.x. can anyone guide me how to store nutch webcrawled data into hdfs? Like providing any documentation or any link regarding that configuration. Thank you all.