Spark error:Exception in thread “main” org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://10.158.1.112:8020/spark/01/out already exists
错误如题所示,具体如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 |
[hdfs@cdh3 java-spark]$ spark-submit --class com.dprototype.cn.SparkMe --executor-memory 1G --total-executor-cores 4 /software/java-spark/SparkMe-1.0-SNAPSHOT.jar hdfs://10.158.1.112:8020/bigdata/01/shakespeare.txt hdfs://10.158.1.112:8020/spark/01/out 18/05/22 16:36:29 INFO spark.SparkContext: Running Spark version 1.6.0 18/05/22 16:36:31 INFO spark.SecurityManager: Changing view acls to: hdfs 18/05/22 16:36:31 INFO spark.SecurityManager: Changing modify acls to: hdfs 18/05/22 16:36:31 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs); users with modify permissions: Set(hdfs) 18/05/22 16:36:32 INFO util.Utils: Successfully started service 'sparkDriver' on port 38936. 18/05/22 16:36:32 INFO slf4j.Slf4jLogger: Slf4jLogger started 18/05/22 16:36:32 INFO Remoting: Starting remoting 18/05/22 16:36:33 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.158.1.116:38762] 18/05/22 16:36:33 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@10.158.1.116:38762] 18/05/22 16:36:33 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 38762. 18/05/22 16:36:33 INFO spark.SparkEnv: Registering MapOutputTracker 18/05/22 16:36:33 INFO spark.SparkEnv: Registering BlockManagerMaster 18/05/22 16:36:33 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-6b35ef71-fed4-4a3d-b583-b4dea07f47e4 18/05/22 16:36:33 INFO storage.MemoryStore: MemoryStore started with capacity 534.5 MB 18/05/22 16:36:33 INFO spark.SparkEnv: Registering OutputCommitCoordinator 18/05/22 16:36:34 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 18/05/22 16:36:34 INFO ui.SparkUI: Started SparkUI at http://10.158.1.116:4040 18/05/22 16:36:34 INFO spark.SparkContext: Added JAR file:/software/java-spark/SparkMe-1.0-SNAPSHOT.jar at spark://10.158.1.116:38936/jars/SparkMe-1.0-SNAPSHOT.jar with timestamp 1526978194585 18/05/22 16:36:34 INFO client.RMProxy: Connecting to ResourceManager at cdh1/10.158.1.112:8032 18/05/22 16:36:35 INFO yarn.Client: Requesting a new application from cluster with 4 NodeManagers 18/05/22 16:36:35 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (1536 MB per container) 18/05/22 16:36:35 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 18/05/22 16:36:35 INFO yarn.Client: Setting up container launch context for our AM 18/05/22 16:36:35 INFO yarn.Client: Setting up the launch environment for our AM container 18/05/22 16:36:35 INFO yarn.Client: Preparing resources for our AM container 18/05/22 16:36:36 INFO yarn.Client: Uploading resource file:/tmp/spark-3195aabe-e274-4c2c-9033-8c27f017ee75/__spark_conf__6353918070191800071.zip -> hdfs://cdh1:8020/user/hdfs/.sparkStaging/application_1526975233549_0003/__spark_conf__6353918070191800071.zip 18/05/22 16:36:37 INFO spark.SecurityManager: Changing view acls to: hdfs 18/05/22 16:36:37 INFO spark.SecurityManager: Changing modify acls to: hdfs 18/05/22 16:36:37 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs); users with modify permissions: Set(hdfs) 18/05/22 16:36:37 INFO yarn.Client: Submitting application 3 to ResourceManager 18/05/22 16:36:38 INFO impl.YarnClientImpl: Submitted application application_1526975233549_0003 18/05/22 16:36:39 INFO yarn.Client: Application report for application_1526975233549_0003 (state: ACCEPTED) 18/05/22 16:36:39 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: root.users.hdfs start time: 1526978197750 final status: UNDEFINED tracking URL: http://cdh1:8088/proxy/application_1526975233549_0003/ user: hdfs 18/05/22 16:36:40 INFO yarn.Client: Application report for application_1526975233549_0003 (state: ACCEPTED) 18/05/22 16:36:41 INFO yarn.Client: Application report for application_1526975233549_0003 (state: ACCEPTED) 18/05/22 16:36:42 INFO yarn.Client: Application report for application_1526975233549_0003 (state: ACCEPTED) 18/05/22 16:36:43 INFO yarn.Client: Application report for application_1526975233549_0003 (state: ACCEPTED) 18/05/22 16:36:44 INFO yarn.Client: Application report for application_1526975233549_0003 (state: ACCEPTED) 18/05/22 16:36:45 INFO yarn.Client: Application report for application_1526975233549_0003 (state: ACCEPTED) 18/05/22 16:36:46 INFO yarn.Client: Application report for application_1526975233549_0003 (state: ACCEPTED) 18/05/22 16:36:47 INFO yarn.Client: Application report for application_1526975233549_0003 (state: ACCEPTED) 18/05/22 16:36:48 INFO yarn.Client: Application report for application_1526975233549_0003 (state: ACCEPTED) 18/05/22 16:36:49 INFO yarn.Client: Application report for application_1526975233549_0003 (state: ACCEPTED) 18/05/22 16:36:50 INFO yarn.Client: Application report for application_1526975233549_0003 (state: ACCEPTED) 18/05/22 16:36:51 INFO yarn.Client: Application report for application_1526975233549_0003 (state: ACCEPTED) 18/05/22 16:36:52 INFO yarn.Client: Application report for application_1526975233549_0003 (state: ACCEPTED) 18/05/22 16:36:53 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null) 18/05/22 16:36:53 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> cdh1, PROXY_URI_BASES -> http://cdh1:8088/proxy/application_1526975233549_0003), /proxy/application_1526975233549_0003 18/05/22 16:36:53 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 18/05/22 16:36:53 INFO yarn.Client: Application report for application_1526975233549_0003 (state: ACCEPTED) 18/05/22 16:36:54 INFO yarn.Client: Application report for application_1526975233549_0003 (state: ACCEPTED) 18/05/22 16:36:55 INFO yarn.Client: Application report for application_1526975233549_0003 (state: RUNNING) 18/05/22 16:36:55 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: 10.158.1.122 ApplicationMaster RPC port: 0 queue: root.users.hdfs start time: 1526978197750 final status: UNDEFINED tracking URL: http://cdh1:8088/proxy/application_1526975233549_0003/ user: hdfs 18/05/22 16:36:55 INFO cluster.YarnClientSchedulerBackend: Application application_1526975233549_0003 has started running. 18/05/22 16:36:55 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41426. 18/05/22 16:36:55 INFO netty.NettyBlockTransferService: Server created on 41426 18/05/22 16:36:55 INFO storage.BlockManager: external shuffle service port = 7337 18/05/22 16:36:55 INFO storage.BlockManagerMaster: Trying to register BlockManager 18/05/22 16:36:55 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.158.1.116:41426 with 534.5 MB RAM, BlockManagerId(driver, 10.158.1.116, 41426) 18/05/22 16:36:55 INFO storage.BlockManagerMaster: Registered BlockManager 18/05/22 16:36:55 INFO scheduler.EventLoggingListener: Logging events to hdfs://cdh1:8020/user/spark/applicationHistory/application_1526975233549_0003 18/05/22 16:36:55 INFO spark.SparkContext: Registered listener com.cloudera.spark.lineage.ClouderaNavigatorListener 18/05/22 16:36:55 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 18/05/22 16:36:57 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 213.4 KB, free 534.3 MB) 18/05/22 16:36:57 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 25.9 KB, free 534.3 MB) 18/05/22 16:36:57 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.158.1.116:41426 (size: 25.9 KB, free: 534.5 MB) 18/05/22 16:36:57 INFO spark.SparkContext: Created broadcast 0 from textFile at SparkMe.scala:9 18/05/22 16:36:57 INFO mapred.FileInputFormat: Total input paths to process : 1 18/05/22 16:36:58 INFO spark.SparkContext: Starting job: sortBy at SparkMe.scala:9 18/05/22 16:36:58 INFO scheduler.DAGScheduler: Registering RDD 3 (map at SparkMe.scala:9) 18/05/22 16:36:58 INFO scheduler.DAGScheduler: Got job 0 (sortBy at SparkMe.scala:9) with 2 output partitions 18/05/22 16:36:58 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (sortBy at SparkMe.scala:9) 18/05/22 16:36:58 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0) 18/05/22 16:36:58 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0) 18/05/22 16:36:58 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[3] at map at SparkMe.scala:9), which has no missing parents 18/05/22 16:36:58 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 4.2 KB, free 534.3 MB) 18/05/22 16:36:58 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.3 KB, free 534.3 MB) 18/05/22 16:36:58 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.158.1.116:41426 (size: 2.3 KB, free: 534.5 MB) 18/05/22 16:36:58 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1004 18/05/22 16:36:58 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[3] at map at SparkMe.scala:9) (first 15 tasks are for partitions Vector(0, 1)) 18/05/22 16:36:58 INFO cluster.YarnScheduler: Adding task set 0.0 with 2 tasks 18/05/22 16:36:59 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1) 18/05/22 16:37:00 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 2) 18/05/22 16:37:11 INFO cluster.YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (cdh3:58484) with ID 1 18/05/22 16:37:11 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, cdh3, executor 1, partition 0, RACK_LOCAL, 2208 bytes) 18/05/22 16:37:11 INFO spark.ExecutorAllocationManager: New executor 1 has registered (new total is 1) 18/05/22 16:37:11 INFO storage.BlockManagerMasterEndpoint: Registering block manager cdh3:40403 with 534.5 MB RAM, BlockManagerId(1, cdh3, 40403) 18/05/22 16:37:14 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on cdh3:40403 (size: 2.3 KB, free: 534.5 MB) 18/05/22 16:37:15 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on cdh3:40403 (size: 25.9 KB, free: 534.5 MB) 18/05/22 16:37:16 INFO cluster.YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (cdh2:39944) with ID 2 18/05/22 16:37:16 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, cdh2, executor 2, partition 1, NODE_LOCAL, 2208 bytes) 18/05/22 16:37:16 INFO spark.ExecutorAllocationManager: New executor 2 has registered (new total is 2) 18/05/22 16:37:16 INFO storage.BlockManagerMasterEndpoint: Registering block manager cdh2:34455 with 534.5 MB RAM, BlockManagerId(2, cdh2, 34455) 18/05/22 16:37:19 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on cdh2:34455 (size: 2.3 KB, free: 534.5 MB) 18/05/22 16:37:20 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on cdh2:34455 (size: 25.9 KB, free: 534.5 MB) 18/05/22 16:37:21 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 10022 ms on cdh3 (executor 1) (1/2) 18/05/22 16:37:26 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (map at SparkMe.scala:9) finished in 28.178 s 18/05/22 16:37:26 INFO scheduler.DAGScheduler: looking for newly runnable stages 18/05/22 16:37:26 INFO scheduler.DAGScheduler: running: Set() 18/05/22 16:37:26 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1) 18/05/22 16:37:26 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 10319 ms on cdh2 (executor 2) (2/2) 18/05/22 16:37:26 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 18/05/22 16:37:26 INFO scheduler.DAGScheduler: failed: Set() 18/05/22 16:37:26 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[7] at sortBy at SparkMe.scala:9), which has no missing parents 18/05/22 16:37:26 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.6 KB, free 534.3 MB) 18/05/22 16:37:26 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.0 KB, free 534.3 MB) 18/05/22 16:37:26 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.158.1.116:41426 (size: 2.0 KB, free: 534.5 MB) 18/05/22 16:37:26 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1004 18/05/22 16:37:26 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[7] at sortBy at SparkMe.scala:9) (first 15 tasks are for partitions Vector(0, 1)) 18/05/22 16:37:26 INFO cluster.YarnScheduler: Adding task set 1.0 with 2 tasks 18/05/22 16:37:26 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, cdh2, executor 2, partition 0, NODE_LOCAL, 1960 bytes) 18/05/22 16:37:26 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, cdh3, executor 1, partition 1, NODE_LOCAL, 1960 bytes) 18/05/22 16:37:26 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on cdh2:34455 (size: 2.0 KB, free: 534.5 MB) 18/05/22 16:37:26 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on cdh3:40403 (size: 2.0 KB, free: 534.5 MB) 18/05/22 16:37:26 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to cdh2:39944 18/05/22 16:37:26 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 148 bytes 18/05/22 16:37:27 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to cdh3:58484 18/05/22 16:37:28 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 1738 ms on cdh2 (executor 2) (1/2) 18/05/22 16:37:29 INFO scheduler.DAGScheduler: ResultStage 1 (sortBy at SparkMe.scala:9) finished in 2.413 s 18/05/22 16:37:29 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 2411 ms on cdh3 (executor 1) (2/2) 18/05/22 16:37:29 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 18/05/22 16:37:29 INFO scheduler.DAGScheduler: Job 0 finished: sortBy at SparkMe.scala:9, took 31.093104 s Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://10.158.1.112:8020/spark/01/out already exists at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:131) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1177) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1154) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1.apply(PairRDDFunctions.scala:1154) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:1154) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply$mcV$sp(PairRDDFunctions.scala:1060) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1026) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$4.apply(PairRDDFunctions.scala:1026) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:1026) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply$mcV$sp(PairRDDFunctions.scala:952) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:952) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopFile$1.apply(PairRDDFunctions.scala:952) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:951) at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply$mcV$sp(RDD.scala:1457) at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1436) at org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1.apply(RDD.scala:1436) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1436) at com.dprototype.cn.SparkMe$.main(SparkMe.scala:9) at com.dprototype.cn.SparkMe.main(SparkMe.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 18/05/22 16:37:29 INFO spark.SparkContext: Invoking stop() from shutdown hook 18/05/22 16:37:29 INFO ui.SparkUI: Stopped Spark web UI at http://10.158.1.116:4040 18/05/22 16:37:29 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread 18/05/22 16:37:29 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 18/05/22 16:37:29 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down 18/05/22 16:37:29 INFO cluster.YarnClientSchedulerBackend: Stopped 18/05/22 16:37:29 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 18/05/22 16:37:29 INFO storage.MemoryStore: MemoryStore cleared 18/05/22 16:37:29 INFO storage.BlockManager: BlockManager stopped 18/05/22 16:37:29 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 18/05/22 16:37:29 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 18/05/22 16:37:29 INFO spark.SparkContext: Successfully stopped SparkContext 18/05/22 16:37:29 INFO util.ShutdownHookManager: Shutdown hook called 18/05/22 16:37:29 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-3195aabe-e274-4c2c-9033-8c27f017ee75 [hdfs@cdh3 java-spark]$ |
该问题是,执行jar的时候,输出的目录不应该在HDFS中已存在,去掉已存在的目录就可以了,JAR会自动生成输出目录。
————————————————————
Done。