hdfs启动后datanode闪挂
阅读原文时间:2021年04月26日阅读:1

问题描述:

start-dfs.sh启动hdfs后,几个datanode全部几秒内挂掉,namenode运行正常

查看datanode日志

2017-08-06 10:46:00,800 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /var/data/dfs/data/in_use.lock acquired by nodename 2649@slave1
2017-08-06 10:46:00,803 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage directory [DISK]file:/var/data/dfs/data/
java.io.IOException: Incompatible clusterIDs in /var/data/dfs/data: namenode clusterID = CID-19750292-d801-47ee-be97-528c75ae79dc; datanode clusterID = CID-ba6a6bd1-8dc5-47c4-b078-4a1e506ae304
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:775)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:300)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:416)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:395)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:573)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1362)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1327)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:317)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:223)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:802)
        at java.lang.Thread.run(Thread.java:745)原因为多次多namenode格式化之后namenode clusterID会重新生成,导致namenode clusterID和namenode clusterID不一致

将namenode clusterID更新到namenode clusterID中

改完后查看namenode的clusterIP

[root@master current]# cat /var/data/dfs/name/current/VERSION
#Mon Aug 07 00:42:04 EDT 2017
namespaceID=599308608
clusterID=CID-555d9ec2-dd1a-4f0b-a5c0-c00cdd6d2758
cTime=0
storageType=NAME_NODE
blockpoolID=BP-818173535-192.168.56.100-1502080924621
layoutVersion=-63

查看namenode的clusterIP

[root@slave1 current]# cat /var/data/dfs/data/current/VERSION
#Sun Aug 06 13:20:17 EDT 2017
storageID=DS-c2a29878-83ef-4f33-b936-afda5f6c40bc
clusterID=CID-555d9ec2-dd1a-4f0b-a5c0-c00cdd6d2758
cTime=0
datanodeUuid=e2e46d95-e5a2-4e7b-9b84-91a015805ce9
storageType=DATA_NODE
layoutVersion=-56

重启后问题解决