场景

目标使用pyhton+jupyter_notebook + hdfs 访问HDFS存取文件

应用部署于Docker（18.09.0-3.el7）环境

代码

importpandas as pd

from hdfs import InsecureClient,Client,HdfsError

host ='http://node0:50070;http://node1:50070'

fileDir ='/data/nlp/uploads/'

fileName ='20190418170826_test01.xlsx'

client =Client(host)print(client.list(fileDir))

['20190418170826_test01.xlsx']

try:

withclient.read(fileDir+fileName) as hdfs_in_fs:

predictDF = pd.read_csv(hdfs_in_fs, names=predict_cols, index_col='Number')except HdfsErrorase:

print(e)

NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known',))

报错的重点在于，Max retries exceeded with url 和 Name or service not known达到最大尝试次数后，提示未知的地址或服务，说明客户端并没有与数据节点建立连接。

根据hdfs读流程的数据流向可以看到，当主节点确认hdfs中存在被请求的数据时，客户端时直接与子节点建立连接的，而在某个特定情况下，子节点和客户端可能不在同一网段，或者客户端/etc/host中没有记录子节点地址与机器名的映射，而导致连接失败。

docker中没有vi、vim编辑器，所以可以使用如下命令往hosts中添加

/bin/sh -c"echo <ip地址> node152 >> /etc/hosts"

亲测试有效，洗完有帮助。

坑无涯，勤作舟。