Issue
I am trying to develop Python client which interacts with Hadoop file system 3.3 using pyarrow package. My OS is CentOS 8 and IDE is Eclipse pydev. The sample code is simple.
from pyarrow import fs
hdfs = fs.HadoopFileSystem('localhost', 9000)
But the prerequisite command has to be executed for the successful python codes running. The command is
# export CLASSPATH=`$HADOOP_HOME/bin/hdfs classpath --glob`
Whenever I run Eclipse IDE, I have to execute the above export CLASSPATH
command before Eclipse. To prevent this inconvenience and to automate the IDE configuration, I make the shell script file like below,
#! /bin/bash
export CLASSPATH=$(/usr/local/hadoop/bin/hdfs classpath --glob)
echo $CLASSPATH
When I execute this script, output messages from shell script are correct. But when I execute the Eclipse on the next line, the above pyarrow Python codes do not work successfully. I have no idea of this failure.
When I execute the export CLASSPATH
command on the shell itself, the Python codes on Eclipse work correctly. But when I execute shell script, the Eclipse Python codes throws errors.
Update
I enter the [Run Configurations] menu and choose [Environment] tab. I click [Add] and input the below values onto [New Environment Variable] Dialog.
Name : CLASSPATH
Value : $(/usr/local/hadoop/bin/hdfs classpath --glob)
But my configuration is not correct, the same errors are still brought again.
I attach the image of my IDE configuration.
Solution
As mentioned in comments, in eclipe's "Run Configurations" menu, you need to add en environment variable entry for called CLASSPATH
.
The value must be whatever the shell returns when you run this command:
/usr/local/hadoop/bin/hdfs classpath --glob
Don't try to put $(/usr/local/hadoop/bin/hdfs classpath --glob)
, it will be interpreted as a string an won't work.
Answered By - 0x26res