Issue
I'm running Hive on Hadoop, and successfully installed in on Google Cloud Storage using bdutil version 1.3.1. I ran the below command:
./bdutil -e platforms/hdp/ambari_env.sh deploy
As the hive user, I'm able to create / drop databases and tables without any trouble:
hive> create database db_final location 'gs://cip-hadoop-dev-data/apps/hive/warehouse/db_final';
OK
Time taken: 1.816 seconds
But I get the below error if I try to access the database as any other user:
hive> use db_final;
FAILED: SemanticException MetaException(message:java.security.AccessControlException: Permission denied: user=andy, path="gs://cip-hadoop-dev-data/apps/hive/warehouse/db_final":hive:hive:drwx------)
I can tell that it's an error with permissions as the file permissions are 700, which is listed above and confirmed from the command line:
[andy@hadoop-m ~]$ hdfs dfs -ls gs:///apps/hive/warehouse/
drwx------ - andy andy 0 2015-09-11 01:46 gs:///apps/hive/warehouse/db_final
I've tried changing the permissions on the file using the hdfs command, but they remain the same:
[andy@hadoop-m ~]$ sudo hdfs dfs -chmod 750 gs:///apps/hive/warehouse/db_final
[andy@hadoop-m ~]$ hdfs dfs -ls gs:///apps/hive/warehouse/
drwx------ - andy andy 0 2015-09-11 01:46 gs:///apps/hive/warehouse/db_final
I've also granted SELECT permissions on the database to the user, which succeeds, but I still get the same error when I try to use the database.
This seems kind of similar to this issue, but I'm using the latest version of bdutil, so I don't know if it's the same issue. I also confirmed that dfs.permissions.enabled was set to false.
So everything appears to work ok, if I run it as the hive user, but I don't want to send the hive username / password to everyone who needs to access the database.
What else should I try / look into?
Thanks for your help
Solution
Indeed, part of the problem is that the GCS connector doesn't actually have posix/hdfs permissions; it only reports static permissions, while it actually authenticates with oauth2 credentials which are not tied to the linux accounts on a GCE VM.
We semi-recently added a feature to allow modifying the reported permissions from the GCS connector with fs.gs.reported.permissions
: https://github.com/GoogleCloudPlatform/bigdata-interop/commit/93637a136cdb7354b1a93cc3c7a61c42b0bc78a6
It hasn't been released in an official release yet, but you could try to build a snapshot following the instructions here: https://github.com/GoogleCloudPlatform/bigdata-interop
mvn -P hadoop2 package
And then replacing the existing GCS connector jarfile with your new build. Alternatively, for a quick test you can use a temporary snapshot build we have, just keep in mind the provided link will stop working after its deadline, and the snapshot build isn't verified yet for production workloads. There should be an official release soon which will provide a clean build of the jarfile if you want to only verify a proof-of-concept with the snapshot build first.
Once you've replaced the jarfile, you can try modifying core-site.xml
to set fs.gs.reported.permissions
to something like 755
or even 777
as the permissions; note that setting the GCS connector reported permissions to be permissive doesn't actually leak any greater access than otherwise, since the GCS access is only conditioned on oauth2 credentials (likely via service-account if you're on a GCE VM). The only goal is to find a reported permission which makes the Hadoop tooling which you use happy (where some tools may complain about 777
being too permissive).
Answered By - Dennis Huo Answer Checked By - Senaida (WPSolving Volunteer)