Set up HDFS Permissions using Linux Credentials - Teaching Big Data Skills with Amazon EMR

Set up HDFS Permissions using Linux Credentials

For the Linux users created above to have access to their own Hadoop Distributed File System (HDFS) directories, the user specific HDFS directories must be created using the hdfs commands. For command details, see HDFS Permissions Guide.

  1. Create Home directory for each user.

    hdfs dfs -mkdir /user/INSERT_USER_NAME
  2. Apply ownership for the newly created home directory to the user

    hdfs dfs -chown INSERT_USER_NAME:INSERT_USER_NAME /user/INSERT_USER_NAME

Example:

hdfs dfs -mkdir /user/student01 hdfs dfs -chown student01:student01 /user/student01

For HDFS, there is a completely separate permission silo. As such, you want to provision separate group ownership to each user and then have separate entries for each group in each student’s facl. Depending on the current state of permissions or if you want to automate new cluster creation, you might want to overwrite a full facl, for example:

HDFS command syntax to set ACL permissions:

hdfs dfs -setfacl [-R] [-b |-k -m |-x <acl_spec> <path>] |[—set <acl_spec> <path>]

Example (student01 access rights):

hdfs dfs -chown student01:student01 /user/student01 hdfs dfs -setfacl -R –set user:student01:rwx,group:instructors:rwx,group:administrators:rwx,group:teachingassistants:rwx,group:students:---,other::--- /user/student01

Additionally, apply similar ACL permissions to lock down the superuser and any administrator accounts or directories that you create.

Note

The /tmp hdfs directory can store information (such as queries and keys) that is readable in plain text. Make sure to lock this directory down so that only class administrators/instructors have access to this directory. You can do this by limiting read access only to the needed users/groups.

For example, to lock down access to only the instructors, you can use permissions such as:

hdfs dfs -setfacl –set user::rwx,group:instructors:rwx,other::-wx /tmp