Tutorial: Configure an cluster-dedicated KDC with Amazon EMR
This topic guides you through creating a cluster with a cluster-dedicated key distribution center (KDC), manually adding Linux accounts to all cluster nodes, adding Kerberos principals to the KDC on the primary node, and ensuring that client computers have a Kerberos client installed.
For more information on Amazon EMR support for Kerberos and KDC, as well as links to MIT Kerberos Documentation, see Use Kerberos for authentication with Amazon EMR.
Step 1: Create the Kerberized cluster
-
Create a security configuration that enables Kerberos. The following example demonstrates a
create-security-configuration
command using the AWS CLI that specifies the security configuration as an inline JSON structure. You can also reference a file saved locally.aws emr create-security-configuration --name
MyKerberosConfig
\ --security-configuration '{"AuthenticationConfiguration": {"KerberosConfiguration": {"Provider": "ClusterDedicatedKdc", "ClusterDedicatedKdcConfiguration": {"TicketLifetimeInHours":24
}}}}' -
Create a cluster that references the security configuration, establishes Kerberos attributes for the cluster, and adds Linux accounts using a bootstrap action. The following example demonstrates a
create-cluster
command using the AWS CLI. The command references the security configuration that you created above,MyKerberosConfig
. It also references a simple script,createlinuxusers.sh
, as a bootstrap action, which you create and upload to Amazon S3 before creating the cluster.aws emr create-cluster --name "
MyKerberosCluster
" \ --release-labelemr-7.6.0
\ --instance-typem5.xlarge
\ --instance-count3
\ --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2KeyPair
\ --service-role EMR_DefaultRole \ --security-configurationMyKerberosConfig
\ --applications Name=Hadoop
Name=Hive
Name=Oozie
Name=Hue
Name=HCatalog
Name=Spark
\ --kerberos-attributes Realm=EC2.INTERNAL
,\ KdcAdminPassword=MyClusterKDCAdminPwd
\ --bootstrap-actionsPath=s3://
amzn-s3-demo-bucket
/createlinuxusers.shThe following code demonstrates the contents of the
createlinuxusers.sh
script, which adds user1, user2, and user3 to each node in the cluster. In the next step, you add these users as KDC principals.#!/bin/bash sudo adduser user1 sudo adduser user2 sudo adduser user3
Step 2: Add principals to the KDC, create HDFS user directories, and configure SSH
The KDC running on the primary node needs a principal added for the local host and for each user that you create on the cluster. You may also create HDFS directories for each user if they need to connect to the cluster and run Hadoop jobs. Similarly, configure the SSH service to enable GSSAPI authentication, which is required for Kerberos. After you enable GSSAPI, restart the SSH service.
The easiest way to accomplish these tasks is to submit a step to the cluster.
The following example submits a bash script configurekdc.sh
to the
cluster you created in the previous step, referencing its cluster ID. The script
is saved to Amazon S3. Alternatively, you can connect to the primary node using an EC2
key pair to run the commands or submit the step during cluster creation.
aws emr add-steps --cluster-id
<j-2AL4XXXXXX5T9>
--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://myregion
.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://amzn-s3-demo-bucket
/configurekdc.sh"]
The following code demonstrates the contents of the
configurekdc.sh
script.
#!/bin/bash #Add a principal to the KDC for the primary node, using the primary node's returned host name sudo kadmin.local -q "ktadd -k /etc/krb5.keytab host/`hostname -f`" #Declare an associative array of user names and passwords to add declare -A arr arr=([user1]=pwd1 [user2]=pwd2 [user3]=pwd3) for i in ${!arr[@]}; do #Assign plain language variables for clarity name=${i} password=${arr[${i}]} # Create principal for sshuser in the primary node and require a new password on first logon sudo kadmin.local -q "addprinc -pw $password +needchange $name" #Add user hdfs directory hdfs dfs -mkdir /user/$name #Change owner of user's hdfs directory to user hdfs dfs -chown $name:$name /user/$name done # Enable GSSAPI authentication for SSH and restart SSH service sudo sed -i 's/^.*GSSAPIAuthentication.*$/GSSAPIAuthentication yes/' /etc/ssh/sshd_config sudo sed -i 's/^.*GSSAPICleanupCredentials.*$/GSSAPICleanupCredentials yes/' /etc/ssh/sshd_config sudo systemctl restart sshd
The users that you added should now be able to connect to the cluster using SSH. For more information, see Using SSH to connect to Kerberized clusters with Amazon EMR.