Tag : kuldeep-kulkarni-hortonworks

Automate HDP installation using Ambari Blueprints – Part 6

HDP installation using Ambari Blueprints (Part 6)

HDP installation using Ambari Blueprints

HDP installation using Ambari Blueprints

 

In previous post we have seen how to Automate HDP installation with Kerberos authentication on multi node cluster using Ambari Blueprints.

 

In this post, we will see how to deploy multi-node node HDP Cluster with Resource Manager HA via Ambari blueprint.

 

Below are simple steps to install HDP multi node cluster with Resource Manager HA using internal repository via Ambari Blueprints.

 

Step 1: Install Ambari server using steps mentioned under below link

http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-installation/content/ch_Installing_Ambari.html

 

Step 2: Register ambari-agent manually

Install ambari-agent package on all the nodes in the cluster and modify hostname to ambari server host(fqdn) in /etc/ambari-agent/conf/ambari-agent.ini

 

Step 3: Configure blueprints

Please follow below steps to create Blueprints

 

3.1 Create hostmap.json(cluster creation template) file as shown below:

Note – This file will have information related to all the hosts which are part of your HDP cluster. This is also called as cluster is creation template as per Apache Ambari documentation.

{
 "blueprint" : "hdptest",
 "default_password" : "hadoop",
 "host_groups" :[
{
 "name" : "blueprint1",
 "hosts" : [
 {
 "fqdn" : "blueprint1.crazyadmins.com"
 }
 ]
 },
{
 "name" : "blueprint2",
 "hosts" : [
 {
 "fqdn" : "blueprint2.crazyadmins.com"
 }
 ]
 },
{
 "name" : "blueprint3",
 "hosts" : [
 {
 "fqdn" : "blueprint3.crazyadmins.com"
 }
 ]
 }
 ]
}

 

3.2 Create cluster_config.json(blueprint) file, it contents mapping of hosts to HDP components

{
 "configurations" : [
 {
 "core-site": {
 "properties" : {
 "fs.defaultFS" : "hdfs://%HOSTGROUP::blueprint1%:8020"
 }}
 },{
 "yarn-site" : {
 "properties" : {
 "hadoop.registry.rm.enabled" : "false",
 "hadoop.registry.zk.quorum" : "%HOSTGROUP::blueprint3%:2181,%HOSTGROUP::blueprint2%:2181,%HOSTGROUP::blueprint1%:2181",
 "yarn.log.server.url" : "http://%HOSTGROUP::blueprint3%:19888/jobhistory/logs",
 "yarn.resourcemanager.address" : "%HOSTGROUP::blueprint2%:8050",
 "yarn.resourcemanager.admin.address" : "%HOSTGROUP::blueprint2%:8141",
 "yarn.resourcemanager.cluster-id" : "yarn-cluster",
 "yarn.resourcemanager.ha.automatic-failover.zk-base-path" : "/yarn-leader-election",
 "yarn.resourcemanager.ha.enabled" : "true",
 "yarn.resourcemanager.ha.rm-ids" : "rm1,rm2",
 "yarn.resourcemanager.hostname" : "%HOSTGROUP::blueprint2%",
 "yarn.resourcemanager.hostname.rm1" : "%HOSTGROUP::blueprint2%",
 "yarn.resourcemanager.hostname.rm2" : "%HOSTGROUP::blueprint3%",
 "yarn.resourcemanager.webapp.address.rm1" : "%HOSTGROUP::blueprint2%:8088",
 "yarn.resourcemanager.webapp.address.rm2" : "%HOSTGROUP::blueprint3%:8088",
 "yarn.resourcemanager.recovery.enabled" : "true",
 "yarn.resourcemanager.resource-tracker.address" : "%HOSTGROUP::blueprint2%:8025",
 "yarn.resourcemanager.scheduler.address" : "%HOSTGROUP::blueprint2%:8030",
 "yarn.resourcemanager.store.class" : "org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore",
 "yarn.resourcemanager.webapp.address" : "%HOSTGROUP::blueprint2%:8088",
 "yarn.resourcemanager.webapp.https.address" : "%HOSTGROUP::blueprint2%:8090",
 "yarn.timeline-service.address" : "%HOSTGROUP::blueprint3%:10200",
 "yarn.timeline-service.webapp.address" : "%HOSTGROUP::blueprint3%:8188",
 "yarn.timeline-service.webapp.https.address" : "%HOSTGROUP::blueprint3%:8190"
 }
 }
 }
],
 "host_groups" : [
{
 "name" : "blueprint1",
 "components" : [
{
 "name" : "NAMENODE"
},
{
 "name" : "NODEMANAGER"
},
{
 "name" : "DATANODE"
},
{
 "name" : "ZOOKEEPER_CLIENT"
},
{
 "name" : "HDFS_CLIENT"
},
{
 "name" : "YARN_CLIENT"
},
{
 "name" : "MAPREDUCE2_CLIENT"
},
{
 "name" : "ZOOKEEPER_SERVER"
}
 ],
 "cardinality" : 1
},
{
 "name" : "blueprint2",
 "components" : [
{
 "name" : "SECONDARY_NAMENODE"
},
{
 "name" : "RESOURCEMANAGER"
},
{
 "name" : "NODEMANAGER"
},
{
 "name" : "DATANODE"
},
{
 "name" : "ZOOKEEPER_CLIENT"
},
{
 "name" : "ZOOKEEPER_SERVER"
},
{
 "name" : "HDFS_CLIENT"
},
{
 "name" : "YARN_CLIENT"
},
{
 "name" : "MAPREDUCE2_CLIENT"
}
 ],
 "cardinality" : 1
},
{
 "name" : "blueprint3",
 "components" : [
{
 "name" : "RESOURCEMANAGER"
},
{
 "name" : "APP_TIMELINE_SERVER"
},
{
 "name" : "HISTORYSERVER"
},
{
 "name" : "NODEMANAGER"
},
{
 "name" : "DATANODE"
},
{
 "name" : "ZOOKEEPER_CLIENT"
},
{
 "name" : "ZOOKEEPER_SERVER"
},
{
 "name" : "HDFS_CLIENT"
},
{
 "name" : "YARN_CLIENT"
},
{
 "name" : "MAPREDUCE2_CLIENT"
}
 ],
 "cardinality" : 1
}
 ],
 "Blueprints" : {
 "blueprint_name" : "hdptest",
 "stack_name" : "HDP",
 "stack_version" : "2.5"
 }
}

Note – I have kept Resource Managers on blueprint1 and blueprint2, you can change it according to your requirement.

 

Step 4: Create an internal repository map

 

4.1: hdp repository – copy below contents, modify base_url to add hostname/ip-address of your internal repository server and save it in repo.json file.

{
"Repositories":{
"base_url":"http://<ip-address-of-repo-server>/hdp/centos6/HDP-2.5.3.0",
"verify_base_url":true
}
}

 

4.2: hdp-utils repository – copy below contents, modify base_url to add hostname/ip-address of your internal repository server and save it in hdputils-repo.json file.

 

{
"Repositories":{
"base_url":"http://<ip-address-of-repo-server>/hdp/centos6/HDP-UTILS-1.1.0.21",
"verify_base_url":true
}
}

 

Step 5: Register blueprint with ambari server by executing below command

curl -H "X-Requested-By: ambari"-X POST -u admin:admin http://<ambari-server-hostname>:8080/api/v1/blueprints/multinode-hdp -d @cluster_config.json

Step 6: Setup Internal repo via REST API.

Execute below curl calls to setup internal repositories.

curl -H "X-Requested-By: ambari"-X PUT -u admin:admin http://<ambari-server-hostname>:8080/api/v1/stacks/HDP/versions/2.4/operating_systems/redhat6/repositories/HDP-2.4 -d @repo.json

curl -H "X-Requested-By: ambari"-X PUT -u admin:admin http://<ambari-server-hostname>:8080/api/v1/stacks/HDP/versions/2.4/operating_systems/redhat6/repositories/HDP-UTILS-1.1.0.20 -d @hdputils-repo.json

Step 7: Pull the trigger! Below command will start cluster installation.

curl -H "X-Requested-By: ambari"-X POST -u admin:admin http://<ambari-server-hostname>:8080/api/v1/clusters/multinode-hdp -d @hostmap.json

Please feel free to comment if you need any further help on this. Happy Hadooping!!  :)

 

 

 

 

facebooktwittergoogle_plusredditpinterestlinkedinmailby feather

Automate HDP installation using Ambari Blueprints – Part 5

HDP installation using Ambari Blueprints (Part 5)

HDP installation using Ambari Blueprints

HDP installation using Ambari Blueprints

 

How to deploy HDP cluster with Kerberos authentication using Ambari Blueprint? 

 

You are at correct place! :) Please follow my below article on HCC to setup single node HDP cluster using Ambari Blueprint with Kerberos Authentication(MIT KDC)

https://community.hortonworks.com/articles/78969/automate-hdp-installation-using-ambari-blueprints-4.html

 

 

Please refer Next Part for Automated HDP installation using Ambari blueprint with Resource Manager high availability.

 

Please feel free to comment if you need any further help on this. Happy Hadooping!! :)

 

facebooktwittergoogle_plusredditpinterestlinkedinmailby feather

Automate HDP installation using Ambari Blueprints – Part 4

HDP installation using Ambari Blueprints (Part 4)

HDP installation using Ambari Blueprints

HDP installation using Ambari Blueprints

How to deploy HDP cluster with Kerberos authentication using Ambari Blueprint? 

You are at correct place! :) Please follow my below article on HCC to setup single node HDP cluster using Ambari Blueprint with Kerberos Authentication(MIT KDC)

https://community.hortonworks.com/articles/70189/automate-hdp-installation-using-ambari-blueprints-3.html

 

Please refer Next Part for Automated HDP installation using Ambari blueprint with Kerberos authentication for multi-node cluster.

 

Please feel free to comment if you need any further help on this. Happy Hadooping!! :)

facebooktwittergoogle_plusredditpinterestlinkedinmailby feather

Configure node labels on YARN

In this post, we will see how to configure node labels on YARN. I work for Hortonworks so obviously we will configure it for HDP 😉

 

Before we go for the configuration part, let’s understand what is node label  in YARN.

 

Node labels allows us to divide our cluster in different parts and we can use those parts individually as per our requirements. More specifically, we can create a group of node-managers using node labels, for example group of node managers which are having high amount of RAM and use them to process only critical production jobs! This is cool, isn’t it? So lets see how we can configure node labels on YARN.

 

Types of node labels:

Exclusive – In this type of node labels, only associated/mapped queues can access the resources of node label.

Non Exclusive(sharable) – If resources are not in use for this node label then it can be shared with other running applications in a cluster.

 

Configuring node labels:

Step 1: Create required directory structure on HDFS

Note – You can run below commands from any of the hdfs client.

 

sudo su hdfs
hadoop fs -mkdir -p /yarn/node-labels
hadoop fs -chown -R yarn:yarn /yarn
hadoop fs -chmod -R 700 /yarn

 

Step 2: Make sure that you have user directory for ‘yarn’ user on HDFS, if not then please create it using below commands

Note – You can run below commands from any of the hdfs client.

sudo su hdfs
hadoop fs -mkdir -p /user/yarn
hadoop fs -chown -R yarn:yarn /user/yarn
hadoop fs -chmod -R 700 /user/yarn

 

Step 3: Configure below properties in yarn-site.xml via Ambari UI. If you don’t have Ambari UI, please add it manually to /etc/hadoop/conf/yarn-site.xml and restart required services.

yarn.node-labels.enabled=true
yarn.node-labels.fs-store.root-dir=hdfs://<namenode-host>:<namenode-rpc-port>/<complete-path_to_node_label_directory>

 

Note – Please restart required services after above configuration changes!

 

Step 4: Create node labels using below commands

sudo -u yarn yarn rmadmin -addToClusterNodeLabels "<node-label1>(exclusive=<true|false>),<node-label2>(exclusive=<true|false>)"

 

For example, to add 2 node labels x and y:

sudo -u yarn yarn rmadmin -addToClusterNodeLabels "x(exclusive=true),y(exclusive=false)"

 

You can verify if node labels have been created by looking at Resource manager UI under ‘Node Lables’ option in the left pane or you can also run below command on any of the Yarn client

yarn cluster --list-node-labels

 

Sample output:

[yarn@prodnode1 ~]$ yarn cluster --list-node-labels
16/12/14 15:45:56 INFO impl.TimelineClientImpl: Timeline service address: http://prodnode3.openstacklocal:8188/ws/v1/timeline/
16/12/14 15:45:56 INFO client.RMProxy: Connecting to ResourceManager at prodnode3.openstacklocal/172.26.74.211:8050
Node Labels: <x:exclusivity=true>,<y:exclusivity=false>

 

Step 5: Allocate node labels to the node managers using below command:

sudo -u yarn yarn rmadmin -replaceLabelsOnNode "<node-manager1>:<port>=<node-label1> <node-manager2>:<port>=<node-label2>"

 

Example:

sudo -u yarn yarn rmadmin -replaceLabelsOnNode "prodnode1.openstacklocal=x prodnode2.openstacklocal=y"

 

Note – Don’t worry about port if you have only one node manager running per host.

 

Step 6: Map node labels to the queues:

I have created 2 queues ‘a’ and ‘b’ in such a way that, queue ‘a’ can access nodes with label ‘x’ and ‘y’ where queue ‘b’ can only access the nodes with label ‘y’. By default, all the queues can access nodes with ‘default’ label.

Below is my capacity scheduler configuration:

yarn.scheduler.capacity.maximum-am-resource-percent=0.2
yarn.scheduler.capacity.maximum-applications=10000
yarn.scheduler.capacity.node-locality-delay=40
yarn.scheduler.capacity.queue-mappings-override.enable=false
yarn.scheduler.capacity.root.a.a1.accessible-node-labels=x,y
yarn.scheduler.capacity.root.a.a1.accessible-node-labels.x.capacity=30
yarn.scheduler.capacity.root.a.a1.accessible-node-labels.x.maximum-capacity=100
yarn.scheduler.capacity.root.a.a1.accessible-node-labels.y.capacity=50
yarn.scheduler.capacity.root.a.a1.accessible-node-labels.y.maximum-capacity=100
yarn.scheduler.capacity.root.a.a1.acl_administer_queue=*
yarn.scheduler.capacity.root.a.a1.acl_submit_applications=*
yarn.scheduler.capacity.root.a.a1.capacity=40
yarn.scheduler.capacity.root.a.a1.maximum-capacity=100
yarn.scheduler.capacity.root.a.a1.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.a.a1.ordering-policy=fifo
yarn.scheduler.capacity.root.a.a1.state=RUNNING
yarn.scheduler.capacity.root.a.a1.user-limit-factor=1
yarn.scheduler.capacity.root.a.a2.accessible-node-labels=x,y
yarn.scheduler.capacity.root.a.a2.accessible-node-labels.x.capacity=70
yarn.scheduler.capacity.root.a.a2.accessible-node-labels.x.maximum-capacity=100
yarn.scheduler.capacity.root.a.a2.accessible-node-labels.y.capacity=50
yarn.scheduler.capacity.root.a.a2.accessible-node-labels.y.maximum-capacity=100
yarn.scheduler.capacity.root.a.a2.acl_administer_queue=*
yarn.scheduler.capacity.root.a.a2.acl_submit_applications=*
yarn.scheduler.capacity.root.a.a2.capacity=60
yarn.scheduler.capacity.root.a.a2.maximum-capacity=60
yarn.scheduler.capacity.root.a.a2.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.a.a2.ordering-policy=fifo
yarn.scheduler.capacity.root.a.a2.state=RUNNING
yarn.scheduler.capacity.root.a.a2.user-limit-factor=1
yarn.scheduler.capacity.root.a.accessible-node-labels=x,y
yarn.scheduler.capacity.root.a.accessible-node-labels.x.capacity=100
yarn.scheduler.capacity.root.a.accessible-node-labels.x.maximum-capacity=100
yarn.scheduler.capacity.root.a.accessible-node-labels.y.capacity=50
yarn.scheduler.capacity.root.a.accessible-node-labels.y.maximum-capacity=100
yarn.scheduler.capacity.root.a.acl_administer_queue=*
yarn.scheduler.capacity.root.a.acl_submit_applications=*
yarn.scheduler.capacity.root.a.capacity=40
yarn.scheduler.capacity.root.a.maximum-capacity=40
yarn.scheduler.capacity.root.a.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.a.ordering-policy=fifo
yarn.scheduler.capacity.root.a.queues=a1,a2
yarn.scheduler.capacity.root.a.state=RUNNING
yarn.scheduler.capacity.root.a.user-limit-factor=1
yarn.scheduler.capacity.root.accessible-node-labels=x,y
yarn.scheduler.capacity.root.accessible-node-labels.x.capacity=100
yarn.scheduler.capacity.root.accessible-node-labels.x.maximum-capacity=100
yarn.scheduler.capacity.root.accessible-node-labels.y.capacity=100
yarn.scheduler.capacity.root.accessible-node-labels.y.maximum-capacity=100
yarn.scheduler.capacity.root.acl_administer_queue=*
yarn.scheduler.capacity.root.b.accessible-node-labels=y
yarn.scheduler.capacity.root.b.accessible-node-labels.y.capacity=50
yarn.scheduler.capacity.root.b.accessible-node-labels.y.maximum-capacity=100
yarn.scheduler.capacity.root.b.acl_administer_queue=*
yarn.scheduler.capacity.root.b.acl_submit_applications=*
yarn.scheduler.capacity.root.b.b1.accessible-node-labels=y
yarn.scheduler.capacity.root.b.b1.accessible-node-labels.y.capacity=100
yarn.scheduler.capacity.root.b.b1.accessible-node-labels.y.maximum-capacity=100
yarn.scheduler.capacity.root.b.b1.acl_administer_queue=*
yarn.scheduler.capacity.root.b.b1.acl_submit_applications=*
yarn.scheduler.capacity.root.b.b1.capacity=100
yarn.scheduler.capacity.root.b.b1.maximum-capacity=100
yarn.scheduler.capacity.root.b.b1.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.b.b1.ordering-policy=fifo
yarn.scheduler.capacity.root.b.b1.state=RUNNING
yarn.scheduler.capacity.root.b.b1.user-limit-factor=1
yarn.scheduler.capacity.root.b.capacity=60
yarn.scheduler.capacity.root.b.maximum-capacity=100
yarn.scheduler.capacity.root.b.minimum-user-limit-percent=100
yarn.scheduler.capacity.root.b.ordering-policy=fifo
yarn.scheduler.capacity.root.b.queues=b1
yarn.scheduler.capacity.root.b.state=RUNNING
yarn.scheduler.capacity.root.b.user-limit-factor=1
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.queues=a,b

 

FAQS:

 

Below is the status of my cluster. I have 3 node managers, one is with label x, other is ‘y’ and third one does not have any label.

 

configure node labels on Yarn

configure node labels on Yarn

 

How to remove associated node label from a node manager?

Let’s try to remove label ‘x’ from prodnode1.openstacklocal

[yarn@prodnode1 ~]$ yarn rmadmin -replaceLabelsOnNode "prodnode1.openstacklocal"
16/12/14 15:48:06 INFO client.RMProxy: Connecting to ResourceManager at prodnode3.openstacklocal/172.26.74.211:8141
[yarn@prodnode1 ~]$

 

Below is the status after deleting label ‘x’ from prodnode1.openstackl0cal

node-x-deleted

 

How to assign label back to some node manager?

Let’s try to assign label ‘x’ to node manager prodnode1.openstacklocal

[yarn@prodnode1 ~]$ yarn rmadmin -replaceLabelsOnNode "prodnode1.openstacklocal=x"
16/12/14 15:50:38 INFO client.RMProxy: Connecting to ResourceManager at prodnode3.openstacklocal/172.26.74.211:8141
[yarn@prodnode1 ~]$

 

Status after running above command:

configure node labels on Yarn

configure node labels on Yarn

 

How to submit job to specific node label ?

Let’s try to submit sample job to node label ‘x':

[yarn@prodnode1 ~]$ hadoop jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command "sleep 100" -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -num_containers 4 -queue a2 -node_label_expression x

 

How to submit job to node label ‘default’?

Don’t mention ‘-node_label_expression’ parameter while submitting job to submit job in a default partition(default node label)

[yarn@prodnode1 ~]$ hadoop jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command "sleep 100" -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -num_containers 10 -queue b1 1>/dev/null 2>/dev/null &
[2] 18776

 

How non exclusive node label works?

We have node label ‘y’ as non exclusive, lets keep resources under node label ‘y’ idle and try to submit job in a default node label and see if it hires resources from ‘y’. Interesting! isn’t it? well, that’s how we learn :)

 

State before job submission:

Note that memory used and running containers are 0 in below screenshot, that explains – ‘resources are idle’.

nodes-no-job

 

Submit a sample job to default partition:

[yarn@prodnode1 ~]$ hadoop jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -shell_command "sleep 100" -jar /usr/hdp/current/hadoop-yarn-client/hadoop-yarn-applications-distributedshell.jar -num_containers 10 -queue b1 1>/dev/null 2>/dev/null &
[1] 17451
[yarn@prodnode1 ~]$

 

Now, check the status again, You can see that resources from node label ‘y’ is being used for a job submitted in a default partition:

 

nodes-non-exclusive-proof

 

 

 

Please feel free to comment if you need any further help on this. Happy Hadooping!! :)

 

 

facebooktwittergoogle_plusredditpinterestlinkedinmailby feather

Automate HDP installation using Ambari Blueprints – Part 3

In previous post we have seen how to install multi node HDP cluster using Ambari Blueprints. In this post we will see how to Automate HDP installation using Ambari Blueprints to configure Namenode HA.

 

Below are simple steps to install HDP multinode cluster with Namenode HA using internal repository via Ambari Blueprints.

 

Step 1: Install Ambari server using steps mentioned under below link

http://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_Installing_HDP_AMB/content/_download_the_ambari_repo_lnx6.html

 

Step 2: Register ambari-agent manually

Install ambari-agent package on all the nodes in the cluster and modify hostname to ambari server host(fqdn) in /etc/ambari-agent/conf/ambari-agent.ini

 

Step 3: Configure blueprints

Please follow below steps to create Blueprints

 

3.1 Create hostmapping.json file as shown below:

Note – This file will have information related to all the hosts which are part of your HDP cluster.

{
 "blueprint" : "prod",
 "default_password" : "hadoop",
 "host_groups" :[
{
 "name" : "prodnode1",
 "hosts" : [
 {
 "fqdn" : "prodnode1.openstacklocal"
 }
 ]
 },
{
 "name" : "prodnode2",
 "hosts" : [
 {
 "fqdn" : "prodnode2.openstacklocal"
 }
 ]
 },
{
 "name" : "prodnode3",
 "hosts" : [
 {
 "fqdn" : "prodnode3.openstacklocal"
 }
 ]
 }
 ]
}

 

3.2 Create cluster_configuration.json file, it contents mapping of hosts to HDP components

{
 "configurations" : [
 { "core-site": {
 "properties" : {
 "fs.defaultFS" : "hdfs://prod",
 "ha.zookeeper.quorum" : "%HOSTGROUP::prodnode1%:2181,%HOSTGROUP::prodnode2%:2181,%HOSTGROUP::prodnode3%:2181"
 }}
 },
 { "hdfs-site": {
 "properties" : {
 "dfs.client.failover.proxy.provider.prod" : "org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider",
 "dfs.ha.automatic-failover.enabled" : "true",
 "dfs.ha.fencing.methods" : "shell(/bin/true)",
 "dfs.ha.namenodes.prod" : "nn1,nn2",
 "dfs.namenode.http-address" : "%HOSTGROUP::prodnode1%:50070",
 "dfs.namenode.http-address.prod.nn1" : "%HOSTGROUP::prodnode1%:50070",
 "dfs.namenode.http-address.prod.nn2" : "%HOSTGROUP::prodnode3%:50070",
 "dfs.namenode.https-address" : "%HOSTGROUP::prodnode1%:50470",
 "dfs.namenode.https-address.prod.nn1" : "%HOSTGROUP::prodnode1%:50470",
 "dfs.namenode.https-address.prod.nn2" : "%HOSTGROUP::prodnode3%:50470",
 "dfs.namenode.rpc-address.prod.nn1" : "%HOSTGROUP::prodnode1%:8020",
 "dfs.namenode.rpc-address.prod.nn2" : "%HOSTGROUP::prodnode3%:8020",
 "dfs.namenode.shared.edits.dir" : "qjournal://%HOSTGROUP::prodnode1%:8485;%HOSTGROUP::prodnode2%:8485;%HOSTGROUP::prodnode3%:8485/prod",
 "dfs.nameservices" : "prod"
 }}
 }],
 "host_groups" : [
{
 "name" : "prodnode1",
 "components" : [
{
"name" : "NAMENODE"
},
{
 "name" : "JOURNALNODE"
},
{
 "name" : "ZKFC"
},
{
"name" : "NODEMANAGER"
},
{
"name" : "DATANODE"
},
{
"name" : "ZOOKEEPER_CLIENT"
},
{
"name" : "HDFS_CLIENT"
},
{
"name" : "YARN_CLIENT"
},
{
 "name" : "FALCON_CLIENT"
},
{
 "name" : "OOZIE_CLIENT"
},
{
 "name" : "HIVE_CLIENT"
},
{
"name" : "MAPREDUCE2_CLIENT"
},
{
"name" : "ZOOKEEPER_SERVER"
}
],
 "cardinality" : 1
},
{
 "name" : "prodnode2",
 "components" : [
{
 "name" : "JOURNALNODE"
},
{
 "name" : "MYSQL_SERVER"
},
{
 "name" : "HIVE_SERVER"
},
{
 "name" : "HIVE_METASTORE"
},
{
 "name" : "WEBHCAT_SERVER"
},
{
"name" : "NODEMANAGER"
},
{
"name" : "DATANODE"
},
{
"name" : "ZOOKEEPER_CLIENT"
},
{
"name" : "ZOOKEEPER_SERVER"
},
{
"name" : "HDFS_CLIENT"
},
{
"name" : "YARN_CLIENT"
},
{
 "name" : "FALCON_SERVER"
},
{
 "name" : "OOZIE_SERVER"
},
{
 "name" : "FALCON_CLIENT"
},
{
 "name" : "OOZIE_CLIENT"
},
{
 "name" : "HIVE_CLIENT"
},
{
"name" : "MAPREDUCE2_CLIENT"
}
],
 "cardinality" : 1
},
{
 "name" : "prodnode3",
 "components" : [
{
"name" : "RESOURCEMANAGER"
},
{
 "name" : "JOURNALNODE"
},
{
 "name" : "ZKFC"
},
{
 "name" : "NAMENODE"
},
{
"name" : "APP_TIMELINE_SERVER"
},
{
"name" : "HISTORYSERVER"
},
{
"name" : "NODEMANAGER"
},
{
"name" : "DATANODE"
},
{
"name" : "ZOOKEEPER_CLIENT"
},
{
"name" : "ZOOKEEPER_SERVER"
},
{
"name" : "HDFS_CLIENT"
},
{
"name" : "YARN_CLIENT"
},
{
 "name" : "HIVE_CLIENT"
},
{
"name" : "MAPREDUCE2_CLIENT"
}
],
 "cardinality" : 1
}
 ],
 "Blueprints" : {
 "blueprint_name" : "prod",
 "stack_name" : "HDP",
 "stack_version" : "2.4"
 }
}

Note – I have kept Namenodes on prodnode1 and prodnode3, you can change it according to your requirement. I have added few more services like Hive, Falcon, Oozie etc. You can remove them or add few more according to your requirement.

 

Step 4: Create an internal repository map

4.1: hdp repository – copy below contents, modify base_url to add hostname/ip-address of your internal repository server and save it in repo.json file.

{
"Repositories":{
"base_url":"http://<ip-address-of-repo-server>/hdp/centos6/HDP-2.4.2.0",
"verify_base_url":true
}
}

 

4.2: hdp-utils repository – copy below contents, modify base_url to add hostname/ip-address of your internal repository server and save it in hdputils-repo.json file.

{
"Repositories" : {
 "base_url" : "http://<ip-address-of-repo-server>/hdp/centos6/HDP-UTILS-1.1.0.20",
 "verify_base_url" : true
}
}

 

Step 5: Register blueprint with ambari server by executing below command

curl -H "X-Requested-By: ambari"-X POST -u admin:admin http://<ambari-server-hostname>:8080/api/v1/blueprints/multinode-hdp -d @cluster_config.json

 

Step 6: Setup Internal repo via REST API.

Execute below curl calls to setup internal repositories.

curl -H "X-Requested-By: ambari"-X PUT -u admin:admin http://<ambari-server-hostname>:8080/api/v1/stacks/HDP/versions/2.4/operating_systems/redhat6/repositories/HDP-2.4 -d @repo.json
curl -H "X-Requested-By: ambari"-X PUT -u admin:admin http://<ambari-server-hostname>:8080/api/v1/stacks/HDP/versions/2.4/operating_systems/redhat6/repositories/HDP-UTILS-1.1.0.20 -d @hdputils-repo.json

 

Step 7: Pull the trigger! Below command will start cluster installation.

curl -H "X-Requested-By: ambari"-X POST -u admin:admin http://<ambari-server-hostname>:8080/api/v1/clusters/multinode-hdp -d @hostmap.json

 

Please refer Part-4 for setting up HDP with Kerberos authentication via Ambari blueprint.

 

Please feel free to comment if you need any further help on this. Happy Hadooping!! :)

 

 

facebooktwittergoogle_plusredditpinterestlinkedinmailby feather

Oozie workflow failed – hive-site.xml permission denied

Oozie workflow failed – hive-site.xml permission denied – If you have configured Hive action in your Oozie workflow and its getting failed as hive-site.xml permission denied then you are at correct place! 

Oozie workflow failed – hive-site.xml permission denied – This error can possibly occur with below mentioned scenarios:

 

Scenarios:

1. Your Oozie workflow contains hive action.

2. You have included <job-xml> inside hive action and given hive-site.xml path.

3. hive-site.xml is present under ${wf.application.path}/lib as well.

4. Your developers have also added hive-site.xml to Oozie sharelib, possibly at below location(s):

/user/oozie/sharelib/lib_<timestamp>/oozie/hive-site.xml
/user/oozie/sharelib/lib_<timestamp>/sqoop/hive-site.xml
/user/oozie/sharelib/lib_<timestamp>/hive/hive-site.xml

5. My simple hive workflow is failing with below error:

Oozie Hive action configuration 
Using action configuration file /hadoop/data01/hadoop/yarn/local/usercache/root/appcache/application_1443111597609_2691/container_1443111597609_2691_01_000002/action.xml 
Setting env property for mapreduce.job.credentials.binary to: /hadoop/data01/hadoop/yarn/local/usercache/root/appcache/application_1443111597609_2691/container_1443111597609_2691_01_000002/container_tokens 
Setting env property for tez.credentials.path to: /hadoop/data01/hadoop/yarn/local/usercache/root/appcache/application_1443111597609_2691/container_1443111597609_2691_01_000002/container_tokens 
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], main() threw exception, hive-site.xml (Permission denied) 
java.io.FileNotFoundException: hive-site.xml (Permission denied) 
at java.io.FileOutputStream.open(Native Method) 
at java.io.FileOutputStream.<init>(FileOutputStream.java:221) 
at java.io.FileOutputStream.<init>(FileOutputStream.java:110) 
at org.apache.oozie.action.hadoop.HiveMain.setUpHiveSite(HiveMain.java:166) 
at org.apache.oozie.action.hadoop.HiveMain.run(HiveMain.java:196) 
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:38) 
at org.apache.oozie.action.hadoop.HiveMain.main(HiveMain.java:66) 
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) 
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
at java.lang.reflect.Method.invoke(Method.java:606) 
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:225) 
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) 
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430) 
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) 
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) 
at java.security.AccessController.doPrivileged(Native Method) 
at javax.security.auth.Subject.doAs(Subject.java:415) 
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594) 
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) 
Oozie Launcher failed, finishing Hadoop job gracefully

 

Okay, How to fix this? 

Having multiple configuration files at different locations causes conflicts, Oozie might be trying to replace hive-site.xml from /etc/hive/conf/action-conf/hive/hive-site.xml copied to the local cache directory of nodemanager with hive-site.xml loaded from one of the location mentioned above

To resolve this conflict, We need to delete extra copies of hive-site.xml from all the above mentioned locations. Oozie uses hive-site.xml from /etc/oozie/conf/action-conf/hive/hive-site.xml :)

 

In short, remove hive-site.xml from below locations:

1. oozie sharelib (it was present at multiple locations in oozie sharelib)

2. from ${wf.application.path}/lib/ directory.

3. From workflow.xml (deleted <job-xml>)

 

With Oozie nothing is Easy 😉

 

Please comment if you have any feedback/questions/suggestions. Happy Hadooping!! :)

 

Please follow our Oozie tutorials on github for more information.

 

 

 

facebooktwittergoogle_plusredditpinterestlinkedinmailby feather

Unable to access Namenode UI after disabling Kerberos authentication

We get 401 authentication error after disabling Kerberos authentication sometimes, error is shown in below screenshot

Screenshot – Unable to access Namenode UI after disabling kerberosunable to access namenode ui after disabling kerberos

unable to access namenode ui after disabling kerberos

 

 

This can break Ambari metrics as monitoring data wont be available over HTTP for Namenode/Resource manager or any other hadoop ecosystem UI.

 

How to resolve this issue?

 

Please check and verify if below property is set to true in core-site.xml

hadoop.http.authentication.simple.anonymous.allowed=true

 

Note – I have seen this issue most of the times because of above property being set to false.

 

Please also verify below property and modify if required.

hadoop.security.authentication=simple

 

Sometimes simple issue eats our valuable time! Posting this article to save your precious time :)

 

Please comment if you face any issues or need any further help on this.

 

Happy Hadooping!! :)

 

facebooktwittergoogle_plusredditpinterestlinkedinmailby feather

How to setup cross realm trust between two MIT KDC

How to setup cross realm trust between two MIT KDC – In this post, we will see how to setup cross realm trust between two MIT KDC. We can access and copy data of one cluster from another if the cross realm trust is setup correctly.

 

In our example, we have 2 clusters with same HDP version(2.4.2.0) and Ambari version(2.2.2.0)

Cluster 1:

172.26.68.47 hwx-1.hwx.com  hwx-1
172.26.68.46 hwx-2.hwx.com  hwx-2
172.26.68.45 hwx-3.hwx.com  hwx-3

Cluster 2:

172.26.68.48 support-1.support.com   support-1
172.26.68.49 support-2.support.com   support-2
172.26.68.50 support-3.support.com   support-3

 

Below are the steps:

 

Step 1: Make sure both the clusters are kerberized with MIT KDC. You can use below automated script for configuring Kerberos on HDP.

https://community.hortonworks.com/articles/29203/automated-kerberos-installation-and-configuration.html

 

Step 2: Please configure /etc/hosts file on both the clusters to have Ip <-> hostname mappings.

Example:

On both clusters /etc/hosts file should look like below:

172.26.68.47 hwx-1.hwx.com  hwx-1
172.26.68.46 hwx-2.hwx.com  hwx-2
172.26.68.45 hwx-3.hwx.com  hwx-3
172.26.68.48 support-1.support.com   support-1
172.26.68.49 support-2.support.com   support-2
172.26.68.50 support-3.support.com   support-3

 

Step 3: Configure krb5.conf:

 

3.1 Configure [realm] section to add another cluster’s KDC server details – This is required to find KDC to authenticate user which belongs to another cluster.

Example on Cluster1:

[realms]
  HWX.COM = {
    admin_server = hwx-1.hwx.com
    kdc = hwx-1.hwx.com
  }
  SUPPORT.COM = {
    admin_server = support-1.support.com
    kdc = support-1.support.com
  }

3.2 Configure [domain_realm] section to add another cluster’s domain <-> realm mapping. 

[domain_realm]
  .hwx.com = HWX.COM
  hwx.com = HWX.COM
  .support.com = SUPPORT.COM
  support.com = SUPPORT.COM

3.3 Configure [capaths] to add another cluster’s realm

[capaths]
    HWX.COM = {
         SUPPORT.COM = .
    }

On Cluster 1, the krb5.conf should look like below:

[libdefaults]
  renew_lifetime = 7d
  forwardable = true
  default_realm = HWX.COM
  ticket_lifetime = 24h
  dns_lookup_realm = false
  dns_lookup_kdc = false
  #default_tgs_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5
  #default_tkt_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5
[logging]
  default = FILE:/var/log/krb5kdc.log
  admin_server = FILE:/var/log/kadmind.log
  kdc = FILE:/var/log/krb5kdc.log
[realms]
  HWX.COM = {
    admin_server = hwx-1.hwx.com
    kdc = hwx-1.hwx.com
  }
  SUPPORT.COM = {
    admin_server = support-1.support.com
    kdc = support-1.support.com
  }
[domain_realm]
  .hwx.com = HWX.COM
  hwx.com = HWX.COM
  .support.com = SUPPORT.COM
  support.com = SUPPORT.COM
[capaths]
    HWX.COM = {
         SUPPORT.COM = .
    }

Note – Please copy modified /etc/krb5.conf to all the nodes in Cluster 1

 

Similarly on Cluster2, the krb5.conf should look like below:

[libdefaults]
  renew_lifetime = 7d
  forwardable = true
  default_realm = SUPPORT.COM
  ticket_lifetime = 24h
  dns_lookup_realm = false
  dns_lookup_kdc = false
  #default_tgs_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5
  #default_tkt_enctypes = aes des3-cbc-sha1 rc4 des-cbc-md5
[logging]
  default = FILE:/var/log/krb5kdc.log
  admin_server = FILE:/var/log/kadmind.log
  kdc = FILE:/var/log/krb5kdc.log
[realms]
  SUPPORT.COM = {
    admin_server = support-1.support.com
    kdc = support-1.support.com
  }
  HWX.COM = {
    admin_server = hwx-1.hwx.com
    kdc = hwx-1.hwx.com
  }
[domain_realm]
  .hwx.com = HWX.COM
  hwx.com = HWX.COM
  .support.com = SUPPORT.COM
  support.com = SUPPORT.COM
[capaths]
    SUPPORT.COM = {
        HWX.COM = .
    }

Note – Please copy modified /etc/krb5.conf to all the nodes in Cluster 2

 

Step 4: Modify below property in hdfs-site.xml on a cluster from where you want to execute distcp command ( specifically speaking – client side )

dfs.namenode.kerberos.principal.pattern=*

 

Step 5: Add a common trust principal in both the KDCs. Please keep same password for both the principals

On Cluster 1 and 2, execute below commands in kadmin utility:

addprinc krbtgt/HWX.COM@SUPPORT.COM
addprinc krbtgt/SUPPORT.COM@HWX.COM

 

Step 6: Configure auth_to_local rules on both the clusters:

On Cluster1, append auth_to_local rules from Cluster2

Example on Cluster 1:

RULE:[1:$1@$0](ambari-qa-hadoop@HWX.COM)s/.*/ambari-qa/
RULE:[1:$1@$0](hdfs-hadoop@HWX.COM)s/.*/hdfs/
RULE:[1:$1@$0](spark-hadoop@HWX.COM)s/.*/spark/
RULE:[1:$1@$0](.*@HWX.COM)s/@.*//
RULE:[2:$1@$0](dn@HWX.COM)s/.*/hdfs/
RULE:[2:$1@$0](hive@HWX.COM)s/.*/hive/
RULE:[2:$1@$0](jhs@HWX.COM)s/.*/mapred/
RULE:[2:$1@$0](jn@HWX.COM)s/.*/hdfs/
RULE:[2:$1@$0](nm@HWX.COM)s/.*/yarn/
RULE:[2:$1@$0](nn@HWX.COM)s/.*/hdfs/
RULE:[2:$1@$0](rm@HWX.COM)s/.*/yarn/
RULE:[2:$1@$0](yarn@HWX.COM)s/.*/yarn/
DEFAULT
RULE:[1:$1@$0](ambari-qa-support@SUPPORT.COM)s/.*/ambari-qa/
RULE:[1:$1@$0](hdfs-support@SUPPORT.COM)s/.*/hdfs/
RULE:[1:$1@$0](spark-support@SUPPORT.COM)s/.*/spark/
RULE:[1:$1@$0](.*@SUPPORT.COM)s/@.*//
RULE:[2:$1@$0](dn@SUPPORT.COM)s/.*/hdfs/
RULE:[2:$1@$0](hive@SUPPORT.COM)s/.*/hive/
RULE:[2:$1@$0](jhs@SUPPORT.COM)s/.*/mapred/
RULE:[2:$1@$0](jn@SUPPORT.COM)s/.*/hdfs/
RULE:[2:$1@$0](nm@SUPPORT.COM)s/.*/yarn/
RULE:[2:$1@$0](nn@SUPPORT.COM)s/.*/hdfs/
RULE:[2:$1@$0](rm@SUPPORT.COM)s/.*/yarn/
RULE:[2:$1@$0](yarn@SUPPORT.COM)s/.*/yarn/

 

On Cluster2, append auth_to_local rules from Cluster1

 

Example on Cluster 2:

RULE:[1:$1@$0](ambari-qa-support@SUPPORT.COM)s/.*/ambari-qa/
RULE:[1:$1@$0](hdfs-support@SUPPORT.COM)s/.*/hdfs/
RULE:[1:$1@$0](spark-support@SUPPORT.COM)s/.*/spark/
RULE:[1:$1@$0](.*@SUPPORT.COM)s/@.*//
RULE:[2:$1@$0](dn@SUPPORT.COM)s/.*/hdfs/
RULE:[2:$1@$0](hive@SUPPORT.COM)s/.*/hive/
RULE:[2:$1@$0](jhs@SUPPORT.COM)s/.*/mapred/
RULE:[2:$1@$0](jn@SUPPORT.COM)s/.*/hdfs/
RULE:[2:$1@$0](nm@SUPPORT.COM)s/.*/yarn/
RULE:[2:$1@$0](nn@SUPPORT.COM)s/.*/hdfs/
RULE:[2:$1@$0](rm@SUPPORT.COM)s/.*/yarn/
RULE:[2:$1@$0](yarn@SUPPORT.COM)s/.*/yarn/
DEFAULT
RULE:[1:$1@$0](ambari-qa-hadoop@HWX.COM)s/.*/ambari-qa/
RULE:[1:$1@$0](hdfs-hadoop@HWX.COM)s/.*/hdfs/
RULE:[1:$1@$0](spark-hadoop@HWX.COM)s/.*/spark/
RULE:[1:$1@$0](.*@HWX.COM)s/@.*//
RULE:[2:$1@$0](dn@HWX.COM)s/.*/hdfs/
RULE:[2:$1@$0](hive@HWX.COM)s/.*/hive/
RULE:[2:$1@$0](jhs@HWX.COM)s/.*/mapred/
RULE:[2:$1@$0](jn@HWX.COM)s/.*/hdfs/
RULE:[2:$1@$0](nm@HWX.COM)s/.*/yarn/
RULE:[2:$1@$0](nn@HWX.COM)s/.*/hdfs/
RULE:[2:$1@$0](rm@HWX.COM)s/.*/yarn/
RULE:[2:$1@$0](yarn@HWX.COM)s/.*/yarn/

 

Step7: Add common user principal to both the KDCs. Execute below commands on both the KDC, please keep same password for both the principals.

For Cluster 1:

6.1. Login to kadmin

6.2. Execute below command to add a user principal

addprinc kuldeepk@HWX.COM

 

For Cluster 2:

6.3. Login to kadmin

6.4. Execute below command to add a user principal

addprinc kuldeepk@SUPPORT.COM

 

Step 8: Login to Cluster 2, do a kinit and try to access hdfs files of Cluster 1

Example:

hdfs dfs -ls hdfs://hwx-2.hwx.com:8020/tmp
Found 8 items
drwx------   - ambari-qa hdfs          0 2016-07-29 23:24 hdfs://hwx-2.hwx.com:8020/tmp/ambari-qa
drwxr-xr-x   - hdfs      hdfs          0 2016-07-29 22:02 hdfs://hwx-2.hwx.com:8020/tmp/entity-file-history
drwx-wx-wx   - ambari-qa hdfs          0 2016-07-29 23:25 hdfs://hwx-2.hwx.com:8020/tmp/hive
-rwxr-xr-x   3 hdfs      hdfs       1414 2016-07-29 23:50 hdfs://hwx-2.hwx.com:8020/tmp/id1aac2d44_date502916
-rwxr-xr-x   3 ambari-qa hdfs       1414 2016-07-29 23:26 hdfs://hwx-2.hwx.com:8020/tmp/idtest.ambari-qa.1469834803.19.in
-rwxr-xr-x   3 ambari-qa hdfs        957 2016-07-29 23:26 hdfs://hwx-2.hwx.com:8020/tmp/idtest.ambari-qa.1469834803.19.pig
drwxr-xr-x   - ambari-qa hdfs          0 2016-07-29 23:53 hdfs://hwx-2.hwx.com:8020/tmp/tezsmokeinput

Note – hwx-2.hwx.com is the Active Namenode of Cluster 1.

 

You can try copying files from Cluster 2 to Cluster 1 using distcp

 

Example:

[kuldeepk@support-1 root]$ hadoop distcp hdfs://hwx-1.hwx.com:8020/tmp/test.txt /tmp/
16/07/30 22:03:27 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hdfs://hwx-1.hwx.com:8020/tmp/test.txt], targetPath=/tmp, targetPathExists=true, preserveRawXattrs=false}
16/07/30 22:03:27 INFO impl.TimelineClientImpl: Timeline service address: http://support-3.support.com:8188/ws/v1/timeline/
16/07/30 22:03:27 INFO client.RMProxy: Connecting to ResourceManager at support-3.support.com/172.26.68.50:8050
16/07/30 22:03:28 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 20 for kuldeepk on 172.26.68.47:8020
16/07/30 22:03:28 INFO security.TokenCache: Got dt for hdfs://hwx-1.hwx.com:8020; Kind: HDFS_DELEGATION_TOKEN, Service: 172.26.68.47:8020, Ident: (HDFS_DELEGATION_TOKEN token 20 for kuldeepk)
16/07/30 22:03:29 INFO impl.TimelineClientImpl: Timeline service address: http://support-3.support.com:8188/ws/v1/timeline/
16/07/30 22:03:29 INFO client.RMProxy: Connecting to ResourceManager at support-3.support.com/172.26.68.50:8050
16/07/30 22:03:29 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 24 for kuldeepk on ha-hdfs:support
16/07/30 22:03:29 INFO security.TokenCache: Got dt for hdfs://support; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:support, Ident: (HDFS_DELEGATION_TOKEN token 24 for kuldeepk)
16/07/30 22:03:29 INFO mapreduce.JobSubmitter: number of splits:1
16/07/30 22:03:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1469916118318_0003
16/07/30 22:03:29 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: 172.26.68.47:8020, Ident: (HDFS_DELEGATION_TOKEN token 20 for kuldeepk)
16/07/30 22:03:29 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:support, Ident: (HDFS_DELEGATION_TOKEN token 24 for kuldeepk)
16/07/30 22:03:30 INFO impl.YarnClientImpl: Submitted application application_1469916118318_0003
16/07/30 22:03:31 INFO mapreduce.Job: The url to track the job: http://support-3.support.com:8088/proxy/application_1469916118318_0003/
16/07/30 22:03:31 INFO tools.DistCp: DistCp job-id: job_1469916118318_0003
16/07/30 22:03:31 INFO mapreduce.Job: Running job: job_1469916118318_0003
16/07/30 22:03:43 INFO mapreduce.Job: Job job_1469916118318_0003 running in uber mode : false
16/07/30 22:03:43 INFO mapreduce.Job:  map 0% reduce 0%
16/07/30 22:03:52 INFO mapreduce.Job:  map 100% reduce 0%
16/07/30 22:03:53 INFO mapreduce.Job: Job job_1469916118318_0003 completed successfully
16/07/30 22:03:53 INFO mapreduce.Job: Counters: 32
  File System Counters
    FILE: Number of bytes read=0
    FILE: Number of bytes written=142927
    FILE: Number of read operations=0
    FILE: Number of large read operations=0
    FILE: Number of write operations=0
    HDFS: Number of bytes read=346
    HDFS: Number of bytes written=45
    HDFS: Number of read operations=12
    HDFS: Number of large read operations=0
    HDFS: Number of write operations=2
  Job Counters
    Launched map tasks=1
    Other local map tasks=1
    Total time spent by all maps in occupied slots (ms)=14324
    Total time spent by all reduces in occupied slots (ms)=0
    Total time spent by all map tasks (ms)=7162
    Total vcore-seconds taken by all map tasks=7162
    Total megabyte-seconds taken by all map tasks=7333888
  Map-Reduce Framework
    Map input records=1
    Map output records=1
    Input split bytes=118
    Spilled Records=0
    Failed Shuffles=0
    Merged Map outputs=0
    GC time elapsed (ms)=77
    CPU time spent (ms)=1210
    Physical memory (bytes) snapshot=169885696
    Virtual memory (bytes) snapshot=2337554432
    Total committed heap usage (bytes)=66584576
  File Input Format Counters
    Bytes Read=228
  File Output Format Counters
    Bytes Written=45
  org.apache.hadoop.tools.mapred.CopyMapper$Counter
    BYTESSKIPPED=0
    SKIP=1

Note – hwx-1.hwx.com is the Active Namenode of Cluster 1.

 

Please comment if you have any feedback/questions/suggestions. Happy Hadooping!! :)

 

 

facebooktwittergoogle_plusredditpinterestlinkedinmailby feather

How to configure Ambari Hive View for Kerberized cluster

How to configure Ambari Hive View for Kerberized cluster – This tutorial has been successfully tried and tested on HDP-2.4.0.0 and Ambari 2.2.1.0

 

I have my HDP Cluster Kerberized and Ambari has been configured for SSL.

 

Note – Steps are same for Ambari with or without SSL.

 

Please follow below steps for Configuring Hive View on Kerberized HDP Cluster.

 

Step 1 – Please configure your Ambari Server for Kerberos with the steps mentioned in below article. Please follow steps 1 to 5.

https://community.hortonworks.com/articles/40635/configure-tez-view-for-kerberized-hdp-cluster.html

 

Step 2 – Please add below properties to core-site.xml via Ambari UI and restart required services.

 

Note – If you are running Ambari Server as root user then add below properties

hadoop.proxyuser.root.groups=*
hadoop.proxyuser.root.hosts=*

 

If you are running Ambari server as non-root user then please add below properties in core-site.xml

hadoop.proxyuser.<ambari-server-user>.groups=*
hadoop.proxyuser.<ambari-server-user>.hosts=*

 

Please replace <ambari-server-user> with user running Ambari Server in above example.

 

I’m assuming that your ambari server principal is ambari-server@REALM.COM, if not then please replace ‘ambari-server’ with your principal’s user part.

hadoop.proxyuser.ambari-server.groups=*
hadoop.proxyuser.ambari-server.hosts=*

 

Step 3 – Create user directory on hdfs for the user accessing hive view. For e.g. in my case I’m using admin user to access hive view.

 

sudo -u hdfs hadoop fs -mkdir /user/admin 
sudo -u hdfs hadoop fs -chown admin:hdfs /user/admin
sudo -u hdfs hadoop fs -chmod 755 /user/admin

 

Step 4 – Goto Admin tab –> Click on Manage Ambari –> Views –> Edit Hive view ( Create a new one if it doesn’t exist already ) and configure settings as given below

 

Note – You may need to modify values as per your environment settings!

 

 

After above steps, you should be able to access your hive view without any issues. If you receive any error(s) then please check /var/log/ambari-server/ambari-server.log for more details and troubleshooting.

 

 

Please comment if you have any feedback/questions/suggestions. Happy Hadooping!! :)

facebooktwittergoogle_plusredditpinterestlinkedinmailby feather

How to configure Ambari File View with Namenode HA Kerberized

How to configure Ambari File View with Namenode HA Kerberized cluster – This tutorial has been successfully tried and tested on HDP-2.4.2.0 and Ambari 2.2.2.0

 

I have my HDP Cluster Kerberized with Namenode HA.

 

Please follow below steps for Configuring File View on Kerberized HDP Cluster.

 

Step 1 – Please configure your Ambari Server for Kerberos with the steps mentioned in below article. Please follow steps 1 to 5.

https://community.hortonworks.com/articles/40635/configure-tez-view-for-kerberized-hdp-cluster.html

 

Step 2 – Please add below properties to core-site.xml via Ambari UI and restart required services.

 

Note – If you are running Ambari Server as root user then add below properties

hadoop.proxyuser.root.groups=*
hadoop.proxyuser.root.hosts=*

 

If you are running Ambari server as non-root user then please add below properties in core-site.xml

hadoop.proxyuser.<ambari-server-user>.groups=*
hadoop.proxyuser.<ambari-server-user>.hosts=*

Please replace <ambari-server-user> with user running Ambari Server in above example.

 

I’m assuming that your ambari server principal is ambari-server@REALM.COM, if not then please replace ‘ambari-server’ with your principal’s user part.

hadoop.proxyuser.ambari-server.groups=*
hadoop.proxyuser.ambari-server.hosts=*

 

Step 3 – Create user directory on hdfs for the user accessing file view. For e.g. in my case I’m using admin user to access file view.

 

sudo -u hdfs hadoop fs -mkdir /user/admin 
sudo -u hdfs hadoop fs -chown admin:hdfs /user/admin
sudo -u hdfs hadoop fs -chmod 755 /user/admin

 

Step 4 – Goto Admin tab –> Click on Manage Ambari –> Views –> Edit File view ( Create a new one if it doesn’t exist already ) and configure settings as given below

 

Note – You may need to modify values as per your environment settings!

 

 

 

After above steps, you should be able to access your file view without any issues. If you receive any error(s) then please check /var/log/ambari-server/ambari-server.log for more details and troubleshooting.

 

Please comment if you have any feedback/questions/suggestions. Happy Hadooping!! :)

facebooktwittergoogle_plusredditpinterestlinkedinmailby feather