HDP Certified Professional (HDPCA), or the Hortonworks Data Platform Certified Administrator, is a Hadoop System Administrator who is capable and responsible for installing, configuring, and supporting an HDP Cluster. This is a hands-on performance-based exam which requires some competency and Big Data expertise. Tasks are advocated on AWS instances which must be completed by the candidate within a limited time frame.
HDPCA Exam Overview
There are 5 main categories of the objectives of HDPCA certification:
- Installation: Administrators must be able to install and deploy an HDP Cluster, with or without Internet Access. The installation procedures involve configuring the machine and tweaking a few operating system resources to serve as effective nodes of a cluster.
- Configuration: Proper configuration of HDP Services is must for delivering accurate and precise results. Errors causing a downtime of the systems, controlling user access, current process runtime information and other relevant information can be found only when the cluster is configured as per the business requirements.
- Troubleshooting: What to do when an unforeseen exception arises in daily processing schedules? This section entails how to perform root cause analysis and get to the source of error. Reading the logs, setting different logging levels etc fall under this category.
- High Availability: When systems are set up in a manner to serve business requirements 24 x 7 x 365, it will be prudent to assume that a system will run for that long without any downtime. In such cases, core Hadoop services can be made Highly Available, i.e. accessible to the client at all times to ensure data access is never disrupted. Administrators are responsible for creating such an automatic failover setup on the deployed cluster.
- Security: Security is the need of the hour for organizations across the globe. The data generated or stored on the Hadoop FileSystem needs to be secure from all attacks. Data loss in today’s world has become a big challenge and administrators need to ensure that there is no loophole in securing the data on their cluster.
The first step towards HDPCA exam preparation is to understand the Hadoop terminology. Here are the top 20 Hadoop terms that will help you to become Hadoop certified.
- Aspiring candidates can give the exam at any time and location of their liking with 24/7 access. They must be in possession of a laptop, good working Internet Connection, and a webcam.
- The exam is based on Hortonworks Data Platform(HDP) 2.3 and managed using Ambari 2.1.0. A 4-node cluster based on chargeable AWS instances will need to be provisioned during the exam by the candidate. The exam duration is 2 hours.
- The cost of the examination is USD250.
To register for the exam right away, head over to the Hortonworks website and navigate to the Certifications category. Thereafter, under the Registration Section, head over to examslocal.com and create an account. Choose a date as per your convenience and schedule your HDPCA certification!
Your server connection, the region of HDP Cluster etc. are important details that will be asked while registration, so fills out those details carefully to avoid last minute hassles. For example, you may experience latency if your chosen HDP Server is in a far apart region from where you are giving the exam.
HDPCA Candidates have the flexibility of giving the exam from any location they deem fit. All that is required is a laptop, a good internet connection, and a webcam. (Although, don’t forget your exam fees)
During the Exam
Usually, 5-6 tasks are given to candidates appearing in the certification exam. These tasks belong to the list of tasks mentioned in the link above. For candidates who have thoroughly prepared from the Official HDPCA Datasheet, there should be no cause for worry.
The total time allotted to complete the exam is 2 hours.
You will require a webcam and a good internet connection for the exam. In case, a candidate has difficulty memorizing all the concepts, HDP2.3 documentation pages will be made available during the examination.
How to Prepare for HDPCA Certification Exam?
- Hortonworks has made available a list of HDPCA tasks that can fully prepare a candidate for this certification. If a candidate thoroughly brushes through these tasks, he won’t face any difficulty in attaining a good score in the HDPCA certification. Most of these tasks along with a general introduction to Apache Hadoop is included in the Whizlabs HDPCA Online Course.
- Candidates are also encouraged to read Linux main pages and Apache Hadoop official documentation for a detailed explanation of the concepts. These pages also contain information on the recently discovered bugs in a particular service and the threads currently open for development.
- Additionally, candidates should definitely appear for the HDPCA Practice Exam. This exam helps you gauge a level of the difficulty to expect from the questions of this certification. Depending on your level of comfort during this practice examination, you can safely estimate the extent of your preparation for this certification.
Preparing for Hadoop Interview? Here’re 50 Must-Read Hadoop Interview Questions and Answers.
Best Books for HDPCA Exam Preparation
Books play an important role to pass a certification exam. So, the choice of books is required to be good if you want to pass the HDPCA certification exam. Here are some of the best Big data Hadoop administration books that will enable you to learn Hadoop curriculum, making you an expert in Hadoop administration.
The most referred Hadoop Admin books for HDPCA Certification are –
1. Expert Hadoop 2 Administration – Managing Spark, YARN, and MapReduce by Sam R. Alapati
This Hadoop administrator book is a complete guide to create, configure, secure, manage, and optimize Hadoop clusters in different environments. This book helps you learn about the execution of Spark and MapReduce applications in a Hadoop cluster, HDFS command, management and protection of Hadoop data, and so on.
2. Hadoop Operations and Cluster Management Cookbook by Shumin Guo
Hadoop Operations and Cluster Management Cookbook is the book that helps you learn how to design and manage a Hadoop cluster. This book is intended for both the newbie and experienced one. Through this book, you will learn how Hadoop works, and the concepts of Hadoop Administration. You will also learn how to install, configure, and monitor a Hadoop cluster.
3. Hadoop Operations – A Guide for Developers and Administrators by Eric Sammer
This Hadoop Administrator book is for those who are interested to learn the maintenance of large and complex Hadoop clusters. In this book, the author teaches you the basics of running Hadoop from planning, installing, and configuring to maintenance. You will also learn Hadoop deployment, resource management in Hadoop cluster, and Hadoop troubleshooting.
4. Hadoop: The Definitive Guide, 4th Edition, Storage and Analysis at Internet Scale by Tom White
Hadoop: The Definitive Guide is a comprehensive guide that helps you learn how to build and maintain reliable, and distributed systems in Hadoop cluster. This book is ideal for the administrators who want to learn set up and run Hadoop clusters. This book contains few chapters on YARN and some projects on Hadoop.
5. Hadoop 2.x Administration Cookbook – Administer and Maintain Large Apache Hadoop Clusters by Gurmukh Singh
This Hadoop Administrator book is intended for one who wants to have a better understanding of how to maintain Hadoop cluster on HDFS layer using MapReduce and YARN. This book covers the concepts of troubleshooting and diagnostics in Hadoop administration. Also, you will learn how to secure, encrypt, and configure Hadoop clusters.
Tips and Tricks for Aspiring HDPCA Administrators
We know that being an HDPCA can be a challenging and daunting task. Handling a plethora of services, all at once, to service client requests at all times suggests that Administrators have a lot on their plate. So, here are a few tips and tricks that might assist you to troubleshoot operations and failed jobs while administrating a cluster.
The following operations have been performed on RHEL 6 and 7 systems operating on a 4 node cluster with HDP2.3.
General Points Regarding SSH
While setting up SSH Keys it is extremely important to pay attention to the public keys added to the host user’s .ssh directory, under his user folder at /home/<user> directory. This folder contains a file called as authorized_keys which contain a list of the public keys of the remote hosts that are allowed to authenticate on this machine. So, in case you receive a ‘Permission Denied/Connection refused- Host refused to connect’ or any such exceptions, the first step of operation would be to check this file on the most affected.
Additionally, instead of remembering the private IPs of the nodes in your cluster, you can use Aliases. Linux systems allow naming a host based on its IP in the /etc/hosts file. Add the names of all your hosts, along with their private IPs, separated by a space, with each new host and its IP in a newline. Perform this operation on each node in your cluster for easier referencing of nodes whilst operating. In case you are administering a cluster comprising of over 100 nodes, naming conventions can be used differentiating nodes on their roles.
For example, the datanodes can have a ‘dn_<machineName>’ while the namenode can be aliased as ‘nn_<machineName>’.
Preparing for HDPCA certification exam? Have a quick look at the Job Responsibilities of a Hadoop Administrator.
Here are a few tips and pointers that will assist you in the entire Administration process:
- How to Copy files using SSH(scp)
To copy fileA.txt from system2 at location /home/userA/ to system1 at location /home/userB/
$> scp userA@system2:/home/userA/fileA.txt /home/userB/
- Installation of Apache Server
On RHEL6 systems, installation of apr, apr-utils and pcre2 modules is necessary prior to the installation of your local web server. This is primarily because Apache Server module requires these packages to be pre-installed. However, these packages aren’t updated on the system after installation and must be installed by the Administrator.
$> sudo yum install –y apr $> sudo yum install –y apr-utils $> sudo yum install –y pcre2
- During the configuration of services on the system(Preparing Environment) Stage of HDP Installation, it is recommended to use one bash script containing all the commands, use scp to copy it on different hosts of your cluster, assign executable permissions onto it and run it remotely using SSH.
- NameNode fails to start on any one/all of the nodes, then what to do?
How to forcefully restart the Namenode?
Check the Ambari logs when ‘NameNode’ start operation is running on your Administration Status Panel in Ambari. If the logs state that safe mode is on and unable to leave the existing state, then:
- Open Terminal
- Run hadoop dfsadmin-safemode: this should return the current status of the NameNode. If it returns ‘safemode is on’ then run the following command:
$> hadoop dfsadmin -safemode leave
Such an Administration operation will restart the namenode and complete the pending operation. However, it would be prudent to ignore the risks associated with HDFS and data replication/under-replication. So, perform a load balancing operation of the HDFS once the NameNode is forcefully restarted.
- Searching within HDFS
To search for any particular file in HDFS, run the following command:
$> hadoop fs -ls -r | grep <filename>
- Display Directory size in HDFS in MB?
Run the following command:
$> hadoop fs -du -s -h <HDFS Path>
- Clear System Memory, PageLinks and cache on RHEL systems
As root, execute :
$> sudo sync; echo 3 > /proc/sys/vm/drop_caches
- Installation of Hue- After installation, you encounter a ‘WebHdfsException at /filebrowser/
SecurityException: Failed to obtain user group information: org.apache.hadoop.security.authorize.AuthorizationException: User: hue is not allowed to impersonate <your_system_user> (error 403)’
There are two things to be done in such a case, first Stop all the Services of your cluster using Ambari Administration Dashboard. Thereafter, edit the Hue configuration file, hue.ini, located at /etc/hue/conf/hue.ini :
- append /webhdfs/v1 to webhdfs url in hadoop section; and
- change webserver user to hduser in desktop section.
Restart the HUE system service after appending to the configuration file. Restart the Ambari Administration Server once and finally navigate to your Hue Remote Path
- Disabled Kerberos and Kafka Broker Fail to start?
Append the keyword ’ /newroot’ to zookeeper.connect.property.
So, effectively the new zookeeper.connect.property has the value of
Ensure that the Zookeeper Connect Port is 2181 and is not being used in your system by any other service.
HDPCA Certification exam validates the skills and knowledge of the Hadoop System Administrator. The above sections define the complete guide involving tricks and tips that need to be followed while administering a Hadoop cluster. These tips and tricks will help you pass the Hadoop Certified Administrator – HDPCA Certification exam in the first attempt.
So, what are you waiting for? Get started and unlock a future full of bright opportunities! Begin your preparation for the HDPCA Certification exam now!