Cloudera Certified Developer for Apache Hadoop Exam (CCDH-410)

I passed Cloudera Certified Hadoop Developer Exam last month. Though I have been working with Hadoop components since quite some time, I always get nervous about appearing for formal examinations. So clearing the exam in first attempt and with good score was quite a relief 🙂

After appearing for the exam, I have to stay it’s not one of those ones where you can cram though book or tutorials over night and pass next day. On the contrary, there is nearly every chance of not making through if you go by that approach. I felt exam to be quite a balanced one, one of the few very good ones, I have been through in recent past. It doesn’t ask silly syntax questions or often repeated and easily found details about Hadoop but rather tries to gauge one’s level of understanding about the framework, internal details about its functioning, different components and how do they fit together. Here are some of the tips, which might be helpful while preparing for the examination.

Cloudera Training Cloudera regularly organizes Instructor led classroom training for Apache Hadoop developer program. It’s a 4 day program and there are courses being conducted all over the world at pretty regular intervals. You can find the detailed agenda and training calendar here. I attended one such training in Amsterdam, the Netherlands and found it to be quite thorough and helpful during the preparation. Though, both are managed by Cloudera, training is in no way directly tailored to clear the exam. Both have different objectives and they do over-lap in many ways but going through the training material and exercises will no way guarantee you to clear the exam.

Exam Preparation Guide Cloudera provides a comprehensive guide of the exam in terms of areas covered and their percentage wise split on their web-site here. They also provide quite a few tutorials and links to docs which are all quite helpful.

Books A complete reading of Hadoop : The Definitive Guide by Tom White is a must for getting a very good grasp about Hadoop Internals and would go a long way in making you confident about exam preparations. Along with this, it would be great if you can also go through Hadoop in Action by Chuck Lam. Think of Hadoop : The Definitive Guide as a complete reference while Hadoop in Action is filled with lot of practical examples, quite helpful while using Hadoop in your day-to-day work.

Practice Programs It always helps to write Map/Reduce programs, use different Input/Output file formats, write Hive and Pig scripts essentially make your hands dirty with different components. You are free to download different components from Apache web-sites but can always use Cloudera Quick Start VM which takes away the pain of putting everything together.

Pratical Tips

  • Though it’s a Cloudera managed certification program, there were no Cloudera Product specific questions like Cloudera Enterprise Manager/li>
  • Cluster set-up, configuration and Administration are not part of the exam. Though it’s good to know for your understanding and work, but questions were not asked on those areas
  • Actual exam delivery is managed by Pearson Vue, a completely Independent organization. They have authorised test centres across all major cities in the world. You will need to take an appointment with your local Pearson Vue center for appearing in the exam. Detailed instructions are available here.
  • Exam vouchers normally have a time-bound validity, so do keep the last dates in mind and prepare accordingly.

I guess, that should be quite good not only from an exam perspective but also for your future Hadoop related work. Good Luck !

For those, who are interested to see how a Cloudera certificate looks like, here is a sample for your motivation 🙂

Advertisements

Our First CSD course in USA

This being first post of 2013, would like to wish everyone a year full of joy, good health and success in your endeavours.

Coming to the actual topic, I agree its several weeks late but this being the time of new resolutions, I decided to put it nevertheless thinking “Better Late than Never” 🙂

Under the umbrella of Ceezone, I along with Srinivas Chillara conducted our first training in USA : Certified Scrum Developer course in Oklahoma City, USA from 5th till 7th December’2012. It was quite interesting experience in many aspects for us.

.Net based course As you might be aware that CSD is a technical training with quite some hands on during the entire 3 day period. For all our courses till date, we had used Java as programming language for code samples, exercises and project work. Though the course is not about language intricacies and concepts taught are language agnostic, but still it helps more if examples and hands-on exercises are in the same language as your day-to-day work environment. We came to know that most of the people attending the course have been working on .Net back-ground, we decided to tailor our course and materials accordingly. We also included more information about relevant tools and frameworks for .NET development environment. It took us some effort but based on the feedback we received, I think it was well worth it. Another positive side-effect is now-onwards we can easily conduct complete .NET or Java based courses and also let participants choose their preferred language while doing the coding exercises.

New Partnerships For offering the course, we partnered with two organisations : Platinum Edge and Raman Tech. Both of them are helping large number of organisations reap maximum benefits of Agile methodologies through Consulting, Coaching and training services. After having interacted and worked with people from both these organisations, I have a great amount of respect for quality of their work. In case you are looking for coaching, consulting or training your development or management teams, I would seriously suggest to get in touch with them for more detailed interactions.

Visiting Oklahoma Last but not the least, we got to see a little bit of Oklahoma city as well. Though our schedules were a bit tight, but it was really nice spending few days in the city and interacting with all the participants of the course. Attached are few pictures I took during the course. Looking forward to many more of these assignments in USA.

This slideshow requires JavaScript.

Manipulating Google Docs and Spreadsheets programatically

Google Docs was the first Cloud based offering for Office suite of Applications. Since then many other players have entered the market. Most notable of them are Office 365 from Microsoft and Collaboration Apps from Zoho. Open Office team at Apache is also working on Cloud Apache OpenOffice Based on HTML 5.

This post is not a comparative analysis of feature offerings of all these products but rather a different use-case where we would like a programatic interface to underlying files and their data. In this area, Google Docs is a hands-down winner as it has a comprehensive API offering for accessing and manipulating files stored on Google Cloud. None of the competitors have a similar offering for their users. Google has exposed the API in form of a web-service so one can use it in programming language of their choice. In addition, they have also provided client libraries in several popular languages for easily getting started. Let’s take couple of examples

Document List API It provides global functions for programatically accessing all your files stored on Google docs irrespective of their types. Through API, I could:

  • List all files stored under Google docs for a given Google account
  • Filter all files based on their types like Documents, SpreadSheets, Presentations, PDFs etc
  • Create or Delete files
  • Upload and download files
  • Search contents

SpreadSheet API This API provides more fine-grained details specifically targeted for Spreadsheets stored on your Google docs account. Again through API, I could easily:

  • Access all Spreadsheets stored under a given Google account
  • Access different worksheets of a given SpreadSheet
  • Search cell details with in a given range
  • Get details of a given cell like its data, formulae etc

I have created a project demonstrating these capabilities through sample Java programs. The code along with necessary java libraries and running instructions are available at git-hub here. You should easily be able to execute them in your preferred IDE. In case of any issues please drop a line in the comments section and I will be happy to look into them. As always comments are more than welcome.

There are of-course many more things which you can achieve through these APIs. For a complete reference, please refer to online documentation for Document List and for SpreadSheet APIs

Quick Tip : Installing Xcode on Mac OS-X 10.6.8

Xcode is Apple’s IDE for creating applications for Mac, iPhone and iPad. Apart from SDK’s for iPhone/iPad application development, it also comes bundled with gcc compiler and other Unix utilities which are required for installation of many other packages. For my case, I wanted to install gnuplot, a tool for generating graphs and plots based on my data. Similar to APT (Advanced Packaging Tool), available on Debian Linux distribution systems, we have MacPorts for Mac-OS. Through MacPorts, installation of gnuplot is just one command:

sudo port install gnuplot

But in order for MacPorts to work properly, it requires fully functional installation of Xcode. There are 2 ways to get Xcode on your computer:

  • If you have Mac OS-X 10.7.x or Lion, then you can install Xcode 4.x for free from Apple Developer Center. Otherwise, you are required to pay for the Software.
  • Install from Software DVD : Xcode 3.2.3 comes along with some other optional Softwares on Mac OS DVD. Though it might not have all the features of latest release of Xcode but for many use-cases, it would be sufficient. At-least that was the case for me.

Installation from Mac OS-X DVD

Starting the installation program from DVD was pretty straight-forward. But towards the end, the installation failed without giving any reasons. I tried once more but same result. I looked at Apple developer center and after lot of searching around, found a downloadable version of Xcode installer. It’s a huge file (more than 3.5 GB) so took quite some time on my not so fast Internet connection. But I hoped that it would work fine. Again the same results and I went again searching for possible solutions. Finally I found the solution on Stack-Overflow which worked for me.

Essentially, I had to change my System’s date to 01/01/2012, start the installation from DVD and it worked like a breeze. Later I reverted my System’s date correctly and everything works quite nicely again. Quite weird cause and solution but amazing to see Xcode, MacPorts and gnuplot finally working on my computer.

Hopefully, people might see this earlier than spending hours and hours on this problem.

QuickTip : DateFormat intricacies in Java

Java developers might have used Date and DateFormat classes innumerable number of times in their projects. Same is the case with me. While working on DateFormat class, I came across a behavior which was kind of unexpected for me, so thought about writing few lines, just in case it helps someone in a similar situation.

I was working on Server side of a Web Application. UI is expected to send Date as String and Server has to convert it in DateTime, do some manipulations before retrieving the information from Database and returning an appropriate response back. Sounds simple and a straight-forward case for using DateFormat. Here is a very standard code snippet which we might have written several times to do the work:

    SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");
    Date parsedDate = dateFormat.parse(incomingDateString);
    // Do further work...

In order to be safe, we also need to take into account that incoming string may not be in correct format and the application should be capable of handling it gracefully. The code would be something like:

    SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd");
    Date parsedDate = null;
    try {
      parsedDate = dateFormat.parse(incomingDateString);
    } catch (ParseException exc) {
      // Do some Error Handling
    }

To ensure, error handling is correctly implemented, I wrote following JUnit test case:
Continue reading

Easily adding a Security Layer over Play! Web Application

I have been working on a Consumer facing Social Web application development since past few weeks. In our attempt to get MVP (Minimum Viable Product) out at the earliest, we concentrated on building features first. As the first launch is done, we started looking at some of the Infrastructure related work. One of the most important items in our list is Security improvement. Being a social web-site, we expect a lot of visitors. We should have good safeguards against malicious use and for the safety of our data.

Problem Context We have REST APIs for communicating with Back-End. Typical example is:

/api/users/{id}/?      Users.update

Signature of Users.update() method are:

  public static void update(Long userId, User updatedUser) {
    //Find User with userID from DB
    //Update it's properties with passed updatedUser
    //Return response
  }

As we are passing user-id in the URL, it’s quite susceptible for wrong usage.

Proposed Behavior We decided to communicate through generated, unique and short-lived SessionID’s instead. The execution steps in the new flow would be something like:

  • UI will request a SessionID for every user and on repeated intervals
  • Server will generate SessionIDs and keep a track of associated user
  • SessionID will be passed with request along with other required parameters
  • Server will validate the authenticity of passed SessionID before processing any request
  • Back-End calls will get corresponding user information passing through SessionID. Thus UI will never be sending UserIDs as part of request

Our Technology Stack We are using Play 1.2.4 as Web Application framework with Java as programming language for our back-end. For people not aware about Play, it’s based on simple Stateless MVC architecture. For more information, please refer to their web-site
Continue reading

Drawing on Live Video in Flex

While working on a project, I came across a requirement where we had to draw over a Video being captured through connected Web-Cam. We are using Flex for the purpose. I consider myself not more than a novice in Flex API’s so searched a bit about possible approaches. I am not sure whether it’s my search skills or lack of Flex knowledge but could not find anything easy for my requirement. I started looking a bit more into Flex containers and finally came up with a solution which is quite simple. I would like to share the same here.
Continue reading