{"id":57409,"date":"2018-02-02T13:54:16","date_gmt":"2018-02-02T08:24:16","guid":{"rendered":"https:\/\/www.whizlabs.com\/?p=57409"},"modified":"2018-02-02T13:54:16","modified_gmt":"2018-02-02T08:24:16","slug":"pig-progression-with-hadoop-versions","status":"publish","type":"post","link":"https:\/\/www.whizlabs.com\/blog\/pig-progression-with-hadoop-versions\/","title":{"rendered":"Apache Pig Progression with Hadoop\u2019s Changing Versions"},"content":{"rendered":"<p style=\"text-align: justify\"><span lang=\"EN-GB\">Hadoop has scaled up in many ways to open up wings of all levels of technical people. Of course, Java programmers have the edge over others when it comes to Hadoop development. However, if you are new or unknown to high-level languages like Java or Jython, no worries! Apache Pig is there to do all kinds of data manipulations whether structured or unstructured data. Certainly, it makes \u201cPig Hadoop\u201d the interrelated terms in the Hadoop family. <\/span><span lang=\"EN-GB\">The purpose of<\/span><span lang=\"EN-GB\"> <\/span><span lang=\"EN-GB\">Apache Pig is to create the MapReduce jobs on large data-sets instead of executing them by writing complex Java codes.<\/span><\/p>\n<blockquote><p><a href=\"https:\/\/www.whizlabs.com\/blog\/why-java-developers-should-learn-hadoop\/\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/www.whizlabs.com\/wp-content\/uploads\/2018\/02\/Should-Learn-Hadoop.jpg\" alt=\"Learn Hadoop\" width=\"728\" height=\"90\" class=\"aligncenter size-full wp-image-57718\" \/><\/a><\/p><\/blockquote>\n<p style=\"text-align: justify\"><span lang=\"EN-GB\">However, Hadoop has evolved in years. Not to mention, this has happened due to increased user demands in the field of data analysis with the massive amount of data. Consequently, every Hadoop components have marketed with some new features on new releases, and so is the Apache Pig. We will have a closer look at those change areas of Apache Pig major releases in this article. <\/span><\/p>\n<h2 style=\"text-align: justify\"><b><span lang=\"EN-GB\">Apache Pig in Few Words<\/span><\/b><\/h2>\n<p style=\"text-align: justify\"><span lang=\"EN-GB\">Apache Pig is a high-level scripting language which makes a Hadoop developer&#8217;s life easy in making complex data transformations. It is a SQL like procedural language, more widely known as Pig Latin. It is very compatible with users who know other scripting languages. However, to handle real business problems, the compatibility of Pig Big data with its User Defined Functions (UDF) feature works wonderfully. It efficiently invokes code in other languages like JRuby, Java, etc. Also, developers can embed Pig scripts in other languages.<\/span><\/p>\n<figure id=\"attachment_57416\" aria-describedby=\"caption-attachment-57416\" style=\"width: 700px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/www.whizlabs.com\/wp-content\/uploads\/2018\/01\/pig.jpg\"><img decoding=\"async\" src=\"https:\/\/www.whizlabs.com\/wp-content\/uploads\/2018\/01\/pig.jpg\" alt=\"Apache Pig\" width=\"700\" height=\"379\" class=\"wp-image-57416 size-full\" \/><\/a><figcaption id=\"caption-attachment-57416\" class=\"wp-caption-text\">(Image source -https:\/\/www.safaribooksonline.com\/library\/view\/hdinsight-essentials\/9781849695367\/graphics\/5367OS_06_04.jpg)<\/figcaption><\/figure>\n<h4 style=\"text-align: justify\"><b><span lang=\"EN-GB\">Why is Apache Pig Useful When Hadoop has its MapReduce?<\/span><\/b><\/h4>\n<p style=\"text-align: justify\"><span lang=\"EN-GB\">Both MapReduce and Pig do data processing. However, the first one deals with a low level of abstraction during data set processing. On the other hand, the Pig processes large data sets with the <strong>higher level of abstraction<\/strong>. Moreover, you will get a series of MapReduce jobs out of Pig transformations. Along with it, framework wise there are some more differences between MapReduce Processing and Pig processing.<\/span><\/p>\n<p style=\"text-align: justify\"><span lang=\"EN-GB\">With Pig Latin, you can perform almost all the standard data-processing operations, such as group by, join, filter, union, order by, etc. However, you can perform only operations like group by using MapReduce. The other operations like order by, filter, projection and join are not provided in MapReduce. As a result, the user needs to write a custom program for it.<\/span><\/p>\n<h2><b><span lang=\"EN-GB\">Apache Pig Hadoop Versions Over the Years<\/span><\/b><\/h2>\n<p><span lang=\"EN-GB\">Since the time of incubation till today, Apache Pig has evolved with twenty-four releases with different versions of Hadoop. <\/span><\/p>\n<h4><b><span lang=\"EN-GB\">Apache Pig Evolution in Hadoop 1.0 Series<\/span><\/b><\/h4>\n<p><span lang=\"EN-GB\">The first release of Apache Pig came with Hadoop 0.18, and it was in its incubation. However, it was not a stable release from Hadoop perspective. The next releases of Apache Pig which was a maintenance release played as the first version as Hadoop subproject. We see following necessary changes in Pig functionality and performance in the subsequent few releases from Pig 0.1.1 to 0.10.<\/span><\/p>\n<p><b><span lang=\"EN-GB\">Features included<\/span><\/b><\/p>\n<ul>\n<li><span lang=\"EN-GB\">Five times performance gain<\/span><\/li>\n<li><span lang=\"EN-GB\">The multi-query optimization (It allows sharing computation across multiple queries within a single Pig script)<\/span><\/li>\n<li><span lang=\"EN-GB\">Introduction of two new joins \u2013 Skewed join and merge join<\/span><\/li>\n<li><span lang=\"EN-GB\">Performance and memory usage improvements<\/span><\/li>\n<li><span lang=\"EN-GB\">Adding the Accumulator interface for UDFs<\/span><\/li>\n<li><span lang=\"EN-GB\">Including of new LoadFunc or StoreFunc interface<\/span><\/li>\n<li><span lang=\"EN-GB\">Including custom partitioner<\/span><\/li>\n<li><span lang=\"EN-GB\">Including Python UDF<\/span><\/li>\n<li><span lang=\"EN-GB\">Including control structures, query parser changing and performing semantic cleanup<\/span><\/li>\n<li><span lang=\"EN-GB\">Adding the Accumulator interface for UDFs<\/span><\/li>\n<\/ul>\n<p><em><span lang=\"EN-GB\"><strong>The most significant release of Apache Pig with Hadoop 1.0 was version 0.10.0.<\/strong> <\/span><\/em><\/p>\n<figure id=\"attachment_57709\" aria-describedby=\"caption-attachment-57709\" style=\"width: 336px\" class=\"wp-caption alignright\"><a href=\"https:\/\/www.whizlabs.com\/blog\/best-hadoop-certification-in-2018\/\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" src=\"https:\/\/www.whizlabs.com\/wp-content\/uploads\/2018\/02\/Want-to-validate-your-Hadoop-knowledge_-Here-is-the-list-of-best-Hadoop-certifications.jpg\" alt=\"Hadoop certifications\" width=\"336\" height=\"280\" class=\"wp-image-57709 size-full\" \/><\/a><figcaption id=\"caption-attachment-57709\" class=\"wp-caption-text\">Best Hadoop Certifications<\/figcaption><\/figure>\n<p><b><span lang=\"EN-GB\">Features included<\/span><\/b><\/p>\n<ul>\n<li><span lang=\"EN-GB\">Boolean datatype<\/span><\/li>\n<li><span lang=\"EN-GB\">JRuby<\/span><\/li>\n<li><span lang=\"EN-GB\">Nested cross\/for each<\/span><\/li>\n<li><span lang=\"EN-GB\">Limit by the expression<\/span><\/li>\n<li><span lang=\"EN-GB\">UDF<\/span><\/li>\n<li><span lang=\"EN-GB\">The split default destination<\/span><\/li>\n<li><span lang=\"EN-GB\">Map-side aggregation<\/span><\/li>\n<li><span lang=\"EN-GB\">Tuple\/bag\/map syntax support <\/span><\/li>\n<li><span lang=\"EN-GB\">Source code only distribution<\/span><\/li>\n<li><span lang=\"EN-GB\">Better support for Apache Hadoop 2 with different Maven artifacts<\/span><\/li>\n<li><span lang=\"EN-GB\">Better support for Oracle JDK 7<\/span><\/li>\n<\/ul>\n<blockquote><\/blockquote>\n<h4><b><span lang=\"EN-GB\">Apache Pig Evolution in Hadoop 2.0 Onwards<\/span><\/b><\/h4>\n<p>Hadoop 2x is significantly different from Hadoop 1x in many ways. It is<\/p>\n<ul>\n<li><span lang=\"EN-GB\">\u00a0M<\/span><span lang=\"EN-GB\">ore scalable with YARN<\/span><\/li>\n<li><span lang=\"EN-GB\">Able to run non-MapReduce jobs<\/span><\/li>\n<li><span lang=\"EN-GB\">High availability of name nodes<\/span><\/li>\n<li><span lang=\"EN-GB\">Native Windows support<\/span><\/li>\n<li><span lang=\"EN-GB\">More utilization <\/span><\/li>\n<li><span lang=\"EN-GB\">Beyond batch approach <\/span><\/li>\n<\/ul>\n<p><span lang=\"EN-GB\">Hence, it demands more enhanced performance from the utility tools like Pig.<\/span><\/p>\n<p><em><strong><span lang=\"EN-GB\">The first major release of Apache Pig in Hadoop2x series is 0.12.0.<\/span><\/strong><span lang=\"EN-GB\"> <\/span><\/em><span lang=\"EN-GB\"><\/span><\/p>\n<p><b><span lang=\"EN-GB\">Features included<\/span><\/b><\/p>\n<ul>\n<li><span lang=\"EN-GB\">ASSERT operator \u2013 For data validation<\/span><\/li>\n<li><span lang=\"EN-GB\">Streaming UDF \u2013 For UDF without JVM<\/span><\/li>\n<li><span lang=\"EN-GB\">New AvroStorage \u2013 Works as Pig built-in functions and faster<\/span><\/li>\n<li><span lang=\"EN-GB\">IN\/CASE operator <\/span><\/li>\n<li><span lang=\"EN-GB\">BigInteger and BigDecimal data type &#8211; Some applications need calculations with a high degree of precision. In such cases, BigInteger and BigDecimal are useful for precise calculations. <\/span><\/li>\n<\/ul>\n<p><span lang=\"EN-GB\">We see the scope of non-MapReduce engines in Hadoop 2x onwards. Hence, <strong>Apache Pig 0.13.0<\/strong> also brought the necessary changes to run on Hadoop&#8217;s non-MapReduce engines. Along with it included \u2013<\/span><\/p>\n<ul>\n<li><span lang=\"EN-GB\">The auto-local mode to work with small input data size to run in-process<\/span><\/li>\n<li><span lang=\"EN-GB\">Fetch optimization <\/span><\/li>\n<li><span lang=\"EN-GB\">Fixed counters for local-mode<\/span><\/li>\n<\/ul>\n<p style=\"text-align: justify\"><span lang=\"EN-GB\">As Hadoop introduced high-performance Apache Tez, data processing scaled up from terabytes to petabytes. The main feature of Apache Pig 0.14.0 is Pig on Tez. However, Pig on Tez stabilization came only in next release. Additionally, it came with improved Tez auto-parallelism. Along with, it introduced ORC File.<\/span><\/p>\n<p style=\"text-align: justify\"><span lang=\"EN-GB\">The <strong>latest release of Apache Pig<\/strong> which is 0.17.0 introduced it on Spark which is already a high performer in Hadoop operation.<\/span><\/p>\n<p style=\"text-align: justify\"><span lang=\"EN-GB\">Hadoop is progressing, and Hadoop 3.0 is already in the market with few enhancements. Hence, we could expect upcoming feature introduction in next release.<\/span><\/p>\n<p>[divider \/]<\/p>\n<h4 style=\"text-align: justify\"><b><span lang=\"EN-GB\">Bottom Line<\/span><\/b><\/h4>\n<p style=\"text-align: justify\"><span lang=\"EN-GB\">Working in Hadoop environment means working in Hadoop ecosystem and the tools supported by the ecosystem. Similarly, once you work with Pig and Hadoop integrated form, you will get a better picture of it. Hence, if your passion is to become a big data Hadoop architect or developer, you must be familiar with the entire ecosystem.\u00a0 <\/span><\/p>\n<p style=\"text-align: justify\"><span lang=\"EN-GB\">However, passion does not fulfill itself unless you set some goal for it. Moreover, Hadoop is a vast area to cover up, and you must be correctly oriented. Following the path of a renowned certification in this field probably the best and useful roadmap to reach the goal! Not to mention Cloudera, is the most sought-after platform for Hadoop and their CCA Administrator (CCA-131) certification covers entire Hadoop ecosystem with tools like Apache Pig, Hive, and Impala, etc. <b><\/b><\/span><\/p>\n<p style=\"text-align: justify\"><span lang=\"EN-GB\">Whizlabs gives you an opportunity to get a broad knowledge of the subject matter through their self-study guide &#8211; <\/span><span lang=\"EN-GB\"><a href=\"https:\/\/www.whizlabs.com\/cloudera-cca-admin-certification\/\" target=\"_blank\" rel=\"noopener\"><\/a><\/span><span lang=\"EN-GB\"><a href=\"https:\/\/www.whizlabs.com\/cloudera-cca-admin-certification\/\" target=\"_blank\" rel=\"noopener\">Cloudera Certified Associate Administrator (CCA-131) Certification.<\/a><\/span><\/p>\n<p style=\"text-align: justify\"><span lang=\"EN-GB\">It is a complete coverage of the certification preparation that includes hands-on as well. Hence, leverage the power of knowledge with us and become a successful Hadoop professional of tomorrow!<\/span><span lang=\"EN-GB\"><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hadoop has scaled up in many ways to open up wings of all levels of technical people. Of course, Java programmers have the edge over others when it comes to Hadoop development. However, if you are new or unknown to high-level languages like Java or Jython, no worries! Apache Pig is there to do all kinds of data manipulations whether structured or unstructured data. Certainly, it makes \u201cPig Hadoop\u201d the interrelated terms in the Hadoop family. The purpose of Apache Pig is to create the MapReduce jobs on large data-sets instead of executing them by writing complex Java codes. However, [&hellip;]<\/p>\n","protected":false},"author":220,"featured_media":57612,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_uag_custom_page_level_css":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[6],"tags":[151,637,1058,1180,1181],"class_list":["post-57409","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-big-data","tag-apache-pig-evolution","tag-cloudera","tag-mapreduce-jobs","tag-pig-big-data","tag-pig-hadoop"],"uagb_featured_image_src":{"full":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2018\/01\/Hadoop.jpg",560,315,false],"thumbnail":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2018\/01\/Hadoop-150x150.jpg",150,150,true],"medium":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2018\/01\/Hadoop-300x169.jpg",300,169,true],"medium_large":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2018\/01\/Hadoop.jpg",560,315,false],"large":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2018\/01\/Hadoop.jpg",560,315,false],"1536x1536":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2018\/01\/Hadoop.jpg",560,315,false],"2048x2048":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2018\/01\/Hadoop.jpg",560,315,false],"profile_24":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2018\/01\/Hadoop.jpg",24,14,false],"profile_48":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2018\/01\/Hadoop.jpg",48,27,false],"profile_96":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2018\/01\/Hadoop.jpg",96,54,false],"profile_150":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2018\/01\/Hadoop.jpg",150,84,false],"profile_300":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2018\/01\/Hadoop.jpg",300,169,false],"tptn_thumbnail":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2018\/01\/Hadoop-250x250.jpg",250,250,true],"web-stories-poster-portrait":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2018\/01\/Hadoop.jpg",560,315,false],"web-stories-publisher-logo":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2018\/01\/Hadoop.jpg",96,54,false],"web-stories-thumbnail":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2018\/01\/Hadoop.jpg",150,84,false]},"uagb_author_info":{"display_name":"Aditi Malhotra","author_link":"https:\/\/www.whizlabs.com\/blog\/author\/aditi\/"},"uagb_comment_info":5,"uagb_excerpt":"Hadoop has scaled up in many ways to open up wings of all levels of technical people. Of course, Java programmers have the edge over others when it comes to Hadoop development. However, if you are new or unknown to high-level languages like Java or Jython, no worries! Apache Pig is there to do all&hellip;","_links":{"self":[{"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/posts\/57409","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/users\/220"}],"replies":[{"embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/comments?post=57409"}],"version-history":[{"count":0,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/posts\/57409\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/media\/57612"}],"wp:attachment":[{"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/media?parent=57409"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/categories?post=57409"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/tags?post=57409"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}