{"id":48519,"date":"2017-12-10T22:24:35","date_gmt":"2017-12-10T16:54:35","guid":{"rendered":"https:\/\/www.whizlabs.com\/?p=48519"},"modified":"2021-01-28T08:10:53","modified_gmt":"2021-01-28T08:10:53","slug":"do-you-need-hadoop-to-run-spark","status":"publish","type":"post","link":"https:\/\/www.whizlabs.com\/blog\/do-you-need-hadoop-to-run-spark\/","title":{"rendered":"Do You Need Hadoop to Run Spark?"},"content":{"rendered":"<p style=\"text-align: justify;\"><span lang=\"EN-GB\"><a href=\"https:\/\/www.whizlabs.com\/blog\/hadoop-administrator-responsibilities\/\" target=\"_blank\" rel=\"noopener noreferrer\">Hadoop<\/a> and <a href=\"https:\/\/www.whizlabs.com\/blog\/best-big-data-certifications\/\" target=\"_blank\" rel=\"noopener noreferrer\">Apache Spark<\/a> both are today\u2019s booming open-source Big data frameworks. Though Hadoop and Spark don\u2019t do the same thing, however, they are inter-related. The need for Hadoop is everywhere for Big data processing. However, Hadoop has a major drawback despite its many important features and benefits for data processing. MapReduce which is the native batch processing engine of Hadoop is not as fast as Spark. <\/span><\/p>\n<p style=\"text-align: justify;\"><span lang=\"EN-GB\">And that\u2019s where Spark takes an edge over Hadoop. In addition to that, most of today&#8217;s big data projects demand batch workload as well as real-time data processing. Hadoop&#8217;s MapReduce isn&#8217;t cut out for it and can process only batch data. Furthermore, when it is time to low latency processing of a large amount of data, MapReduce fails to do that. Hence, we need to run Spark on top of Hadoop. With its hybrid framework and resilient distributed dataset (<a href=\"https:\/\/www.whizlabs.com\/blog\/spark-rdd\/\" target=\"_blank\" rel=\"noopener\">Spark RDD<\/a>), data can be stored transparently in-memory while you run Spark.<\/span><\/p>\n<p><a href=\"https:\/\/www.whizlabs.com\/blog\/big-data-careers\/\" target=\"_blank\" rel=\"noopener noreferrer\"><img decoding=\"async\" class=\"wp-image-49555 size-full aligncenter\" src=\"https:\/\/www.whizlabs.com\/wp-content\/uploads\/2017\/12\/Spark-Developer-or-Hadoop-Admin_.jpg\" alt=\"Spark Developer or Hadoop Admin\" width=\"728\" height=\"90\" \/><\/a><\/p>\n<p style=\"text-align: justify;\"><span lang=\"EN-GB\">But does that mean there is always a need of Hadoop to run Spark? Let\u2019s look into technical detail to justify it.<\/span><\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_76 ez-toc-wrap-left counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #ea7e02;color:#ea7e02\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #ea7e02;color:#ea7e02\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.whizlabs.com\/blog\/do-you-need-hadoop-to-run-spark\/#Need_of_Hadoop_to_Run_Spark\" >Need of Hadoop to Run Spark<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.whizlabs.com\/blog\/do-you-need-hadoop-to-run-spark\/#Different_Ways_to_Run_Spark_in_Hadoop\" >Different Ways to Run Spark in Hadoop<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.whizlabs.com\/blog\/do-you-need-hadoop-to-run-spark\/#You_can_Run_Spark_without_Hadoop_in_Standalone_Mode\" >You can Run Spark without Hadoop in Standalone Mode<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.whizlabs.com\/blog\/do-you-need-hadoop-to-run-spark\/#Why_Enterprises_Prefer_to_Run_Spark_with_Hadoop\" >Why Enterprises Prefer to Run Spark with Hadoop?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.whizlabs.com\/blog\/do-you-need-hadoop-to-run-spark\/#How_Can_You_Run_Spark_without_HDFS\" >How Can You Run Spark without HDFS?<\/a><\/li><\/ul><\/nav><\/div>\n<h2 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"Need_of_Hadoop_to_Run_Spark\"><\/span><span lang=\"EN-GB\">Need of Hadoop to Run Spark<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"text-align: justify;\"><span lang=\"EN-GB\">Hadoop and Spark are not mutually exclusive and can work together. Real-time and faster data processing in Hadoop is not possible without Spark. On the other hand, Spark doesn&#8217;t have any file system for distributed storage. However, many Big data projects deal with multi-petabytes of data that need to be stored in a distributed storage. Hence, in such a scenario, Hadoop&#8217;s distributed file system (HDFS) is used along with its resource manager YARN. <\/span><\/p>\n<p style=\"text-align: justify;\"><span lang=\"EN-GB\">Furthermore, to run Spark in a distributed mode, it is installed on top of Yarn. Then Spark\u2019s advanced analytics applications are used for data processing. Hence, if you run Spark in a distributed mode using HDFS, you can achieve maximum benefit by connecting all projects in the cluster. Hence, HDFS is the main need for Hadoop to run Spark in distributed mode.<\/span><\/p>\n<h2 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"Different_Ways_to_Run_Spark_in_Hadoop\"><\/span><span lang=\"EN-GB\">Different Ways to Run Spark in Hadoop<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"text-align: justify;\"><span lang=\"EN-GB\">There are three ways to deploy and run Spark in the Hadoop cluster.<\/span><\/p>\n<ol style=\"text-align: justify;\">\n<li><span lang=\"EN-GB\">Standalone<\/span><\/li>\n<li><span lang=\"EN-GB\">Over YARN<\/span><\/li>\n<li><span lang=\"EN-GB\">In MapReduce (SIMR)<\/span><\/li>\n<\/ol>\n<h4 style=\"text-align: justify;\"><strong><span lang=\"EN-GB\">Standalone Deployment<\/span><\/strong><\/h4>\n<p style=\"text-align: justify;\"><span lang=\"EN-GB\">This is the simplest mode of deployment. In the standalone mode resources are statically allocated on all or subsets of nodes in Hadoop cluster. However, you can run Spark parallel with MapReduce. This is the preferred deployment choice for Hadoop 1.x.<\/span> <span lang=\"EN-GB\">In this mode, Spark manages its cluster.<\/span><\/p>\n<h4 style=\"text-align: justify;\"><strong><span lang=\"EN-GB\">Over YARN Deployment<\/span><\/strong><\/h4>\n<p style=\"text-align: justify;\"><span lang=\"EN-GB\">There is no pre-installation, or admin access is required in this mode of deployment. Hence, it is an easy way of integration between Hadoop and Spark. This is the only cluster manager that ensures security. It is the better choice for a big Hadoop cluster in a production environment.<\/span><\/p>\n<h4 style=\"text-align: justify;\"><strong><span lang=\"EN-GB\">Spark In MapReduce (SIMR)<\/span><\/strong><\/h4>\n<p style=\"text-align: justify;\"><span lang=\"EN-GB\">In this mode of deployment, there is no need for YARN. Rather Spark jobs can be launched inside MapReduce.<\/span><\/p>\n<p><strong>Note:<\/strong> If you are preparing for a Hadoop interview, we recommend you to go through the top <a href=\"https:\/\/www.whizlabs.com\/blog\/top-50-hadoop-interview-questions\/\" target=\"_blank\" rel=\"noopener\">Hadoop interview questions<\/a> and get ready for the interview.<\/p>\n<h2 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"You_can_Run_Spark_without_Hadoop_in_Standalone_Mode\"><\/span><span lang=\"EN-GB\">You can Run Spark without Hadoop in Standalone Mode<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"text-align: justify;\"><span lang=\"EN-GB\">Spark and Hadoop are better together Hadoop is not essential to run Spark. If you go by Spark documentation, it is mentioned that there is no need for Hadoop if you run Spark in a standalone mode. In this case, you need resource managers like CanN or Mesos only. Moreover, you can run Spark without Hadoop and independently on a Hadoop cluster with Mesos provided you don\u2019t need any library from the Hadoop ecosystem.<\/span><\/p>\n<h2 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"Why_Enterprises_Prefer_to_Run_Spark_with_Hadoop\"><\/span><span lang=\"EN-GB\">Why Enterprises Prefer to Run Spark with Hadoop?<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"text-align: justify;\"><strong><span lang=\"EN-GB\">Spark has its ecosystem which consists of \u2013<\/span><\/strong><\/p>\n<ul style=\"text-align: justify;\">\n<li><span lang=\"EN-GB\">Spark core \u2013 Foundation for data processing <\/span><\/li>\n<li><span lang=\"EN-GB\">Spark SQL \u2013 Based on Shark and helps in data extracting, loading and transformation<\/span><\/li>\n<li><span lang=\"EN-GB\">Spark streaming \u2013 Light API helps in batch processing and streaming of data<\/span><\/li>\n<li><span lang=\"EN-GB\">Machine learning library \u2013 Helps in machine learning algorithm implementation.<\/span><\/li>\n<li><span lang=\"EN-GB\">Graph Analytics (GraphX) \u2013 Helps in representing <\/span><span lang=\"EN-GB\">Resilient Distributed Graph<\/span><\/li>\n<li><span lang=\"EN-GB\">Spark Cassandra Connector <\/span><\/li>\n<li><span lang=\"EN-GB\">Spark R integration<\/span><\/li>\n<\/ul>\n<p style=\"text-align: justify;\"><span lang=\"EN-GB\">Here is the layout of the Spark components in the ecosystem \u2013<\/span><\/p>\n<p style=\"text-align: justify;\"><span lang=\"EN-GB\"><a href=\"https:\/\/www.whizlabs.com\/wp-content\/uploads\/2017\/12\/Spark-Framework-Ecosystem.png\"><img decoding=\"async\" class=\"wp-image-49845 size-full aligncenter\" src=\"https:\/\/www.whizlabs.com\/wp-content\/uploads\/2017\/12\/Spark-Framework-Ecosystem.png\" alt=\"Spark Framework Ecosystem\" width=\"560\" height=\"315\" \/><\/a><\/span><\/p>\n<p style=\"text-align: justify;\"><span lang=\"EN-GB\">However, there are few challenges to this ecosystem which are still need to be addressed. These mainly deal with complex data types and streaming of those data. Success in these areas requires running<b> <\/b>Spark with other components of Hadoop ecosystems. Moreover, it can help in better analysis and processing of data for many use case scenarios. Using Spark with Hadoop distribution may be the most compelling reason why enterprises seek to run Spark on top of Hadoop.<\/span><\/p>\n<p style=\"text-align: justify;\"><span lang=\"EN-GB\">Moreover, using Spark with a commercially accredited distribution ensures its market creditability strongly. Databricks is the fundamental data structure of Apache Spark so you can also get a <a href=\"https:\/\/www.whizlabs.com\/blog\/5-best-apache-spark-certification\/\" target=\"_blank\" rel=\"noopener\">Databricks certification<\/a> to validate your Apache Spark skills. Other distributed file systems that are not compatible with Spark may create complexity during data processing. Hence, enterprises prefer to restrain run Spark without Hadoop.<\/span><\/p>\n<h2 style=\"text-align: justify;\"><span class=\"ez-toc-section\" id=\"How_Can_You_Run_Spark_without_HDFS\"><\/span><span lang=\"EN-GB\">How Can You Run Spark without HDFS?<\/span><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p style=\"text-align: justify;\"><span lang=\"EN-GB\">HDFS is just one of the file systems that Spark supports and not the final answer. If you don\u2019t have Hadoop set up in the environment what would you do? Furthermore, Spark is a cluster computing system and not a data storage system. Hence, what all it needs to run data processing is some external source of data storage to store and read data. It could be a local file system on your desktop. Moreover, you don\u2019t need to run HDFS unless you are using any file path in HDFS. <\/span><\/p>\n<p style=\"text-align: justify;\"><span lang=\"EN-GB\">Furthermore, as I told Spark needs an external storage source, it could be a no SQL database like Apache Cassandra or HBase or Amazon\u2019s S3. To run Spark, you just need to install Spark in the same node of Cassandra and use the cluster manager like YARN or MESOS. In this scenario also we can run Spark without Hadoop.<\/span><\/p>\n<h4 style=\"text-align: justify;\"><b><span lang=\"EN-GB\">Conclusion<\/span><\/b><\/h4>\n<p style=\"text-align: justify;\"><span lang=\"EN-GB\">Hence,<\/span> <span lang=\"EN-GB\">we concluded at this point that<\/span> <span lang=\"EN-GB\">we can run<\/span> <span lang=\"EN-GB\">Spark<\/span> <span lang=\"EN-GB\">without Hadoop. However, Spark is made to be an effective solution for distributed computing in multi-node mode. Hence, we can achieve the maximum benefit of data processing if we run Spark with HDFS or a similar file system. However, Spark and Hadoop both are open source and maintained by Apache. <\/span><\/p>\n<p style=\"text-align: justify;\"><span lang=\"EN-GB\">Hence they are compatible with each other. Furthermore, setting Spark up with a third-party file system solution can prove to be complicating. Therefore, it is easy to integrate Spark with Hadoop.<\/span> <span lang=\"EN-GB\">So, our question &#8211; Do you need Hadoop to run Spark? The definite answer is \u00ad\u2013 you can go either way. However, running Spark on top of Hadoop is the best solution due to its compatibility.<\/span><\/p>\n<p>[divider \/]<\/p>\n<p style=\"text-align: justify;\"><em><strong>Whizlabs Big Data Certification courses \u2013\u00a0<a href=\"https:\/\/www.whizlabs.com\/spark-developer-certification\/\" target=\"_blank\" rel=\"noopener noreferrer\">Spark Developer Certification (HDPCD)<\/a>\u00a0and\u00a0<a href=\"https:\/\/www.whizlabs.com\/hdpca-certification\/\" target=\"_blank\" rel=\"noopener noreferrer\">HDP Certified Administrator (HDPCA)<\/a>\u00a0<\/strong>are based on the Hortonworks Data Platform, a market giant of Big Data platforms. Whizlabs recognizes that interacting with data and increasing its comprehensibility is the need of the hour and hence, we are proud to launch our\u00a0<a href=\"https:\/\/www.whizlabs.com\/big-data-certifications\/\">Big Data Certifications<\/a>. We have created state-of-the-art content that should aid data developers and administrators to gain a competitive edge over others.<\/em><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hadoop and Apache Spark both are today\u2019s booming open-source Big data frameworks. Though Hadoop and Spark don\u2019t do the same thing, however, they are inter-related. The need for Hadoop is everywhere for Big data processing. However, Hadoop has a major drawback despite its many important features and benefits for data processing. MapReduce which is the native batch processing engine of Hadoop is not as fast as Spark. And that\u2019s where Spark takes an edge over Hadoop. In addition to that, most of today&#8217;s big data projects demand batch workload as well as real-time data processing. Hadoop&#8217;s MapReduce isn&#8217;t cut out [&hellip;]<\/p>\n","protected":false},"author":220,"featured_media":49750,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_uag_custom_page_level_css":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[6],"tags":[152,422,1120,1406,1476,1481,1482],"class_list":["post-48519","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-big-data","tag-apache-spark","tag-big-data","tag-need-of-hadoop","tag-run-spark-with-hadoop","tag-spark-framework-ecosystem","tag-spark-without-hadoop","tag-spark-without-hdfs"],"uagb_featured_image_src":{"full":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2017\/12\/Do-You-Need-Hadoop-to-Run-Spark_-1.png",560,315,false],"thumbnail":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2017\/12\/Do-You-Need-Hadoop-to-Run-Spark_-1-150x150.png",150,150,true],"medium":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2017\/12\/Do-You-Need-Hadoop-to-Run-Spark_-1-300x169.png",300,169,true],"medium_large":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2017\/12\/Do-You-Need-Hadoop-to-Run-Spark_-1.png",560,315,false],"large":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2017\/12\/Do-You-Need-Hadoop-to-Run-Spark_-1.png",560,315,false],"1536x1536":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2017\/12\/Do-You-Need-Hadoop-to-Run-Spark_-1.png",560,315,false],"2048x2048":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2017\/12\/Do-You-Need-Hadoop-to-Run-Spark_-1.png",560,315,false],"profile_24":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2017\/12\/Do-You-Need-Hadoop-to-Run-Spark_-1.png",24,14,false],"profile_48":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2017\/12\/Do-You-Need-Hadoop-to-Run-Spark_-1.png",48,27,false],"profile_96":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2017\/12\/Do-You-Need-Hadoop-to-Run-Spark_-1.png",96,54,false],"profile_150":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2017\/12\/Do-You-Need-Hadoop-to-Run-Spark_-1.png",150,84,false],"profile_300":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2017\/12\/Do-You-Need-Hadoop-to-Run-Spark_-1.png",300,169,false],"tptn_thumbnail":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2017\/12\/Do-You-Need-Hadoop-to-Run-Spark_-1-250x250.png",250,250,true],"web-stories-poster-portrait":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2017\/12\/Do-You-Need-Hadoop-to-Run-Spark_-1.png",560,315,false],"web-stories-publisher-logo":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2017\/12\/Do-You-Need-Hadoop-to-Run-Spark_-1.png",96,54,false],"web-stories-thumbnail":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2017\/12\/Do-You-Need-Hadoop-to-Run-Spark_-1.png",150,84,false]},"uagb_author_info":{"display_name":"Aditi Malhotra","author_link":"https:\/\/www.whizlabs.com\/blog\/author\/aditi\/"},"uagb_comment_info":5,"uagb_excerpt":"Hadoop and Apache Spark both are today\u2019s booming open-source Big data frameworks. Though Hadoop and Spark don\u2019t do the same thing, however, they are inter-related. The need for Hadoop is everywhere for Big data processing. However, Hadoop has a major drawback despite its many important features and benefits for data processing. MapReduce which is the&hellip;","_links":{"self":[{"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/posts\/48519","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/users\/220"}],"replies":[{"embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/comments?post=48519"}],"version-history":[{"count":7,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/posts\/48519\/revisions"}],"predecessor-version":[{"id":77298,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/posts\/48519\/revisions\/77298"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/media\/49750"}],"wp:attachment":[{"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/media?parent=48519"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/categories?post=48519"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/tags?post=48519"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}