{"id":81518,"date":"2022-03-16T22:39:31","date_gmt":"2022-03-17T04:09:31","guid":{"rendered":"https:\/\/www.whizlabs.com\/blog\/?p=81518"},"modified":"2024-04-23T17:33:40","modified_gmt":"2024-04-23T12:03:40","slug":"google-cloud-professional-data-engineer-exam-questions","status":"publish","type":"post","link":"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/","title":{"rendered":"NEW Questions &#038; Answers on Google Cloud Certified Professional Data Engineer Exam"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Are you looking for the <a href=\"https:\/\/www.whizlabs.com\/google-cloud-certified-professional-data-engineer\/\">Google Cloud Professional Data Engineer Exam Questions<\/a>? The questions and answers provided here test and enhance your knowledge of the exam objectives.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">A professional Data Engineer collects, transforms, and publishes the data, thereby enabling data-driven decision making. <\/span>Earning a Google Cloud Certified Professional Data Engineer certification may help you in pursuing a better career in the Google cloud industry. To pass the actual exam, you have to spend more time on learning &amp; re-learning through multiple practice tests.<\/p>\n<p>Let&#8217;s start learning!<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_76 ez-toc-wrap-left counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #ea7e02;color:#ea7e02\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #ea7e02;color:#ea7e02\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Design_Data_Processing_Systems\" >Domain: Design Data Processing Systems<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Design_Data_Processing_Systems-2\" >Domain: Design Data Processing Systems<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Build_and_Operationalize_Data_Processing_Systems\" >Domain: Build and Operationalize Data Processing Systems<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Operationalize_Machine_Learning_Models\" >Domain: Operationalize Machine Learning Models\u00a0<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Operationalize_Machine_Learning_Models-2\" >Domain: Operationalize Machine Learning Models\u00a0<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Operationalize_Machine_Learning_Models-3\" >Domain: Operationalize Machine Learning Models\u00a0<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Ensure_Solution_Quality\" >Domain: Ensure Solution Quality<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Design_Data_Processing_Systems-3\" >Domain: Design Data Processing Systems<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Design_Data_Processing_Systems-4\" >Domain: Design Data Processing Systems<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Build_and_Operationalize_Data_Processing_Systems-2\" >Domain:\u00a0 Build and Operationalize Data Processing Systems<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Build_and_Operationalize_Data_Processing_Systems-3\" >Domain: Build and Operationalize Data Processing Systems<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Operationalize_Machine_Learning_Models-4\" >Domain: Operationalize Machine Learning Models\u00a0<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Operationalize_Machine_Learning_Models-5\" >Domain: Operationalize Machine Learning Models\u00a0<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Operationalize_Machine_Learning_Models-6\" >Domain: Operationalize Machine Learning Models\u00a0<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Design_Data_Processing_Systems-5\" >Domain: Design Data Processing Systems<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Design_Data_Processing_Systems-6\" >Domain: Design Data Processing Systems<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-17\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Build_and_Operationalize_Data_Processing_Systems-4\" >Domain: Build and Operationalize Data Processing Systems<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-18\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Operationalize_Machine_Learning_Models-7\" >Domain: Operationalize Machine Learning Models<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-19\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Operationalize_Machine_Learning_Models-8\" >Domain: Operationalize Machine Learning Models<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-20\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Operationalize_Machine_Learning_Models-9\" >Domain: Operationalize Machine Learning Models<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-21\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Design_Data_Processing_Systems-7\" >Domain: Design Data Processing Systems<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-22\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Design_Data_Processing_Systems-8\" >Domain: Design Data Processing Systems<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-23\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Maintaining_and_Automating_Data_Workloads\" >Domain: Maintaining and Automating Data Workloads<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-24\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Maintaining_and_Automating_Data_Workloads-2\" >Domain: Maintaining and Automating Data Workloads<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-25\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Maintaining_and_Automating_Data_Workloads-3\" >Domain: Maintaining and Automating Data Workloads<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-26\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Maintaining_and_Automating_Data_Workloads-4\" >Domain: Maintaining and Automating Data Workloads<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-27\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Maintaining_and_Automating_Data_Workloads-5\" >Domain: Maintaining and Automating Data Workloads<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-28\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Preparing_and_Using_Data_for_Analysis\" >Domain: Preparing and Using Data for Analysis<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-29\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Preparing_and_Using_Data_for_Analysis-2\" >Domain: Preparing and Using Data for Analysis\u00a0<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-30\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Preparing_and_using_data_for_analysis\" >Domain: Preparing and using data for analysis\u00a0<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-31\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Preparing_and_using_data_for_analysis-2\" >Domain: Preparing and using data for analysis<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-32\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Domain_Preparing_and_Using_Data_for_Analysis-3\" >Domain: Preparing and Using Data for Analysis\u00a0<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-33\" href=\"https:\/\/www.whizlabs.com\/blog\/google-cloud-professional-data-engineer-exam-questions\/#Summary\" >Summary<\/a><\/li><\/ul><\/nav><\/div>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Design_Data_Processing_Systems\"><\/span><span style=\"font-weight: 400;\">Domain: Design Data Processing Systems<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q1 : A company is migrating its current infrastructure from on-premise to Google cloud. It stores over 280TB of data on its on-premise HDFS servers. You were tasked to move data from HDFS to Google Storage in a secure and efficient manner. Which of the following approaches are best to fulfill this task?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Install Google Storage gsutil tool on servers and copy the data from HDFS to Google Storage.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Use Cloud Data Transfer Service to migrate the data to Google Storage.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Import the data from HDFS to BigQuery. Then, export the data to Google Storage in AVRO format.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Use Transfer Appliance Service to migrate the data to Google Storage.<\/span><\/p>\n<p><b>Correct Answer: D<\/b><\/p>\n<p><b>Explanation :\u00a0<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Storage Transfer Service allows you to quickly import <\/span><i><span style=\"font-weight: 400;\">ONLINE <\/span><\/i><span style=\"font-weight: 400;\">data into Cloud Storage. You can also set up a repeating schedule for transferring data, as well as transfer data within Cloud Storage, from one bucket to another.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Transfer Appliance is an <\/span><i><span style=\"font-weight: 400;\">OFFLINE <\/span><\/i><span style=\"font-weight: 400;\">secure, high capacity storage server that you set up in your datacenter. You fill it with data and ship it to an ingest location where the data is uploaded to Google Cloud Storage.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">So, <strong>answer D is the correct<\/strong> one, while B is incorrect.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>Answer A is incorrect:<\/strong> gsutil tool is good for programmatic usage by developers and may be useful to copy and move megabytes\/gigabytes of data. Not so practical for Terabytes of data. It\u2019s also not reliable data transfer technique as it is related to the machine\u2019s connectivity with Google Cloud.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>Answer C is incorrect:<\/strong> In order to migrate to BigQuery, you need to migrate data to Google Storage. This is a useless approach as the main challenge is migrating data from HDFS to Google Storage and BigQuery won\u2019t help solving it.<\/span><\/p>\n<p><b>References: <\/b><span style=\"font-weight: 400;\">Google Cloud Storage Transfer Service:<\/span><a href=\"https:\/\/cloud.google.com\/storage-transfer\/docs\/\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/storage-transfer\/docs\/ <\/span><\/a><span style=\"font-weight: 400;\">Google Appliance Transfer Service:<\/span><a href=\"https:\/\/cloud.google.com\/transfer-appliance\/\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/transfer-appliance\/ <\/span><\/a><b>Migrate HDFS to Google Storage:<\/b><a href=\"https:\/\/cloud.google.com\/solutions\/migration\/hadoop\/hadoop-gcp-migration-data\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/solutions\/migration\/hadoop\/hadoop-gcp-<\/span> <span style=\"font-weight: 400;\">migration-data<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Design_Data_Processing_Systems-2\"><\/span><span style=\"font-weight: 400;\">Domain: Design Data Processing Systems<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q2 : You have a Dataflow pipeline to run and process a set of data files received from a client, for transformation and loading into a data warehouse. This pipeline should run each morning so that metrics can be ready when stakeholders need the latest stats based on data sent the day before. Which tool should you use?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Cloud Functions<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Compute Engine<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Kubernetes Engine<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Cloud Scheduler<\/span><\/p>\n<p><b>Correct Answer: D<\/b><\/p>\n<p><b>Explanation :\u00a0<\/b><\/p>\n<p><span style=\"font-weight: 400;\">The question is asking to suggest a name of service that can be used to trigger to schedule a dataflow pipeline.<\/span><\/p>\n<p><b>A: Cloud Functions <\/b><span style=\"font-weight: 400;\">Cloud Functions can be written in Node.js, Python, Go, Java, .NET, Ruby, and PHP programming languages, and are executed in language-specific runtimes. This can be invoked HTTP functions from standard HTTP requests. These HTTP requests wait for the response and support handling of common HTTP request methods like GET, PUT, POST, DELETE and OPTIONS<\/span><span style=\"font-weight: 400;\"> Hence this is not a correct solution.<\/span><br \/>\n<b>B: Compute Engine <\/b><span style=\"font-weight: 400;\">This is a VM and hence not a service to schedule a dataflow pipeline. Hence this is not a correct solution.<\/span><br \/>\n<b>C: Kubernetes Engine <\/b><span style=\"font-weight: 400;\">This is work on model of Master and node and hence not correct solution.<\/span><br \/>\n<b>D: Cloud Scheduler <\/b><span style=\"font-weight: 400;\">Cloud Scheduler is a fully managed enterprise-grade cron job scheduler. It allows you to schedule virtually any job, including batch, big data jobs, cloud infrastructure operations, and more. You can automate everything, including retries in case of failure to reduce manual toil and intervention <\/span><span style=\"font-weight: 400;\">Hence this is a correct solution<\/span><\/p>\n<p><b>Reference: <\/b><span style=\"font-weight: 400;\">Cloud Scheduler:<\/span><a href=\"https:\/\/cloud.google.com\/scheduler\/\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/scheduler\/<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Build_and_Operationalize_Data_Processing_Systems\"><\/span><span style=\"font-weight: 400;\">Domain: Build and Operationalize Data Processing Systems<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q3 : A pharmaceutical factory has over 100,000 different sensors generating JSON-format events every 10 seconds to be collected. You need to gather the event data for sensor &amp; time series analysis.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">Which database is best used to collect event data?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Google Storage<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Cloud Spanner<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Big Table<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Datastore<\/span><\/p>\n<p><b>Correct Answer: C<\/b><\/p>\n<p><b>Explanation :\u00a0<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Cloud Big Table is a petabyte-scale, fully managed NoSQL database service for large analytical and operational workloads.<\/span><\/p>\n<p><b>Answer A is incorrect<\/b><span style=\"font-weight: 400;\">: Storing data to Google Storage needs further processing to be eligible for time- series analysis using tools such as Apache Hive or Presto.<\/span><br \/>\n<b>Answer B is incorrect:<\/b><span style=\"font-weight: 400;\"> Cloud Spanner is a relational database service. It is not recommended for JSON- format data that may have changing structure.<\/span><br \/>\n<b>Answer D is incorrect:<\/b><span style=\"font-weight: 400;\"> Datastore can be a potential choice since it\u2019s a NoSQL database. The issue is, Datastore is not built for storing and reading huge data volumes as in this scenario. Datastore is deisgned for web applications of small scale.<\/span><\/p>\n<p><b>Reference: <\/b><span style=\"font-weight: 400;\">Big Table vs. Datastore:<\/span><a href=\"https:\/\/stackoverflow.com\/questions\/30085326\/google-cloud-bigtable-vs-google-cloud-datastore\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/stackoverflow.com\/questions\/30085326\/google-cloud-bigtable-vs-<\/span> <span style=\"font-weight: 400;\">google-cloud-datastore<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Operationalize_Machine_Learning_Models\"><\/span><span style=\"font-weight: 400;\">Domain: Operationalize Machine Learning Models\u00a0<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q4 : A financial services firm providing products such as credit cards and bank loans receives thousands of online applications from clients applying for their products. Because it takes a lot of effort to scan and check all applications if they meet the minimum requirements for the products they are applying for, they want to build a machine learning model takes application fields like annual income, marital status, date of birth, occupation and other attributes as input and finds out if the applicant is qualified for the product the client applies for.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">What is the machine learning technique will help build such model?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Regression<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B. <\/strong>Classification<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Clustering<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Reinforcement learning<\/span><\/p>\n<p><b>Correct Answer: B<\/b><\/p>\n<p><b>Explanation :\u00a0<\/b><\/p>\n<p><span style=\"font-weight: 400;\">A regression problem is a problem which its output variable is of continuous value. Problems which finds out about variables such as weights, prices or age are considered regression problems.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">A classification problem is a problem which the output variable is a category. Examples of classification problems are finding a passenger\u2019s nationality, detect if a patient is diagnosed with a disease or if an applicant is qualified for a job interview.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Regression and classification are supervised learning problems. It means, the machine learns from past experiences by training it on a labeled data set. A training set is a set of rows with input and output parameters. The machine then learns from the training set and improves its parameters for better detection.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Clustering is an unsupervised learning method. An unsupervised learning is a method to find references between input data without labeled output. The purpose is to find meaningful structure between the input sets with similar features and group them. Clustering is the method of grouping data points share similarities and separating dissimilar points to other groups. Examples of clustering applications are customer segmentation (new, frequent, loyal, ..), city land value and detecting anomalies in network traffic.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Reinforcement learning is a technique which a machine takes actions without training sets to reach the highest rewards possible. The agent learns from trial and decides what to do to perform a given task without supervision. The task punishes the agent for a wrong action and rewards it for achieving the task. Examples of reinforcement learning is asking an agent to play a maze game to reach the exit with traps along the way or making an agent play a video game and win a racing game.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">From the explanation above, we can see the scenario problem which finding if a client is qualified for a product is a classification problem. So, answer B is correct.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Operationalize_Machine_Learning_Models-2\"><\/span><span style=\"font-weight: 400;\">Domain: Operationalize Machine Learning Models\u00a0<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q5 : Data scientists are testing a TensorFlow model on Google Cloud using four NVIDIA Tesla P100 GPUs to test a TensorFlow model. After experimenting with several use cases, they decide to scale up by using a different machine type for testing. As a data engineer, you are responsible of assisting with choosing the right machine type to reach a better model performance. What should you do?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Use TPU machine type for testing the TensorFlow on.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Scale up machine type by using NVIDIA Tesla V100 GPUs.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Use 8 NVIDIA Tesla K80 GPUs instead of the current 4 P100 GPUs.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Increase number of Tesla P100 GPUs used until test results return satisfactory performance.<\/span><\/p>\n<p><b>Correct Answer: A<\/b><\/p>\n<p><b>Explanation :\u00a0<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Google built the Tensor Processing Unit (TPU) in order to make it possible for data scientists to achieve business and research breakthroughs ranging from network security to medical diagnoses. Cloud TPU is the custom-designed machine learning ASIC that powers Google products like Translate, Photos, Search, Assistant, and Gmail.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><img decoding=\"async\" class=\"aligncenter wp-image-81519 size-full\" title=\"Google Cloud ResNet-50 Training Cost Comparison\" src=\"https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de5.png\" alt=\"Google Cloud ResNet-50 Training Cost Comparison\" width=\"1304\" height=\"775\" srcset=\"https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de5.png 1304w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de5-300x178.png 300w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de5-1024x609.png 1024w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de5-768x456.png 768w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de5-707x420.png 707w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de5-640x380.png 640w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de5-681x405.png 681w\" sizes=\"(max-width: 1304px) 100vw, 1304px\" \/>As for cost, below is a benchmark for cost comparison between TPU &amp; GPU types:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">So, for this scenario, using TPU machine type is the recommended type to build Tensorflow models on.\u00a0<\/span><\/p>\n<p><b>References: <\/b><span style=\"font-weight: 400;\">TPU Machine Type: <\/span><a href=\"https:\/\/cloud.google.com\/tpu\/\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/tpu\/<\/span><\/a><br \/>\n<span style=\"font-weight: 400;\">GPU Machine Type:\u00a0<a href=\"https:\/\/cloud.google.com\/gpu\/\" target=\"_blank\" rel=\"nofollow noopener\">https:\/\/cloud.google.com\/gpu\/<\/a><\/span><br \/>\n<a href=\"https:\/\/cloud.google.com\/blog\/products\/ai-machine-learning\/what-makes-tpus-fine-tuned-for-deep-learning\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/blog\/products\/ai-machine-learning\/what-makes-tpus-fine-tuned-for-deep-<\/span> <span style=\"font-weight: 400;\">learning<\/span><\/a><br \/>\n<span style=\"font-weight: 400;\">Using GPUs for training models in the cloud:<\/span><a href=\"https:\/\/cloud.google.com\/ml-engine\/docs\/tensorflow\/using-gpus\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/ml-engine\/docs\/tensorflow\/using-gpus<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Operationalize_Machine_Learning_Models-3\"><\/span><span style=\"font-weight: 400;\">Domain: Operationalize Machine Learning Models\u00a0<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q6 : The data scientists at your company have built a machine learning neural network model using TensorFlow. After several tests on the model, the team decides the model is ready to be deployed for production use. Which of the following services would you use to host the model to Google Cloud?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Google Kubernetes Engine<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Google ML Deep Learning VM<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Google Container Registry<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Google Machine Learning Model<\/span><\/p>\n<p><b>Correct Answer: D<\/b><\/p>\n<p><b>Explanation :\u00a0<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Google Kubernetes Engine is a managed, production-ready environment for deploying containerized applications. It brings our latest innovations in developer productivity, resource efficiency, automated operations, and open-source flexibility to accelerate your time to market.<\/span><\/p>\n<p><b>Answer A is incorrect:<\/b><span style=\"font-weight: 400;\"> GKE is a service to deploy and scale Docker containers in the cloud. You need to build the docker image for your model if you want to use it, which is not recommended for this scenario.<\/span><br \/>\n<b>Answer B is incorrect:<\/b><span style=\"font-weight: 400;\"> Google ML Deep Learning VM is a service that offers pre-configured virtual machines for deep learning applications. It is not used to deploy ML models to production.<\/span><br \/>\n<b>Answer C is incorrect:<\/b><span style=\"font-weight: 400;\"> Google Container Registry is a service to store, manage, and secure your Docker container images. It does not for deploying machine learning models. <\/span><span style=\"font-weight: 400;\">Cloud Machine Learning Engine is a managed service that lets developers and data scientists build and run superior machine learning models in production. Cloud ML Engine offers training and prediction services, which can be used together or individually.<\/span><br \/>\n<b>Answer D is correct<\/b><span style=\"font-weight: 400;\">: Google ML Model is the service to use to deploy your machine learning models.<\/span><\/p>\n<p><b>References: <\/b><span style=\"font-weight: 400;\">Google Kubernetes Engine:<\/span><a href=\"https:\/\/cloud.google.com\/kubernetes-engine\/\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/kubernetes-engine\/, <\/span><\/a><span style=\"font-weight: 400;\">Google Machine Learning Engine:<\/span><a href=\"https:\/\/cloud.google.com\/ml-engine\/\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/ml-engine\/,<\/span><\/a> <span style=\"font-weight: 400;\">Google ML Deep Learning VM:<\/span><a href=\"https:\/\/cloud.google.com\/deep-learning-vm\/\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/deep-learning-vm\/<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Ensure_Solution_Quality\"><\/span><span style=\"font-weight: 400;\">Domain: Ensure Solution Quality<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q7 : You launched a Dataproc cluster to perform some Apache Spark jobs. You are looking for a method to securely transfer web traffic data between your machine\u2019s web browser and Dataproc cluster.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">How can you achieve this?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> FTP connection<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> SSH tunnel<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> VPN connection<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Incognito mode<\/span><\/p>\n<p><b>Correct Answer: B<\/b><\/p>\n<p><b>Explanation :\u00a0<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Some of the core open source components included with Google Cloud Dataproc clusters, such as Apache Hadoop and Apache Spark, provide web interfaces. These interfaces can be used to manage and monitor cluster resources and facilities, such as the YARN resource manager, the Hadoop Distributed File System (HDFS), MapReduce, and Spark. Other components or applications that you install on your cluster may also provide web interfaces.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">It is recommended to create an SSH tunnel for a secure connection between your web browser and Dataproc\u2019s master node. SSH tunnel supports traffic proxying using the SOCKS protocol. To configure your browser to use the proxy, start a new browser session with proxy server parameters.<\/span><\/p>\n<p><b>Reference: <\/b><span style=\"font-weight: 400;\">Dataproc \u2013 Cluster Web Interfaces:<\/span><a href=\"https:\/\/cloud.google.com\/dataproc\/docs\/concepts\/accessing\/cluster-web-interfaces#connecting_to_the_web_interfaces\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/dataproc\/docs\/concepts\/accessing\/cluster-<\/span> <span style=\"font-weight: 400;\">web-interfaces#connecting_to_the_web_interfaces<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Design_Data_Processing_Systems-3\"><\/span><span style=\"font-weight: 400;\">Domain: Design Data Processing Systems<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q8 : You have deployed a Tensorflow machine learning model using Cloud Machine Learning Engine. The model should be able to handle high volume of instances in a job to run complex models. The model should also write the output to Google Storage.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">Which of the following approaches is recommended?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Use online prediction when using the model. Batch prediction supports asynchronous requests.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Use batch prediction when using the model. Batch prediction supports asynchronous requests.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Use batch prediction when using the model to return the results as soon as possible.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Use online prediction when using the model to return the results as soon as possible.<\/span><\/p>\n<p><b>Correct Answer: B<\/b><\/p>\n<p><b>Explanation :\u00a0<\/b><\/p>\n<p><span style=\"font-weight: 400;\"><img decoding=\"async\" class=\"aligncenter wp-image-81520 size-full\" title=\"Google Cloud Online Prediction vs Bactch Prediction \" src=\"https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de8.jpg\" alt=\"Google Cloud Online Prediction vs Bactch Prediction \" width=\"972\" height=\"634\" srcset=\"https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de8.jpg 972w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de8-300x196.jpg 300w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de8-768x501.jpg 768w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de8-644x420.jpg 644w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de8-640x417.jpg 640w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de8-681x444.jpg 681w\" sizes=\"(max-width: 972px) 100vw, 972px\" \/>AI Platform provides two ways to get predictions from trained models: <\/span><span style=\"font-weight: 400;\">online prediction <\/span><span style=\"font-weight: 400;\">(sometimes called HTTP prediction), and <\/span><span style=\"font-weight: 400;\">batch prediction<\/span><span style=\"font-weight: 400;\">. In both cases, you pass input data to a cloud-hosted machine-learning model and get inferences for each data instance. The differences are shown in the following table:<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Batch prediction can handle high volume of instances in a job to run complex models. It also writes the output to Google Storage by specified location.<\/span><\/p>\n<p><b>Answer A &amp; D are incorrect: <\/b><span style=\"font-weight: 400;\">Online prediction doesn\u2019t support handling high volume of instances per job and doesn\u2019t write output to Google Storage.<\/span><br \/>\n<b>Answer C is incorrect<\/b><span style=\"font-weight: 400;\">: Batch prediction doesn\u2019t return the output as soon as possible, it supports asynchronous requests.<\/span><\/p>\n<p><b>Reference: <\/b><span style=\"font-weight: 400;\">Online vs. Batch Prediction:<\/span><a href=\"https:\/\/cloud.google.com\/ml-engine\/docs\/tensorflow\/online-vs-batch-prediction\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/ml-engine\/docs\/tensorflow\/online-vs-batch-<\/span> <span style=\"font-weight: 400;\">prediction<\/span><\/a><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Design_Data_Processing_Systems-4\"><\/span><span style=\"font-weight: 400;\">Domain: Design Data Processing Systems<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q9 : A company uses Airflow to orchestrate its data pipelines and DAGs (Directed Acyclic Graphs), installed and maintained on-premise by DevOps team. The company wants to migrate the data pipelines managed in Airflow to Google Cloud. The company is looking for a migration method which can make DAGs available and migrated without extra code modifications so the data pipelines can be available once migrated. Which service should you use?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> App Engine<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Cloud Function<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Dataflow<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Cloud Composer<\/span><\/p>\n<p><b>Correct Answer: D<\/b><\/p>\n<p><b>Explanation :<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Cloud Composer is a fully managed workflow orchestration service built on Apache Airflow. Cloud composer is built specifically to schedule and monitor workflows and take required actions. You can use Cloud Composer to orchestrate dataflow pipeline and create a custom sensor to detect file\u2019s condition if any changes occurred, then it triggers the dataflow pipeline to run again.<\/span><\/p>\n<p><b>Reference: <\/b><span style=\"font-weight: 400;\">Cloud Composer: <\/span><a href=\"https:\/\/cloud.google.com\/composer\/\" target=\"_blank\" rel=\"nofollow noopener\">https:\/\/cloud.google.com\/composer\/<\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Build_and_Operationalize_Data_Processing_Systems-2\"><\/span><span style=\"font-weight: 400;\">Domain:\u00a0 Build and Operationalize Data Processing Systems<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q10 : An air-quality research facility monitors the quality of the air and alerts of possible high air pollution in a region. The facility receives event data from 25,000 sensors every 60 seconds. Event data is then used for time-series analysis per region. Cloud experts suggested using BigTable for storing event data.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">What will you design the row key for each even in BigTable?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A. <\/strong>Use event\u2019s timestamp as row key.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Use combination of sensor ID with timestamp as <\/span><i><span style=\"font-weight: 400;\">sensorID-timestamp<\/span><\/i><span style=\"font-weight: 400;\">.<\/span><br \/>\n<span style=\"font-weight: 400;\"> <strong>C. <\/strong>Use combination of sensor ID with timestamp as timestamp<\/span><i><span style=\"font-weight: 400;\">-sensorID<\/span><\/i><span style=\"font-weight: 400;\">.<\/span><br \/>\n<span style=\"font-weight: 400;\"> <strong>D. <\/strong>Use sensor ID as row key.<\/span><\/p>\n<p><b>Correct Answer: B<\/b><\/p>\n<p><b>Explanation :\u00a0<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Storing time-series data in Cloud Bigtable is a natural fit. Cloud Bigtable stores data as unstructured columns in rows; each row has a row key, and row keys are sorted lexicographically.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">For time series, you should generally use tall and narrow tables. This is for two reasons: Storing one event per row makes it easier to run queries against your data. Storing many events per row makes it more likely that the total row size will exceed the recommended maximum (see Rows can be big but are not infinite).<\/span><\/p>\n<p><span style=\"font-weight: 400;\">When Cloud Bigtable stores rows, it sorts them by row key in lexicographic order. There is effectively a single index per table, which is the row key. Queries that access a single row, or a contiguous range of rows, execute quickly and efficiently. All other queries result in a full table scan, which will be far, far slower. A full table scan is exactly what it sounds like\u2014every row of your table is examined in turn.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">For Cloud Bigtable, where you could be storing many petabytes of data in a single table, the performance of a full table scan will only get worse as your system grows.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Choosing a row key that facilitates common queries is of paramount importance to the overall performance of the system. Enumerate your queries, put them in order of importance, and then design row keys that work for those queries.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">From the description, you need to combine both sensor ID and timestamp in order to fetch data you want fast. So, answers A &amp; D are incorrect.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">If you start the row key with timestamp, most recent data will be inserted at the bottom of the table since rows are sorted in lexicographic order. Starting the row key with sensor ID will allow writing all sensor\u2019s events together and allow distributing data among nodes.<\/span><\/p>\n<p><b>Reference: <\/b><span style=\"font-weight: 400;\">BigTable \u2013 Schema Design for Time Series Data:<\/span><a href=\"https:\/\/cloud.google.com\/bigtable\/docs\/schema-design-time-series\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/bigtable\/docs\/schema-<\/span> <span style=\"font-weight: 400;\">design-time-series<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Build_and_Operationalize_Data_Processing_Systems-3\"><\/span><span style=\"font-weight: 400;\">Domain: Build and Operationalize Data Processing Systems<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q11 : Your company hosts a gaming app which reaches over 30,000 players in a single minute. The app generates event data including information about players state, score, location coordinates and other stats. You need to find a storage solution which can support high read\/write throughput with very low latency which doesn\u2019t exceed 10 milliseconds to ensure a quality performance experience for the players.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">Which of the following is the best option for this scenario?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A. <\/strong>Cloud Spanner<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B. <\/strong>BigQuery<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C. <\/strong>BigTable<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D. <\/strong>Datastore<\/span><\/p>\n<p><b>Correct Answer: C<\/b><\/p>\n<p><b>Explanation :\u00a0<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Cloud BigTable is a petabyte-scale, fully managed NoSQL database service for large analytical and operational workloads. Under a typical workload, Cloud BigTable delivers highly predictable performance. When everything is running smoothly, a typical workload can achieve the following performance for each node in the Cloud Bigtable cluster, depending on which type of storage the cluster uses:<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><img decoding=\"async\" class=\"aligncenter wp-image-81521 size-full\" title=\"Google Cloud BigTable\" src=\"https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de11.jpg\" alt=\"Google Cloud BigTable\" width=\"969\" height=\"135\" srcset=\"https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de11.jpg 969w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de11-300x42.jpg 300w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de11-768x107.jpg 768w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de11-640x89.jpg 640w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de11-681x95.jpg 681w\" sizes=\"(max-width: 969px) 100vw, 969px\" \/>In general, a cluster&#8217;s performance increases linearly as you add nodes to the cluster. For example, if you create an SSD cluster with 10 nodes, the cluster can support up to 100,000 rows per second for a typical read-only or write-only workload, with 6 ms latency for each read or write operation.<\/span><\/p>\n<p><b>Answer A is incorrect:<\/b><span style=\"font-weight: 400;\"> Cloud Spanner does not guarantee the same performance and low latency as BigTable.<\/span><br \/>\n<b>Answer B is incorrect:<\/b><span style=\"font-weight: 400;\"> While BigQuery is a potential choice, BigQuery doesn\u2019t provide high throughput and low latency as powerful as BigTable.<\/span><br \/>\n<b>Answer D is incorrect<\/b><span style=\"font-weight: 400;\">: Datastore can be a potential choice since it\u2019s a NoSQL database. The issue is, Datastore is not built for storing and reading huge data volumes as in this scenario. Datastore is designed for web applications of small scale.<\/span><\/p>\n<p><b>Reference: <\/b><span style=\"font-weight: 400;\">Understanding BigTable Performance:<\/span><a href=\"https:\/\/cloud.google.com\/bigtable\/docs\/performance\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/bigtable\/docs\/performance<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Operationalize_Machine_Learning_Models-4\"><\/span><span style=\"font-weight: 400;\">Domain: Operationalize Machine Learning Models\u00a0<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q12 : An online learning platform wants to generate captions for its videos. The platform offers around 2,500 courses with topics about business, finance, cooking, development &amp; science. The platform allows content with different languages such as French, German, Turkish and Thai. Thus, this can be very difficult for a single team to caption all available courses and they are looking for an approach which helps do such<\/span> <span style=\"font-weight: 400;\">massive job.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">Which product from Google Cloud will you suggest them to use?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A. <\/strong>Cloud Speech-to-Text.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B. <\/strong>Cloud Natural Language.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C. <\/strong>Machine Learning Engine.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D. <\/strong>AutoML Vision API.<\/span><\/p>\n<p><b>Correct Answer: A<\/b><\/p>\n<p><b>Explanation :<\/b><\/p>\n<p><b>Answer A is correct<\/b><span style=\"font-weight: 400;\">: Cloud Speech-to-Text is a service to generate captions from videos by detecting speakers language and speech.<\/span><br \/>\n<b>Answer B is incorrect:<\/b><span style=\"font-weight: 400;\"> Cloud natural language service is to derive insights from unstructured text revealing meaning of the documents and categorize articles. It won\u2019t help extracting captions from videos.<\/span><br \/>\n<b>Answer C is incorrect:<\/b><span style=\"font-weight: 400;\"> Machine Learning Engine is a managed service letting developers and scientists build their own models and run them in production. This means, you have to build your own model to generate text from videos which needs much effort and experience to build such model. So, it\u2019s not a practical solution for this scenario.<\/span><br \/>\n<b>Answer D is incorrect:<\/b><span style=\"font-weight: 400;\"> AutoML Vision API is a service to recognize and derive insights from images by either using pre-trained models or training a custom model based on a set of photographics.<\/span><\/p>\n<p><b>References: <\/b><span style=\"font-weight: 400;\">Google NLP:<\/span><a href=\"https:\/\/cloud.google.com\/natural-language\/\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/natural-language\/<\/span><\/a>, <span style=\"font-weight: 400;\">Google Machine Learning Engine:<\/span><a href=\"https:\/\/cloud.google.com\/ml-engine\/\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/ml-engine\/<\/span><\/a>,\u00a0<span style=\"font-weight: 400;\">Google Vision API:<\/span><a href=\"https:\/\/cloud.google.com\/vision\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/vision<\/span><\/a>,\u00a0<span style=\"font-weight: 400;\">Google Speech-to-Text API:<\/span><a href=\"https:\/\/cloud.google.com\/speech-to-text\/\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/speech-to-text\/<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Operationalize_Machine_Learning_Models-5\"><\/span><span style=\"font-weight: 400;\">Domain: Operationalize Machine Learning Models\u00a0<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q13 : You are building a machine learning model to solve a binary classification problem. The model is going to predict the likelihood of a customer to be using a fraudulent credit card when purchasing online.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">Since there is a very small fraction of purchase transactions are proved to be fraudulent, more than 99% of the purchase transactions are valid.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">You want to make sure the machine learning model is able to identify the fraudulent transactions. What is the technique to examine the effectiveness of the model?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A. <\/strong>Gradient Descent<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B. <\/strong>Recall<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C. <\/strong>Feature engineering<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D. <\/strong>Precision<\/span><\/p>\n<p><b>Correct Answer: B<\/b><\/p>\n<p><b>Explanation :<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Precision is the formula to check how accurate the model is when most of the output are positives. In other words, if most of the output is yes.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Recall: is the formula to check how accurate the model is when most of the output are negatives. In other words, if most of the output is no.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Gradient Descent is an optimization algorithm to find the minimal value of a function. Gradient descent is used to find the minimal minimal RMSE or cost function.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Feature Engineering is the process of deciding which data is important for the model.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">From the description, answers A &amp; C are incorrect. It leaves us with B &amp; D.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Since the scenario mentions very little likelihood a transaction can be fraudulent. There are more \u201cno\u201d than \u201cyes\u201d means more negative than positive. Hence, to calculate the effectiveness of the model, you should use recall formula.<\/span><\/p>\n<p><b>References: <\/b><span style=\"font-weight: 400;\">Precision &amp; Recall:<\/span><a href=\"https:\/\/developers.google.com\/machine-learning\/crash-course\/classification\/precision-and-recall\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/developers.google.com\/machine-learning\/crash-course\/classification\/<\/span> <span style=\"font-weight: 400;\">precision-and-recall<\/span><\/a>,\u00a0<span style=\"font-weight: 400;\">Gradient Descent:<\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Gradient_descent\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/en.wikipedia.org\/wiki\/Gradient_descent<\/span><\/a>,\u00a0<span style=\"font-weight: 400;\">Feature Engineering:<\/span><a href=\"https:\/\/cloud.google.com\/ml-engine\/docs\/tensorflow\/data-prep\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/ml-engine\/docs\/tensorflow\/data-prep<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Operationalize_Machine_Learning_Models-6\"><\/span><span style=\"font-weight: 400;\">Domain: Operationalize Machine Learning Models\u00a0<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q14 : You want to launch a Cloud Machine Learning Engine cluster to deploy a deep neural network model built by Tensorflow by data scientists of your company. Reviewing the standard tiers available by Google ML Engine, you could not find a tier that suits the requirements data scientists need for the cluster. Google allows you to specify custom cluster specification.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">Which of the following specifications you are allowed to set?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A. <\/strong>workerCount<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B. <\/strong>parameterServerCount<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C. <\/strong>masterCount<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D. <\/strong>workerMemory<\/span><\/p>\n<p><b>Correct Answers: A and B<\/b><\/p>\n<p><b>Explanation :<\/b><\/p>\n<p><span style=\"font-weight: 400;\">The Custom tier is not a set tier, but rather enables you to use your own cluster specification. When you use this tier, set values to configure your processing cluster according to these guidelines:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">You must set TrainingInput.masterType to specify the type of machine to use for your master node. This is the only required setting. See the machine types described below.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">You may set TrainingInput.<\/span><i><span style=\"font-weight: 400;\">workerCount <\/span><\/i><span style=\"font-weight: 400;\">to specify the number of workers to use. If you specify one or more workers, you must also set TrainingInput.workerType to specify the type of machine to use for your worker nodes.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"2\"><span style=\"font-weight: 400;\">You may set TrainingInput.parameterServerCount to specify the number of parameter servers to use. If you specify one or more parameter servers, you must also set TrainingInput.<\/span><i><span style=\"font-weight: 400;\">parameterServerType <\/span><\/i><span style=\"font-weight: 400;\">to specify the type of machine to use for your parameter servers.<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">From the explanation, specifications can be set from the answers are workerCount &amp; <\/span><span style=\"font-weight: 400;\">parameterServerCount.<\/span><\/p>\n<p><b>Reference: <\/b><span style=\"font-weight: 400;\">Specifying Machine Types or Scale Tiers:<\/span><a href=\"https:\/\/cloud.google.com\/ml-engine\/docs\/tensorflow\/machine-types\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/ml-engine\/docs\/tensorflow\/machine-types<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Design_Data_Processing_Systems-5\"><\/span><span style=\"font-weight: 400;\">Domain: Design Data Processing Systems<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q15 : Your company is using multiple Google Cloud projects. Since maintaining and managing bills for these projects is becoming very complicated with time, management decided to unify the company\u2019s projects into one and migrate all existing resources to a single project.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">One of the projects to be migrated contains several Google Storage buckets with the total estimated file size of 25TB. This data is required to be moved to the newly created project. You need to find a secure and efficient method to migrate data. What Google Cloud product is best for this task?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A. <\/strong>gsutil command<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B. <\/strong>Storage Transfer Service<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C. <\/strong>Appliance Transfer Service<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D. <\/strong>Dataproc<\/span><\/p>\n<p><b>Correct Answer: B<\/b><\/p>\n<p><b>Explanation :<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Storage Transfer Service allows you to quickly import <\/span><i><span style=\"font-weight: 400;\">ONLINE <\/span><\/i><span style=\"font-weight: 400;\">data into Cloud Storage. You can also set up a repeating schedule for transferring data, as well as to transfer data within Cloud Storage, from one bucket to another.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Transfer Appliance is an <\/span><i><span style=\"font-weight: 400;\">OFFLINE, <\/span><\/i><span style=\"font-weight: 400;\">secure, high capacity storage server that you set up in your data center. You fill it with data and ship it to an ingest location where the data is uploaded to Google Cloud Storage.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">So, option B is correct, while option C is incorrect.<\/span><\/p>\n<p><b>Option A is incorrect<\/b><span style=\"font-weight: 400;\">: The gsutil tool is good for programmatic usage by developers and may be useful to copy and move megabytes\/gigabytes of data. It\u2019s not so practical for Terabytes of data. It\u2019s also not a reliable data transfer technique as it is related to the machine\u2019s connectivity with Google Cloud.<\/span><br \/>\n<b>Option D is incorrect: <\/b><span style=\"font-weight: 400;\">Dataproc may help by reading from source buckets and writes into the destination buckets, but this requires data in source buckets to be used by Hadoop\/Apache tools (Partitioned, optimized file formats such as ORC, ..).<\/span><\/p>\n<p><b>References: <\/b><span style=\"font-weight: 400;\">Google Cloud Storage Transfer Service: <\/span><a href=\"https:\/\/cloud.google.com\/storage-transfer\/docs\/\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/storage-transfer\/docs\/<\/span><\/a>,\u00a0<span style=\"font-weight: 400;\">Google Appliance Transfer Service: <\/span><a href=\"https:\/\/cloud.google.com\/transfer-appliance\/\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/transfer-appliance\/<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Design_Data_Processing_Systems-6\"><\/span><span style=\"font-weight: 400;\">Domain: Design Data Processing Systems<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q16 : Your company signed a contract with a retail chain store to handle its data processing applications and tech stack. One of the several applications to be implemented is building an ETL pipeline to ingest the chain store\u2019s daily purchase transaction logs to be processed and stored for analysis and reporting; visualize the chain\u2019s purchase details for the head management.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">Daily transaction logs will be available at 2 am when the day is over and logs are exported to a Google Storage bucket partitioned by date in format (yyyy-mm-dd). Dataflow pipeline should run every day at 3:00 am to ingest and process the logs. Which of the following Google products would help?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A. <\/strong>Cloud Function<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B. <\/strong>Compute Engine<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C. <\/strong>Cloud Scheduler<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D. <\/strong>Kubernetes Engine<\/span><\/p>\n<p><b>Correct Answer: C<\/b><\/p>\n<p><b>Explanation :<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Cloud Scheduler is a fully managed enterprise-grade cron job scheduler. It allows you to schedule any job virtually, including batch, big data jobs, cloud infrastructure operations, and more. You can automate everything, including retries in case of failure to reduce manual toil and intervention. Cloud Scheduler even acts as a single pane of glass, allowing you to manage all <\/span><\/p>\n<p><b>Reference: <\/b><span style=\"font-weight: 400;\">Cloud Scheduler: <\/span><a href=\"https:\/\/cloud.google.com\/scheduler\/\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/scheduler\/<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Build_and_Operationalize_Data_Processing_Systems-4\"><\/span><span style=\"font-weight: 400;\">Domain: Build and Operationalize Data Processing Systems<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q17 : You receive payment transaction logs from e-wallet apps. Transaction logs have a dynamic structure which differs from the e-wallet apps received from. Logs are required to be stored for further security analysis. Transaction logs are critical and it is expected from data storage to have high performance in order to query the required security metrics to be updated in near-real time. Which of the following approaches should you use?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A. <\/strong>Use BigTable as a database with HDD storage to store system logs.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B. <\/strong>Use BigTable as a database with SSD storage to store system logs.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C. <\/strong>Use Datastore as a database to store system logs.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D. <\/strong>Use Firebase as a database to store system logs.<\/span><\/p>\n<p><b>Correct Answer: B<\/b><\/p>\n<p><b>Explanation :<\/b><\/p>\n<p><span style=\"font-weight: 400;\">When you create a Cloud Bigtable instance, you choose whether its clusters store data on solid-state drives (SSD) or hard disk drives (HDD):<\/span><\/p>\n<p><span style=\"font-weight: 400;\">SSD is significantly faster and has more predictable performance than HDD.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">HDD throughput is much more limited than SSD throughput. In a cluster that uses HDD storage, it&#8217;s easy to reach the maximum throughput before CPU usage reaches 100%. To increase throughput, you must add more nodes, but the cost of the additional nodes can easily exceed your savings from using HDD storage. SSD storage does not have this limitation because it offers much more throughput per node.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Individual row reads on HDD are very slow. Because of disk seek time, HDD storage supports only 5% of the read rows per second of SSD storage.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">The cost savings from HDD are minimal, relative to the cost of the nodes in your Cloud Bigtable cluster, unless you&#8217;re storing very large amounts of data.<\/span><\/p>\n<p><b>References: <\/b><span style=\"font-weight: 400;\">Choosing Between SSD and HDD Storage: <\/span><a href=\"https:\/\/cloud.google.com\/bigtable\/docs\/choosing-ssd-hdd\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/bigtable\/docs\/choosing-ssd-hdd<\/span><\/a>, <span style=\"font-weight: 400;\">Querying Cloud Bigtable Data: <\/span><a href=\"https:\/\/cloud.google.com\/bigquery\/external-data-bigtable\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/bigquery\/external-data-bigtable<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Operationalize_Machine_Learning_Models-7\"><\/span><span style=\"font-weight: 400;\">Domain: Operationalize Machine Learning Models<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q18 :<\/span> <span style=\"font-weight: 400;\">You have over 2,000 video clips with dialog scenes and you need to transcribe the dialog to text. Since transcribing this amount of clips can be time-consuming, you want to find a product in Google Cloud which can achieve this instead. Which of the following is that best for this scenario?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A. <\/strong>AutoML Vision API.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B. <\/strong>Cloud Natural Language.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C. <\/strong>Machine Learning Engine.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D. <\/strong>Cloud Speech-to-Text.<\/span><\/p>\n<p><b>Correct Answer: D<\/b><\/p>\n<p><b>Explanation :<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Cloud Speech-to-Text is a service to generate captions from videos by detecting speakers language and speech. Google Cloud Speech-to-Text enables developers to convert audio to text by applying powerful neural network models in an easy-to-use API. The API recognizes 120 languages and variants to support your global user base. You can enable voice command-and-control, transcribe audio from call centers, and more. It can process real-time streaming or prerecorded audio, using Google\u2019s machine learning technology.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option A is incorrect:<\/strong> AutoML Vision API is a service to recognize and derive insights from images by either using pre-trained models or training a custom model based on a set of photographic.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>Option B is incorrect:<\/strong> Cloud natural language service is used to derive insights from unstructured text, revealing the meaning of the documents and categorize articles. It won\u2019t help in extracting captions from videos.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>Option C is incorrect:<\/strong> Machine Learning Engine is a managed service that allows developers and scientists to build their own models and run them in production. This means you have to build your own model to generate text from videos which needs much effort and experience to build such a model. So, it\u2019s not a practical solution for this scenario.<\/span><\/p>\n<p><b>References: <\/b><span style=\"font-weight: 400;\">Google Speech-to-Text API: <\/span><a href=\"https:\/\/cloud.google.com\/speech-to-text\/\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/speech-to-text\/<\/span><\/a>,\u00a0<span style=\"font-weight: 400;\">Google NLP:<\/span><a href=\"https:\/\/cloud.google.com\/natural-language\/\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/natural-language\/<\/span><\/a>,\u00a0<span style=\"font-weight: 400;\">Google Machine Learning Engine:<\/span><a href=\"https:\/\/cloud.google.com\/ml-engine\/\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/ml-engine\/<\/span><\/a>,\u00a0<span style=\"font-weight: 400;\">Google Vision API:<\/span><a href=\"https:\/\/cloud.google.com\/vision\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/vision<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Operationalize_Machine_Learning_Models-8\"><\/span><span style=\"font-weight: 400;\">Domain: Operationalize Machine Learning Models<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q19 : You are building a model using TensorFlow. Upon training the model, the results show that the model could return 73% true positives. When you tested the model with a set derived from real data. You noticed a decrease in true positive returns to 65%. You need to tune the model for better prediction. What would you do?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A. <\/strong>Increase feature parameters<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B. <\/strong>Increase regularization<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C. <\/strong>Decrease feature parameters<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D. <\/strong>Decrease regularization<\/span><\/p>\n<p><b>Correct Answers: B and C<\/b><\/p>\n<p><b>Explanation :<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Overfitting happens when a model performs well on a training set, generating only a small error while giving wrong output for the test set. This happens because the model is only picking up specific features input found in the training set instead of picking out the general features of the given training set.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To solve overfitting, the following would help in improving the model\u2019s quality:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Increase the number of examples, the more data a model is trained with, the more use cases the model can be training on and better improves its predictions.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Tune hyperparameters which are related to number and size of hidden layers (for neural networks), and regularization, which means using techniques to make your model simpler such as using dropout method to remove neuron networks or adding \u201cpenalty\u201d parameters to the cost function.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Remove features by removing irrelevant features. Feature engineering is a wide subject and feature selection is a critical part of building and training a model. Some algorithms have built-in feature selection, but in some cases, data scientists need to cherry-pick or manually select or remove features for debugging and finding the best model output.<\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><span style=\"font-weight: 400;\"> \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 <\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><span style=\"font-weight: 400;\">From the brief explanation, to solve the overfitting problem in the scenario, you need to:<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Increase the training set.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Decrease features parameters.<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Increase regularization.<\/span><\/li>\n<\/ul>\n<p><b>Reference: <\/b><span style=\"font-weight: 400;\">Building a serverless Machine learning model: <\/span><a href=\"https:\/\/cloud.google.com\/solutions\/building-a-\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/solutions\/building-a- serverless-ml-model<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Operationalize_Machine_Learning_Models-9\"><\/span><span style=\"font-weight: 400;\">Domain: Operationalize Machine Learning Models<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q20 : You are building a machine learning model using TensorFlow. The model aims to predict the next earthquake\u2019s locations, approximate time and Richter scale based on data records since 1913. The model needs tuning each number of epochs on training data for higher accuracy. Which of the variables are used for hyperparameter tuning?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A. <\/strong>Number of features<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B. <\/strong>Number of hidden layers<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C. <\/strong>Number of nodes in hidden layers<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D. <\/strong>Weight values<\/span><\/p>\n<p><b>Correct Answers: B and C<\/b><\/p>\n<p><b>Explanation :<\/b><\/p>\n<p><span style=\"font-weight: 400;\">Hyperparameters are the variables that govern the training process itself. For example, part of setting up a deep neural network is deciding <\/span><b>how many hidden layers <\/b><span style=\"font-weight: 400;\">of nodes to use between the input layer and the output layer, and <\/span><b>how many nodes each layer <\/b><span style=\"font-weight: 400;\">should use. These variables are not directly related to the training data but these are configuration variables. Note that parameters change during a training job, while hyperparameters are usually constant during a job.<\/span><\/p>\n<p><b>Option A is incorrect<\/b><span style=\"font-weight: 400;\">: Numbers of features is set by feature engineering, not hyperparameter tuning.<\/span><br \/>\n<b>Option D is incorrect: <\/b><span style=\"font-weight: 400;\">Weight values are set while training the model.<\/span><\/p>\n<p><b>Reference: <\/b><span style=\"font-weight: 400;\">Hyperparameter Tuning: <\/span><a href=\"https:\/\/cloud.google.com\/ml-engine\/docs\/tensorflow\/hyperparameter-tuning-overview\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/ml-engine\/docs\/tensorflow\/hyperparameter-tuning-overview<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h4><em><span style=\"font-weight: 400;\">Q21 : Choose all statement(s) which is\/are correct for cloud pub\/sub.<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A. <\/strong>Cloud PubSub has default retention period of 7 days.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B. <\/strong>The retention period for cloud pubsub is configurable and can be configured to a maximum of 28 days and a minimum of 10 minutes.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C. <\/strong>The retention period for cloud pubsub is configurable and can be configured to a maximum of 7 days and a minimum of 10 minutes.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D. <\/strong>Retention period for cloud pub\/sub is not configurable and is set to 7 days by default.<\/span><\/p>\n<p><b>Correct Answers: A and C<\/b><\/p>\n<p><b>Explanation :<\/b><\/p>\n<p><b>Option A is correct<\/b><span style=\"font-weight: 400;\">. Cloud pub\/sub has default retention period of 7 days.<\/span><br \/>\n<b>Option B is incorrect.<\/b><span style=\"font-weight: 400;\"> Cloud pub\/sub retention period is configurable and can be configured to a maximum of 7 days and a minimum of 10 minutes.<\/span><br \/>\n<b>Option C is correct<\/b><span style=\"font-weight: 400;\">. Cloud pub\/sub retention period is configurable and can be configured to a maximum of 7 days and a minimum of 10 minutes<\/span><br \/>\n<b>Option D is incorrect<\/b><span style=\"font-weight: 400;\">. Cloud pub\/sub retention period is configurable.<\/span><\/p>\n<p><b><img decoding=\"async\" class=\"aligncenter size-full wp-image-81522\" src=\"https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de21.png\" alt=\"\" width=\"1099\" height=\"369\" srcset=\"https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de21.png 1099w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de21-300x101.png 300w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de21-1024x344.png 1024w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de21-768x258.png 768w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de21-640x215.png 640w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de21-681x229.png 681w\" sizes=\"(max-width: 1099px) 100vw, 1099px\" \/>Reference: <\/b><a href=\"https:\/\/cloud.google.com\/pubsub\/docs\/subscriber\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/pubsub\/docs\/subscriber<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h4><em><span style=\"font-weight: 400;\">Q22 : You receive bank transactions data through Cloud Pub\/Sub and you need to analyze the data using Cloud data flow. The transactions are in the below format:<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">2INDEL3465,\u00a0 JACK, 34627,DOLLAR,20191205234251000,D<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">1USCHG5627, SAM, 1276, DOLLAR, 20191205234252562,C<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">Currently the requirement is to extract customer name from the transaction and store the results in an output PCollection. Select the operation which is best suited for this processing.<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A. <\/strong>Regex.find<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B. <\/strong>Pardo<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C. <\/strong>Extract<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D. <\/strong>Transform<\/span><\/p>\n<p><b>Correct Answer: B<\/b><\/p>\n<p><b>Explanation :<\/b><\/p>\n<p><b>Option A is incorrect<\/b><span style=\"font-weight: 400;\">. Regex.find will output Regex group containing all the lines that matches the regex. In this case we need the customer name to be extracted and placed into another PCollection for further processing.<\/span><br \/>\n<b>Option B is correct.<\/b><span style=\"font-weight: 400;\"> As ParDo helps in extracting parts from elements. We can use ParDo for filtering a dataset. ParDo can be used to consider each element in PCollection and either output that element to a new collection or discard it.<\/span><br \/>\n<b>Option C is incorrect. <\/b><span style=\"font-weight: 400;\">Extract option does not exist in Cloud Dataflow.<\/span><br \/>\n<b>Option D is incorrect<\/b><span style=\"font-weight: 400;\">. Transform is a step in your pipeline and it represents data processing operation.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h4><em><span style=\"font-weight: 400;\">Q23 : You have some Hadoop jobs on-premise which the management of the company has decided to bring to Google Cloud Dataproc. Few of the jobs will still be running from on-premise hadoop cluster while other will be running from Cloud Dataproc. You need to orchestrate these jobs and add required dependencies between your on-premises and dataproc jobs. However, the company doesn&#8217;t want any vendor lock-in and can move to AWS as well in future. The orchestration framework should be chosen which can accept all these future changes as well. Select the best possible option provided by GCP with very little overhead.<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A. <\/strong>Cloud Composer<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B. <\/strong>Apache Airflow<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C. <\/strong>Apache Oozie<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D. <\/strong>Cloud Scheduler<\/span><\/p>\n<p><b>Correct Answer: A<\/b><\/p>\n<p><b>Explanation :<\/b><\/p>\n<p><b>Option A is correct.<\/b><span style=\"font-weight: 400;\"> Cloud Composer allows you to pull workflows together from wherever they live, supporting a fully-functioning and connected cloud environment. Since Cloud Composer is built on<\/span><a href=\"https:\/\/cloud.google.com\/blog\/products\/data-analytics\/using-upstream-apache-airflow-hooks-and-operators-in-cloud-composer\" target=\"_blank\" rel=\"nofollow noopener\"> <span style=\"font-weight: 400;\">Apache Airflow<\/span><\/a> <span style=\"font-weight: 400;\">\u2013 an open-source technology \u2013 it provides freedom from vendor lock-in as well as integration with a wide variety of platforms. We can connect to on-premise database from Cloud Composer. For connecting to on-premise database refer below link: <\/span><a href=\"https:\/\/www.progress.com\/tutorials\/cloud-and-hybrid\/connect-to-on-premises-databases-from-google-composer\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/www.progress.com\/tutorials\/cloud-and-hybrid\/connect-to-on-premises-databases-f rom-google-composer<\/span><\/a><br \/>\n<b>Option B is incorrect.<\/b><span style=\"font-weight: 400;\"> As per question GCP service is required, Cloud Composer is built on top of Apache Airflow. Cloud Composer should be correct answer.<\/span><br \/>\n<b>Option C is incorrect<\/b><span style=\"font-weight: 400;\">. Orchestration job using Apache Oozie does not support connecting to on-premises and dataproc at the same time. Also, Oozie can only be run with Hadoop. <\/span><span style=\"font-weight: 400;\">There is no managed service for Oozie in GCP.<\/span><br \/>\n<b>Option D is incorrect<\/b><span style=\"font-weight: 400;\">. Cloud Scheduler is fully managed cron job scheduler. In this case, we need orchestration framework where dependencies between jobs will be present hence Cloud Scheduler is not the correct answer.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Design_Data_Processing_Systems-7\"><\/span><span style=\"font-weight: 400;\">Domain: Design Data Processing Systems<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q24 : A regional auto dealership is migrating its business applications to Google Cloud. The CTO of this company asked their data engineer to find the possible ways you can ingest data into BigQuery?<\/span><\/em><\/h4>\n<ol>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Batch Ingestion &amp; Streaming Ingestion<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Data Transfer Service<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Query Materialization<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Partner Integrations<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">All of the above<\/span><\/li>\n<\/ol>\n<p><b>Correct Answer: E<\/b><\/p>\n<p><b>Explanation :<\/b><\/p>\n<p><b>Option E is CORRECT<\/b><span style=\"font-weight: 400;\"> because these all are the possible ways to load data into BigQuery. Lets understand these one by one &#8211;<\/span><br \/>\n<span style=\"font-weight: 400;\">Batch Ingestion &#8211; This involves ingesting large, bounded datasets that don\u2019t have to be processed in real-time.<\/span><br \/>\n<span style=\"font-weight: 400;\">To implement batch ingestion, one can use Google Cloud Storage, Cloud Dataflow, Cloud Dataproc, Cloud Data Fusion etc.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Stream Ingestion\u00a0 &#8211; This involves ingesting large, unbounded data that is processed in real-time.<\/span><br \/>\n<span style=\"font-weight: 400;\">To implement Stream Ingestion, one can use Apache Kafka to BQ connector, Cloud Pub\/Sub, Cloud Dataflow, Cloud Dataproc etc.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Data Transfer Service (DTS) &#8211; This is a fully managed service to load data from external cloud storage providers such as Amazon S3, Google SaaS applications such as Google Ads, and transferring data from data warehouse technologies such as Teradata etc.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Partner Integrations &#8211; These are the data integration alternatives from Google Cloud Partners. <\/span><span style=\"font-weight: 400;\">This includes, Confluent, Informatica, snapLogic, Talend and many more.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Query Materialization &#8211; This is the best way to simplify extract, transform and load patterns in BigQuery. Using federated queries in BigQuery, one can persist their analysis results in Big Query to derive any insights.<\/span><br \/>\n<span style=\"font-weight: 400;\">Please find the image attached below having details about all the possible ways to ingest data into BigQuery.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><img decoding=\"async\" class=\"aligncenter wp-image-81523 size-full\" title=\"Google Cloud Loading Data into BigQuery\" src=\"https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de24.png\" alt=\"Google Cloud Loading Data into BigQuery\" width=\"2048\" height=\"1118\" srcset=\"https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de24.png 2048w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de24-300x164.png 300w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de24-1024x559.png 1024w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de24-768x419.png 768w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de24-1536x839.png 1536w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de24-769x420.png 769w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de24-640x350.png 640w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/de24-681x372.png 681w\" sizes=\"(max-width: 2048px) 100vw, 2048px\" \/>For more information on the <\/span><b>Big Query<\/b><span style=\"font-weight: 400;\">, please visit the below URL: <\/span><a href=\"https:\/\/cloud.google.com\/blog\/topics\/developers-practitioners\/bigquery-explained-data-ingestion\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/blog\/topics\/developers-practitioners\/bigquery-explained-data-ingestion<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Design_Data_Processing_Systems-8\"><\/span><span style=\"font-weight: 400;\">Domain: Design Data Processing Systems<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q25 : Your organization is looking for a fully managed, cloud-native data integration service. Their engineers are very familiar with CDAP (Cask Data Application Platform).<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">Which managed service in Google Cloud would you recommend?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A. <\/strong>Cloud Dataproc<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B. <\/strong>Cloud Composer<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C. <\/strong>Cloud Dataflow<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D. <\/strong>Cloud Data Fusion<\/span><\/p>\n<p><b>Correct Answer: D<\/b><\/p>\n<p><b>Explanation :<\/b><\/p>\n<p><b>Option D\u00a0 is CORRECT<\/b><span style=\"font-weight: 400;\"> because Cloud Data Fusion is built with an open-source core (CDAP) for pipeline portability.<\/span><br \/>\n<span style=\"font-weight: 400;\">This also provides end-to-end data lineage for root cause analysis.<\/span><br \/>\n<b>Option A is incorrect<\/b><span style=\"font-weight: 400;\"> because Cloud Dataproc is a fully managed and highly scalable service for running Apache Flink, Apache Spark, and other applications.<\/span><br \/>\n<b>Option B is incorrect<\/b><span style=\"font-weight: 400;\"> because Cloud Composer is a fully managed data-workflow orchestration service.<\/span><br \/>\n<b>Option C is incorrect<\/b><span style=\"font-weight: 400;\"> because Cloud Dataflow is a fully managed streaming analytics service.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">For more information on the <\/span><b>Google Cloud<\/b> <b>Data fusion<\/b><span style=\"font-weight: 400;\">, please visit the below URL: <\/span><a href=\"https:\/\/cloud.google.com\/data-fusion\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/data-fusion<\/span><\/a><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Maintaining_and_Automating_Data_Workloads\"><\/span>Domain<span style=\"font-weight: 400;\">: Maintaining and Automating Data Workloads<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q26: A company needs to process large volumes of data for business-critical processes while minimizing costs. They are considering using Dataproc for their data processing needs. What should they consider when deciding between persistent or job-based data clusters in Dataproc?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\">A) Persistent clusters offer fixed capacity and are suitable for continuous, long-running processes, while job-based clusters are more cost-effective for short-lived, ad-hoc tasks.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">B) Job-based clusters provide greater flexibility and cost efficiency for intermittent workloads, while persistent clusters are ideal for predictable, ongoing data processing tasks.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">C) Persistent clusters are recommended for batch processing tasks due to their ability to scale resources dynamically, while job-based clusters are better suited for interactive query jobs.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">D) Job-based clusters offer better fault tolerance and error recovery mechanisms, making them suitable for mission-critical data processes, while persistent clusters are more cost-effective for one-time data transformations.<\/span><\/p>\n<p><b>Answer:<\/b><span style=\"font-weight: 400;\"> B<\/span><\/p>\n<p><b>Explanation:<\/b><\/p>\n<p><strong>Option B is CORRECT<\/strong><span style=\"font-weight: 400;\"> as job-based clusters in Dataproc provide flexibility and cost efficiency for intermittent workloads by provisioning resources only when needed, while persistent clusters are better suited for predictable, ongoing data processing tasks where resources are continuously required.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option A is incorrect<\/strong> because persistent clusters may not be the most cost-effective option for short-lived, ad-hoc tasks due to their continuous resource allocation.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option C is incorrect<\/strong> because persistent clusters are not specifically recommended for batch processing tasks, and job-based clusters can also handle batch processing effectively.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option D is incorrect<\/strong> because fault tolerance and error recovery mechanisms are not distinguishing factors between persistent and job-based clusters in Dataproc. Both types of clusters offer fault tolerance capabilities.<\/span><\/p>\n<p><b>Reference Link<\/b><span style=\"font-weight: 400;\">: <\/span><a href=\"https:\/\/cloud.google.com\/dataproc\/docs\/concepts\/compute\/clusters-datalab-flex\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/dataproc\/docs\/concepts\/compute\/clusters-datalab-flex<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Maintaining_and_Automating_Data_Workloads-2\"><\/span>Domain:<span style=\"font-weight: 400;\"> Maintaining and Automating Data Workloads<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q27: An organization wants to automate its data processing workflows using Cloud Composer. They need to schedule jobs in a repeatable manner to ensure the timely execution of critical tasks. What approach should they take to achieve this goal effectively?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\">A) Utilize Cloud Functions to trigger workflows based on predefined schedules and dependencies.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">B) Implement directed acyclic graphs (DAGs) in Cloud Composer to define workflow dependencies and schedule job execution.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">C) Use Cloud Scheduler to define and manage job schedules, and then trigger workflow execution in Cloud Composer.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">D) Leverage Cloud Tasks to create and manage task queues for scheduling and orchestrating data processing jobs in Cloud Composer.<\/span><\/p>\n<p><b>Answer:<\/b><span style=\"font-weight: 400;\"> B<\/span><\/p>\n<p><b>Explanation:<\/b><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option B is<\/strong> <\/span><b>CORRECT<\/b><span style=\"font-weight: 400;\"> as directed acyclic graphs (DAGs) in Cloud Composer allow organizations to define workflow dependencies and schedule job execution in a repeatable and reliable manner, ensuring timely execution of critical tasks.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option A is incorrect<\/strong> because while Cloud Functions can trigger workflows based on schedules and dependencies, they do not provide the same level of orchestration and scheduling capabilities as Cloud Composer DAGs.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option C is incorrect<\/strong> because Cloud Scheduler is primarily used for managing job schedules, but it does not offer workflow orchestration capabilities like Cloud Composer DAGs.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option D is incorrect<\/strong> because Cloud Tasks is more suitable for managing task queues and asynchronous task execution, rather than scheduling and orchestrating data processing workflows.<\/span><\/p>\n<p><b>Reference Link:<\/b> <a href=\"https:\/\/cloud.google.com\/composer\/docs\/concepts\/dags\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/composer\/docs\/concepts\/dags<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Maintaining_and_Automating_Data_Workloads-3\"><\/span>Domain:<span style=\"font-weight: 400;\"> Maintaining and Automating Data Workloads<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q28: Your organization needs to organize data workloads based on business requirements, considering factors like flexibility, capacity, and pricing models. Which pricing model offers fixed capacity and is best suited for predictable, steady workloads?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\">A) Flex slot pricing<\/span><\/p>\n<p><span style=\"font-weight: 400;\">B) On-demand pricing<\/span><\/p>\n<p><span style=\"font-weight: 400;\">C) Flat-rate slot pricing<\/span><\/p>\n<p><span style=\"font-weight: 400;\">D) Pay-as-you-go pricing<\/span><\/p>\n<p><b>Answer<\/b><span style=\"font-weight: 400;\">: C<\/span><\/p>\n<p><b>Explanation:<\/b><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option C is<\/strong> <\/span><b>CORRECT<\/b><span style=\"font-weight: 400;\"> as flat-rate slot pricing offers fixed capacity for predictable workloads. It allows users to pay a flat rate for a set number of slots, regardless of usage, providing stability in pricing for steady workloads.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option A is incorrect<\/strong> because flex slot pricing offers flexibility and dynamically allocates slots based on demand, making it suitable for fluctuating workloads.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option B is incorrect<\/strong> because on-demand pricing charges users based on usage, without any fixed commitments, making it suitable for sporadic or unpredictable workloads.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option D is incorrect<\/strong> because pay-as-you-go pricing is similar to on-demand pricing and charges users based on actual usage, without fixed commitments.<\/span><\/p>\n<p><b>Reference Link:<\/b> <a href=\"https:\/\/cloud.google.com\/bigquery\/pricing#flat-rate-pricing\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/bigquery\/pricing#flat-rate-pricing<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Maintaining_and_Automating_Data_Workloads-4\"><\/span>Domain:<span style=\"font-weight: 400;\"> Maintaining and Automating Data Workloads<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q29: Your organization needs to ensure the observability of data processes, monitor planned usage, and troubleshoot error messages and billing issues effectively. Which Google Cloud service provides centralized logging and monitoring capabilities for data processes?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\">A) Google Cloud Monitoring<\/span><\/p>\n<p><span style=\"font-weight: 400;\">B) Google Cloud Logging<\/span><\/p>\n<p><span style=\"font-weight: 400;\">C) Google Cloud Trace<\/span><\/p>\n<p><span style=\"font-weight: 400;\">D) Google Cloud Audit Logging<\/span><\/p>\n<p><b>Answer:<\/b><span style=\"font-weight: 400;\"> A<\/span><\/p>\n<p><b>Explanation:<\/b><\/p>\n<p><strong>Option A is CORRECT<\/strong><span style=\"font-weight: 400;\"> as Google Cloud Monitoring provides centralized logging and monitoring capabilities for Google Cloud services, including data processes such as Dataproc and Dataflow. It enables users to collect, view, and analyze metrics, logs, and other monitoring data across their Google Cloud environment.<\/span><\/p>\n<p><strong>Option B<\/strong><span style=\"font-weight: 400;\"><strong> is incorrect<\/strong> because while Google Cloud Logging offers centralized logging capabilities, it does not provide comprehensive monitoring functionalities for data processes such as planned usage monitoring and error troubleshooting.<\/span><\/p>\n<p><strong>Option C <\/strong><span style=\"font-weight: 400;\"><strong>is incorrect<\/strong> because Google Cloud Trace is primarily focused on distributed application tracing, providing insights into application performance, rather than centralized logging and monitoring for data processes.<\/span><\/p>\n<p><strong>Option D<\/strong><span style=\"font-weight: 400;\"><strong> is incorrect<\/strong> because Google Cloud Audit Logging is specifically designed for tracking and logging user access and system activity within Google Cloud Platform services, not for monitoring data processes.<\/span><\/p>\n<p><b>Reference Link:<\/b> <a href=\"https:\/\/cloud.google.com\/monitoring\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/monitoring<\/span><\/a><\/p>\n<h3><\/h3>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Maintaining_and_Automating_Data_Workloads-5\"><\/span>Domain: <span style=\"font-weight: 400;\">Maintaining and Automating Data Workloads<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em>Q30: <span style=\"font-weight: 400;\">Your organization operates critical data processes in a Google Cloud environment. You need to decide between persistent or job-based data clusters for processing large-scale data workloads efficiently. What should you consider when making this decision?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\">A) Evaluate the cost-effectiveness of persistent clusters based on long-term utilization trends.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">B) Implement job-based clusters to ensure scalability and resource optimization for irregular workloads.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">C) Analyze the availability of resources in persistent clusters to ensure uninterrupted data processing.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">D) Use persistent clusters to avoid the overhead of cluster initialization and termination.<\/span><\/p>\n<p><b>Answer:<\/b><span style=\"font-weight: 400;\"> B<\/span><\/p>\n<p><b>Explanation:<\/b><\/p>\n<p><strong>Option B is CORRECT<\/strong><span style=\"font-weight: 400;\"> as implementing job-based clusters allows for scalability and resource optimization, especially for irregular workloads, ensuring efficient utilization of resources without incurring unnecessary costs.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option A is incorrect<\/strong> because while evaluating the cost-effectiveness of persistent clusters is important, it may not address the scalability and optimization needs for sporadic workloads.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option C is incorrect<\/strong> because analyzing resource availability in persistent clusters may ensure uninterrupted processing but may not offer the flexibility needed for varying workloads.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option D is incorrect<\/strong> because opting for persistent clusters to avoid initialization and termination overhead may lead to underutilization of resources and increased costs for sporadic workloads.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Reference Link: <\/span><a href=\"https:\/\/cloud.google.com\/dataproc\/docs\/concepts\/compute\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/dataproc\/docs\/concepts\/compute<\/span><\/a><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Preparing_and_Using_Data_for_Analysis\"><\/span><b>Domain:<\/b><span style=\"font-weight: 400;\"> Preparing and Using Data for Analysis<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em>Q31:<\/em><b> <\/b><span style=\"font-weight: 400;\">A prominent e-commerce platform is revamping its data visualization dashboard to display real-time sales analytics. The dashboard is expected to handle a substantial number of concurrent users and deliver quick visualizations with minimal latency. Which approach should they take to optimize the dashboard&#8217;s performance?<\/span><\/h4>\n<ol>\n<li><span style=\"font-weight: 400;\">A) Employ BigQuery BI Engine with precomputed materialized views.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">B) Implement BigQuery BI Engine with virtualized logical views.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">C) Utilize BigQuery BI Engine with real-time streaming data integration.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">D) Integrate BigQuery BI Engine with access-controlled authorized views.<\/span><\/li>\n<\/ol>\n<p><b>Answer:<\/b><span style=\"font-weight: 400;\"> A<\/span><\/p>\n<p><b>Explanation:<\/b><\/p>\n<p><strong>Option A is CORRECT<\/strong><span style=\"font-weight: 400;\"> because leveraging BigQuery BI Engine with precomputed materialized views allows for quick access to aggregated data, resulting in faster visualization rendering and minimal latency, which aligns with the requirements of the e-commerce platform&#8217;s data visualization dashboard.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option B is incorrect<\/strong> because virtualized logical views may introduce additional processing overhead and latency, which could impact the performance of the dashboard, especially under high concurrent user loads.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option C is incorrect<\/strong> because while real-time streaming data integration can provide up-to-date insights, it may not offer the same level of performance optimization as precomputed materialized views for handling large volumes of concurrent user requests.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option D is incorrect<\/strong> because access-controlled authorized views focus on security rather than performance optimization for data visualization dashboards.<\/span><\/p>\n<p><b>References:<\/b> <a href=\"https:\/\/cloud.google.com\/bigquery\/docs\/materialized-views-intro\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/bigquery\/docs\/materialized-views-intro<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Preparing_and_Using_Data_for_Analysis-2\"><\/span><b>Domain: <\/b><span style=\"font-weight: 400;\">Preparing and Using Data for Analysis\u00a0<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em>Q32<\/em><b>: <\/b><span style=\"font-weight: 400;\">Your organization aims to consolidate and analyze a vast amount of genomic sequencing data, totaling over 20 TB, from various sources. The data must be stored in new tables for further query and analytics, accessible via SQL, with a low-maintenance architecture. What is the most cost-effective solution to support data analytics for such large datasets?<\/span><\/h4>\n<ol>\n<li><span style=\"font-weight: 400;\">A) Implement Cloud SQL, organize the data into tables, and utilize JOIN operations in SQL queries to retrieve data.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">B) Use BigQuery as a data warehouse solution, configuring output destinations for caching large query results<\/span><span style=\"font-weight: 400;\">.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">C) Deploy a MySQL cluster on a Compute Engine managed instance group for scalability and SQL access.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">D) Utilize Cloud Spanner to replicate the data across regions and normalize it in a series of tables.<\/span><\/li>\n<\/ol>\n<p><b>Answer:<\/b><span style=\"font-weight: 400;\"> B<\/span><\/p>\n<p><b>Explanation:<\/b><\/p>\n<p><strong>Option B is CORRECT<\/strong><span style=\"font-weight: 400;\"> because using BigQuery as a data warehouse solution offers scalability, low maintenance, and efficient handling of large datasets for data analytics. Setting output destinations for caching large query results, ensures quick access to previously processed data, aligning with the organization&#8217;s requirements.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option A is incorrect<\/strong> because while Cloud SQL offers SQL access and table organization, it may not be as scalable and cost-effective for analyzing large datasets compared to BigQuery.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option C is incorrect<\/strong> because managing a MySQL cluster on Compute Engine instances may require more maintenance and management overhead, and it may not provide the same level of scalability and cost-effectiveness as BigQuery for analyzing large datasets.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option D is incorrect<\/strong> because while Cloud Spanner offers scalability and replication features, it may not be optimized for data analytics and querying large datasets compared to BigQuery.<\/span><\/p>\n<p><b>References:<\/b> <a href=\"https:\/\/cloud.google.com\/bigquery\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/bigquery<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Preparing_and_using_data_for_analysis\"><\/span><b>Domain:<\/b><span style=\"font-weight: 400;\"> Preparing and using data for analysis\u00a0<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em>Q33: <span style=\"font-weight: 400;\">Your organization utilizes a dataset in BigQuery for extensive analysis. Now, you intend to grant access to the same dataset for third-party companies while keeping data sharing costs low and ensuring data currency. Which solution should you choose?<\/span><\/em><\/h4>\n<ol>\n<li><span style=\"font-weight: 400;\">A) Utilize Analytics Hub to manage data access and provide third-party companies with access to the dataset.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">B) Implement Cloud Scheduler to regularly export the data to Cloud Storage and grant third-party companies access to the bucket.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">C) Create a separate dataset in BigQuery containing the relevant data for sharing and grant access to third-party companies for the new dataset.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">D) Develop a Dataflow job to periodically read the data and write it to the appropriate BigQuery dataset or Cloud Storage bucket for third-party usage.<\/span><\/li>\n<\/ol>\n<p><b>Answer:<\/b><span style=\"font-weight: 400;\"> A<\/span><\/p>\n<p><b>Explanation:<\/b><\/p>\n<p><strong>Option A is CORRECT<\/strong><span style=\"font-weight: 400;\"> because leveraging Analytics Hub allows centralized management of data access, ensuring security and control while providing third-party companies with access to the dataset. This solution helps maintain low data-sharing costs and ensures data currency.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option B is incorrect<\/strong> because while using Cloud Scheduler for data export may provide regular updates, it may not offer the same level of control and access management as Analytics Hub for third-party companies.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option C is incorrect<\/strong> because creating a separate dataset could lead to redundancy and increased management overhead. It may also not provide the necessary control and access management features offered by Analytics Hub.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option D is incorrect<\/strong> because although Dataflow can automate data movement, it may not be the most efficient solution for managing data access and ensuring currency for third-party companies.<\/span><\/p>\n<p><b>References:<\/b> <a href=\"https:\/\/cloud.google.com\/analytics-hub\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/analytics-hub<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Preparing_and_using_data_for_analysis-2\"><\/span><b>Domain:<\/b><span style=\"font-weight: 400;\"> Preparing and using data for analysis<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em>Q34: <span style=\"font-weight: 400;\">Your team is developing an application on Google Cloud aimed at automatically generating subject labels for users&#8217; blog posts. Due to competitive pressure and limited developer resources, you need to implement this feature quickly, with no prior experience in machine learning. What approach should you take?<\/span><\/em><\/h4>\n<ol>\n<li><span style=\"font-weight: 400;\">A) Integrate the Cloud Natural Language API into your application and process the generated Entity Analysis results as labels.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">B) Utilize the Cloud Natural Language API within your application and process the generated Sentiment Analysis as labels.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">C) Develop and train a text classification model using TensorFlow, deploy it using Cloud Machine Learning Engine, and call the model from your application to process the results as labels.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">D) Create and train a text classification model using TensorFlow, deploy it using a Kubernetes Engine cluster, and call the model from your application to process the results as labels.<\/span><\/li>\n<\/ol>\n<p><b>Answer:<\/b><span style=\"font-weight: 400;\"> A<\/span><\/p>\n<p><b>Explanation:<\/b><\/p>\n<p><strong>Option A is CORRECT<\/strong><span style=\"font-weight: 400;\"> because leveraging the Cloud Natural Language API allows for quick implementation of subject labels without the need for machine learning expertise. By processing the generated Entity Analysis results, the application can efficiently generate subject labels for blog posts.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option B is incorrect<\/strong> because while Sentiment Analysis could provide insights into the sentiment of blog posts, it may not be suitable for generating subject labels.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option C is incorrect<\/strong> because building and deploying a custom text classification model using TensorFlow and Cloud Machine Learning Engine would require significant time and expertise, which contradicts the requirement for a quick implementation.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option D is incorrect<\/strong> because deploying a TensorFlow model using a Kubernetes Engine cluster would also require considerable effort and resources, making it unsuitable for the scenario described.<\/span><\/p>\n<p><b>References:<\/b> <a href=\"https:\/\/cloud.google.com\/natural-language\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/natural-language<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Preparing_and_Using_Data_for_Analysis-3\"><\/span><b>Domain:<\/b><span style=\"font-weight: 400;\"> Preparing and Using Data for Analysis\u00a0<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><em><b>Q35: <\/b><span style=\"font-weight: 400;\">Your company has onboarded a new data scientist who needs to conduct complex analyses across extensive datasets stored in Google Cloud Storage and a Cassandra cluster on Google Compute Engine. The data scientist aims to create labeled datasets for machine learning projects and perform visualization tasks. However, she finds her current laptop insufficient for these tasks, causing significant slowdowns. What solution should you provide to facilitate her work?<\/span><\/em><\/p>\n<ol>\n<li><span style=\"font-weight: 400;\">A) Install a local instance of Jupyter Notebook on the data scientist&#8217;s laptop.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">B) Grant the data scientist access to Google Cloud Shell for performing analyses and visualization tasks.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">C) Deploy a visualization tool on a virtual machine (VM) hosted on Google Compute Engine for the data scientist&#8217;s use.<\/span><\/li>\n<li><span style=\"font-weight: 400;\">D) Deploy Google Cloud Datalab to a virtual machine (VM) on Google Compute Engine to enable the data scientist to perform analyses and visualizations efficiently.<\/span><\/li>\n<\/ol>\n<p><b>Answer<\/b><span style=\"font-weight: 400;\">: D<\/span><\/p>\n<p><b>Explanation:<\/b><\/p>\n<p><strong>Option D is CORRECT<\/strong><span style=\"font-weight: 400;\"> because Google Cloud Datalab provides a powerful and interactive toolset tailored for data exploration, analysis, and visualization. Deploying Datalab to a virtual machine (VM) on Google Compute Engine ensures that the data scientist has access to the necessary computational resources and tools to handle large datasets efficiently.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option A is incorrect<\/strong> because installing a local instance of Jupyter Notebook on the data scientist&#8217;s laptop may not address the performance limitations caused by the large datasets.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option B is incorrect<\/strong> because while Google Cloud Shell provides command-line access to Google Cloud Platform resources, it may not offer the computational power required for handling extensive datasets and performing complex analyses.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Option C is incorrect<\/strong> because hosting a visualization tool on a VM on Google Compute Engine does not address the data scientist&#8217;s need for a comprehensive data exploration and analysis environment.<\/span><\/p>\n<p><b>References:<\/b> <a href=\"https:\/\/cloud.google.com\/datalab\" target=\"_blank\" rel=\"nofollow noopener\"><span style=\"font-weight: 400;\">https:\/\/cloud.google.com\/datalab<\/span><\/a><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Summary\"><\/span>Summary<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">We hope you were able to get insights on the certification and feel a little more confident now. For more elaborations and practice follow our exam-ready Practice Tests. Keep learning until you feel confident enough to take up the actual exam. Preparation is the key tor passing the Google Cloud Professional Data Engineer Certification Exam.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Are you looking for the Google Cloud Professional Data Engineer Exam Questions? The questions and answers provided here test and enhance your knowledge of the exam objectives. A professional Data Engineer collects, transforms, and publishes the data, thereby enabling data-driven decision making. Earning a Google Cloud Certified Professional Data Engineer certification may help you in pursuing a better career in the Google cloud industry. To pass the actual exam, you have to spend more time on learning &amp; re-learning through multiple practice tests. Let&#8217;s start learning! Domain: Design Data Processing Systems Q1 : A company is migrating its current infrastructure [&hellip;]<\/p>\n","protected":false},"author":12,"featured_media":81541,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_uag_custom_page_level_css":"","site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"default","adv-header-id-meta":"","stick-header-meta":"default","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"set","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[10,12],"tags":[801],"class_list":["post-81518","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cloud-computing-certifications","category-google-cloud","tag-google-cloud-data-engineer-free-test"],"uagb_featured_image_src":{"full":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-Google-Cloud-Certified-Professional-Data-Engineer-Certification-Exam.png",600,315,false],"thumbnail":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-Google-Cloud-Certified-Professional-Data-Engineer-Certification-Exam-150x150.png",150,150,true],"medium":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-Google-Cloud-Certified-Professional-Data-Engineer-Certification-Exam-300x158.png",300,158,true],"medium_large":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-Google-Cloud-Certified-Professional-Data-Engineer-Certification-Exam.png",600,315,false],"large":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-Google-Cloud-Certified-Professional-Data-Engineer-Certification-Exam.png",600,315,false],"1536x1536":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-Google-Cloud-Certified-Professional-Data-Engineer-Certification-Exam.png",600,315,false],"2048x2048":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-Google-Cloud-Certified-Professional-Data-Engineer-Certification-Exam.png",600,315,false],"profile_24":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-Google-Cloud-Certified-Professional-Data-Engineer-Certification-Exam.png",24,13,false],"profile_48":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-Google-Cloud-Certified-Professional-Data-Engineer-Certification-Exam.png",48,25,false],"profile_96":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-Google-Cloud-Certified-Professional-Data-Engineer-Certification-Exam.png",96,50,false],"profile_150":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-Google-Cloud-Certified-Professional-Data-Engineer-Certification-Exam.png",150,79,false],"profile_300":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-Google-Cloud-Certified-Professional-Data-Engineer-Certification-Exam.png",300,158,false],"tptn_thumbnail":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-Google-Cloud-Certified-Professional-Data-Engineer-Certification-Exam-250x250.png",250,250,true],"web-stories-poster-portrait":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-Google-Cloud-Certified-Professional-Data-Engineer-Certification-Exam.png",600,315,false],"web-stories-publisher-logo":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-Google-Cloud-Certified-Professional-Data-Engineer-Certification-Exam.png",96,50,false],"web-stories-thumbnail":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-Google-Cloud-Certified-Professional-Data-Engineer-Certification-Exam.png",150,79,false]},"uagb_author_info":{"display_name":"Krishna Srinivasan","author_link":"https:\/\/www.whizlabs.com\/blog\/author\/krishna\/"},"uagb_comment_info":15,"uagb_excerpt":"Are you looking for the Google Cloud Professional Data Engineer Exam Questions? The questions and answers provided here test and enhance your knowledge of the exam objectives. A professional Data Engineer collects, transforms, and publishes the data, thereby enabling data-driven decision making. Earning a Google Cloud Certified Professional Data Engineer certification may help you in&hellip;","_links":{"self":[{"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/posts\/81518","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/users\/12"}],"replies":[{"embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/comments?post=81518"}],"version-history":[{"count":11,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/posts\/81518\/revisions"}],"predecessor-version":[{"id":94873,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/posts\/81518\/revisions\/94873"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/media\/81541"}],"wp:attachment":[{"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/media?parent=81518"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/categories?post=81518"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/tags?post=81518"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}