{"id":81515,"date":"2022-03-16T22:20:14","date_gmt":"2022-03-17T03:50:14","guid":{"rendered":"https:\/\/www.whizlabs.com\/blog\/?p=81515"},"modified":"2024-04-18T17:15:28","modified_gmt":"2024-04-18T11:45:28","slug":"aws-certified-machine-learning-specialty-free-questions","status":"publish","type":"post","link":"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/","title":{"rendered":"Free Questions on AWS Certified Machine Learning Specialty Certification Exam"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Did you come here looking for the FREE Questions and Answers on the <\/span><a href=\"https:\/\/www.whizlabs.com\/aws-certified-machine-learning-specialty\/\"><strong>AWS Certified Machine Learning Specialty<\/strong><\/a><span style=\"font-weight: 400;\"> certification? Find them here.<\/span><\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_76 ez-toc-wrap-left counter-hierarchy ez-toc-counter ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Table of Contents<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #ea7e02;color:#ea7e02\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #ea7e02;color:#ea7e02\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 ' ><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#What_to_expect_in_AWS_Machine_Learning_Certification_Exam\" >What to expect in AWS Machine Learning Certification Exam?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Modeling\" >Domain : Modeling<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Data_Engineering\" >Domain : Data Engineering<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Data_Engineering-2\" >Domain : Data Engineering<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Exploratory_Data_Analysis\" >Domain : Exploratory Data Analysis<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Modeling-2\" >Domain : Modeling<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Modeling-3\" >Domain : Modeling<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Modeling-4\" >Domain : Modeling<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Modeling-5\" >Domain : Modeling<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Modeling-6\" >Domain : Modeling<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Data_Engineering-3\" >Domain : Data Engineering<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Exploratory_Data_Analysis-2\" >Domain : Exploratory Data Analysis<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Exploratory_Data_Analysis-3\" >Domain : Exploratory Data Analysis<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Data_Engineering-4\" >Domain : Data Engineering<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Exploratory_Data_Analysis-4\" >Domain : Exploratory Data Analysis<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Data_Engineering-5\" >Domain : Data Engineering<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-17\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Modeling-7\" >Domain : Modeling<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-18\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Modeling-8\" >Domain : Modeling<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-19\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Modeling-9\" >Domain : Modeling<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-20\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Modeling-10\" >Domain : Modeling<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-21\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_ML_Implementation_and_Operations\" >Domain : ML Implementation and Operations<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-22\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Modeling-11\" >Domain : Modeling<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-23\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Data_Engineering-6\" >Domain : Data Engineering<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-24\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Modeling-12\" >Domain : Modeling<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-25\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Domain_Data_Engineering-7\" >Domain : Data Engineering<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-26\" href=\"https:\/\/www.whizlabs.com\/blog\/aws-certified-machine-learning-specialty-free-questions\/#Summary\" >Summary<\/a><\/li><\/ul><\/nav><\/div>\n<h3><span class=\"ez-toc-section\" id=\"What_to_expect_in_AWS_Machine_Learning_Certification_Exam\"><\/span>What to expect in AWS Machine Learning Certification Exam?<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">The AWS Machine Learning Certification<\/span><span style=\"font-weight: 400;\"> attests to your expertise in the building, tuning, training, and deployment of Machine Learning(ML) models on AWS. It aids the organizations in the identification and development of talent that possess critical skills in the implementation of Cloud Initiatives.<\/span><\/p>\n<p>Let&#8217;s Start Exploring!<\/p>\n<p><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Exploratory Data Analysis<\/span><\/p>\n<h4><em><span style=\"font-weight: 400;\">Q1 : You work for a financial services firm that wishes to enhance its fraud detection capabilities further. The firm has implemented fine-grained transaction logging for all transactions their customers make using their credit cards. The fraud prevention department would like to use this data to produce dashboards to give them insight into their customer\u2019s transaction activity and provide real-time fraud prediction.<\/span><\/em><\/h4>\n<h4><em><span style=\"font-weight: 400;\">You plan to build a fraud detection model using the transaction observation data with Amazon SageMaker. Each transaction observation has a date-time stamp. In its raw form, the date-time stamp is not very useful in your prediction model since it is unique. Can you make use of the date-time stamp in your fraud prediction model, and if so, how?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> No, you cannot use the date-time stamp since this data point will never occur again. Unique features like this will not help identify patterns in your data.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Yes, you can use the date-time stamp data point. You can just use feature selection to deselect the date-time stamp data point, thus dropping it from the learning process.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Yes, you can use the date-time stamp data point. You can transform the date-time stamp into features for the hour of the day, the day of the week, and the month.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> No, you cannot use the date-time feature since there is no way to transform it into a unique data point.<\/span><\/p>\n<p><b>Correct Answer:<\/b> <b>C<\/b><\/p>\n<p><b>Explanation\u00a0<\/b><\/p>\n<p><b>Option A is incorrect<\/b><span style=\"font-weight: 400;\"> since you can use the date-time stamp if you use feature engineering to transform the data point into a useful form.<\/span><br \/>\n<b>Option B is incorrect <\/b><span style=\"font-weight: 400;\">since this option is really just another way of ignoring, thus not using, the date-time stamp data point.<\/span><br \/>\n<b>Option C is correct. <\/b><span style=\"font-weight: 400;\">You can transform the data point using feature engineering and thus gain value from it for the learning process of your model. (See the AWS Machine Learning blog post: <\/span><b>Simplify machine learning with XGBoost and Amazon SageMaker:<\/b><a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/simplify-machine-learning-with-xgboost-and-amazon-sagemaker\/\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">https:\/\/aws.amazon.com\/blogs\/machine-learning\/simplify-machine-learning-with-xgboost-and-amazon-sagemaker\/<\/span><\/a><b>)<\/b><br \/>\n<b>Option D is incorrect <\/b><span style=\"font-weight: 400;\">since we can transform the data point into unique features that represent the hour of the day, the day of the week, and the month. These variables could be useful to learn if the fraudulent activity tends to happen at a particular hour, day of the week, or month.<\/span><\/p>\n<p><b>Diagram:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Here is a screenshot from the AWS Machine Learning documentation depicting a typical fraud detection machine learning solution:<img decoding=\"async\" class=\"aligncenter wp-image-81516 size-full\" title=\"AWS Typical fraud detection machine learning solution\" src=\"https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/ml1.png\" alt=\"AWS Typical fraud detection machine learning solution\" width=\"814\" height=\"417\" srcset=\"https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/ml1.png 814w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/ml1-300x154.png 300w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/ml1-768x393.png 768w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/ml1-640x328.png 640w, https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/ml1-681x349.png 681w\" sizes=\"(max-width: 814px) 100vw, 814px\" \/><\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\"><strong>Reference:<\/strong> Please see the Amazon Machine Learning developer documentation: <\/span><a href=\"https:\/\/docs.aws.amazon.com\/machine-learning\/latest\/dg\/feature-processing.html\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/docs.aws.amazon.com\/machine-learning\/latest\/dg\/feature-processing.html<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Modeling\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Modeling<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q2\u00a0 : You work for a real estate company where you are building a machine learning model to predict the prices of houses. You are using a regression decision tree. As you train your model, you see that it is overfitted to your training data, and it doesn\u2019t generalize well to unseen data. How can you improve your situation and get better training results most efficiently?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Use a random forest by building multiple randomized decision trees and averaging their outputs to get the predictions of the housing prices.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Gather additional training data that gives a more diverse representation of the housing price data.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Use the \u201cdropout\u201d technique to penalize large weights and prevent overfitting.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Use feature selection to eliminate irrelevant features and iteratively train your model until you eliminate the overfitting.<\/span><\/p>\n<p><b>Correct Answer:<\/b> <b>A<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is correct<\/b><span style=\"font-weight: 400;\"> because the random forest algorithm is well known to increase the prediction accuracy and prevent overfitting that occurs with a single decision tree. (See these articles comparing the decision tree and random forest algorithms:<\/span><a href=\"https:\/\/medium.com\/datadriveninvestor\/decision-tree-and-random-forest-e174686dd9eb\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">https:\/\/medium.com\/datadriveninvestor\/decision-tree-and-random-forest-e174686dd9eb<\/span><\/a><span style=\"font-weight: 400;\"> and<\/span><a href=\"https:\/\/towardsdatascience.com\/decision-trees-and-random-forests-df0c3123f991\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">https:\/\/towardsdatascience.com\/decision-trees-and-random-forests-df0c3123f991<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><br \/>\n<b>Option B is incorrect <\/b><span style=\"font-weight: 400;\">since gathering additional data will not necessarily improve the overfitting problem, especially if the additional data has the same noise level of the original data.<\/span><br \/>\n<b>Option C is incorrect<\/b><span style=\"font-weight: 400;\"> since while the \u201cdropout\u201d technique improves models that are overfitted, it is a technique used with neural networks, not decision trees.<\/span><br \/>\n<b>Option D is incorrect<\/b><span style=\"font-weight: 400;\"> since it requires significantly more effort than using the random forest algorithm approach.<\/span><\/p>\n<p><strong>Reference<\/strong><b>: <\/b><span style=\"font-weight: 400;\">Please see this overview of the random forest machine learning algorithm: <\/span><a href=\"https:\/\/medium.com\/capital-one-tech\/random-forest-algorithm-for-machine-learning-c4b2c8cc9feb\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/medium.com\/capital-one-tech\/random-forest-algorithm-for-machine-learning-c4b2c8cc9feb<\/span><\/a><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Data_Engineering\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Data Engineering<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q3 : You need to use machine learning to produce real-time analysis of streaming data from IoT devices out in the field. These devices monitor oil well rigs for malfunction. Due to the safety and security nature of these IoT events, the events must be analyzed by your safety engineers in real-time. You also have an audit requirement to retain your IoT device events for 7 days since you cannot fail to process any of the events. Which approach would give you the best solution for processing your streaming data?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Use Amazon Kinesis Data Streams and its Kinesis Producer Library to pass your events from your producers to your Kinesis stream.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Use Amazon Kinesis Data Streams and its Kinesis API PutRecords call to pass your events from your producers to your Kinesis stream.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Use Amazon Kinesis Data Streams and its Kinesis Client Library to pass your events from your producers to your Kinesis stream.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Use Amazon Kinesis Data Firehose to pass your events directly to your S3 bucket where you store your machine learning data.<\/span><\/p>\n<p><b>Correct Answer:<\/b> <b>B<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is incorrect.<\/b><span style=\"font-weight: 400;\"> The Amazon Kinesis Data Streams Producer Library is not meant to be used for real-time processing of event data since, according to the AWS developer documentation, \u201cit can incur an additional processing delay of up to RecordMaxBufferedTime within the library\u201d. Therefore, it is not the best solution for a real-time analytics solution. (See the AWS developer documentation titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/streams\/latest\/dev\/developing-producers-with-kpl.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Developing Producers Using the Amazon Kinesis Producer Library<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><br \/>\n<b>Option B is correct.<\/b><span style=\"font-weight: 400;\"> The Amazon Kinesis Data Streams API PutRecords call is the best choice for processing in real-time since it sends its data synchronously and does not have the processing delay of the Producer Library. Therefore, it is better suited to real-time applications. (See the AWS developer documentation titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/streams\/latest\/dev\/developing-producers-with-sdk.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Developing Producers Using the Amazon Kinesis Data Streams API with the AWS SDK for Java<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><br \/>\n<b>Option C is incorrect.<\/b><span style=\"font-weight: 400;\"> The Amazon Kinesis Data Streams Client Library interacts with the Kinesis Producer Library to process its event data. Therefore, you\u2019ll have the same processing delay problem with this option. (See the AWS developer documentation titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/streams\/latest\/dev\/developing-consumers-with-kcl.html#kinesis-record-processor-kcl-role\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Developing Consumers Using the Kinesis Client Library 1.x<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><br \/>\n<b>Option D is incorrect. <\/b><span style=\"font-weight: 400;\">The Amazon Kinesis Data Firehose service directly streams your event data to your S3 bucket for use in your real-time analytics model. However, Amazon Kinesis Data Firehose retries to send your data for a maximum of 24 hours, but you have a 7-day retention requirement. (See the<\/span><a href=\"https:\/\/aws.amazon.com\/kinesis\/data-firehose\/faqs\/\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Amazon Kinesis Data Firehose FAQs<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><\/p>\n<p><strong>Reference:<\/strong> <span style=\"font-weight: 400;\">Please see the<\/span><a href=\"https:\/\/aws.amazon.com\/kinesis\/data-streams\/\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Amazon Kinesis Data Streams documentation<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Data_Engineering-2\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Data Engineering<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q4 : You work as a machine learning specialist at a marketing company. Your team has gathered market data about your users into an S3 bucket. You have been tasked to write an AWS Glue job to convert the files from json to a format that will be used to store Hive data. Which data format is the most efficient to convert the data for use with Hive?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Ion<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> grokLog<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Xml<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Orc<\/span><\/p>\n<p><b>Correct Answer: D<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is incorrect<\/b><span style=\"font-weight: 400;\">. Currently, AWS Glue does not support ion for output. (See the AWS developer guide documentation titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/glue\/latest\/dg\/aws-glue-programming-etl-format.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Format Options for ETL Inputs and Outputs in AWS Glue<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><br \/>\n<b>Option B is incorrect.<\/b><span style=\"font-weight: 400;\"> Currently, AWS Glue does not support grokLog for output. (See the AWS developer guide documentation titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/glue\/latest\/dg\/aws-glue-programming-etl-format.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Format Options for ETL Inputs and Outputs in AWS Glue<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><br \/>\n<b>Option C is incorrect<\/b><span style=\"font-weight: 400;\">. Currently, AWS Glue does not support xml for output. (See the AWS developer guide documentation titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/glue\/latest\/dg\/aws-glue-programming-etl-format.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Format Options for ETL Inputs and Outputs in AWS Glue<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><br \/>\n<b>Option D is correct<\/b><span style=\"font-weight: 400;\">. From the Apache Hive Language Manual: \u201cThe <\/span><i><span style=\"font-weight: 400;\">Optimized Row Columnar<\/span><\/i><span style=\"font-weight: 400;\"> (<\/span><a href=\"https:\/\/orc.apache.org\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">ORC<\/span><\/a><span style=\"font-weight: 400;\">) file format provides a highly efficient way to store Hive data. It was designed to overcome the limitations of the other Hive file formats. Using ORC files improves performance when Hive is reading, writing, and processing data.\u201d Also, AWS Glue supports orc for output. (See the<\/span><a href=\"https:\/\/cwiki.apache.org\/confluence\/display\/Hive\/LanguageManual+ORC\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Apache Hive Language Manual<\/span><\/a><span style=\"font-weight: 400;\"> and the AWS developer guide documentation titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/glue\/latest\/dg\/aws-glue-programming-etl-format.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Format Options for ETL Inputs and Outputs in AWS Glue<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Reference:<\/strong> Please see the AWS developer guide documentation titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/glue\/latest\/dg\/aws-glue-programming-general.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">General Information about Programming AWS Glue ETL Scripts<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Exploratory_Data_Analysis\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Exploratory Data Analysis<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q5 : You work as a machine learning specialist for a consulting firm where you analyze data about the consultants who work there in preparation for using the data in your machine learning models. The features you have in your data are things like employee id, specialty, practice, job description, billing hours, and principle. The principle attribute is represented as \u2018yes\u2019 or \u2018no\u2019, whether the consultant has made principle level or not. For your initial analysis, you need to identify the distribution of consultants and their billing hours for the given period. What visualization best describes this relationship?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Scatter plot<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Histogram<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Line chart<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Box plot<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>E.<\/strong> Bubble chart<\/span><\/p>\n<p><b>Correct Answer: B<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Options A is incorrect<\/b><span style=\"font-weight: 400;\">. You are looking for distribution on a single dimension: the consultants billing hours. From the Amazon QuickSite User Guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/quicksight\/latest\/user\/working-with-visual-types.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Working with Visual Types in Amazon QuickSight<\/span><\/a><span style=\"font-weight: 400;\">, \u201cA scatter chart shows multiple distributions, i.e., two or three measures for a dimension.\u201d<\/span><br \/>\n<b>Option B is correct<\/b><span style=\"font-weight: 400;\">. You are looking for a distribution of a single dimension: the consultants billing hours. From the<\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Histogram\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Wikipedia article titled Histogram<\/span><\/a><span style=\"font-weight: 400;\">, \u201cA histogram is an accurate representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable.\u201d The continuous variable in this question: the billing hours, binned into ranges (x-axis), at a frequency: the number of consultants at a billing hour range (y-axis).<\/span><br \/>\n<b>Option C is incorrect<\/b><span style=\"font-weight: 400;\">. From the Amazon QuickSite User Guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/quicksight\/latest\/user\/working-with-visual-types.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Working with Visual Types in Amazon QuickSight<\/span><\/a><span style=\"font-weight: 400;\">, \u201cUse line charts to compare changes in measured values over a period of time.\u201d You are looking for distribution, not a comparison of changes over a period of time.<\/span><br \/>\n<b>Option D is incorrect<\/b><span style=\"font-weight: 400;\">. From the Statistics How To article titled<\/span><a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/types-graphs\/\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Types of Graphs Used in Math and Statistics<\/span><\/a><span style=\"font-weight: 400;\">, \u201cA boxplot, also called a box and whisker plot, is a way to show the spread and centers of a data set. Measures of spread include the interquartile range and the mean of the data set. Measures of the center include the mean or average and median (the middle of a data set).\u201d A Box Plot shows the distribution of multiple dimensions of the data. Once again, you are looking for a distribution of a single dimension, not a distribution on multiple dimensions.<\/span><br \/>\n<b>Option E is incorrect<\/b><span style=\"font-weight: 400;\">.\u00a0 From the<\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Bubble_chart\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Wikipedia article titled Bubble Chart<\/span><\/a><span style=\"font-weight: 400;\">, \u201cA bubble chart is a type of chart that displays three dimensions of data. Each entity with its triplet (<\/span><i><span style=\"font-weight: 400;\">v<\/span><\/i><span style=\"font-weight: 400;\">1, <\/span><i><span style=\"font-weight: 400;\">v<\/span><\/i><span style=\"font-weight: 400;\">2, <\/span><i><span style=\"font-weight: 400;\">v<\/span><\/i><span style=\"font-weight: 400;\">3) of associated data is plotted as a disk that expresses two of the <\/span><i><span style=\"font-weight: 400;\">vi<\/span><\/i><span style=\"font-weight: 400;\"> values through the disk&#8217;s <\/span><i><span style=\"font-weight: 400;\">xy<\/span><\/i><span style=\"font-weight: 400;\"> location and the third through its size.\u201d Once again, you are looking for a distribution of a single dimension, not a distribution on three dimensions.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Reference:<\/strong> Please see the Amazon QuickSight user guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/quicksight\/latest\/user\/working-with-visuals.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Working with Amazon QuickSight Visuals<\/span><\/a><span style=\"font-weight: 400;\"> and the Statistics How To article titled<\/span><a href=\"https:\/\/www.statisticshowto.datasciencecentral.com\/types-graphs\/\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Types of Graphs Used in Math and Statistics<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Modeling-2\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Modeling<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q6 : You work as a machine learning specialist for a robotics manufacturer where you are attempting to use unsupervised learning to train your robots to perform their prescribed tasks. You have engineered your data and produced a CSV file and placed it on S3.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">Which of the following input data channel specifications are correct for your data?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Metadata Content-Type is identified as text\/csv<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Metadata Content-Type is identified as application\/x-recordio-protobuf;boundary=1<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Metadata Content-Type is identified as application\/x-recordio-protobuf;label_size=1<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Metadata Content-Type is identified as text\/csv;label_size=0<\/span><\/p>\n<p><b>Correct Answer: D<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is incorrect<\/b><span style=\"font-weight: 400;\">. The Content-Type of text\/csv without specifying a label_size is used when you have target data, usually in column one, since the default value for label_size is 1, meaning you have one target column. (See the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/cdf-training.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Common Data Formats for Training<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><br \/>\n<b>Option B is incorrect<\/b><span style=\"font-weight: 400;\">. The boundary content type is not relevant to CSV files. It is used for multipart form data.<\/span><br \/>\n<b>Option C is incorrect<\/b><span style=\"font-weight: 400;\">. For unsupervised learning, the label_size should equal 0, indicating the absence of a target. (See the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/cdf-training.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Common Data Formats for Training<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><br \/>\n<b>Option D is correct<\/b><span style=\"font-weight: 400;\">. For unsupervised learning, the label_size equals 0, indicating the absence of a target. (See the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/cdf-training.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Common Data Formats for Training<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Reference:<\/strong> Please see the Amazon SageMaker developer guide, specifically<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/sagemaker-algo-common-data-formats.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Common Data Formats for Built-in Algorithms<\/span><\/a><span style=\"font-weight: 400;\"> and<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/cdf-training.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Common Data Formats for Training<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Modeling-3\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Modeling<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em>Q7 : You work as a machine learning specialist for a marketing firm. Your firm wishes to determine which customers in a dataset of its registered users will respond to a new proposed marketing campaign. You plan to use the XGBoost algorithm on the binary classification problem. In order to find the optimal model, you plan to run many hyperparameter tuning jobs to reach the best hyperparameter values. Which of the following hyperparameters must you use in your tuning jobs if your objective is set to multi:softprob?<\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Alpha<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Base_score<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Eta<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Num_round<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>E.<\/strong> Gamma<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>F.<\/strong> Num_class<\/span><\/p>\n<p><b>Correct Answers: D and F<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is incorrect<\/b><span style=\"font-weight: 400;\">. The alpha hyperparameter is used to adjust the L1 regulation term on weights. This term is optional. (See the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/xgboost_hyperparameters.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">XGBoost Hyperparameters<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><br \/>\n<b>Option B is incorrect<\/b><span style=\"font-weight: 400;\">. The base_score hyperparameter is used to set the initial prediction score of all instances. This term is optional. (See the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/xgboost_hyperparameters.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">XGBoost Hyperparameters<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><br \/>\n<b>Option C is incorrect<\/b><span style=\"font-weight: 400;\">. The eta hyperparameter is used to prevent overfitting. This term is optional. (See the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/xgboost_hyperparameters.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">XGBoost Hyperparameters<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><br \/>\n<b>Option D is correct<\/b><span style=\"font-weight: 400;\">. The num_round hyperparameter is used to set the number of rounds to run in your hyperparameter tuning jobs. This term is required. (See the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/xgboost_hyperparameters.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">XGBoost Hyperparameters<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><br \/>\n<b>Option E is incorrect<\/b><span style=\"font-weight: 400;\">. The gamma hyperparameter is used to set the minimum loss reduction required to make a further partition on a leaf node of the tree. This term is optional. (See the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/xgboost_hyperparameters.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">XGBoost Hyperparameters<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><br \/>\n<b>Option F is correct<\/b><span style=\"font-weight: 400;\">. This hyperparameter is used to set the number of classes. This term is required if the objective is set to multi:softmax or multi:softprob. (See the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/xgboost_hyperparameters.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">XGBoost Hyperparameters<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Reference:<\/strong> Please see the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/automatic-model-tuning.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Automatic Model Tuning<\/span><\/a><span style=\"font-weight: 400;\"> and the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/automatic-model-tuning-how-it-works.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">How Hyperparameter Tuning Works<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Modeling-4\"><\/span>Domain : Modeling<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q8 : You work as a machine learning specialist for a financial services company. You are building a machine learning model to perform futures price prediction. You have trained your model, and you now want to evaluate it to make sure it is not overtrained and can generalize.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">Which of the following techniques is the appropriate method to cross-validate your machine learning model?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Leave One Out Cross Validation (LOOCV)<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> K-Fold Cross Validation<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Stratified Cross Validation<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Time Series Cross Validation<\/span><\/p>\n<p><b>Correct Answer: D<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is incorrect.<\/b><span style=\"font-weight: 400;\"> Since we are trying to validate a time series set of data, we need to use a method that uses a rolling origin with day n as training data and day n+1 as test data. The LOOCV approach doesn\u2019t give us this option. (See the article<\/span><a href=\"https:\/\/medium.com\/datadriveninvestor\/k-fold-and-other-cross-validation-techniques-6c03a2563f1e\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">K-Fold and Other Cross-Validation Techniques<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><br \/>\n<b>Option B is incorrect<\/b><span style=\"font-weight: 400;\">. The K-Fold cross validation technique randomizes the test dataset. We cannot randomize our test dataset since we try to validate a time series set of data. Randomized time series data loses its time-related value.<\/span><br \/>\n<b>Option C is incorrect<\/b><span style=\"font-weight: 400;\">. We are trying to cross-validate time series data. We cannot randomize the test data because it will lose its time-related value.<\/span><br \/>\n<b>Option D is correct<\/b><span style=\"font-weight: 400;\">. The Time Series Cross Validation technique is the correct choice for cross-validating a time series dataset. Time series cross validation uses forward chaining, where the origin of the forecast moves forward in time.\u00a0 Day n is training data and day n+1 is test data.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Reference:<\/strong> Please see the Amazon Machine Learning developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/machine-learning\/latest\/dg\/cross-validation.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Cross Validation<\/span><\/a><span style=\"font-weight: 400;\">, and the article<\/span><a href=\"https:\/\/medium.com\/datadriveninvestor\/k-fold-and-other-cross-validation-techniques-6c03a2563f1e\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">K-Fold and Other Cross-Validation Techniques<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Modeling-5\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Modeling<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q9 : You work as a machine learning specialist for a state highway administration department. Your department is trying to use machine learning to help determine the make and model of cars as they pass a camera on the state highways. You need to build a machine learning model to accomplish this problem.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">Which modeling approach best fits your problem?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Multi-Class Classification<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Simulation-based Reinforcement Learning<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Binary Classification<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Heuristic Approach<\/span><\/p>\n<p><b>Correct Answer: A<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is correct<\/b><span style=\"font-weight: 400;\">. Multi-Class Classification is used when your model needs to choose from a finite set of outcomes, such as this car make and model classification image recognition problem.<\/span><br \/>\n<b>Option B is incorrect<\/b><span style=\"font-weight: 400;\">. Simulation-Based Reinforcement Learning is used in problems where your model needs to learn through trial and error. An image recognition problem with a finite set of outcomes is better suited to a multi-class classification model.<\/span><br \/>\n<b>Option C is incorrect<\/b><span style=\"font-weight: 400;\">. Binary Classification is the approach you use when you are trying to predict a binary outcome. This strategy determination problem would not fit a binary classification model since you have a finite set from which to choose that is greater than 2.<\/span><br \/>\n<b>Option D is incorrect<\/b><span style=\"font-weight: 400;\">. The Heuristic Approach is used when a machine learning approach is not necessary. An example is the rate of acceleration of a particle through space. There are well known formulas for speed, inertia, and friction that can solve a problem such as this.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Reference:<\/strong> Please see the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/linear-learner.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Linear Learner Algorithm<\/span><\/a><span style=\"font-weight: 400;\">, the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/reinforcement-learning.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Reinforcement Learningwith Amazon SageMaker RL<\/span><\/a><span style=\"font-weight: 400;\">,\u00a0 the Amazon Machine Learning developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/machine-learning\/latest\/dg\/multiclass-classification.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Multiclass Classification<\/span><\/a><span style=\"font-weight: 400;\">, and the article titled<\/span><a href=\"https:\/\/www.quora.com\/What-is-the-difference-between-a-machine-learning-algorithm-and-a-heuristic-and-when-to-use-each\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">What is the difference between a machine learning algorithm and a heuristic, and when to use each?<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Modeling-6\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Modeling<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q10 : You work as a machine learning specialist for the highway toll collection division of the regional state area. The toll collection division uses cameras to identify car license plates as the cars pass through the various toll gates on the state highways. You are on the team that is using Sage Maker Image Classification machine learning to read and classify license plates by state and then identify the actual license plate number.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">Very rarely, cars pass through the toll gates with plates from foreign countries, for example, Great Britain or Mexico. The outliers must not adversely affect your model\u2019s predictions.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">Which hyperparameter should you set, and to what value, to ensure these outliers do not adversely impact your model?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> feature_dim set to 5<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> feature_dim set to 1<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> sample_size set to 10<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> sample_size set to 100<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>E.<\/strong> learning_rate set to 0.1<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>F.<\/strong> learning_rate set to 0.75<\/span><\/p>\n<p><b>Correct Answer: E<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is incorrect<\/b><span style=\"font-weight: 400;\">. The feature_dim hyperparameter is a setting on the K-Means and K-Nearest Neighbors algorithms, not the Image Classification algorithm.<\/span><br \/>\n<b>Option B is incorrect<\/b><span style=\"font-weight: 400;\">. The feature_dim hyperparameter is a setting on the K-Means and K-Nearest Neighbors algorithms, not the Image Classification algorithm.<\/span><br \/>\n<b>Option C is incorrect<\/b><span style=\"font-weight: 400;\">. The sample_size hyperparameter is a setting on the K-Nearest Neighbors algorithm, not the Image Classification algorithm.<\/span><br \/>\n<b>Option D is incorrect<\/b><span style=\"font-weight: 400;\">. The sample_size hyperparameter is a setting on the K-Nearest Neighbors algorithm, not the Image Classification algorithm.<\/span><br \/>\n<b>Option E is correct<\/b><span style=\"font-weight: 400;\">. The learning_rate hyperparameter governs how quickly the model adapts to new or changing data. Valid values range from 0.0 to 1.0. Setting this hyperparameter to a low value, such as 0.1, will make the model learn more slowly and be less sensitive to outliers. This is what you want. You want your model not to be adversely impacted by outlier data.<\/span><br \/>\n<b>Option F is incorrect<\/b><span style=\"font-weight: 400;\">. The learning_rate hyperparameter governs how quickly the model adapts to new or changing data. Valid values range from 0.0 to 1.0. Setting this hyperparameter to a high value, such as 0.75, will make the model learn more quickly but be sensitive to outliers. This is not what you want. You want your model not to be adversely impacted by outlier data.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Reference:<\/strong> Please see the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/IC-Hyperparameter.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Image Classification Hyperparameters<\/span><\/a><span style=\"font-weight: 400;\">, and the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/algos.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Use Amazon SageMaker Built-in Algorithms<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Data_Engineering-3\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Data Engineering<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q11 : You work as a machine learning specialist at a hedge fund firm. Your firm is working on a new quant algorithm to predict when to enter and exit holdings in their portfolio. You are building a machine learning model to predict these entry and exit points in time. You have cleaned your data, and you are now ready to split the data into training and test datasets.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">Which splitting technique is best suited to your model\u2019s requirements?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Use k-fold cross validation to split the data.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Sequentially splitting the data<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Randomly splitting the data<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Categorically splitting the data by holding<\/span><\/p>\n<p><b>Correct Answer: B<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is incorrect<\/b><span style=\"font-weight: 400;\">. Using k-fold cross validation will randomly split your data. But you need to consider the time-series nature of your data when splitting. So randomizing the data would eliminate the time element of your observations, making the datasets unusable for predicting price changes over time.<\/span><br \/>\n<b>Option B is correct<\/b><span style=\"font-weight: 400;\">. By sequentially splitting the data, you preserve the time element of your observations.<\/span><br \/>\n<b>Option C is incorrect<\/b><span style=\"font-weight: 400;\">. Randomly splitting the data would eliminate the time element of your observations, making the datasets unusable for predicting price changes over time.<\/span><br \/>\n<b>Option D is incorrect<\/b><span style=\"font-weight: 400;\">. If you split the data by a category such as the holding attribute, you would create imbalanced training and test dataset since some holdings would only be in the training dataset and others would only be in the test dataset.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Reference:<\/strong> Please see the Amazon Machine Learning developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/machine-learning\/latest\/dg\/splitting-types.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Splitting Your Data<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Exploratory_Data_Analysis-2\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Exploratory Data Analysis<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q12 : You work for a major banking firm as a machine learning specialist. As part of the bank\u2019s fraud detection team, you build a machine learning model to detect fraudulent transactions. Using your training dataset, you have produced a Receiver Operating Characteristic (ROC) curve, and it shows 99.99% accuracy.\u00a0 Your transaction dataset is very large, but 99.99% of the observations in your dataset represent non-fraudulent transactions. Therefore, the fraudulent observations are a minority class. Your dataset is very imbalanced.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">You have the approval from your management team to produce the most accurate model possible, even if it means spending more time perfecting the model. What is the most effective technique to address the imbalance in your dataset?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Synthetic Minority Oversampling Technique (SMOTE) oversampling<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Random oversampling<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Generative Adversarial Networks (GANs) oversampling<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Edited Nearest Neighbor undersampling<\/span><\/p>\n<p><b>Correct Answer: C<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is incorrect<\/b><span style=\"font-weight: 400;\">. The SMOTE technique creates new observations of the underrepresented class, in this case, the fraudulent observations. These synthetic observations are almost identical to the original fraudulent observations. This technique is expeditious, but the types of synthetic observations it produces are not as useful as the unique observations created by other oversampling techniques.<\/span><br \/>\n<b>Option B is incorrect<\/b><span style=\"font-weight: 400;\">. Random oversampling uses copies of some of the minority class observations (randomly selected) to augment the minority class observation set. These observations are exact replicas of existing minority class observations, making them less effective than observations created by other techniques that produce unique synthetic observations.<\/span><br \/>\n<b>Option C is correct<\/b><span style=\"font-weight: 400;\">. The Generative Adversarial Networks (GANs) technique generates unique observations that more closely resemble the real minority observations without being so similar that they are almost identical. This results in more unique observations of your minority class that improve your model\u2019s accuracy by helping to correct the imbalance in your data.<\/span><br \/>\n<b>Option D is incorrect<\/b><span style=\"font-weight: 400;\">. Using an undersampling technique would remove potentially useful majority class observations. Additionally, you would have to remove a huge number of your majority class observations to correct your imbalance that you would render your entire training dataset useless.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Reference:<\/strong> Please see the Wikipedia article titled<\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Oversampling_and_undersampling_in_data_analysis\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Oversampling and undersampling in data analysis<\/span><\/a><span style=\"font-weight: 400;\">, and the article titled<\/span><a href=\"https:\/\/medium.com\/@hazy_ai\/imbalanced-data-and-credit-card-fraud-ad1c1ed011ea\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Imbalanced data and credit card fraud<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Exploratory_Data_Analysis-3\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Exploratory Data Analysis<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q13 : You work for the security department of your firm. As part of securing your firm\u2019s email activity from phishing attacks, you need to build a machine learning model that analyzes incoming email text to find word phrases like \u201cyou\u2019re a winner\u201d or \u201cclick here now\u201d to find potential phishing emails.\u00a0<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">Which of the following text feature engineering techniques is the best solution for this task?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Orthogonal Sparse Bigram (OSB)<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Term Frequency-Inverse Document Frequency (tf-idf)<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Bag-of-Words<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> N-Gram<\/span><\/p>\n<p><b>Correct Answer: D<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is incorrect<\/b><span style=\"font-weight: 400;\">. The Orthogonal Sparse Bigram natural language processing algorithm creates groups of words and outputs the pairs of words that include the first word. You are trying to classify an email as a phishing attack by having your model learn based on the presence of multi-word phrases in the email text, not pairs of words from the email text stream using the first word as the key.<\/span><br \/>\n<b>Option B is incorrect<\/b><span style=\"font-weight: 400;\">. Term Frequency-Inverse Document Frequency determines how important a word is in a document by giving weights to words that are common and less common in the document. You are trying to classify an email as a phishing attack by having your model learn based on the presence of multi-word phrases in the email text. You are not trying to determine the importance of a word or phrase in the email text.<\/span><br \/>\n<b>Option C is incorrect<\/b><span style=\"font-weight: 400;\">. The Bag-of-Words natural language processing algorithm creates tokens of the input document text and outputs a statistical depiction of the text. The statistical depiction, such as a histogram, shows the count of each word in the document. You are trying to classify an email as a phishing attack by having your model learn based on the presence of multi-word phrases in the email text, not individual words.<\/span><br \/>\n<b>Option D is correct<\/b><span style=\"font-weight: 400;\">. The N-Gram natural language processing algorithm is used to find multi-word phrases in the text, in this case, an email. This suits your phishing detection task since you are trying to classify an email as a phishing attack by having your model learn based on the presence of multi-word phrases.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Reference:<\/strong> Please see the article titled<\/span><a href=\"https:\/\/towardsdatascience.com\/introduction-to-natural-language-processing-for-text-df845750fb63\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Introduction to Natural Language Processing for Text<\/span><\/a><span style=\"font-weight: 400;\">, and the article titled<\/span><a href=\"https:\/\/medium.com\/machine-learning-intuition\/document-classification-part-2-text-processing-eaa26d16c719\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Document Classification Part 2: Text Processing (N-Gram Model &amp; TF-IDF Model)<\/span><\/a><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Data_Engineering-4\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Data Engineering<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q14 : You work as a machine learning specialist for a media sharing service. Healthcare professionals will use the media sharing service to share images of x-rays, MRIs, and other medical imagery. The accuracy of labelling these images is of primary importance, since the labelling will be used in auto diagnostic software. As your team builds the data repository to be used by your machine learning algorithms, you need to use human manual labellers. You have decided to use Amazon Ground Truth for this purpose. Since accuracy is of prime importance, you have decided to use the annotation consolidation feature of Ground Truth to ensure proper labelling of the medical images.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">Which of the Ground Truth annotation consolidation functions should you use to ensure the accuracy of your labelling tasks?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Bounding box<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Semantic segmentation<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Named entity<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Output manifest<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>E.<\/strong> Mechanical turk<\/span><\/p>\n<p><b>Correct Answers: A and B<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is correct<\/b><span style=\"font-weight: 400;\">. The bounding box finds the most similar bounding boxes from workers and averages them, thus using the power of multiple workers to annotate your images more accurately.<\/span><br \/>\n<b>Option B is correct<\/b><span style=\"font-weight: 400;\">. The semantic segmentation feature fuses the pixel annotations of multiple workers and applying a smoothing function to the image, thus using the power of multiple workers to annotate your images more accurately.<\/span><br \/>\n<b>Option C is incorrect<\/b><span style=\"font-weight: 400;\">. The named entity feature is used with text annotation work, not image annotation.<\/span><br \/>\n<b>Option D is incorrect<\/b><span style=\"font-weight: 400;\">. The Ground Truth output manifest allows the output of a labelling job to be used as the input to a machine learning model. This feature will not help ensure the accuracy of worker annotations.<\/span><br \/>\n<b>Option E is incorrect<\/b><span style=\"font-weight: 400;\">. The Ground Truth Mechanical Turk feature gives you access to a large pool of labelling workers. While increasing the number of workers at your disposal, this feature will not help ensure the accuracy of worker annotations.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Reference:<\/strong> Please see the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/sms-annotation-consolidation.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Annotation Consolidation<\/span><\/a><span style=\"font-weight: 400;\">, and the Amazon Machine Learning blog titled<\/span><a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/use-the-wisdom-of-crowds-with-amazon-sagemaker-ground-truth-to-annotate-data-more-accurately\/\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Use the wisdom of crowds with Amazon SageMaker Ground Truth to annotate data more accurately<\/span><\/a><span style=\"font-weight: 400;\">, and GitHub repository titled<\/span><a href=\"https:\/\/github.com\/awslabs\/amazon-sagemaker-examples\/tree\/master\/ground_truth_labeling_jobs\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Amazon Sagemaker Examples Introduction to Ground Truth Labeling Jobs<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Exploratory_Data_Analysis-4\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Exploratory Data Analysis<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q15 : You work for a city government in their shared bike program as a machine learning specialist. You need to visualize the bike share location predictions you are producing on an hourly basis using your model inference you created using the SageMaker built-in K-Means algorithm. Your inference endpoint takes IoT data from your shared bikes as they are used throughout the city. You also want to enrich your shared bike data with external data sources such as current weather and road conditions.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">Which set of Amazon services would you use to create your visualization with the least amount of effort?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> IoT Core -&gt; IoT Analytics -&gt;SageMaker -&gt; QuickSight<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> IoT Core -&gt; Kinesis Firehose -&gt; SageMaker -&gt; QuickSight<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> IoT Core -&gt; Lambda -&gt; SageMaker -&gt; QuickSight<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> IoT Core -&gt; IoT Greengrass -&gt; QuickSight<\/span><\/p>\n<p><b>Correct Answer: A<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is correct<\/b><span style=\"font-weight: 400;\">. IoT Core collects data from each shared bike, IoT Analytics retrieves messages from the shared bikes as they stream data, IoT Analytics also enriches the streaming data with your external data sources and sends the streaming data to your K-Means machine learning inference endpoint, QuickSight is then used to create your visualization. This approach requires the least amount of effort mainly because of the data enrichment feature of IoT Analytics.<\/span><br \/>\n<b>Option B is incorrect<\/b><span style=\"font-weight: 400;\">. With this option, you would have to create a lambda function to gather the data enrichment information (weather, road conditions) and enrich the data streams in your own code.<\/span><br \/>\n<b>Option C is incorrect<\/b><span style=\"font-weight: 400;\">. Also, with this option, you would have to add code to your lambda function to gather the data enrichment information (weather, road conditions) and enrich the data streams in your own code.<\/span><br \/>\n<b>Option D is incorrect<\/b><span style=\"font-weight: 400;\">. IoT Greengrass is a service that you use to run local machine learning inference capabilities on connected devices. This approach would not easily integrate with your QuickSight visualization.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Reference:<\/strong> Please see the<\/span><a href=\"https:\/\/aws.amazon.com\/iot-analytics\/\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">AWS IoT Analytics overview<\/span><\/a><span style=\"font-weight: 400;\">, the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/k-means.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">K-Means Algorithm<\/span><\/a><span style=\"font-weight: 400;\">, the AWS Big Data blog titled<\/span><a href=\"https:\/\/aws.amazon.com\/blogs\/big-data\/build-a-visualization-and-monitoring-dashboard-for-iot-data-with-amazon-kinesis-analytics-and-amazon-quicksight\/\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Build a Visualization and Monitoring Dashboard for IoT Data with Amazon Kinesis Analytics and Amazon QuickSight<\/span><\/a><span style=\"font-weight: 400;\">, the AWS IoT Analytics User Guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/iotanalytics\/latest\/userguide\/welcome.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">What IS AWS IoT Analytics?<\/span><\/a><span style=\"font-weight: 400;\">, and the<\/span><a href=\"https:\/\/aws.amazon.com\/greengrass\/faqs\/\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">AWS IoT Greengrass FAQs<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Data_Engineering-5\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Data Engineering<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q16 : You work for a logistics company that specializes in the storage, movement, and control of massive amounts of packages. You are on the machine learning team assigned the task of building a machine learning model to assist in the control of your company\u2019s package logistics. Specifically, your model needs to predict the routes your package movers should take for optimal delivery and resource usage. The model requires various transformations to be performed on the data. You also want to get inferences on entire datasets once you have your model in production. Additionally, you won\u2019t need a persistent endpoint for applications to call to get inferences.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">Which type of production deployment would you use to get predictions from your model in the most expeditious manner?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> SageMaker Hosting Services<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> SageMaker Batch Transform<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> SageMaker Containers<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> SageMaker Elastic Inference<\/span><\/p>\n<p><b>Correct Answer: B<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is incorrect<\/b><span style=\"font-weight: 400;\">. SageMaker Hosting Services is used for applications to send requests to an HTTPS endpoint to get inferences. This type of deployment is used when you need a persistent endpoint for applications to call to get inferences.<\/span><br \/>\n<b>Option B is correct<\/b><span style=\"font-weight: 400;\">. SageMaker Batch Transform is used to get inferences for an entire dataset, and you don\u2019t need a persistent endpoint for applications to call to get inferences.<\/span><br \/>\n<b>Option C is incorrect<\/b><span style=\"font-weight: 400;\">. SageMaker Containers is a service you can use to create your own Docker containers to deploy your models. This would not be the most expeditious option.\u00a0\u00a0<\/span><br \/>\n<b>Option D is incorrect<\/b><span style=\"font-weight: 400;\">. SageMaker Elastic Interface is used to accelerate deep learning inference workloads. This service alone would not give you the batch transform capabilities you need.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Reference:<\/strong> Please see the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/how-it-works-hosting.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Deploy a Model on Amazon SageMaker Hosting Services<\/span><\/a><span style=\"font-weight: 400;\">, the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/how-it-works-batch.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Get Inferences for an Entire Dataset with Batch Transform<\/span><\/a><span style=\"font-weight: 400;\">, the Amazon Elastic Inference developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/elastic-inference\/latest\/developerguide\/what-is-ei.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">What Is Amazon Elastic Inference?<\/span><\/a><span style=\"font-weight: 400;\">, and the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/amazon-sagemaker-containers.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Amazon SageMaker Containers: a Library to Create Docker Containers<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Modeling-7\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Modeling<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q17 : You work as a machine learning specialist for an auto manufacturer that produces several car models in several product lines. Example models include an LX model, an EX model, a Sport model, etc. These models have many similarities. But of course, they also have defining differences. Each model has its own parts list entries in your company\u2019s parts database. When ordering commodity parts for these car models from auto parts manufacturers, you want to produce the most efficient orders for each parts manufacturer by combining orders for similar parts lists. This will save your company money. You have decided to use the AWS Glue FindMatches Machine Learning Transform to find your matching parts lists.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">You have created your data source file as a CSV, and you have also created your labeling file used to train your Find Matches to transform. When you run your AWS Glue transform job, it fails. Which of the following could be the root of the problem?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> The labeling file is in the CSV format.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> The labeling file has labeling set_id and label as its first two columns with the remaining columns matching the schema of the parts list data to be processed.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Records in the labeling file that don\u2019t have any matches have unique labels.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> The labeling file is not encoded in UTF-8 without BOM (byte order mark).<\/span><\/p>\n<p><b>Correct Answer: D<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is incorrect<\/b><span style=\"font-weight: 400;\">. When using the AWS Glue FindMatches ML Transform, the labeling file must be in CSV format.<\/span><br \/>\n<b>Option B is incorrect<\/b><span style=\"font-weight: 400;\">. When using the AWS Glue FindMatches ML Transform, the first two columns of the labeling file are required to be labeling_set_id and label. Also, the remaining columns must match the schema of the data to be processed.<\/span><br \/>\n<b>Option C is incorrect<\/b><span style=\"font-weight: 400;\">. When using the AWS Glue FindMatches ML Transform, if a record doesn\u2019t have a match, it is assigned a unique label.<\/span><br \/>\n<b>Option D is correct<\/b><span style=\"font-weight: 400;\">. When using the AWS Glue FindMatches ML Transform, the labeling file must be encoded as UTF-8 without BOM.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Reference:<\/strong> Please see the AWS Glue developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/glue\/latest\/dg\/machine-learning.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Machine Learning Transforms in AWS Glue<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Modeling-8\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Modeling<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q18 : You work for an Internet of Things (IoT) component manufacturer which builds servos, engines, sensors, etc. The IoT devices transmit usage and environment information back to AWS IoT Core via the MQTT protocol. You want to use a machine learning model to show how\/where the use of your products is clustered in various regions around the world. This information will help your data scientists build KPI dashboards to improve your component engineering quality and performance. You have created, trained, and deployed to Amazon SageMaker Hosting Services your model based on the XGBoost algorithm. Your model is set up to receive inference requests from a lambda function that is triggered by the receipt of an IoT Core MQTT message via your Kinesis Data Streams instance.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">What transform steps need to be done for each inference request? Also, which steps are handled by your code versus by the inference algorithm?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Inference request serialization (handled by the algorithm)<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Inference request serialization (handled by your lambda code)<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Inference request deserialization (handled by your lambda code)<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Inference request deserialization (handled by the algorithm)<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>E.<\/strong> Inference request post serialization (handled by the algorithm)<\/span><\/p>\n<p><b>Correct Answers: B and D<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is incorrect<\/b><span style=\"font-weight: 400;\">. The inference request serialization must be completed by your lambda code. The algorithm needs to receive the inference request in serialized form.<\/span><br \/>\n<b>Option B is correct<\/b><span style=\"font-weight: 400;\">. The inference request serialization must be completed by your lambda code.<\/span><br \/>\n<b>Option C is incorrect<\/b><span style=\"font-weight: 400;\">. The inference request is deserialized by the algorithm in response to the inference request. Your lambda code is responsible for serializing the inference request.<\/span><br \/>\n<b>Option D is correct<\/b><span style=\"font-weight: 400;\">. The inference request is deserialized by the algorithm in response to the inference request.<\/span><br \/>\n<b>Option E is incorrect<\/b><span style=\"font-weight: 400;\">. There is no inference request post serialization step in the SageMaker inference request\/response process.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>Reference:<\/strong> Please see the Amazon SageMaker developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/cdf-inference.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Common Data Formats for Inference<\/span><\/a><span style=\"font-weight: 400;\">, the<\/span><a href=\"https:\/\/aws.amazon.com\/iot-core\/\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">AWS IoT Core overview page<\/span><\/a><span style=\"font-weight: 400;\">, the AWS IoT developer guide titled<\/span><a href=\"https:\/\/docs.aws.amazon.com\/iot\/latest\/developerguide\/iot-lambda-rule.html\" target=\"_blank\" rel=\"noopener\"> <span style=\"font-weight: 400;\">Creating an AWS Lambda Rule<\/span><\/a><span style=\"font-weight: 400;\">.<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Modeling-9\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Modeling<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q19 : You are a Machine Learning Specialist on a team that is designing a system to help improve sales for your auto parts division. You have clickstream data gathered from your user&#8217;s activity on your product website. Your team has been tasked with using the large amount of clickstream information depicting user behavior and product preferences to build a recommendation engine similar to the Amazon.com feature that recommends products through the tagline of \u201cusers who bought this item also considered these items.\u201d Similarly, your team\u2019s task is to predict which products a given user may like based on the similarity between the given user and other users.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">How should you and your team architect this solution?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Create a recommendation engine based on a neural combinative filtering model using TensorFlow and run it on Sage Maker.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Create a recommendation engine based on model-based filtering using TensorFlow and run it on SageMaker.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Create a recommendation engine based on a neural collaborative filtering model using TensorFlow and run it on Sage Maker.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Create a recommendation engine based on content-based filtering using TensorFlow and run it on SageMaker.<\/span><\/p>\n<p><b>Correct<\/b> <b>Answer:<\/b> <b>C<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is incorrect.<\/b><span style=\"font-weight: 400;\"> There is no neural combinative filtering method used in recommendation engine models.<\/span><br \/>\n<b>Option B is incorrect.<\/b><span style=\"font-weight: 400;\"> The term model-based filtering is too generic. We are using a model to make our recommendations, but which type of model should we use?<\/span><br \/>\n<b>Option C is correct. <\/b><span style=\"font-weight: 400;\">The famous Amazon.com recommendation engine is built using a neural collaborative filtering method. This method is optimized to find similarities in environments where you have large amounts of user actions that you can analyze.<\/span><br \/>\n<b>Option D is incorrect.<\/b><span style=\"font-weight: 400;\"> Content-based filtering relies on similarities between features of items, whereas collaborative-based filtering relies on preferences from other users and how they respond to similar items.<\/span><\/p>\n<p><strong>References: <\/strong><span style=\"font-weight: 400;\">Please see the article titled <\/span><b>BUILDING A RECOMMENDATION ENGINE WITH SPARK ML ON AMAZON EMR USING ZEPPELIN<\/b><span style=\"font-weight: 400;\"> (<\/span><a href=\"https:\/\/noise.getoto.net\/2015\/11\/14\/building-a-recommendation-engine-with-spark-ml-on-amazon-emr-using-zeppelin-2\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/noise.getoto.net\/2015\/11\/14\/building-a-recommendation-engine-with-spark-ml-on-amazon-emr-using-zeppelin-2\/<\/span><\/a><span style=\"font-weight: 400;\">), <\/span><span style=\"font-weight: 400;\">The Wikipedia article titled<\/span><b> Collaborative filtering <\/b><span style=\"font-weight: 400;\">(<\/span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Collaborative_filtering\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/en.wikipedia.org\/wiki\/Collaborative_filtering<\/span><\/a><span style=\"font-weight: 400;\">), <\/span><span style=\"font-weight: 400;\">The AWS Machine Learning blog titled <\/span><b>Building a customized recommender system in Amazon SageMaker <\/b><span style=\"font-weight: 400;\">(<\/span><a href=\"https:\/\/aws.amazon.com\/blogs\/machine-learning\/building-a-customized-recommender-system-in-amazon-sagemaker\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/aws.amazon.com\/blogs\/machine-learning\/building-a-customized-recommender-system-in-amazon-sagemaker\/<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Modeling-10\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Modeling<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q20 : You are part of a machine learning team in a financial services company that builds a model that will perform time series forecasting of stock price movement using SageMaker. You and your team have finished training the model, and you are now ready to performance test your endpoint to get the best parameters for configuring auto-scaling for your model variant. How can you most efficiently review the latency, memory utilization, and CPU utilization during the load test?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Stream the SageMaker model variant CloudWatch logs to ElasticSearch. Then visualize and query the log data in a Kibana dashboard.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Create custom CloudWatch logs containing the metrics you wish to monitor, then stream the SageMaker model variant logs to ElasticSearch and visualize\/query the log data in a Kibana dashboard.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Create a CloudWatch dashboard to show a view of the latency, memory utilization, and CPU utilization metrics of the SageMaker model variant.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Query the SageMaker model variant logs on S3 using Athena and leverage QuickSight to visualize the logs.<\/span><\/p>\n<p><b>Correct Answer:<\/b> <b>C<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is incorrect.<\/b><span style=\"font-weight: 400;\"> Using ElasticSearch and Kibana unnecessarily complicates the solution. A CloudWatch dashboard can show all of the metric data you need to evaluate your model variant.<\/span><br \/>\n<b>Option B is incorrect.<\/b><span style=\"font-weight: 400;\"> You don\u2019t need to create custom CloudWatch logs with the metrics you wish to monitor. All of the metrics (latency, memory utilization, and CPU utilization) you wish to view are generated by CloudWatch by default. Also, using ElasticSearch and Kibana unnecessarily complicates the solution.<\/span><br \/>\n<b>Option C is correct.<\/b><span style=\"font-weight: 400;\"> The simplest approach is to leverage the CloudWatch dashboard feature since it generates all of the metrics (latency, memory utilization, and CPU utilization) you wish to view by default.<\/span><br \/>\n<b>Option D is incorrect. <\/b><span style=\"font-weight: 400;\">Using Athena and QuickSight unnecessarily complicates the solution. A CloudWatch dashboard can show all of the metric data you need to evaluate your model variant.<\/span><\/p>\n<p><strong>References: <\/strong><span style=\"font-weight: 400;\">Please see the Amazon SageMaker developer guide titled <\/span><b>Monitor Amazon SageMaker with Amazon CloudWatch <\/b><span style=\"font-weight: 400;\">(<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/monitoring-cloudwatch.html\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/monitoring-cloudwatch.html<\/span><\/a><span style=\"font-weight: 400;\">), <\/span><span style=\"font-weight: 400;\">The Amazon CloudWatch user guide titled<\/span><b> Using Amazon CloudWatch Dashboards <\/b><span style=\"font-weight: 400;\">(<\/span><a href=\"https:\/\/docs.aws.amazon.com\/AmazonCloudWatch\/latest\/monitoring\/CloudWatch_Dashboards.html\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/docs.aws.amazon.com\/AmazonCloudWatch\/latest\/monitoring\/CloudWatch_Dashboards.html<\/span><\/a><span style=\"font-weight: 400;\">), <\/span><span style=\"font-weight: 400;\">The Amazon CloudWatch user guide titled <\/span><b>Creating a CloudWatch Dashboard <\/b><span style=\"font-weight: 400;\">(<\/span><a href=\"https:\/\/docs.aws.amazon.com\/AmazonCloudWatch\/latest\/monitoring\/create_dashboard.html\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/docs.aws.amazon.com\/AmazonCloudWatch\/latest\/monitoring\/create_dashboard.html<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_ML_Implementation_and_Operations\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">ML Implementation and Operations<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q21 : You work as a machine learning specialist for a financial services firm. Your firm contracts with market data generation services that deliver 5 TB of market activity record data every minute. To prepare this data for your machine learning models, your team queries the data using Athena. However, the queries perform poorly because they are operating on such a large data stream. You need to find a more performant option. Which file format for your market data records on S3 will give you the best performance?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> TSV files<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Compressed LZO files<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Parquet files<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> CSV files<\/span><\/p>\n<p><b>Correct Answer:<\/b> <b>C<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is incorrect. <\/b><span style=\"font-weight: 400;\">The TSV file format uses a row-based file structure that uses tabs as an attribute separator. When Athena reads from these types of files, it must read the entire row for every row versus reading in a column when only the attribute in that column is needed for your query. Columnar-based file processing is much more efficient for queries of large datasets. Also, the TSV file format does not support the partitioning of your data.<\/span><br \/>\n<b>Option B is incorrect. <\/b><span style=\"font-weight: 400;\">Compressed LZO Files do not support columnar processing nor partitioning. Therefore they will perform poorly when compared to columnar file formats like Parquet.<\/span><br \/>\n<b>Option C is correct. <\/b><span style=\"font-weight: 400;\">The Parquet file format is a columnar-based format, and it supports partitioning. The other columnar-based file format supported by Athena is ORC. These columnar-based file formats outperform the tabular formats such as CSV and TSV when Athena works with very large datasets.<\/span><br \/>\n<b>Option D is incorrect. <\/b><span style=\"font-weight: 400;\">The CSV file format uses a row-based file structure that uses commas as an attribute separator. When Athena reads from these types of files, it must read the entire row for every row versus reading in a column (columnar-based processing) when only the attribute in that column is needed for your query. Columnar-based file processing is much more efficient for queries of large datasets. Also, the CSV file format does not support the partitioning of your data.<\/span><\/p>\n<p><strong>References: <\/strong><span style=\"font-weight: 400;\">Please see the <\/span><b>Amazon Athena FAQs <\/b><span style=\"font-weight: 400;\">(refer to the question \u201cHow do I improve the performance of my query?\u201d(<\/span><a href=\"https:\/\/aws.amazon.com\/athena\/faqs\/#:~:text=Amazon%20Athena%20supports%20a%20wide,%2C%20LZO%2C%20and%20GZIP%20formats.\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/aws.amazon.com\/athena\/faqs\/#:~:text=Amazon%20Athena%20supports%20a%20wide,%2C%20LZO%2C%20and%20GZIP%20formats.<\/span><\/a><span style=\"font-weight: 400;\">), <\/span><span style=\"font-weight: 400;\">The AWS Big Data blog titled<\/span><b> Top 10 Performance Tuning Tips for Amazon Athena (<\/b><a href=\"https:\/\/aws.amazon.com\/blogs\/big-data\/top-10-performance-tuning-tips-for-amazon-athena\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/aws.amazon.com\/blogs\/big-data\/top-10-performance-tuning-tips-for-amazon-athena\/<\/span><\/a><span style=\"font-weight: 400;\">), <\/span><span style=\"font-weight: 400;\">The Amazon Athena user guide titled <\/span><b>Compression Formats<\/b><span style=\"font-weight: 400;\"> (<\/span><a href=\"https:\/\/docs.aws.amazon.com\/athena\/latest\/ug\/compression-formats.html\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/docs.aws.amazon.com\/athena\/latest\/ug\/compression-formats.html<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Modeling-11\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Modeling<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q22\u00a0 : Your work for a company that performs seismic research for client firms that drill for petroleum. As a machine learning specialist, you have built a series of models that classify seismic waves to determine the seismic profile of a proposed drilling site. You need to select the best model to use in production. Which metric should you use to compare and evaluate your machine learning classification models against each other?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Area Under the ROC Curve (AUC)<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Mean square error (MSE)<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Mean Absolute Error (MAE)<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Recall<\/span><\/p>\n<p><b>Correct Answer:<\/b> <b>A<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is correct.<\/b><span style=\"font-weight: 400;\"> The area under the Receiver Operating Characteristic (ROC) curve is the most commonly used metric to compare classification models.<\/span><br \/>\n<b>Option B is incorrect. <\/b><span style=\"font-weight: 400;\">The Mean Square Error (MSE) is commonly used to measure regression error. It finds the average squared error between the predicted and actual values. It is not used to compare classification models.<\/span><br \/>\n<b>Option C is incorrect. <\/b><span style=\"font-weight: 400;\">The Mean Square Error is also commonly used to measure regression error. It finds the average absolute distance between the predicted and target values. It is not used to compare classification models.<\/span><br \/>\n<b>Option D is incorrect. <\/b><span style=\"font-weight: 400;\">The recall metric is the percentage of results correctly classified by a model. This metric alone will not allow you to make a complete assessment and comparison of your models.<\/span><\/p>\n<p><span style=\"font-weight: 400;\"><strong>References:<\/strong> <\/span><span style=\"font-weight: 400;\">Please see the Towards Data Science article titled <\/span><b>Metrics For Evaluating Machine Learning Classification Models<\/b><span style=\"font-weight: 400;\"> (<\/span><a href=\"https:\/\/towardsdatascience.com\/metrics-for-evaluating-machine-learning-classification-models-python-example-59b905e079a5\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/towardsdatascience.com\/metrics-for-evaluating-machine-learning-classification-models-python-example-59b905e079a5<\/span><\/a><span style=\"font-weight: 400;\">), <\/span><span style=\"font-weight: 400;\">The Towards Data Science article titled<\/span><b> How to Evaluate a Classification Machine Learning Model<\/b><span style=\"font-weight: 400;\"> (<\/span><a href=\"https:\/\/towardsdatascience.com\/how-to-evaluate-a-classification-machine-learning-model-d81901d491b1\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/towardsdatascience.com\/how-to-evaluate-a-classification-machine-learning-model-d81901d491b1<\/span><\/a><span style=\"font-weight: 400;\">), <\/span><span style=\"font-weight: 400;\">The Machine Learning Mastery article titled<\/span><b> Assessing and Comparing Classifier Performance with ROC Curves<\/b><span style=\"font-weight: 400;\"> (<\/span><a href=\"https:\/\/machinelearningmastery.com\/assessing-comparing-classifier-performance-roc-curves-2\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/machinelearningmastery.com\/assessing-comparing-classifier-performance-roc-curves-2\/<\/span><\/a><span style=\"font-weight: 400;\">), <\/span><span style=\"font-weight: 400;\">The Towards Data Science article titled<\/span><b> 20 Popular Machine Learning Metrics. Part 1: Classification &amp; Regression Evaluation Metrics<\/b><span style=\"font-weight: 400;\"> (<\/span><a href=\"https:\/\/towardsdatascience.com\/20-popular-machine-learning-metrics-part-1-classification-regression-evaluation-metrics-1ca3e282a2ce\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/towardsdatascience.com\/20-popular-machine-learning-metrics-part-1-classification-regression-evaluation-metrics-1ca3e282a2ce<\/span><\/a><span style=\"font-weight: 400;\">), <\/span><span style=\"font-weight: 400;\">The Data School article titled <\/span><b>Simple guide to confusion matrix terminology <\/b><span style=\"font-weight: 400;\">(<\/span><a href=\"https:\/\/www.dataschool.io\/simple-guide-to-confusion-matrix-terminology\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/www.dataschool.io\/simple-guide-to-confusion-matrix-terminology\/<\/span><\/a><span style=\"font-weight: 400;\">), <\/span><span style=\"font-weight: 400;\">The Medium article titled <\/span><b>Precision vs. Recall <\/b><span style=\"font-weight: 400;\">(<\/span><a href=\"https:\/\/medium.com\/@shrutisaxena0617\/precision-vs-recall-386cf9f89488\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/medium.com\/@shrutisaxena0617\/precision-vs-recall-386cf9f89488<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Data_Engineering-6\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Data Engineering<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q23 : You work as a machine learning specialist for the airline traffic control agency of the federal government. Your machine learning team is responsible for producing the models that process all air traffic in-flight data to produce recommended flight paths for the aircraft currently aloft. The flight paths need to consider all of the prevailing conditions (weather, other flights in the path, etc.) that may affect an aircraft&#8217;s flight path.<\/span><\/em><br \/>\n<em><span style=\"font-weight: 400;\">The data that your models need to process is massive in scale and requires large-scale data processing. How should you build the data transformation and feature engineering processing jobs so that you can process all of the flight data in real-time?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> Run Glue ETL distributed data processing jobs to perform the transformation and feature engineering on the flight data in real-time and save the data to S3 for your model training.\u00a0<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> Use Kinesis Data Firehose to perform the transformation and feature engineering on the flight data in real-time and save the data to S3 for your model training.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> Run Apache Spark Streaming data processing jobs to perform the transformation and feature engineering on the flight data in real-time and save the data to S3 for your model training.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> Use a Kinesis Data Analytics SQL application to perform the transformation and feature engineering on the flight data in real-time and save the data to S3 for your model training.<\/span><\/p>\n<p><b>Correct Answer: C<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is incorrect.<\/b><span style=\"font-weight: 400;\"> Glue ETL is used for batch processing. So it will not work in a real-time scenario.<\/span><br \/>\n<b>Option B is incorrect.<\/b><span style=\"font-weight: 400;\"> Kinesis Data Firehose is a near real-time processing service (it buffers your data as it processes it using the buffer size and buffer interval configuration settings). It will not work in a real-time scenario.<\/span><br \/>\n<b>Option C is correct. <\/b><span style=\"font-weight: 400;\">Apache Spark Streaming is an analytics engine used for large-scale data processing that runs distributed data processing jobs. You can apply data transformations and extract features (feature engineering) using the Spark framework.<\/span><br \/>\n<b>Option D is incorrect. <\/b><span style=\"font-weight: 400;\">Kinesis Data Analytics running a SQL application can\u2019t write directly to S3. Also, Kinesis Data Analytics cannot scale to the large-scale data processing capabilities that Apache Spark jobs can.<\/span><\/p>\n<p><strong>References: <\/strong><span style=\"font-weight: 400;\">Please see the Amazon SageMaker developer guide titled<\/span><b> Data Processing with Apache Spark <\/b><span style=\"font-weight: 400;\">(<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/use-spark-processing-container.html\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/use-spark-processing-container.html<\/span><\/a><span style=\"font-weight: 400;\">), <\/span><span style=\"font-weight: 400;\">The Amazon SageMaker Examples GitHub repository titled <\/span><b>Distributed Data Processing using Apache Spark and SageMaker Processing<\/b><span style=\"font-weight: 400;\"> (<\/span><a href=\"https:\/\/github.com\/aws\/amazon-sagemaker-examples\/blob\/master\/sagemaker_processing\/spark_distributed_data_processing\/sagemaker-spark-processing.ipynb\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/github.com\/aws\/amazon-sagemaker-examples\/blob\/master\/sagemaker_processing\/spark_distributed_data_processing\/sagemaker-spark-processing.ipynb<\/span><\/a><span style=\"font-weight: 400;\">), <\/span><span style=\"font-weight: 400;\">The Amazon Kinesis Data Firehose developer guide titled <\/span><b>Configure Settings <\/b><span style=\"font-weight: 400;\">(<\/span><a href=\"https:\/\/docs.aws.amazon.com\/firehose\/latest\/dev\/create-configure.html\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/docs.aws.amazon.com\/firehose\/latest\/dev\/create-configure.html<\/span><\/a><span style=\"font-weight: 400;\">) <\/span><span style=\"font-weight: 400;\">The Amazon Kinesis Data Analytics <\/span><b>FAQs <\/b><span style=\"font-weight: 400;\">(<\/span><a href=\"https:\/\/aws.amazon.com\/kinesis\/data-analytics\/faqs\/\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/aws.amazon.com\/kinesis\/data-analytics\/faqs\/<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Modeling-12\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Modeling<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q24 : You work as a machine learning specialist for an alternative transportation ride-share company. Your company has scooters, electric longboards, and other electric personal transportation devices in several major cities across the US. Your machine learning team has been asked to produce a machine learning model that classifies device preference by trip duration for each of the available personal transportation devices you offer in each city. You have created a model based on the SageMaker built-in K-Means algorithm. You are now using hyperparameter tuning to get the best-performing model for your problem. Which evaluation metrics and corresponding optimization direction should you choose for your automatic model tuning (a.k.a. hyperparameter tuning)?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> msd, maximize<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> mse, minimize<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> ssd , minimize<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> f1, maximize<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>E.<\/strong> msd, minimize<\/span><\/p>\n<p><b>Correct Answers: C and E<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is incorrect. <\/b><span style=\"font-weight: 400;\">K-Means uses the msd (Mean Squared Distances) metric for model validation. However, you will want to minimize this metric.<\/span><br \/>\n<b>Option B is incorrect.<\/b><span style=\"font-weight: 400;\"> K-Means does not use the mse (Mean Squared Error) metric for model validation.<\/span><br \/>\n<b>Option C is correct.<\/b><span style=\"font-weight: 400;\"> K-Means uses the ssd (Sum of the Squared Distances) metric for model validation, and you will want to minimize this metric.<\/span><br \/>\n<b>Option D is incorrect. <\/b><span style=\"font-weight: 400;\">K-Means does not use the f1 (weighted average of precision and recall) metric for model validation.<\/span><br \/>\n<b>Option E is correct. <\/b><span style=\"font-weight: 400;\">K-Means uses the msd (Mean Squared Distances) metric for model validation, and you will want to minimize this metric.<\/span><\/p>\n<p><strong>References: <\/strong><span style=\"font-weight: 400;\">Please see the Amazon SageMaker developer guide titled<\/span><b> Define Metrics<\/b><span style=\"font-weight: 400;\"> (<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/automatic-model-tuning-define-metrics.html\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/automatic-model-tuning-define-metrics.html<\/span><\/a><span style=\"font-weight: 400;\">), <\/span><span style=\"font-weight: 400;\">The Amazon SageMaker developer guide titled<\/span><b> Tune a K-Means Model<\/b><span style=\"font-weight: 400;\"> (<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/k-means-tuning.html\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/k-means-tuning.html<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><\/p>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Domain_Data_Engineering-7\"><\/span><span style=\"font-weight: 400;\">Domain : <\/span><span style=\"font-weight: 400;\">Data Engineering<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4><em><span style=\"font-weight: 400;\">Q25 : You work as a machine learning specialist for the infectious disease testing department of a national government agency. Your machine learning team is responsible for creating a machine learning model that analyzes the daily test datasets for your country and produces daily predictions of trends of disease contraction and death rates. These projections are used throughout national and international news agencies to report on the daily projections of infectious disease progression. Since your model works on huge datasets on a daily basis, which of the following statements gives an accurate description of your inference processing?<\/span><\/em><\/h4>\n<p><span style=\"font-weight: 400;\"><strong>A.<\/strong> You have set up a persistent endpoint to get predictions from your model using SageMaker batch transform.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>B.<\/strong> You have set up a persistent endpoint to get predictions from your model using SageMaker hosting services.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>C.<\/strong> You don&#8217;t need a persistent endpoint. You use SageMaker batch transform to get inferences from your large datasets.<\/span><br \/>\n<span style=\"font-weight: 400;\"><strong>D.<\/strong> You don&#8217;t need a persistent endpoint. You use SageMaker hosting services to get inferences from your large datasets.<\/span><\/p>\n<p><b>Correct Answer: C<\/b><\/p>\n<p><b>Explanation<\/b><\/p>\n<p><b>Option A is incorrect. <\/b><span style=\"font-weight: 400;\">SageMaker batch transform does not use a persistent endpoint. You use SageMaker batch transform to get inferences from large datasets. Also, your process runs one per day, so a persistent endpoint does not make sense.<\/span><br \/>\n<b>Option B is incorrect.<\/b><span style=\"font-weight: 400;\"> You are processing large datasets on a daily basis. Therefore you should use SageMaker batch transform, not SageMaker hosting services. SageMaker hosting services are used for real-time inference requests, not daily batch requests.<\/span><br \/>\n<b>Option C is correct.<\/b><span style=\"font-weight: 400;\"> Since you are using your endpoint to get inferences one per day from a large dataset, you don&#8217;t need a persistent endpoint. Also, SageMaker batch transform is the best deployment option when getting inferences from an entire dataset.<\/span><br \/>\n<b>Option D is incorrect.<\/b><span style=\"font-weight: 400;\"> SageMaker hosting services need a persistent endpoint. Also, since you are processing large datasets on a daily basis, you should use SageMaker batch transform, not SageMaker hosting services.<\/span><\/p>\n<p><strong>References: <\/strong><span style=\"font-weight: 400;\">Please see the AWS SageMaker developer guide titled<\/span><b> Use Batch Transform <\/b><span style=\"font-weight: 400;\">(<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/batch-transform.html\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/batch-transform.html<\/span><\/a><span style=\"font-weight: 400;\">), <\/span><span style=\"font-weight: 400;\">The AWS SageMaker developer guide titled<\/span><b> Deploy Models for Inference<\/b><span style=\"font-weight: 400;\"> (<\/span><a href=\"https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/deploy-model.html\" target=\"_blank\" rel=\"noopener\"><span style=\"font-weight: 400;\">https:\/\/docs.aws.amazon.com\/sagemaker\/latest\/dg\/deploy-model.html<\/span><\/a><span style=\"font-weight: 400;\">)<\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Summary\"><\/span>Summary<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Hope you enjoyed practicing for the exam with us. To learn and Practice more, go through our detailed exam-ready Mock Tests on our official website. Spending more time on preparation may help you to pass the actual exam within the first attempt. All the best! Keep Learning. Also, you can check <a href=\"https:\/\/www.whizlabs.com\/labs\/\" target=\"_blank\" rel=\"noopener\">AWS hands-on labs<\/a> &amp; <a href=\"https:\/\/www.whizlabs.com\/labs\/sandbox\" target=\"_blank\" rel=\"noopener\">AWS Sandbox<\/a>.\u00a0<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Did you come here looking for the FREE Questions and Answers on the AWS Certified Machine Learning Specialty certification? Find them here. What to expect in AWS Machine Learning Certification Exam? The AWS Machine Learning Certification attests to your expertise in the building, tuning, training, and deployment of Machine Learning(ML) models on AWS. It aids the organizations in the identification and development of talent that possess critical skills in the implementation of Cloud Initiatives. Let&#8217;s Start Exploring! Domain : Exploratory Data Analysis Q1 : You work for a financial services firm that wishes to enhance its fraud detection capabilities further. [&hellip;]<\/p>\n","protected":false},"author":223,"featured_media":81542,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_uag_custom_page_level_css":"","site-sidebar-layout":"default","site-content-layout":"default","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","theme-transparent-header-meta":"default","adv-header-id-meta":"","stick-header-meta":"default","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"set","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[4,10],"tags":[289],"class_list":["post-81515","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-aws-certifications","category-cloud-computing-certifications","tag-aws-machine-learning-certification-exam"],"uagb_featured_image_src":{"full":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-AWS-Certified-Machine-Learning-Specialty-Certification-Exam-Certification-Exam.png",600,315,false],"thumbnail":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-AWS-Certified-Machine-Learning-Specialty-Certification-Exam-Certification-Exam-150x150.png",150,150,true],"medium":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-AWS-Certified-Machine-Learning-Specialty-Certification-Exam-Certification-Exam-300x158.png",300,158,true],"medium_large":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-AWS-Certified-Machine-Learning-Specialty-Certification-Exam-Certification-Exam.png",600,315,false],"large":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-AWS-Certified-Machine-Learning-Specialty-Certification-Exam-Certification-Exam.png",600,315,false],"1536x1536":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-AWS-Certified-Machine-Learning-Specialty-Certification-Exam-Certification-Exam.png",600,315,false],"2048x2048":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-AWS-Certified-Machine-Learning-Specialty-Certification-Exam-Certification-Exam.png",600,315,false],"profile_24":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-AWS-Certified-Machine-Learning-Specialty-Certification-Exam-Certification-Exam.png",24,13,false],"profile_48":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-AWS-Certified-Machine-Learning-Specialty-Certification-Exam-Certification-Exam.png",48,25,false],"profile_96":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-AWS-Certified-Machine-Learning-Specialty-Certification-Exam-Certification-Exam.png",96,50,false],"profile_150":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-AWS-Certified-Machine-Learning-Specialty-Certification-Exam-Certification-Exam.png",150,79,false],"profile_300":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-AWS-Certified-Machine-Learning-Specialty-Certification-Exam-Certification-Exam.png",300,158,false],"tptn_thumbnail":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-AWS-Certified-Machine-Learning-Specialty-Certification-Exam-Certification-Exam-250x250.png",250,250,true],"web-stories-poster-portrait":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-AWS-Certified-Machine-Learning-Specialty-Certification-Exam-Certification-Exam.png",600,315,false],"web-stories-publisher-logo":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-AWS-Certified-Machine-Learning-Specialty-Certification-Exam-Certification-Exam.png",96,50,false],"web-stories-thumbnail":["https:\/\/www.whizlabs.com\/blog\/wp-content\/uploads\/2022\/03\/Free-Questions-on-AWS-Certified-Machine-Learning-Specialty-Certification-Exam-Certification-Exam.png",150,79,false]},"uagb_author_info":{"display_name":"Dharmendra Digari","author_link":"https:\/\/www.whizlabs.com\/blog\/author\/dharmendrawhizlabs-com\/"},"uagb_comment_info":80,"uagb_excerpt":"Did you come here looking for the FREE Questions and Answers on the AWS Certified Machine Learning Specialty certification? Find them here. What to expect in AWS Machine Learning Certification Exam? The AWS Machine Learning Certification attests to your expertise in the building, tuning, training, and deployment of Machine Learning(ML) models on AWS. It aids&hellip;","_links":{"self":[{"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/posts\/81515","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/users\/223"}],"replies":[{"embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/comments?post=81515"}],"version-history":[{"count":21,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/posts\/81515\/revisions"}],"predecessor-version":[{"id":94664,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/posts\/81515\/revisions\/94664"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/media\/81542"}],"wp:attachment":[{"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/media?parent=81515"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/categories?post=81515"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.whizlabs.com\/blog\/wp-json\/wp\/v2\/tags?post=81515"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}