AWS Certified Machine Learning - Specialty Dumps December 2024
Are you tired of looking for a source that'll keep you updated on the AWS Certified Machine Learning - Specialty Exam? Plus, has a collection of affordable, high-quality, and incredibly easy Amazon MLS-C01 Practice Questions? Well then, you are in luck because Salesforcexamdumps.com just updated them! Get Ready to become a AWS Certified Specialty Certified.
Amazon MLS-C01 is a necessary certification exam to get certified. The certification is a reward to the deserving candidate with perfect results. The AWS Certified Specialty Certification validates a candidate's expertise to work with Amazon. In this fast-paced world, a certification is the quickest way to gain your employer's approval. Try your luck in passing the AWS Certified Machine Learning - Specialty Exam and becoming a certified professional today. Salesforcexamdumps.com is always eager to extend a helping hand by providing approved and accepted Amazon MLS-C01 Practice Questions. Passing AWS Certified Machine Learning - Specialty will be your ticket to a better future!
Pass with Amazon MLS-C01 Braindumps!
Contrary to the belief that certification exams are generally hard to get through, passing AWS Certified Machine Learning - Specialty is incredibly easy. Provided you have access to a reliable resource such as Salesforcexamdumps.com Amazon MLS-C01 PDF. We have been in this business long enough to understand where most of the resources went wrong. Passing Amazon AWS Certified Specialty certification is all about having the right information. Hence, we filled our Amazon MLS-C01 Dumps with all the necessary data you need to pass. These carefully curated sets of AWS Certified Machine Learning - Specialty Practice Questions target the most repeated exam questions. So, you know they are essential and can ensure passing results. Stop wasting your time waiting around and order your set of Amazon MLS-C01 Braindumps now!
We aim to provide all AWS Certified Specialty certification exam candidates with the best resources at minimum rates. You can check out our free demo before pressing down the download to ensure Amazon MLS-C01 Practice Questions are what you wanted. And do not forget about the discount. We always provide our customers with a little extra.
Why Choose Amazon MLS-C01 PDF?
Unlike other websites, Salesforcexamdumps.com prioritize the benefits of the AWS Certified Machine Learning - Specialty candidates. Not every Amazon exam candidate has full-time access to the internet. Plus, it's hard to sit in front of computer screens for too many hours. Are you also one of them? We understand that's why we are here with the AWS Certified Specialty solutions. Amazon MLS-C01 Question Answers offers two different formats PDF and Online Test Engine. One is for customers who like online platforms for real-like Exam stimulation. The other is for ones who prefer keeping their material close at hand. Moreover, you can download or print Amazon MLS-C01 Dumps with ease.
If you still have some queries, our team of experts is 24/7 in service to answer your questions. Just leave us a quick message in the chat-box below or email at support@salesforcexamdumps.com.
Amazon MLS-C01 Sample Questions
Question # 1
A data scientist stores financial datasets in Amazon S3. The data scientist uses Amazon Athena to query the datasets by using SQL. The data scientist uses Amazon SageMaker to deploy a machine learning (ML) model. The data scientist wants to obtain inferences from the model at the SageMaker endpoint However, when the data …. ntist attempts to invoke the SageMaker endpoint, the data scientist receives SOL statement failures The data scientist's 1AM user is currently unable to invoke the SageMaker endpoint Which combination of actions will give the data scientist's 1AM user the ability to invoke the SageMaker endpoint? (Select THREE.)
A. Attach the AmazonAthenaFullAccess AWS managed policy to the user identity. B. Include a policy statement for the data scientist's 1AM user that allows the 1AM user toperform the sagemaker: lnvokeEndpoint action, C. Include an inline policy for the data scientist’s 1AM user that allows SageMaker to readS3 objects D. Include a policy statement for the data scientist's 1AM user that allows the 1AM user toperform the sagemakerGetRecord action. E. Include the SQL statement "USING EXTERNAL FUNCTION ml_function_name" in theAthena SQL query. F. Perform a user remapping in SageMaker to map the 1AM user to another 1AM user thatis on the hosted endpoint.
Answer: B,C,E Explanation: The correct combination of actions to enable the data scientist’s IAM user to invoke the SageMaker endpoint is B, C, and E, because they ensure that the IAM user hasthe necessary permissions, access, and syntax to query the ML model from Athena. Theseactions have the following benefits:B: Including a policy statement for the IAM user that allows thesagemaker:InvokeEndpoint action grants the IAM user the permission to call theSageMaker Runtime InvokeEndpoint API, which is used to get inferences from themodel hosted at the endpoint1.C: Including an inline policy for the IAM user that allows SageMaker to read S3objects enables the IAM user to access the data stored in S3, which is the sourceof the Athena queries2.E: Including the SQL statement “USING EXTERNAL FUNCTIONml_function_name” in the Athena SQL query allows the IAM user to invoke the MLmodel as an external function from Athena, which is a feature that enablesquerying ML models from SQL statements3.The other options are not correct or necessary, because they have the followingdrawbacks:A: Attaching the AmazonAthenaFullAccess AWS managed policy to the useridentity is not sufficient, because it does not grant the IAM user the permission toinvoke the SageMaker endpoint, which is required to query the ML model4.D: Including a policy statement for the IAM user that allows the IAM user toperform the sagemaker:GetRecord action is not relevant, because this action isused to retrieve a single record from a feature group, which is not the case in thisscenario5.F: Performing a user remapping in SageMaker to map the IAM user to anotherIAM user that is on the hosted endpoint is not applicable, because this feature isonly available for multi-model endpoints, which are not used in this scenario.References:1: InvokeEndpoint - Amazon SageMaker2: Querying Data in Amazon S3 from Amazon Athena - Amazon Athena3: Querying machine learning models from Amazon Athena using AmazonSageMaker | AWS Machine Learning Blog 4: AmazonAthenaFullAccess - AWS Identity and Access Management5: GetRecord - Amazon SageMaker Feature Store Runtime: [Invoke a Multi-Model Endpoint - Amazon SageMaker]
Question # 2
A Machine Learning Specialist is designing a scalable data storage solution for Amazon SageMaker. There is an existing TensorFlow-based model implemented as a train.py script that relies on static training data that is currently stored as TFRecords. Which method of providing training data to Amazon SageMaker would meet the business requirements with the LEAST development overhead?
A. Use Amazon SageMaker script mode and use train.py unchanged. Point the AmazonSageMaker training invocation to the local path of the data without reformatting the trainingdata. B. Use Amazon SageMaker script mode and use train.py unchanged. Put the TFRecorddata into an Amazon S3 bucket. Point the Amazon SageMaker training invocation to the S3bucket without reformatting the training data. C. Rewrite the train.py script to add a section that converts TFRecords to protobuf andingests the protobuf data instead of TFRecords. D. Prepare the data in the format accepted by Amazon SageMaker. Use AWS Glue orAWS Lambda to reformat and store the data in an Amazon S3 bucket.
Answer: B Explanation: Amazon SageMaker script mode is a feature that allows users to use training scripts similar to those they would use outside SageMaker with SageMaker’s prebuiltcontainers for various frameworks such as TensorFlow. Script mode supports reading datafrom Amazon S3 buckets without requiring any changes to the training script. Therefore,option B is the best method of providing training data to Amazon SageMaker that wouldmeet the business requirements with the least development overhead.Option A is incorrect because using a local path of the data would not be scalable orreliable, as it would depend on the availability and capacity of the local storage. Moreover,using a local path of the data would not leverage the benefits of Amazon S3, such asdurability, security, and performance. Option C is incorrect because rewriting the train.pyscript to convert TFRecords to protobuf would require additional development effort andcomplexity, as well as introduce potential errors and inconsistencies in the data format.Option D is incorrect because preparing the data in the format accepted by AmazonSageMaker would also require additional development effort and complexity, as well asinvolve using additional services such as AWS Glue or AWS Lambda, which wouldincrease the cost and maintenance of the solution.References:Bring your own model with Amazon SageMaker script modeGitHub - aws-samples/amazon-sagemaker-script-modeDeep Dive on TensorFlow training with Amazon SageMaker and Amazon S3amazon-sagemaker-script-mode/generate_cifar10_tfrecords.py at master
Question # 3
A credit card company wants to identify fraudulent transactions in real time. A data scientist builds a machine learning model for this purpose. The transactional data is captured and stored in Amazon S3. The historic data is already labeled with two classes: fraud (positive) and fair transactions (negative). The data scientist removes all the missing data and builds a classifier by using the XGBoost algorithm in Amazon SageMaker. The model produces the following results: • True positive rate (TPR): 0.700 • False negative rate (FNR): 0.300 • True negative rate (TNR): 0.977 • False positive rate (FPR): 0.023 • Overall accuracy: 0.949 Which solution should the data scientist use to improve the performance of the model?
A. Apply the Synthetic Minority Oversampling Technique (SMOTE) on the minority class inthe training dataset. Retrain the model with the updated training data. B. Apply the Synthetic Minority Oversampling Technique (SMOTE) on the majority class in the training dataset. Retrain the model with the updated training data. C. Undersample the minority class. D. Oversample the majority class.
Answer: A Explanation: The solution that the data scientist should use to improve the performance of the model is to apply the Synthetic Minority Oversampling Technique (SMOTE) on theminority class in the training dataset, and retrain the model with the updated training data.This solution can address the problem of class imbalance in the dataset, which can affectthe model’s ability to learn from the rare but important positive class (fraud).Class imbalance is a common issue in machine learning, especially for classification tasks.It occurs when one class (usually the positive or target class) is significantlyunderrepresented in the dataset compared to the other class (usually the negative or nontargetclass). For example, in the credit card fraud detection problem, the positive class(fraud) is much less frequent than the negative class (fair transactions). This can cause themodel to be biased towards the majority class, and fail to capture the characteristics andpatterns of the minority class. As a result, the model may have a high overall accuracy, buta low recall or true positive rate for the minority class, which means it misses manyfraudulent transactions.SMOTE is a technique that can help mitigate the class imbalance problem by generatingsynthetic samples for the minority class. SMOTE works by finding the k-nearest neighborsof each minority class instance, and randomly creating new instances along the linesegments connecting them. This way, SMOTE can increase the number and diversity ofthe minority class instances, without duplicating or losing any information. By applyingSMOTE on the minority class in the training dataset, the data scientist can balance theclasses and improve the model’s performance on the positive class1.The other options are either ineffective or counterproductive. Applying SMOTE on themajority class would not balance the classes, but increase the imbalance and the size ofthe dataset. Undersampling the minority class would reduce the number of instancesavailable for the model to learn from, and potentially lose some important information.Oversampling the majority class would also increase the imbalance and the size of thedataset, and introduce redundancy and overfitting.References:1: SMOTE for Imbalanced Classification with Python - Machine Learning Mastery
Question # 4
A pharmaceutical company performs periodic audits of clinical trial sites to quickly resolve critical findings. The company stores audit documents in text format. Auditors have requested help from a data science team to quickly analyze the documents. The auditors need to discover the 10 main topics within the documents to prioritize and distribute the review work among the auditing team members. Documents that describe adverse events must receive the highest priority. A data scientist will use statistical modeling to discover abstract topics and to provide a list of the top words for each category to help the auditors assess the relevance of the topic. Which algorithms are best suited to this scenario? (Choose two.)
A. Latent Dirichlet allocation (LDA) B. Random Forest classifier C. Neural topic modeling (NTM) D. Linear support vector machine E. Linear regression
Answer: A,C Explanation: The algorithms that are best suited to this scenario are latent Dirichletallocation (LDA) and neural topic modeling (NTM), as they are both unsupervised learningmethods that can discover abstract topics from a collection of text documents. LDA andNTM can provide a list of the top words for each topic, as well as the topic distribution foreach document, which can help the auditors assess the relevance and priority of thetopic12.The other options are not suitable because:Option B: A random forest classifier is a supervised learning method that canperform classification or regression tasks by using an ensemble of decisiontrees. A random forest classifier is not suitable for discovering abstract topics fromtext documents, as it requires labeled data and predefined classes3.Option D: A linear support vector machine is a supervised learning method thatcan perform classification or regression tasks by using a linear function thatseparates the data into different classes. A linear support vector machine is notsuitable for discovering abstract topics from text documents, as it requires labeleddata and predefined classes4.Option E: A linear regression is a supervised learning method that can performregression tasks by using a linear function that models the relationship between adependent variable and one or more independent variables. A linear regression isnot suitable for discovering abstract topics from text documents, as it requireslabeled data and a continuous output variable5.References:1: Latent Dirichlet Allocation2: Neural Topic Modeling3: Random Forest Classifier4: Linear Support Vector Machine5: Linear Regression
Question # 5
A media company wants to create a solution that identifies celebrities in pictures that users upload. The company also wants to identify the IP address and the timestamp details from the users so the company can prevent users from uploading pictures from unauthorized locations. Which solution will meet these requirements with LEAST development effort?
A. Use AWS Panorama to identify celebrities in the pictures. Use AWS CloudTrail tocapture IP address and timestamp details. B. Use AWS Panorama to identify celebrities in the pictures. Make calls to the AWSPanorama Device SDK to capture IP address and timestamp details. C. Use Amazon Rekognition to identify celebrities in the pictures. Use AWS CloudTrail tocapture IP address and timestamp details. D. Use Amazon Rekognition to identify celebrities in the pictures. Use the text detectionfeature to capture IP address and timestamp details.
Answer: C Explanation: The solution C will meet the requirements with the least development effortbecause it uses Amazon Rekognition and AWS CloudTrail, which are fully managedservices that can provide the desired functionality. The solution C involves the followingsteps:Use Amazon Rekognition to identify celebrities in the pictures. AmazonRekognition is a service that can analyze images and videos and extract insightssuch as faces, objects, scenes, emotions, and more. Amazon Rekognition alsoprovides a feature called Celebrity Recognition, which can recognize thousands ofcelebrities across a number of categories, such as politics, sports, entertainment,and media. Amazon Rekognition can return the name, face, and confidence scoreof the recognized celebrities, as well as additional information such as URLs andbiographies1.Use AWS CloudTrail to capture IP address and timestamp details. AWS CloudTrailis a service that can record the API calls and events made by or on behalf of AWSaccounts. AWS CloudTrail can provide information such as the source IP address,the user identity, the request parameters, and the response elements of the APIcalls. AWS CloudTrail can also deliver the event records to an Amazon S3 bucketor an Amazon CloudWatch Logs group for further analysis and auditing2.The other options are not suitable because:Option A: Using AWS Panorama to identify celebrities in the pictures and usingAWS CloudTrail to capture IP address and timestamp details will not meet therequirements effectively. AWS Panorama is a service that can extend computervision to the edge, where it can run inference on video streams from cameras andother devices. AWS Panorama is not designed for identifying celebrities inpictures, and it may not provide accurate or relevant results. Moreover, AWSPanorama requires the use of an AWS Panorama Appliance or a compatibledevice, which may incur additional costs and complexity3.Option B: Using AWS Panorama to identify celebrities in the pictures and makingcalls to the AWS Panorama Device SDK to capture IP address and timestampdetails will not meet the requirements effectively, for the same reasons as optionA. Additionally, making calls to the AWS Panorama Device SDK will require moredevelopment effort than using AWS CloudTrail, as it will involve writing customcode and handling errors and exceptions4.Option D: Using Amazon Rekognition to identify celebrities in the pictures andusing the text detection feature to capture IP address and timestamp details willnot meet the requirements effectively. The text detection feature of AmazonRekognition is used to detect and recognize text in images and videos, such asstreet names, captions, product names, and license plates. It is not suitable forcapturing IP address and timestamp details, as these are not part of the picturesthat users upload. Moreover, the text detection feature may not be accurate orreliable, as it depends on the quality and clarity of the text in the images andvideos5.References:1: Amazon Rekognition Celebrity Recognition 2: AWS CloudTrail Overview3: AWS Panorama Overview4: AWS Panorama Device SDK5: Amazon Rekognition Text Detection
Question # 6
A retail company stores 100 GB of daily transactional data in Amazon S3 at periodic intervals. The company wants to identify the schema of the transactional data. The company also wants to perform transformations on the transactional data that is in Amazon S3. The company wants to use a machine learning (ML) approach to detect fraud in the transformed data. Which combination of solutions will meet these requirements with the LEAST operational overhead? {Select THREE.)
A. Use Amazon Athena to scan the data and identify the schema. B. Use AWS Glue crawlers to scan the data and identify the schema. C. Use Amazon Redshift to store procedures to perform data transformations D. Use AWS Glue workflows and AWS Glue jobs to perform data transformations. E. Use Amazon Redshift ML to train a model to detect fraud. F. Use Amazon Fraud Detector to train a model to detect fraud.
Answer: B,D,F Explanation: To meet the requirements with the least operational overhead, the company should use AWS Glue crawlers, AWS Glue workflows and jobs, and Amazon FraudDetector. AWS Glue crawlers can scan the data in Amazon S3 and identify the schema,which is then stored in the AWS Glue Data Catalog. AWS Glue workflows and jobs canperform data transformations on the data in Amazon S3 using serverless Spark or Pythonscripts. Amazon Fraud Detector can train a model to detect fraud using the transformeddata and the company’s historical fraud labels, and then generate fraud predictions using asimple API call.Option A is incorrect because Amazon Athena is a serverless query service that cananalyze data in Amazon S3 using standard SQL, but it does not perform datatransformations or fraud detection.Option C is incorrect because Amazon Redshift is a cloud data warehouse that can storeand query data using SQL, but it requires provisioning and managing clusters, which addsoperational overhead. Moreover, Amazon Redshift does not provide a built-in fraud detection capability.Option E is incorrect because Amazon Redshift ML is a feature that allows users to create,train, and deploy machine learning models using SQL commands in Amazon Redshift.However, using Amazon Redshift ML would require loading the data from Amazon S3 toAmazon Redshift, which adds complexity and cost. Also, Amazon Redshift ML does notsupport fraud detection as a use case.References:AWS Glue CrawlersAWS Glue Workflows and JobsAmazon Fraud Detector
Question # 7
An automotive company uses computer vision in its autonomous cars. The company trained its object detection models successfully by using transfer learning from a convolutional neural network (CNN). The company trained the models by using PyTorch through the Amazon SageMaker SDK. The vehicles have limited hardware and compute power. The company wants to optimize the model to reduce memory, battery, and hardware consumption without a significant sacrifice in accuracy. Which solution will improve the computational efficiency of the models?
A. Use Amazon CloudWatch metrics to gain visibility into the SageMaker training weights,gradients, biases, and activation outputs. Compute the filter ranks based on the traininginformation. Apply pruning to remove the low-ranking filters. Set new weights based on thepruned set of filters. Run a new training job with the pruned model. B. Use Amazon SageMaker Ground Truth to build and run data labeling workflows. Collecta larger labeled dataset with the labelling workflows. Run a new training job that uses thenew labeled data with previous training data. C. Use Amazon SageMaker Debugger to gain visibility into the training weights, gradients,biases, and activation outputs. Compute the filter ranks based on the training information.Apply pruning to remove the low-ranking filters. Set the new weights based on the prunedset of filters. Run a new training job with the pruned model. D. Use Amazon SageMaker Model Monitor to gain visibility into the ModelLatency metricand OverheadLatency metric of the model after the company deploys the model. Increasethe model learning rate. Run a new training job.
Answer: C Explanation: The solution C will improve the computational efficiency of the models because it uses Amazon SageMaker Debugger and pruning, which are techniques that canreduce the size and complexity of the convolutional neural network (CNN) models. Thesolution C involves the following steps:Use Amazon SageMaker Debugger to gain visibility into the training weights,gradients, biases, and activation outputs. Amazon SageMaker Debugger is aservice that can capture and analyze the tensors that are emitted during thetraining process of machine learning models. Amazon SageMaker Debugger canprovide insights into the model performance, quality, and convergence. AmazonSageMaker Debugger can also help to identify and diagnose issues such asoverfitting, underfitting, vanishing gradients, and exploding gradients1.Compute the filter ranks based on the training information. Filter ranking is atechnique that can measure the importance of each filter in a convolutional layerbased on some criterion, such as the average percentage of zero activations orthe L1-norm of the filter weights. Filter ranking can help to identify the filters thathave little or no contribution to the model output, and thus can be removed withoutaffecting the model accuracy2.Apply pruning to remove the low-ranking filters. Pruning is a technique that canreduce the size and complexity of a neural network by removing the redundant orirrelevant parts of the network, such as neurons, connections, or filters. Pruningcan help to improve the computational efficiency, memory usage, and inference speed of the model, as well as to prevent overfitting and improve generalization3.Set the new weights based on the pruned set of filters. After pruning, the modelwill have a smaller and simpler architecture, with fewer filters in each convolutionallayer. The new weights of the model can be set based on the pruned set of filters,either by initializing them randomly or by fine-tuning them from the originalweights4.Run a new training job with the pruned model. The pruned model can be trainedagain with the same or a different dataset, using the same or a different frameworkor algorithm. The new training job can use the same or a different configuration ofAmazon SageMaker, such as the instance type, the hyperparameters, or the dataingestion mode. The new training job can also use Amazon SageMaker Debuggerto monitor and analyze the training process and the model quality5.The other options are not suitable because:Option A: Using Amazon CloudWatch metrics to gain visibility into the SageMakertraining weights, gradients, biases, and activation outputs will not be as effectiveas using Amazon SageMaker Debugger. Amazon CloudWatch is a service thatcan monitor and observe the operational health and performance of AWSresources and applications. Amazon CloudWatch can provide metrics, alarms,dashboards, and logs for various AWS services, including Amazon SageMaker.However, Amazon CloudWatch does not provide the same level of granularity anddetail as Amazon SageMaker Debugger for the tensors that are emitted during thetraining process of machine learning models. Amazon CloudWatch metrics aremainly focused on the resource utilization and the training progress, not on themodel performance, quality, and convergence6.Option B: Using Amazon SageMaker Ground Truth to build and run data labelingworkflows and collecting a larger labeled dataset with the labeling workflows willnot improve the computational efficiency of the models. Amazon SageMakerGround Truth is a service that can create high-quality training datasets for machinelearning by using human labelers. A larger labeled dataset can help to improve themodel accuracy and generalization, but it will not reduce the memory, battery, andhardware consumption of the model. Moreover, a larger labeled dataset mayincrease the training time and cost of the model7.Option D: Using Amazon SageMaker Model Monitor to gain visibility into theModelLatency metric and OverheadLatency metric of the model after the companydeploys the model and increasing the model learning rate will not improve thecomputational efficiency of the models. Amazon SageMaker Model Monitor is aservice that can monitor and analyze the quality and performance of machinelearning models that are deployed on Amazon SageMaker endpoints. TheModelLatency metric and the OverheadLatency metric can measure the inferencelatency of the model and the endpoint, respectively. However, these metrics do notprovide any information about the training weights, gradients, biases, andactivation outputs of the model, which are needed for pruning. Moreover,increasing the model learning rate will not reduce the size and complexity of themodel, but it may affect the model convergence and accuracy.References:1: Amazon SageMaker Debugger2: Pruning Convolutional Neural Networks for Resource Efficient Inference3: Pruning Neural Networks: A Survey4: Learning both Weights and Connections for Efficient Neural Networks 5: Amazon SageMaker Training Jobs6: Amazon CloudWatch Metrics for Amazon SageMaker7: Amazon SageMaker Ground Truth: Amazon SageMaker Model Monitor
Question # 8
A media company is building a computer vision model to analyze images that are on social media. The model consists of CNNs that the company trained by using images that the company stores in Amazon S3. The company used an Amazon SageMaker training job in File mode with a single Amazon EC2 On-Demand Instance. Every day, the company updates the model by using about 10,000 images that the company has collected in the last 24 hours. The company configures training with only one epoch. The company wants to speed up training and lower costs without the need to make any code changes. Which solution will meet these requirements?
A. Instead of File mode, configure the SageMaker training job to use Pipe mode. Ingest thedata from a pipe. B. Instead Of File mode, configure the SageMaker training job to use FastFile mode withno Other changes. C. Instead Of On-Demand Instances, configure the SageMaker training job to use SpotInstances. Make no Other changes. D. Instead Of On-Demand Instances, configure the SageMaker training job to use SpotInstances. Implement model checkpoints.
Answer: C Explanation: The solution C will meet the requirements because it uses Amazon SageMaker Spot Instances, which are unused EC2 instances that are available at up to90% discount compared to On-Demand prices. Amazon SageMaker Spot Instances canspeed up training and lower costs by taking advantage of the spare EC2 capacity. Thecompany does not need to make any code changes to use Spot Instances, as it can simplyenable the managed spot training option in the SageMaker training job configuration. Thecompany also does not need to implement model checkpoints, as it is using only oneepoch for training, which means the model will not resume from a previous state1.The other options are not suitable because:Option A: Configuring the SageMaker training job to use Pipe mode instead of Filemode will not speed up training or lower costs significantly. Pipe mode is a dataingestion mode that streams data directly from S3 to the training algorithm, withoutcopying the data to the local storage of the training instance. Pipe mode canreduce the startup time of the training job and the disk space usage, but it does notaffect the computation time or the instance price. Moreover, Pipe mode mayrequire some code changes to handle the streaming data, depending on thetraining algorithm2. Option B: Configuring the SageMaker training job to use FastFile mode instead ofFile mode will not speed up training or lower costs significantly. FastFile mode is adata ingestion mode that copies data from S3 to the local storage of the traininginstance in parallel with the training process. FastFile mode can reduce the startuptime of the training job and the disk space usage, but it does not affect thecomputation time or the instance price. Moreover, FastFile mode is only availablefor distributed training jobs that use multiple instances, which is not the case forthe company3.Option D: Configuring the SageMaker training job to use Spot Instances andimplementing model checkpoints will not meet the requirements without the needto make any code changes. Model checkpoints are a feature that allows thetraining job to save the model state periodically to S3, and resume from the latestcheckpoint if the training job is interrupted. Model checkpoints can help to avoidlosing the training progress and ensure the model convergence, but they requiresome code changes to implement the checkpointing logic and the resuming logic4.References:1: Managed Spot Training - Amazon SageMaker2: Pipe Mode - Amazon SageMaker3: FastFile Mode - Amazon SageMaker4: Checkpoints - Amazon SageMaker
Question # 9
A data scientist is building a forecasting model for a retail company by using the most recent 5 years of sales records that are stored in a data warehouse. The dataset contains sales records for each of the company's stores across five commercial regions The data scientist creates a working dataset with StorelD. Region. Date, and Sales Amount as columns. The data scientist wants to analyze yearly average sales for each region. The scientist also wants to compare how each region performed compared to average sales across all commercial regions. Which visualization will help the data scientist better understand the data trend?
A. Create an aggregated dataset by using the Pandas GroupBy function to get averagesales for each year for each store. Create a bar plot, faceted by year, of average sales foreach store. Add an extra bar in each facet to represent average sales. B. Create an aggregated dataset by using the Pandas GroupBy function to get averagesales for each year for each store. Create a bar plot, colored by region and faceted by year,of average sales for each store. Add a horizontal line in each facet to represent averagesales. C. Create an aggregated dataset by using the Pandas GroupBy function to get averagesales for each year for each region Create a bar plot of average sales for each region. Addan extra bar in each facet to represent average sales. D. Create an aggregated dataset by using the Pandas GroupBy function to get average sales for each year for each region Create a bar plot, faceted by year, of average sales foreach region Add a horizontal line in each facet to represent average sales.
Answer: D Explanation: The best visualization for this task is to create a bar plot, faceted by year, ofaverage sales for each region and add a horizontal line in each facet to represent averagesales. This way, the data scientist can easily compare the yearly average sales for eachregion with the overall average sales and see the trends over time. The bar plot also allowsthe data scientist to see the relative performance of each region within each year andacross years. The other options are less effective because they either do not show theyearly trends, do not show the overall average sales, or do not group the data by region.References:pandas.DataFrame.groupby — pandas 2.1.4 documentationpandas.DataFrame.plot.bar — pandas 2.1.4 documentationMatplotlib - Bar Plot - Online Tutorials Library
Question # 10
A data scientist is training a large PyTorch model by using Amazon SageMaker. It takes 10 hours on average to train the model on GPU instances. The data scientist suspects that training is not converging and that resource utilization is not optimal. What should the data scientist do to identify and address training issues with the LEAST development effort?
A. Use CPU utilization metrics that are captured in Amazon CloudWatch. Configure aCloudWatch alarm to stop the training job early if low CPU utilization occurs. B. Use high-resolution custom metrics that are captured in Amazon CloudWatch. Configurean AWS Lambda function to analyze the metrics and to stop the training job early if issuesare detected. C. Use the SageMaker Debugger vanishing_gradient and LowGPUUtilization built-in rulesto detect issues and to launch the StopTrainingJob action if issues are detected. D. Use the SageMaker Debugger confusion and feature_importance_overweight built-inrules to detect issues and to launch the StopTrainingJob action if issues are detected.
Answer: C Explanation: The solution C is the best option to identify and address training issues with the least development effort. The solution C involves the following steps:Use the SageMaker Debugger vanishing_gradient and LowGPUUtilization built-inrules to detect issues. SageMaker Debugger is a feature of Amazon SageMakerthat allows data scientists to monitor, analyze, and debug machine learningmodels during training. SageMaker Debugger provides a set of built-in rules thatcan automatically detect common issues and anomalies in model training, such asvanishing or exploding gradients, overfitting, underfitting, low GPU utilization, andmore1. The data scientist can use the vanishing_gradient rule to check if thegradients are becoming too small and causing the training to not converge. Thedata scientist can also use the LowGPUUtilization rule to check if the GPUresources are underutilized and causing the training to be inefficient2.Launch the StopTrainingJob action if issues are detected. SageMaker Debuggercan also take actions based on the status of the rules. One of the actions isStopTrainingJob, which can terminate the training job if a rule is in an errorstate. This can help the data scientist to save time and money by stopping thetraining early if issues are detected3.The other options are not suitable because:Option A: Using CPU utilization metrics that are captured in Amazon CloudWatchand configuring a CloudWatch alarm to stop the training job early if low CPUutilization occurs will not identify and address training issues effectively. CPUutilization is not a good indicator of model training performance, especially for GPUinstances. Moreover, CloudWatch alarms can only trigger actions based on simplethresholds, not complex rules or conditions4.Option B: Using high-resolution custom metrics that are captured in AmazonCloudWatch and configuring an AWS Lambda function to analyze the metrics andto stop the training job early if issues are detected will incur more development effort than using SageMaker Debugger. The data scientist will have to write thecode for capturing, sending, and analyzing the custom metrics, as well as forinvoking the Lambda function and stopping the training job. Moreover, this solutionmay not be able to detect all the issues that SageMaker Debugger can5.Option D: Using the SageMaker Debugger confusion andfeature_importance_overweight built-in rules and launching the StopTrainingJobaction if issues are detected will not identify and address training issues effectively.The confusion rule is used to monitor the confusion matrix of a classificationmodel, which is not relevant for a regression model that predicts prices. Thefeature_importance_overweight rule is used to check if some features have toomuch weight in the model, which may not be related to the convergence orresource utilization issues2.References:1: Amazon SageMaker Debugger2: Built-in Rules for Amazon SageMaker Debugger3: Actions for Amazon SageMaker Debugger4: Amazon CloudWatch Alarms5: Amazon CloudWatch Custom Metrics
Question # 11
A company builds computer-vision models that use deep learning for the autonomous vehicle industry. A machine learning (ML) specialist uses an Amazon EC2 instance that has a CPU: GPU ratio of 12:1 to train the models. The ML specialist examines the instance metric logs and notices that the GPU is idle half of the time The ML specialist must reduce training costs without increasing the duration of the training jobs. Which solution will meet these requirements?
A. Switch to an instance type that has only CPUs. B. Use a heterogeneous cluster that has two different instances groups. C. Use memory-optimized EC2 Spot Instances for the training jobs. D. Switch to an instance type that has a CPU GPU ratio of 6:1.
Answer: D Explanation: Switching to an instance type that has a CPU: GPU ratio of 6:1 will reducethe training costs by using fewer CPUs and GPUs, while maintaining the same level ofperformance. The GPU idle time indicates that the CPU is not able to feed the GPU withenough data, so reducing the CPU: GPU ratio will balance the workload and improve theGPU utilization. A lower CPU: GPU ratio also means less overhead for inter-processcommunication and synchronization between the CPU and GPU processes. References:Optimizing GPU utilization for AI/ML workloads on Amazon EC2Analyze CPU vs. GPU Performance for AWS Machine Learning
Question # 12
An engraving company wants to automate its quality control process for plaques. The company performs the process before mailing each customized plaque to a customer. The company has created an Amazon S3 bucket that contains images of defects that should cause a plaque to be rejected. Low-confidence predictions must be sent to an internal team of reviewers who are using Amazon Augmented Al (Amazon A2I). Which solution will meet these requirements?
A. Use Amazon Textract for automatic processing. Use Amazon A2I with AmazonMechanical Turk for manual review. B. Use Amazon Rekognition for automatic processing. Use Amazon A2I with a privateworkforce option for manual review. C. Use Amazon Transcribe for automatic processing. Use Amazon A2I with a privateworkforce option for manual review. D. Use AWS Panorama for automatic processing Use Amazon A2I with AmazonMechanical Turk for manual review
Answer: B Explanation: Amazon Rekognition is a service that provides computer vision capabilities for image and video analysis, such as object, scene, and activity detection, face and textrecognition, and custom label detection. Amazon Rekognition can be used to automate thequality control process for plaques by comparing the images of the plaques with the imagesof defects in the Amazon S3 bucket and returning a confidence score for each defect.Amazon A2I is a service that enables human review of machine learning predictions, suchas low-confidence predictions from Amazon Rekognition. Amazon A2I can be integratedwith a private workforce option, which allows the engraving company to use its own internalteam of reviewers to manually inspect the plaques that are flagged by AmazonRekognition. This solution meets the requirements of automating the quality controlprocess, sending low-confidence predictions to an internal team of reviewers, and using Amazon A2I for manual review. References:1: Amazon Rekognition documentation2: Amazon A2I documentation3: Amazon Rekognition Custom Labels documentation4: Amazon A2I Private Workforce documentation
Question # 13
An Amazon SageMaker notebook instance is launched into Amazon VPC The SageMaker notebook references data contained in an Amazon S3 bucket in another account The bucket is encrypted using SSE-KMS The instance returns an access denied error when trying to access data in Amazon S3. Which of the following are required to access the bucket and avoid the access denied error? (Select THREE)
A. An AWS KMS key policy that allows access to the customer master key (CMK) B. A SageMaker notebook security group that allows access to Amazon S3 C. An 1AM role that allows access to the specific S3 bucket D. A permissive S3 bucket policy E. An S3 bucket owner that matches the notebook owner F. A SegaMaker notebook subnet ACL that allow traffic to Amazon S3.
Answer: A,B,C Explanation: To access an Amazon S3 bucket in another account that is encrypted using SSE-KMS, the following are required:A. An AWS KMS key policy that allows access to the customer master key (CMK).The CMK is the encryption key that is used to encrypt and decrypt the data in theS3 bucket. The KMS key policy defines who can use and manage the CMK. Toallow access to the CMK from another account, the key policy must include astatement that grants the necessary permissions (such as kms:Decrypt) to theprincipal from the other account (such as the SageMaker notebook IAM role).B. A SageMaker notebook security group that allows access to Amazon S3. Asecurity group is a virtual firewall that controls the inbound and outbound traffic forthe SageMaker notebook instance. To allow the notebook instance to access theS3 bucket, the security group must have a rule that allows outbound traffic to theS3 endpoint on port 443 (HTTPS).C. An IAM role that allows access to the specific S3 bucket. An IAM role is anidentity that can be assumed by the SageMaker notebook instance to access AWSresources. The IAM role must have a policy that grants the necessary permissions(such as s3:GetObject) to access the specific S3 bucket. The policy must alsoinclude a condition that allows access to the CMK in the other account.The following are not required or correct:D. A permissive S3 bucket policy. A bucket policy is a resource-based policy thatdefines who can access the S3 bucket and what actions they can perform. Apermissive bucket policy is not required and not recommended, as it can exposethe bucket to unauthorized access. A bucket policy should follow the principle ofleast privilege and grant the minimum permissions necessary to the specificprincipals that need access.E. An S3 bucket owner that matches the notebook owner. The S3 bucket ownerand the notebook owner do not need to match, as long as the bucket owner grantscross-account access to the notebook owner through the KMS key policy and thebucket policy (if applicable).F. A SegaMaker notebook subnet ACL that allow traffic to Amazon S3. A subnetACL is a network access control list that acts as an optional layer of security forthe SageMaker notebook instance’s subnet. A subnet ACL is not required toaccess the S3 bucket, as the security group is sufficient to control the traffic.However, if a subnet ACL is used, it must not block the traffic to the S3 endpoint.
Question # 14
A machine learning (ML) engineer has created a feature repository in Amazon SageMaker Feature Store for the company. The company has AWS accounts for development, integration, and production. The company hosts a feature store in the development account. The company uses Amazon S3 buckets to store feature values offline. The company wants to share features and to allow the integration account and the production account to reuse the features that are in the feature repository. Which combination of steps will meet these requirements? (Select TWO.)
A. Create an IAM role in the development account that the integration account andproduction account can assume. Attach IAM policies to the role that allow access to thefeature repository and the S3 buckets. B. Share the feature repository that is associated the S3 buckets from the developmentaccount to the integration account and the production account by using AWS ResourceAccess Manager (AWS RAM). C. Use AWS Security Token Service (AWS STS) from the integration account and theproduction account to retrieve credentials for the development account. D. Set up S3 replication between the development S3 buckets and the integration andproduction S3 buckets. E. Create an AWS PrivateLink endpoint in the development account for SageMaker.
Answer: A,B Explanation: The combination of steps that will meet the requirements are to create an IAM role in thedevelopment account that the integration account and production account can assume,attach IAM policies to the role that allow access to the feature repository and the S3buckets, and share the feature repository that is associated with the S3 buckets from thedevelopment account to the integration account and the production account by using AWSResource Access Manager (AWS RAM). This approach will enable cross-account accessand sharing of the features stored in Amazon SageMaker Feature Store and Amazon S3.Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store,update, search, and share curated data used in training and prediction workflows. Theservice provides feature management capabilities such as enabling easy feature reuse, lowlatency serving, time travel, and ensuring consistency between features used in trainingand inference workflows. A feature group is a logical grouping of ML features whoseorganization and structure is defined by a feature group schema. A feature group schemaconsists of a list of feature definitions, each of which specifies the name, type, andmetadata of a feature. Amazon SageMaker Feature Store stores the features in both anonline store and an offline store. The online store is a low-latency, high-throughput storethat is optimized for real-time inference. The offline store is a historical store that is backedby an Amazon S3 bucket and is optimized for batch processing and model training1.AWS Identity and Access Management (IAM) is a web service that helps you securelycontrol access to AWS resources for your users. You use IAM to control who can use yourAWS resources (authentication) and what resources they can use and in what ways(authorization). An IAM role is an IAM identity that you can create in your account that hasspecific permissions. You can use an IAM role to delegate access to users, applications, orservices that don’t normally have access to your AWS resources. For example, you can create an IAM role in your development account that allows the integration account and theproduction account to assume the role and access the resources in the developmentaccount. You can attach IAM policies to the role that specify the permissions for the featurerepository and the S3 buckets. You can also use IAM conditions to restrict the accessbased on the source account, IP address, or other factors2.AWS Resource Access Manager (AWS RAM) is a service that enables you to easily andsecurely share AWS resources with any AWS account or within your AWS Organization.You can share AWS resources that you own with other accounts using resource shares. Aresource share is an entity that defines the resources that you want to share, and theprincipals that you want to share with. For example, you can share the feature repositorythat is associated with the S3 buckets from the development account to the integrationaccount and the production account by creating a resource share in AWS RAM. You canspecify the feature group ARN and the S3 bucket ARN as the resources, and theintegration account ID and the production account ID as the principals. You can also useIAM policies to further control the access to the shared resources3.The other options are either incorrect or unnecessary. Using AWS Security Token Service(AWS STS) from the integration account and the production account to retrieve credentialsfor the development account is not required, as the IAM role in the development accountcan provide temporary security credentials for the cross-account access. Setting up S3replication between the development S3 buckets and the integration and production S3buckets would introduce redundancy and inconsistency, as the S3 buckets are alreadyshared through AWS RAM. Creating an AWS PrivateLink endpoint in the developmentaccount for SageMaker is not relevant, as it is used to securely connect to SageMakerservices from a VPC, not from another account.References:1: Amazon SageMaker Feature Store – Amazon Web Services2: What Is IAM? - AWS Identity and Access Management3: What Is AWS Resource Access Manager? - AWS Resource Access Manager
Question # 15
A network security vendor needs to ingest telemetry data from thousands of endpoints that run all over the world. The data is transmitted every 30 seconds in the form of records that contain 50 fields. Each record is up to 1 KB in size. The security vendor uses Amazon Kinesis Data Streams to ingest the data. The vendor requires hourly summaries of the records that Kinesis Data Streams ingests. The vendor will use Amazon Athena to query the records and to generate the summaries. The Athena queries will target 7 to 12 of the available data fields. Which solution will meet these requirements with the LEAST amount of customization to transform and store the ingested data?
A. Use AWS Lambda to read and aggregate the data hourly. Transform the data and storeit in Amazon S3 by using Amazon Kinesis Data Firehose. B. Use Amazon Kinesis Data Firehose to read and aggregate the data hourly. Transformthe data and store it in Amazon S3 by using a short-lived Amazon EMR cluster. C. Use Amazon Kinesis Data Analytics to read and aggregate the data hourly. Transformthe data and store it in Amazon S3 by using Amazon Kinesis Data Firehose. D. Use Amazon Kinesis Data Firehose to read and aggregate the data hourly. Transform the data and store it in Amazon S3 by using AWS Lambda.
Answer: C Explanation: The solution that will meet the requirements with the least amount ofcustomization to transform and store the ingested data is to use Amazon Kinesis DataAnalytics to read and aggregate the data hourly, transform the data and store it in AmazonS3 by using Amazon Kinesis Data Firehose. This solution leverages the built-in features ofKinesis Data Analytics to perform SQL queries on streaming data and generate hourlysummaries. Kinesis Data Analytics can also output the transformed data to Kinesis DataFirehose, which can then deliver the data to S3 in a specified format and partitioningscheme. This solution does not require any custom code or additional infrastructure toprocess the data. The other solutions either require more customization (such as usingLambda or EMR) or do not meet the requirement of aggregating the data hourly (such asusing Lambda to read the data from Kinesis Data Streams). References:1: Boosting Resiliency with an ML-based Telemetry Analytics Architecture | AWSArchitecture Blog2: AWS Cloud Data Ingestion Patterns and Practices3: IoT ingestion and Machine Learning analytics pipeline with AWS IoT …4: AWS IoT Data Ingestion Simplified 101: The Complete Guide - Hevo Data
Question # 16
A data scientist is building a linear regression model. The scientist inspects the dataset and notices that the mode of the distribution is lower than the median, and the median is lower than the mean. Which data transformation will give the data scientist the ability to apply a linear regression model?
A. Exponential transformation B. Logarithmic transformation C. Polynomial transformation D. Sinusoidal transformation
Answer: B Explanation: A logarithmic transformation is a suitable data transformation for a linearregression model when the data has a skewed distribution, such as when the mode islower than the median and the median is lower than the mean. A logarithmic transformationcan reduce the skewness and make the data more symmetric and normally distributed,which are desirable properties for linear regression. A logarithmic transformation can alsoreduce the effect of outliers and heteroscedasticity (unequal variance) in the data. Anexponential transformation would have the opposite effect of increasing the skewness andmaking the data more asymmetric. A polynomial transformation may not be able to capturethe nonlinearity in the data and may introduce multicollinearity among the transformedvariables. A sinusoidal transformation is not appropriate for data that does not have aperiodic pattern.References:Data Transformation - Scaler TopicsLinear Regression - GeeksforGeeksLinear Regression - Scribbr
Question # 17
A car company is developing a machine learning solution to detect whether a car is present in an image. The image dataset consists of one million images. Each image in the dataset is 200 pixels in height by 200 pixels in width. Each image is labeled as either having a car or not having a car. Which architecture is MOST likely to produce a model that detects whether a car is present in an image with the highest accuracy?
A. Use a deep convolutional neural network (CNN) classifier with the images as input.Include a linear output layer that outputs the probability that an image contains a car. B. Use a deep convolutional neural network (CNN) classifier with the images as input.Include a softmax output layer that outputs the probability that an image contains a car. C. Use a deep multilayer perceptron (MLP) classifier with the images as input. Include alinear output layer that outputs the probability that an image contains a car. D. Use a deep multilayer perceptron (MLP) classifier with the images as input. Include asoftmax output layer that outputs the probability that an image contains a car.
Answer: A Explanation: A deep convolutional neural network (CNN) classifier is a suitable architecture for image classification tasks, as it can learn features from the images andreduce the dimensionality of the input. A linear output layer that outputs the probability thatan image contains a car is appropriate for a binary classification problem, as it can producea single scalar value between 0 and 1. A softmax output layer is more suitable for a multiclassclassification problem, as it can produce a vector of probabilities that sum up to 1. Adeep multilayer perceptron (MLP) classifier is not as effective as a CNN for imageclassification, as it does not exploit the spatial structure of the images and requires a largenumber of parameters to process the high-dimensional input. References:AWS Certified Machine Learning - Specialty Exam GuideAWS Training - Machine Learning on AWSAWS Whitepaper - An Overview of Machine Learning on AWS
Question # 18
A university wants to develop a targeted recruitment strategy to increase new student enrollment. A data scientist gathers information about the academic performance history of students. The data scientist wants to use the data to build student profiles. The university will use the profiles to direct resources to recruit students who are likely to enroll in the university. Which combination of steps should the data scientist take to predict whether a particular student applicant is likely to enroll in the university? (Select TWO)
A. Use Amazon SageMaker Ground Truth to sort the data into two groups named"enrolled" or "not enrolled." B. Use a forecasting algorithm to run predictions. C. Use a regression algorithm to run predictions. D. Use a classification algorithm to run predictions E. Use the built-in Amazon SageMaker k-means algorithm to cluster the data into twogroups named "enrolled" or "not enrolled."
Answer: A,D Explanation: The data scientist should use Amazon SageMaker Ground Truth to sort the data into two groups named “enrolled” or “not enrolled.” This will create a labeled datasetthat can be used for supervised learning. The data scientist should then use a classificationalgorithm to run predictions on the test data. A classification algorithm is a suitable choicefor predicting a binary outcome, such as enrollment status, based on the input features,such as academic performance. A classification algorithm will output a probability for eachclass label and assign the most likely label to each observation.References:Use Amazon SageMaker Ground Truth to Label DataClassification Algorithm in Machine Learning
Question # 19
An insurance company developed a new experimental machine learning (ML) model to replace an existing model that is in production. The company must validate the quality of predictions from the new experimental model in a production environment before the company uses the new experimental model to serve general user requests. Which one model can serve user requests at a time. The company must measure the performance of the new experimental model without affecting the current live traffic Which solution will meet these requirements?
A. A/B testing B. Canary release C. Shadow deployment D. Blue/green deployment
Answer: C Explanation: The best solution for this scenario is to use shadow deployment, which is a technique that allows the company to run the new experimental model in parallel with theexisting model, without exposing it to the end users. In shadow deployment, the companycan route the same user requests to both models, but only return the responses from theexisting model to the users. The responses from the new experimental model are loggedand analyzed for quality and performance metrics, such as accuracy, latency, and resourceconsumption12. This way, the company can validate the new experimental model in aproduction environment, without affecting the current live traffic or user experience.The other solutions are not suitable, because they have the following drawbacks:A: A/B testing is a technique that involves splitting the user traffic between two ormore models, and comparing their outcomes based on predefinedmetrics. However, this technique exposes the new experimental model to a portionof the end users, which might affect their experience if the model is not reliable orconsistent with the existing model3.B: Canary release is a technique that involves gradually rolling out the newexperimental model to a small subset of users, and monitoring its performance andfeedback. However, this technique also exposes the new experimental model tosome end users, and requires careful selection and segmentation of the usergroups4.D: Blue/green deployment is a technique that involves switching the user trafficfrom the existing model (blue) to the new experimental model (green) at once,after testing and verifying the new model in a separate environment. However, thistechnique does not allow the company to validate the new experimental model in aproduction environment, and might cause service disruption or inconsistency if thenew model is not compatible or stable5.References:1: Shadow Deployment: A Safe Way to Test in Production | LaunchDarkly Blog2: Shadow Deployment: A Safe Way to Test in Production | LaunchDarkly Blog3: A/B Testing for Machine Learning Models | AWS Machine Learning Blog4: Canary Releases for Machine Learning Models | AWS Machine Learning Blog5: Blue-Green Deployments for Machine Learning Models | AWS MachineLearning Blog
Question # 20
A company wants to detect credit card fraud. The company has observed that an average of 2% of credit card transactions are fraudulent. A data scientist trains a classifier on a year's worth of credit card transaction data. The classifier needs to identify the fraudulent transactions. The company wants to accurately capture as many fraudulent transactions as possible. Which metrics should the data scientist use to optimize the classifier? (Select TWO.)
A. Specificity B. False positive rate C. Accuracy D. Fl score E. True positive rate
Answer: D,E Explanation: The F1 score is a measure of the harmonic mean of precision and recall, which are both important for fraud detection. Precision is the ratio of true positives to allpredicted positives, and recall is the ratio of true positives to all actual positives. A high F1score indicates that the classifier can correctly identify fraudulent transactions and avoidfalse negatives. The true positive rate is another name for recall, and it measures theproportion of fraudulent transactions that are correctly detected by the classifier. A high truepositive rate means that the classifier can capture as many fraudulent transactions aspossible.References:Fraud Detection Using Machine Learning | Implementations | AWS SolutionsDetect fraudulent transactions using machine learning with Amazon SageMaker |AWS Machine Learning Blog1. Introduction — Reproducible Machine Learning for Credit Card Fraud Detection
Question # 21
A company deployed a machine learning (ML) model on the company website to predict real estate prices. Several months after deployment, an ML engineer notices that the accuracy of the model has gradually decreased. The ML engineer needs to improve the accuracy of the model. The engineer also needs to receive notifications for any future performance issues. Which solution will meet these requirements?
A. Perform incremental training to update the model. Activate Amazon SageMaker Model Monitor to detect model performance issues and to send notifications. B. Use Amazon SageMaker Model Governance. Configure Model Governance toautomatically adjust model hyper para meters. Create a performance threshold alarm inAmazon CloudWatch to send notifications. C. Use Amazon SageMaker Debugger with appropriate thresholds. Configure Debugger tosend Amazon CloudWatch alarms to alert the team Retrain the model by using only datafrom the previous several months. D. Use only data from the previous several months to perform incremental training toupdate the model. Use Amazon SageMaker Model Monitor to detect model performanceissues and to send notifications.
Answer: A Explanation: The best solution to improve the accuracy of the model and receive notifications for any future performance issues is to perform incremental training to updatethe model and activate Amazon SageMaker Model Monitor to detect model performanceissues and to send notifications. Incremental training is a technique that allows you toupdate an existing model with new data without retraining the entire model from scratch.This can save time and resources, and help the model adapt to changing data patterns.Amazon SageMaker Model Monitor is a feature that continuously monitors the quality ofmachine learning models in production and notifies you when there are deviations in themodel quality, such as data drift and anomalies. You can set up alerts that trigger actions,such as sending notifications to Amazon Simple Notification Service (Amazon SNS) topics,when certain conditions are met.Option B is incorrect because Amazon SageMaker Model Governance is a set of tools thathelp you implement ML responsibly by simplifying access control and enhancingtransparency. It does not provide a mechanism to automatically adjust modelhyperparameters or improve model accuracy.Option C is incorrect because Amazon SageMaker Debugger is a feature that helps youdebug and optimize your model training process by capturing relevant data and providingreal-time analysis. However, using Debugger alone does not update the model or monitorits performance in production. Also, retraining the model by using only data from theprevious several months may not capture the full range of data variability and mayintroduce bias or overfitting.Option D is incorrect because using only data from the previous several months to performincremental training may not be sufficient to improve the model accuracy, as explainedabove. Moreover, this option does not specify how to activate Amazon SageMaker ModelMonitor or configure the alerts and notifications.References:Incremental trainingAmazon SageMaker Model MonitorAmazon SageMaker Model GovernanceAmazon SageMaker Debugger
Question # 22
A retail company wants to build a recommendation system for the company's website. The system needs to provide recommendations for existing users and needs to base those recommendations on each user's past browsing history. The system also must filter out any items that the user previously purchased. Which solution will meet these requirements with the LEAST development effort?
A. Train a model by using a user-based collaborative filtering algorithm on AmazonSageMaker. Host the model on a SageMaker real-time endpoint. Configure an Amazon APIGateway API and an AWS Lambda function to handle real-time inference requests that theweb application sends. Exclude the items that the user previously purchased from theresults before sending the results back to the web application. B. Use an Amazon Personalize PERSONALIZED_RANKING recipe to train a model.Create a real-time filter to exclude items that the user previously purchased. Create anddeploy a campaign on Amazon Personalize. Use the GetPersonalizedRanking APIoperation to get the real-time recommendations. C. Use an Amazon Personalize USER_ PERSONAL IZATION recipe to train a modelCreate a real-time filter to exclude items that the user previously purchased. Create anddeploy a campaign on Amazon Personalize. Use the GetRecommendations API operationto get the real-time recommendations. D. Train a neural collaborative filtering model on Amazon SageMaker by using GPU instances. Host the model on a SageMaker real-time endpoint. Configure an Amazon APIGateway API and an AWS Lambda function to handle real-time inference requests that theweb application sends. Exclude the items that the user previously purchased from theresults before sending the results back to the web application.
Answer: C Explanation: Amazon Personalize is a fully managed machine learning service that makesit easy for developers to create personalized user experiences at scale. It uses the samerecommender system technology that Amazon uses to create its own personalizedrecommendations. Amazon Personalize provides several pre-built recipes that can be usedto train models for different use cases. The USER_PERSONALIZATION recipe is designedto provide personalized recommendations for existing users based on their pastinteractions with items. The PERSONALIZED_RANKING recipe is designed to re-rank alist of items for a user based on their preferences. The USER_PERSONALIZATION recipeis more suitable for this use case because it can generate recommendations for each userwithout requiring a list of candidate items. To filter out the items that the user previouslypurchased, a real-time filter can be created and applied to the campaign. A real-time filter isa dynamic filter that uses the latest interaction data to exclude items from therecommendations. By using Amazon Personalize, the development effort is minimizedbecause it handles the data processing, model training, and deployment automatically. Theweb application can use the GetRecommendations API operation to get the real-timerecommendations from the campaign. References:Amazon PersonalizeWhat is Amazon Personalize?USER_PERSONALIZATION recipePERSONALIZED_RANKING recipeFiltering recommendationsGetRecommendations API operation
Question # 23
A machine learning (ML) specialist is using Amazon SageMaker hyperparameter optimization (HPO) to improve a model’s accuracy. The learning rate parameter is specified in the following HPO configuration:
During the results analysis, the ML specialist determines that most of the training jobs had a learning rate between 0.01 and 0.1. The best result had a learning rate of less than 0.01. Training jobs need to run regularly over a changing dataset. The ML specialist needs to find a tuning mechanism that uses different learning rates more evenly from the provided range between MinValue and MaxValue. Which solution provides the MOST accurate result?
A.Modify the HPO configuration as follows:
Select the most accurate hyperparameter configuration form this HPO job. B.Run three different HPO jobs that use different learning rates form the following intervalsfor MinValue and MaxValue while using the same number of training jobs for each HPOjob:[0.01, 0.1][0.001, 0.01][0.0001, 0.001]Select the most accurate hyperparameter configuration form these three HPO jobs. C.Modify the HPO configuration as follows:
Select the most accurate hyperparameter configuration form this training job. D.Run three different HPO jobs that use different learning rates form the following intervalsfor MinValue and MaxValue. Divide the number of training jobs for each HPO job by three:[0.01, 0.1][0.001, 0.01][0.0001, 0.001]Select the most accurate hyperparameter configuration form these three HPO jobs.
Answer: C Explanation: The solution C modifies the HPO configuration to use a logarithmic scale forthe learning rate parameter. This means that the values of the learning rate are sampledfrom a log-uniform distribution, which gives more weight to smaller values. This can help toexplore the lower end of the range more evenly and find the optimal learning rate moreefficiently. The other solutions either use a linear scale, which may not sample enoughvalues from the lower end, or divide the range into sub-intervals, which may miss some combinations of hyperparameters. References:How Hyperparameter Tuning Works - Amazon SageMakerTuning Hyperparameters - Amazon SageMaker
Question # 24
A data engineer is preparing a dataset that a retail company will use to predict the number of visitors to stores. The data engineer created an Amazon S3 bucket. The engineer subscribed the S3 bucket to an AWS Data Exchange data product for general economic indicators. The data engineer wants to join the economic indicator data to an existing table in Amazon Athena to merge with the business data. All these transformations must finish running in 30-60 minutes. Which solution will meet these requirements MOST cost-effectively?
A. Configure the AWS Data Exchange product as a producer for an Amazon Kinesis datastream. Use an Amazon Kinesis Data Firehose delivery stream to transfer the data toAmazon S3 Run an AWS Glue job that will merge the existing business data with theAthena table. Write the result set back to Amazon S3. B. Use an S3 event on the AWS Data Exchange S3 bucket to invoke an AWS Lambdafunction. Program the Lambda function to use Amazon SageMaker Data Wrangler tomerge the existing business data with the Athena table. Write the result set back toAmazon S3. C. Use an S3 event on the AWS Data Exchange S3 bucket to invoke an AWS LambdaFunction Program the Lambda function to run an AWS Glue job that will merge the existingbusiness data with the Athena table Write the results back to Amazon S3. D. Provision an Amazon Redshift cluster. Subscribe to the AWS Data Exchange productand use the product to create an Amazon Redshift Table Merge the data in AmazonRedshift. Write the results back to Amazon S3.
Answer: B Explanation: The most cost-effective solution is to use an S3 event to trigger a Lambda function that uses SageMaker Data Wrangler to merge the data. This solution avoids theneed to provision and manage any additional resources, such as Kinesis streams, Firehosedelivery streams, Glue jobs, or Redshift clusters. SageMaker Data Wrangler provides avisual interface to import, prepare, transform, and analyze data from various sources,including AWS Data Exchange products. It can also export the data preparation workflow toa Python script that can be executed by a Lambda function. This solution can meet the timerequirement of 30-60 minutes, depending on the size and complexity of the data.References:Using Amazon S3 Event NotificationsPrepare ML Data with Amazon SageMaker Data WranglerAWS Lambda Function
Question # 25
An online delivery company wants to choose the fastest courier for each delivery at the moment an order is placed. The company wants to implement this feature for existing users and new users of its application. Data scientists have trained separate models with XGBoost for this purpose, and the models are stored in Amazon S3. There is one model fof each city where the company operates. The engineers are hosting these models in Amazon EC2 for responding to the web client requests, with one instance for each model, but the instances have only a 5% utilization in CPU and memory, ....operation engineers want to avoid managing unnecessary resources. Which solution will enable the company to achieve its goal with the LEAST operational overhead?
A. Create an Amazon SageMaker notebook instance for pulling all the models fromAmazon S3 using the boto3 library. Remove the existing instances and use the notebook toperform a SageMaker batch transform for performing inferences offline for all the possibleusers in all the cities. Store the results in different files in Amazon S3. Point the web clientto the files. B. Prepare an Amazon SageMaker Docker container based on the open-source multimodelserver. Remove the existing instances and create a multi-model endpoint inSageMaker instead, pointing to the S3 bucket containing all the models Invoke theendpoint from the web client at runtime, specifying the TargetModel parameter according tothe city of each request. C. Keep only a single EC2 instance for hosting all the models. Install a model server in theinstance and load each model by pulling it from Amazon S3. Integrate the instance with theweb client using Amazon API Gateway for responding to the requests in real time,specifying the target resource according to the city of each request. D. Prepare a Docker container based on the prebuilt images in Amazon SageMaker.Replace the existing instances with separate SageMaker endpoints. one for each citywhere the company operates. Invoke the endpoints from the web client, specifying the URL and EndpomtName parameter according to the city of each request.
Answer: B Explanation: The best solution for this scenario is to use a multi-model endpoint inAmazon SageMaker, which allows hosting multiple models on the same endpoint andinvoking them dynamically at runtime. This way, the company can reduce the operationaloverhead of managing multiple EC2 instances and model servers, and leverage thescalability, security, and performance of SageMaker hosting services. By using a multimodelendpoint, the company can also save on hosting costs by improving endpointutilization and paying only for the models that are loaded in memory and the API calls thatare made. To use a multi-model endpoint, the company needs to prepare a Dockercontainer based on the open-source multi-model server, which is a framework-agnosticlibrary that supports loading and serving multiple models from Amazon S3. The companycan then create a multi-model endpoint in SageMaker, pointing to the S3 bucket containingall the models, and invoke the endpoint from the web client at runtime, specifying theTargetModel parameter according to the city of each request. This solution also enablesthe company to add or remove models from the S3 bucket without redeploying theendpoint, and to use different versions of the same model for different cities ifneeded. References:Use Docker containers to build modelsHost multiple models in one container behind one endpointMulti-model endpoints using Scikit LearnMulti-model endpoints using XGBoost
Question # 26
A company is using Amazon Polly to translate plaintext documents to speech for automated company announcements However company acronyms are being mispronounced in the current documents How should a Machine Learning Specialist address this issue for future documents?
A. Convert current documents to SSML with pronunciation tags B. Create an appropriate pronunciation lexicon. C. Output speech marks to guide in pronunciation D. Use Amazon Lex to preprocess the text files for pronunciation
Answer: B Explanation: A pronunciation lexicon is a file that defines how words or phrases should bepronounced by Amazon Polly. A lexicon can help customize the speech output for wordsthat are uncommon, foreign, or have multiple pronunciations. A lexicon must conform to thePronunciation Lexicon Specification (PLS) standard and can be stored in an AWS regionusing the Amazon Polly API. To use a lexicon for synthesizing speech, the lexicon namemust be specified in the <speak> SSML tag. For example, the following lexicon defineshow to pronounce the acronym W3C:http://www.w3.org/2005/01/pronunciation-lexicon” alphabet=“ipa” xml:lang=“en-US”> <lexeme> <grapheme>W3C</grapheme> <alias>WorldWide Web Consortium</alias> </lexeme> </lexicon>To use this lexicon, the text input must include the following SSML tag:<speak version=“1.1” xmlns=“http://www.w3.org/2001/10/synthesis” xml:lang=“en-US”><voice name=“Joanna”> <lexicon name=“w3c_lexicon”/> The <say-as interpretas=“characters”>W3C</say-as> is an international community that develops openstandards to ensure the long-term growth of the Web. </voice> </speak>References:Customize pronunciation using lexicons in Amazon Polly: A blog post that explainshow to use lexicons for creating custom pronunciations.Managing Lexicons: A documentation page that describes how to store andretrieve lexicons using the Amazon Polly API.
Question # 27
A company wants to predict the classification of documents that are created from an application. New documents are saved to an Amazon S3 bucket every 3 seconds. The company has developed three versions of a machine learning (ML) model within Amazon SageMaker to classify document text. The company wants to deploy these three versions to predict the classification of each document. Which approach will meet these requirements with the LEAST operational overhead?
A. Configure an S3 event notification that invokes an AWS Lambda function when newdocuments are created. Configure the Lambda function to create three SageMaker batchtransform jobs, one batch transform job for each model for each document. B. Deploy all the models to a single SageMaker endpoint. Treat each model as aproduction variant. Configure an S3 event notification that invokes an AWS Lambdafunction when new documents are created. Configure the Lambda function to call eachproduction variant and return the results of each model. C. Deploy each model to its own SageMaker endpoint Configure an S3 event notificationthat invokes an AWS Lambda function when new documents are created. Configure theLambda function to call each endpoint and return the results of each model. D. Deploy each model to its own SageMaker endpoint. Create three AWS Lambdafunctions. Configure each Lambda function to call a different endpoint and return theresults. Configure three S3 event notifications to invoke the Lambda functions when newdocuments are created.
Answer: B Explanation: The approach that will meet the requirements with the least operational overhead is to deploy all the models to a single SageMaker endpoint, treat each model asa production variant, configure an S3 event notification that invokes an AWS Lambdafunction when new documents are created, and configure the Lambda function to call eachproduction variant and return the results of each model. This approach involves thefollowing steps:Deploy all the models to a single SageMaker endpoint. Amazon SageMaker is aservice that can build, train, and deploy machine learning models. AmazonSageMaker can deploy multiple models to a single endpoint, which is a webservice that can serve predictions from the models. Each model can be treated asa production variant, which is a version of the model that runs on one or moreinstances. Amazon SageMaker can distribute the traffic among the productionvariants according to the specified weights1.Treat each model as a production variant. Amazon SageMaker can deploy multiplemodels to a single endpoint, which is a web service that can serve predictions fromthe models. Each model can be treated as a production variant, which is a versionof the model that runs on one or more instances. Amazon SageMaker candistribute the traffic among the production variants according to the specifiedweights1.Configure an S3 event notification that invokes an AWS Lambda function whennew documents are created. Amazon S3 is a service that can store and retrieveany amount of data. Amazon S3 can send event notifications when certain actionsoccur on the objects in a bucket, such as object creation, deletion, or modification.Amazon S3 can invoke an AWS Lambda function as a destination for the eventnotifications. AWS Lambda is a service that can run code without provisioning or managing servers2.Configure the Lambda function to call each production variant and return theresults of each model. AWS Lambda can execute the code that can call theSageMaker endpoint and specify the production variant to invoke. AWS Lambdacan use the AWS SDK or the SageMaker Runtime API to send requests to theendpoint and receive the predictions from the models. AWS Lambda can returnthe results of each model as a response to the event notification3.The other options are not suitable because:Option A: Configuring an S3 event notification that invokes an AWS Lambdafunction when new documents are created, configuring the Lambda function tocreate three SageMaker batch transform jobs, one batch transform job for eachmodel for each document, will incur more operational overhead than using a singleSageMaker endpoint. Amazon SageMaker batch transform is a service that canprocess large datasets in batches and store the predictions in Amazon S3.Amazon SageMaker batch transform is not suitable for real-time inference, as itintroduces a delay between the request and the response. Moreover, creatingthree batch transform jobs for each document will increase the complexity and costof the solution4.Option C: Deploying each model to its own SageMaker endpoint, configuring anS3 event notification that invokes an AWS Lambda function when new documentsare created, configuring the Lambda function to call each endpoint and return theresults of each model, will incur more operational overhead than using a singleSageMaker endpoint. Deploying each model to its own endpoint will increase thenumber of resources and endpoints to manage and monitor. Moreover, callingeach endpoint separately will increase the latency and network traffic of thesolution5.Option D: Deploying each model to its own SageMaker endpoint, creating threeAWS Lambda functions, configuring each Lambda function to call a differentendpoint and return the results, configuring three S3 event notifications to invokethe Lambda functions when new documents are created, will incur moreoperational overhead than using a single SageMaker endpoint and a singleLambda function. Deploying each model to its own endpoint will increase thenumber of resources and endpoints to manage and monitor. Creating threeLambda functions will increase the complexity and cost of the solution. Configuringthree S3 event notifications will increase the number of triggers and destinations tomanage and monitor6.References:1: Deploying Multiple Models to a Single Endpoint - Amazon SageMaker2: Configuring Amazon S3 Event Notifications - Amazon Simple Storage Service3: Invoke an Endpoint - Amazon SageMaker4: Get Inferences for an Entire Dataset with Batch Transform - AmazonSageMaker5: Deploy a Model - Amazon SageMaker6: AWS Lambda
Question # 28
A company wants to create an artificial intelligence (Al) yoga instructor that can lead large classes of students. The company needs to create a feature that can accurately count the number of students who are in a class. The company also needs a feature that can differentiate students who are performing a yoga stretch correctly from students who are performing a stretch incorrectly. ...etermine whether students are performing a stretch correctly, the solution needs to measure the location and angle of each student's arms and legs A data scientist must use Amazon SageMaker to ...ss video footage of a yoga class by extracting image frames and applying computer vision models. Which combination of models will meet these requirements with the LEAST effort? (Select TWO.)
A. Image Classification B. Optical Character Recognition (OCR) C. Object Detection D. Pose estimation E. Image Generative Adversarial Networks (GANs)
Answer: C,D Explanation: To count the number of students who are in a class, the solution needs todetect and locate each student in the video frame. Object detection is a computer visionmodel that can identify and locate multiple objects in an image. To differentiate studentswho are performing a stretch correctly from students who are performing a stretchincorrectly, the solution needs to measure the location and angle of each student’s armsand legs. Pose estimation is a computer vision model that can estimate the pose of a person by detecting the position and orientation of key body parts. Image classification,OCR, and image GANs are not relevant for this use case. References:Object Detection: A computer vision technique that identifies and locates objectswithin an image or video.Pose Estimation: A computer vision technique that estimates the pose of a personby detecting the position and orientation of key body parts.Amazon SageMaker: A fully managed service that provides every developer anddata scientist with the ability to build, train, and deploy machine learning (ML)models quickly.
Question # 29
A data scientist is working on a public sector project for an urban traffic system. While studying the traffic patterns, it is clear to the data scientist that the traffic behavior at each light is correlated, subject to a small stochastic error term. The data scientist must model the traffic behavior to analyze the traffic patterns and reduce congestion. How will the data scientist MOST effectively model the problem?
A. The data scientist should obtain a correlated equilibrium policy by formulating thisproblem as a multi-agent reinforcement learning problem. B. The data scientist should obtain the optimal equilibrium policy by formulating thisproblem as a single-agent reinforcement learning problem. C. Rather than finding an equilibrium policy, the data scientist should obtain accuratepredictors of traffic flow by using historical data through a supervised learning approach. D. Rather than finding an equilibrium policy, the data scientist should obtain accuratepredictors of traffic flow by using unlabeled simulated data representing the new trafficpatterns in the city and applying an unsupervised learning approach.
Answer: A Explanation: The data scientist should obtain a correlated equilibrium policy by formulating this problem as a multi-agent reinforcement learning problem. This is because:Multi-agent reinforcement learning (MARL) is a subfield of reinforcement learningthat deals with learning and coordination of multiple agents that interact with eachother and the environment 1. MARL can be applied to problems that involvedistributed decision making, such as traffic signal control, where each traffic lightcan be modeled as an agent that observes the traffic state and chooses an action(e.g., changing the signal phase) to optimize a reward function (e.g., minimizingthe delay or congestion) 2.A correlated equilibrium is a solution concept in game theory that generalizes thenotion of Nash equilibrium. It is a probability distribution over the joint actions ofthe agents that satisfies the following condition: no agent can improve its expectedpayoff by deviating from the distribution, given that it knows the distribution and theactions of the other agents 3. A correlated equilibrium can capture the correlationamong the agents’ actions, which is useful for modeling the traffic behavior at eachlight that is subject to a small stochastic error term.A correlated equilibrium policy is a policy that induces a correlated equilibrium in aMARL setting. It can be obtained by using various methods, such as policygradient, actor-critic, or Q-learning algorithms, that can learn from the feedback of the environment and the communication among the agents 4. A correlatedequilibrium policy can achieve a better performance than a Nash equilibriumpolicy, which assumes that the agents act independently and ignore the correlationamong their actions 5.Therefore, by obtaining a correlated equilibrium policy by formulating this problem as aMARL problem, the data scientist can most effectively model the traffic behavior andreduce congestion.References:Multi-Agent Reinforcement LearningMulti-Agent Reinforcement Learning for Traffic Signal Control: A SurveyCorrelated EquilibriumMulti-Agent Actor-Critic for Mixed Cooperative-Competitive EnvironmentsCorrelated Q-Learning
Question # 30
An ecommerce company wants to use machine learning (ML) to monitor fraudulent transactions on its website. The company is using Amazon SageMaker to research, train, deploy, and monitor the ML models. The historical transactions data is in a .csv file that is stored in Amazon S3 The data contains features such as the user's IP address, navigation time, average time on each page, and the number of clicks for ....session. There is no label in the data to indicate if a transaction is anomalous. Which models should the company use in combination to detect anomalous transactions? (Select TWO.)
A. IP Insights B. K-nearest neighbors (k-NN) C. Linear learner with a logistic function D. Random Cut Forest (RCF) E. XGBoost
Answer: D,E Explanation: To detect anomalous transactions, the company can use a combination ofRandom Cut Forest (RCF) and XGBoost models. RCF is an unsupervised algorithm thatcan detect outliers in the data by measuring the depth of each data point in a collection ofrandom decision trees. XGBoost is a supervised algorithm that can learn from the labeleddata points generated by RCF and classify them as normal or anomalous. RCF can alsoprovide anomaly scores that can be used as features for XGBoost to improve the accuracyof the classification. References: 1: Amazon SageMaker Random Cut Forest2: Amazon SageMaker XGBoost Algorithm3: Anomaly Detection with Amazon SageMaker Random Cut Forest and AmazonSageMaker XGBoost
Question # 31
A company wants to predict stock market price trends. The company stores stock market data each business day in Amazon S3 in Apache Parquet format. The company stores 20 GB of data each day for each stock code. A data engineer must use Apache Spark to perform batch preprocessing data transformations quickly so the company can complete prediction jobs before the stock market opens the next day. The company plans to track more stock market codes and needs a way to scale the preprocessing data transformations. Which AWS service or feature will meet these requirements with the LEAST development effort over time?
A. AWS Glue jobs B. Amazon EMR cluster C. Amazon Athena D. AWS Lambda
Answer: AExplanation: AWS Glue jobs is the AWS service or feature that will meet the requirements with the least development effort over time. AWS Glue jobs is a fully managed service thatenables data engineers to run Apache Spark applications on a serverless Sparkenvironment. AWS Glue jobs can perform batch preprocessing data transformations onlarge datasets stored in Amazon S3, such as converting data formats, filtering data, joiningdata, and aggregating data. AWS Glue jobs can also scale the Spark environmentautomatically based on the data volume and processing needs, without requiring anyinfrastructure provisioning or management. AWS Glue jobs can reduce the developmenteffort and time by providing a graphical interface to create and monitor Spark applications,as well as a code generation feature that can generate Scala or Python code based on thedata sources and targets. AWS Glue jobs can also integrate with other AWS services, suchas Amazon Athena, Amazon EMR, and Amazon SageMaker, to enable further dataanalysis and machine learning tasks1.The other options are either more complex or less scalable than AWS Glue jobs. AmazonEMR cluster is a managed service that enables data engineers to run Apache Sparkapplications on a cluster of Amazon EC2 instances. However, Amazon EMR clusterrequires more development effort and time than AWS Glue jobs, as it involves setting up,configuring, and managing the cluster, as well as writing and deploying the Sparkcode. Amazon EMR cluster also does not scale automatically, but requires manual orscheduled resizing of the cluster based on the data volume and processing needs2.Amazon Athena is a serverless interactive query service that enables data engineers toanalyze data stored in Amazon S3 using standard SQL. However, Amazon Athena is notsuitable for performing complex data transformations, such as joining data from multiplesources, aggregating data, or applying custom logic. Amazon Athena is also not designedfor running Spark applications, but only supports SQL queries3. AWS Lambda is aserverless compute service that enables data engineers to run code without provisioning ormanaging servers. However, AWS Lambda is not optimized for running Spark applications,as it has limitations on the execution time, memory size, and concurrency of the functions.AWS Lambda is also not integrated with Amazon S3, and requires additional steps to readand write data from S3 buckets.References:1: AWS Glue - Fully Managed ETL Service - Amazon Web Services2: Amazon EMR - Amazon Web Services3: Amazon Athena – Interactive SQL Queries for Data in Amazon S3[4]: AWS Lambda – Serverless Compute - Amazon Web Services
Question # 32
A company wants to forecast the daily price of newly launched products based on 3 years of data for older product prices, sales, and rebates. The time-series data has irregular timestamps and is missing some values. Data scientist must build a dataset to replace the missing values. The data scientist needs a solution that resamptes the data daily and exports the data for further modeling. Which solution will meet these requirements with the LEAST implementation effort?
A. Use Amazon EMR Serveriess with PySpark. B. Use AWS Glue DataBrew. C. Use Amazon SageMaker Studio Data Wrangler. D. Use Amazon SageMaker Studio Notebook with Pandas.
Answer: C Explanation: Amazon SageMaker Studio Data Wrangler is a visual data preparation toolthat enables users to clean and normalize data without writing any code. Using DataWrangler, the data scientist can easily import the time-series data from various sources,such as Amazon S3, Amazon Athena, or Amazon Redshift. Data Wrangler canautomatically generate data insights and quality reports, which can help identify and fixmissing values, outliers, and anomalies in the data. Data Wrangler also provides over 250built-in transformations, such as resampling, interpolation, aggregation, and filtering, whichcan be applied to the data with a point-and-click interface. Data Wrangler can also exportthe prepared data to different destinations, such as Amazon S3, Amazon SageMakerFeature Store, or Amazon SageMaker Pipelines, for further modeling and analysis. DataWrangler is integrated with Amazon SageMaker Studio, a web-based IDE for machinelearning, which makes it easy to access and use the tool. Data Wrangler is a serverlessand fully managed service, which means the data scientist does not need to provision,configure, or manage any infrastructure or clusters.Option A is incorrect because Amazon EMR Serverless is a serverless option for running big data analytics applications using open-source frameworks, such as Apache Spark.However, using Amazon EMR Serverless would require the data scientist to write PySparkcode to perform the data preparation tasks, such as resampling, imputation, andaggregation. This would require more implementation effort than using Data Wrangler,which provides a visual and code-free interface for data preparation.Option B is incorrect because AWS Glue DataBrew is another visual data preparation toolthat can be used to clean and normalize data without writing code. However, DataBrewdoes not support time-series data as a data type, and does not provide built-intransformations for resampling, interpolation, or aggregation of time-series data. Therefore,using DataBrew would not meet the requirements of the use case.Option D is incorrect because using Amazon SageMaker Studio Notebook with Pandaswould also require the data scientist to write Python code to perform the data preparationtasks. Pandas is a popular Python library for data analysis and manipulation, whichsupports time-series data and provides various methods for resampling, interpolation, andaggregation. However, using Pandas would require more implementation effort than usingData Wrangler, which provides a visual and code-free interface for data preparation.References:1: Amazon SageMaker Data Wrangler documentation2: Amazon EMR Serverless documentation3: AWS Glue DataBrew documentation4: Pandas documentation
Question # 33
A company operates large cranes at a busy port. The company plans to use machine learning (ML) for predictive maintenance of the cranes to avoid unexpected breakdowns and to improve productivity. The company already uses sensor data from each crane to monitor the health of the cranes in real time. The sensor data includes rotation speed, tension, energy consumption, vibration, pressure, and …perature for each crane. The company contracts AWS ML experts to implement an ML solution. Which potential findings would indicate that an ML-based solution is suitable for this scenario? (Select TWO.)
A. The historical sensor data does not include a significant number of data points andattributes for certain time periods. B. The historical sensor data shows that simple rule-based thresholds can predict cranefailures. C. The historical sensor data contains failure data for only one type of crane model that isin operation and lacks failure data of most other types of crane that are in operation. D. The historical sensor data from the cranes are available with high granularity for the last3 years. E. The historical sensor data contains most common types of crane failures that thecompany wants to predict.
Answer: D,E Explanation: The best indicators that an ML-based solution is suitable for this scenario areD and E, because they imply that the historical sensor data is sufficient and relevant for building a predictive maintenance model. This model can use machine learning techniquessuch as regression, classification, or anomaly detection to learn from the past data andforecast future failures or issues12. Having high granularity and diversity of data canimprove the accuracy and generalization of the model, as well as enable the detection ofcomplex patterns and relationships that are not captured by simple rule-based thresholds3.The other options are not good indicators that an ML-based solution is suitable, becausethey suggest that the historical sensor data is incomplete, inconsistent, or inadequate forbuilding a predictive maintenance model. These options would require additional datacollection, preprocessing, or augmentation to overcome the data quality issues and ensurethat the model can handle different scenarios and types of cranes4 .References:1: Machine Learning Techniques for Predictive Maintenance2: A Guide to Predictive Maintenance & Machine Learning3: Machine Learning for Predictive Maintenance: Reinventing Asset Upkeep4: Predictive Maintenance with Machine Learning: A Complete Guide: [Machine Learning for Predictive Maintenance - AWS Online Tech Talks]
Question # 34
A company is creating an application to identify, count, and classify animal images that are uploaded to the company’s website. The company is using the Amazon SageMaker image classification algorithm with an ImageNetV2 convolutional neural network (CNN). The solution works well for most animal images but does not recognize many animal species that are less common. The company obtains 10,000 labeled images of less common animal species and stores the images in Amazon S3. A machine learning (ML) engineer needs to incorporate the images into the model by using Pipe mode in SageMaker. Which combination of steps should the ML engineer take to train the model? (Choose two.)
A. Use a ResNet model. Initiate full training mode by initializing the network with randomweights. B. Use an Inception model that is available with the SageMaker image classificationalgorithm. C. Create a .lst file that contains a list of image files and corresponding class labels. Uploadthe .lst file to Amazon S3. D. Initiate transfer learning. Train the model by using the images of less common species. E. Use an augmented manifest file in JSON Lines format.
Answer: C,D Explanation: The combination of steps that the ML engineer should take to train the model are to create a .lst file that contains a list of image files and corresponding class labels,upload the .lst file to Amazon S3, and initiate transfer learning by training the model usingthe images of less common species. This approach will allow the ML engineer to leveragethe existing ImageNetV2 CNN model and fine-tune it with the new data using Pipe mode inSageMaker.A .lst file is a text file that contains a list of image files and corresponding class labels,separated by tabs. The .lst file format is required for using the SageMaker imageclassification algorithm with Pipe mode. Pipe mode is a feature of SageMaker that enablesstreaming data directly from Amazon S3 to the training instances, without downloading thedata first. Pipe mode can reduce the startup time, improve the I/O throughput, and enabletraining on large datasets that exceed the disk size limit. To use Pipe mode, the MLengineer needs to upload the .lst file to Amazon S3 and specify the S3 path as the inputdata channel for the training job1.Transfer learning is a technique that enables reusing a pre-trained model for a new task byfine-tuning the model parameters with new data. Transfer learning can save time andcomputational resources, as well as improve the performance of the model, especiallywhen the new task is similar to the original task. The SageMaker image classificationalgorithm supports transfer learning by allowing the ML engineer to specify the number ofoutput classes and the number of layers to be retrained. The ML engineer can use theexisting ImageNetV2 CNN model, which is trained on 1,000 classes of common objects,and fine-tune it with the new data of less common animal species, which is a similar task2.The other options are either less effective or not supported by the SageMaker imageclassification algorithm. Using a ResNet model and initiating full training mode wouldrequire training the model from scratch, which would take more time and resources thantransfer learning. Using an Inception model is not possible, as the SageMaker imageclassification algorithm only supports ResNet and ImageNetV2 models. Using anaugmented manifest file in JSON Lines format is not compatible with Pipe mode, as Pipemode only supports .lst files for image classification1.References:1: Using Pipe input mode for Amazon SageMaker algorithms | AWS MachineLearning Blog2: Image Classification Algorithm - Amazon SageMaker
Question # 35
A machine learning (ML) specialist is using the Amazon SageMaker DeepAR forecasting algorithm to train a model on CPU-based Amazon EC2 On-Demand instances. The model currently takes multiple hours to train. The ML specialist wants to decrease the training time of the model. Which approaches will meet this requirement7 (SELECT TWO )
A. Replace On-Demand Instances with Spot Instances B. Configure model auto scaling dynamically to adjust the number of instancesautomatically. C. Replace CPU-based EC2 instances with GPU-based EC2 instances. D. Use multiple training instances. E. Use a pre-trained version of the model. Run incremental training.
Answer: C,D Explanation: The best approaches to decrease the training time of the model are C and D, because they can improve the computational efficiency and parallelization of the trainingprocess. These approaches have the following benefits:C: Replacing CPU-based EC2 instances with GPU-based EC2 instances canspeed up the training of the DeepAR algorithm, as it can leverage the parallelprocessing power of GPUs to perform matrix operations and gradientcomputations faster than CPUs12. The DeepAR algorithm supports GPU-basedEC2 instances such as ml.p2 and ml.p33.D: Using multiple training instances can also reduce the training time of theDeepAR algorithm, as it can distribute the workload across multiple nodes andperform data parallelism4. The DeepAR algorithm supports distributed training withmultiple CPU-based or GPU-based EC2 instances3.The other options are not effective or relevant, because they have the following drawbacks:A: Replacing On-Demand Instances with Spot Instances can reduce the cost ofthe training, but not necessarily the time, as Spot Instances are subject tointerruption and availability5. Moreover, the DeepAR algorithm does not supportcheckpointing, which means that the training cannot resume from the last savedstate if the Spot Instance is terminated3.B: Configuring model auto scaling dynamically to adjust the number of instancesautomatically is not applicable, as this feature is only available for inferenceendpoints, not for training jobs6.E: Using a pre-trained version of the model and running incremental training is notpossible, as the DeepAR algorithm does not support incremental training ortransfer learning3. The DeepAR algorithm requires a full retraining of the modelwhenever new data is added or the hyperparameters are changed7.References: 1: GPU vs CPU: What Matters Most for Machine Learning? | by Louis (What’s AI)Bouchard | Towards Data Science2: How GPUs Accelerate Machine Learning Training | NVIDIA Developer Blog3: DeepAR Forecasting Algorithm - Amazon SageMaker4: Distributed Training - Amazon SageMaker5: Managed Spot Training - Amazon SageMaker6: Automatic Scaling - Amazon SageMaker7: How the DeepAR Algorithm Works - Amazon SageMaker
Question # 36
A manufacturing company has a production line with sensors that collect hundreds of quality metrics. The company has stored sensor data and manual inspection results in a data lake for several months. To automate quality control, the machine learning team must build an automated mechanism that determines whether the produced goods are good quality, replacement market quality, or scrap quality based on the manual inspection results. Which modeling approach will deliver the MOST accurate prediction of product quality?
A. Amazon SageMaker DeepAR forecasting algorithm B. Amazon SageMaker XGBoost algorithm C. Amazon SageMaker Latent Dirichlet Allocation (LDA) algorithm D. A convolutional neural network (CNN) and ResNet
Answer: D Explanation: A convolutional neural network (CNN) is a type of deep learning model thatcan learn to extract features from images and perform tasks such as classification,segmentation, and detection1. ResNet is a popular CNN architecture that uses residualconnections to overcome the problem of vanishing gradients and enable very deepnetworks2. For the task of predicting product quality based on sensor data, a CNN andResNet approach can leverage the spatial structure of the data and learn complex patternsthat distinguish different quality levels.References:Convolutional Neural Networks (CNNs / ConvNets)PyTorch ResNet: The Basics and a Quick Tutorial
Question # 37
A data scientist at a financial services company used Amazon SageMaker to train and deploy a model that predicts loan defaults. The model analyzes new loan applications and predicts the risk of loan default. To train the model, the data scientist manually extracted loan data from a database. The data scientist performed the model training and deployment steps in a Jupyter notebook that is hosted on SageMaker Studio notebooks. The model's prediction accuracy is decreasing over time. Which combination of slept in the MOST operationally efficient way for the data scientist to maintain the model's accuracy? (Select TWO.)
A. Use SageMaker Pipelines to create an automated workflow that extracts fresh data,trains the model, and deploys a new version of the model. B. Configure SageMaker Model Monitor with an accuracy threshold to check for model drift.Initiate an Amazon CloudWatch alarm when the threshold is exceeded. Connect theworkflow in SageMaker Pipelines with the CloudWatch alarm to automatically initiateretraining. C. Store the model predictions in Amazon S3 Create a daily SageMaker Processing jobthat reads the predictions from Amazon S3, checks for changes in model predictionaccuracy, and sends an email notification if a significant change is detected. D. Rerun the steps in the Jupyter notebook that is hosted on SageMaker Studio notebooksto retrain the model and redeploy a new version of the model. E. Export the training and deployment code from the SageMaker Studio notebooks into aPython script. Package the script into an Amazon Elastic Container Service (Amazon ECS)task that an AWS Lambda function can initiate.
Answer: A,B Explanation: Option A is correct because SageMaker Pipelines is a service that enables you tocreate and manage automated workflows for your machine learning projects. Youcan use SageMaker Pipelines to orchestrate the steps of data extraction, modeltraining, and model deployment in a repeatable and scalable way1.Option B is correct because SageMaker Model Monitor is a service that monitorsthe quality of your models in production and alerts you when there are deviationsin the model quality. You can use SageMaker Model Monitor to set an accuracythreshold for your model and configure a CloudWatch alarm that triggers when thethreshold is exceeded. You can then connect the alarm to the workflow inSageMaker Pipelines to automatically initiate retraining and deployment of a newversion of the model2.Option C is incorrect because it is not the most operationally efficient way tomaintain the model’s accuracy. Creating a daily SageMaker Processing job thatreads the predictions from Amazon S3 and checks for changes in model predictionaccuracy is a manual and time-consuming process. It also requires you to writecustom code to perform the data analysis and send the email notification.Moreover, it does not automatically retrain and deploy the model when theaccuracy drops.Option D is incorrect because it is not the most operationally efficient way tomaintain the model’s accuracy. Rerunning the steps in the Jupyter notebook that ishosted on SageMaker Studio notebooks to retrain the model and redeploy a newversion of the model is a manual and error-prone process. It also requires you tomonitor the model’s performance and initiate the retraining and deployment stepsyourself. Moreover, it does not leverage the benefits of SageMaker Pipelines andSageMaker Model Monitor to automate and streamline the workflow.Option E is incorrect because it is not the most operationally efficient way tomaintain the model’s accuracy. Exporting the training and deployment code fromthe SageMaker Studio notebooks into a Python script and packaging the script intoan Amazon ECS task that an AWS Lambda function can initiate is a complex andcumbersome process. It also requires you to manage the infrastructure andresources for the Amazon ECS task and the AWS Lambda function. Moreover, itdoes not leverage the benefits of SageMaker Pipelines and SageMaker ModelMonitor to automate and streamline the workflow. References:1: SageMaker Pipelines - Amazon SageMaker2: Monitor data and model quality - Amazon SageMaker
Question # 38
A data scientist uses Amazon SageMaker Data Wrangler to define and perform transformations and feature engineering on historical data. The data scientist saves the transformations to SageMaker Feature Store. The historical data is periodically uploaded to an Amazon S3 bucket. The data scientist needs to transform the new historic data and add it to the online feature store The data scientist needs to prepare the .....historic data for training and inference by using native integrations. Which solution will meet these requirements with the LEAST development effort?
A. Use AWS Lambda to run a predefined SageMaker pipeline to perform thetransformations on each new dataset that arrives in the S3 bucket. B. Run an AWS Step Functions step and a predefined SageMaker pipeline to perform thetransformations on each new dalaset that arrives in the S3 bucket C. Use Apache Airflow to orchestrate a set of predefined transformations on each newdataset that arrives in the S3 bucket. D. Configure Amazon EventBridge to run a predefined SageMaker pipeline to perform thetransformations when a new data is detected in the S3 bucket.
Answer: D Explanation: The best solution is to configure Amazon EventBridge to run a predefined SageMaker pipeline to perform the transformations when a new data is detected in the S3bucket. This solution requires the least development effort because it leverages the nativeintegration between EventBridge and SageMaker Pipelines, which allows you to trigger apipeline execution based on an event rule. EventBridge can monitor the S3 bucket for newdata uploads and invoke the pipeline that contains the same transformations and featureengineering steps that were defined in SageMaker Data Wrangler. The pipeline can theningest the transformed data into the online feature store for training and inference.The other solutions are less optimal because they require more development effort andadditional services. Using AWS Lambda or AWS Step Functions would require writingcustom code to invoke the SageMaker pipeline and handle any errors or retries. UsingApache Airflow would require setting up and maintaining an Airflow server and DAGs, aswell as integrating with the SageMaker API.References:Amazon EventBridge and Amazon SageMaker Pipelines integrationCreate a pipeline using a JSON specificationIngest data into a feature group
Question # 39
A financial services company wants to automate its loan approval process by building a machine learning (ML) model. Each loan data point contains credit history from a thirdparty data source and demographic information about the customer. Each loan approval prediction must come with a report that contains an explanation for why the customer was approved for a loan or was denied for a loan. The company will use Amazon SageMaker to build the model. Which solution will meet these requirements with the LEAST development effort?
A. Use SageMaker Model Debugger to automatically debug the predictions, generate theexplanation, and attach the explanation report. B. Use AWS Lambda to provide feature importance and partial dependence plots. Use theplots to generate and attach the explanation report. C. Use SageMaker Clarify to generate the explanation report. Attach the report to thepredicted results. D. Use custom Amazon Cloud Watch metrics to generate the explanation report. Attach thereport to the predicted results.
Answer: C Explanation: The best solution for this scenario is to use SageMaker Clarify to generate the explanationreport and attach it to the predicted results. SageMaker Clarify provides tools to helpexplain how machine learning (ML) models make predictions using a model-agnosticfeature attribution approach based on SHAP values. It can also detect and measurepotential bias in the data and the model. SageMaker Clarify can generate explanationreports during data preparation, model training, and model deployment. The reports includemetrics, graphs, and examples that help understand the model behavior and predictions.The reports can be attached to the predicted results using the SageMaker SDK or the SageMaker API.The other solutions are less optimal because they require more development effort andadditional services. Using SageMaker Model Debugger would require modifying thetraining script to save the model output tensors and writing custom rules to debug andexplain the predictions. Using AWS Lambda would require writing code to invoke the MLmodel, compute the feature importance and partial dependence plots, and generate andattach the explanation report. Using custom Amazon CloudWatch metrics would requirewriting code to publish the metrics, create dashboards, and generate and attach theexplanation report.References:Bias Detection and Model Explainability - Amazon SageMaker Clarify - AWSAmazon SageMaker Clarify Model ExplainabilityAmazon SageMaker Clarify: Machine Learning Bias Detection and ExplainabilityGitHub - aws/amazon-sagemaker-clarify: Fairness Aware Machine Learning
Question # 40
A manufacturing company has structured and unstructured data stored in an Amazon S3 bucket. A Machine Learning Specialist wants to use SQL to run queries on this data. Which solution requires the LEAST effort to be able to query this data?
A. Use AWS Data Pipeline to transform the data and Amazon RDS to run queries. B. Use AWS Glue to catalogue the data and Amazon Athena to run queries. C. Use AWS Batch to run ETL on the data and Amazon Aurora to run the queries. D. Use AWS Lambda to transform the data and Amazon Kinesis Data Analytics to runqueries.
Answer: B Explanation: Using AWS Glue to catalogue the data and Amazon Athena to run queries is the solution that requires the least effort to be able to query the data stored in an AmazonS3 bucket using SQL. AWS Glue is a service that provides a serverless data integrationplatform for data preparation and transformation. AWS Glue can automatically discover,crawl, and catalogue the data stored in various sources, such as Amazon S3, AmazonRDS, Amazon Redshift, etc. AWS Glue can also use AWS KMS to encrypt the data at reston the Glue Data Catalog and Glue ETL jobs. AWS Glue can handle both structured andunstructured data, and support various data formats, such as CSV, JSON, Parquet,etc. AWS Glue can also use built-in or custom classifiers to identify and parse the dataschema and format1 Amazon Athena is a service that provides an interactive query enginethat can run SQL queries directly on data stored in Amazon S3. Amazon Athena canintegrate with AWS Glue to use the Glue Data Catalog as a central metadata repository forthe data sources and tables. Amazon Athena can also use AWS KMS to encrypt the dataat rest on Amazon S3 and the query results. Amazon Athena can query both structuredand unstructured data, and support various data formats, such as CSV, JSON, Parquet,etc. Amazon Athena can also use partitions and compression to optimize the queryperformance and reduce the query cost23The other options are not valid or require more effort to query the data stored in an AmazonS3 bucket using SQL. Using AWS Data Pipeline to transform the data and Amazon RDS torun queries is not a good option, as it involves moving the data from Amazon S3 toAmazon RDS, which can incur additional time and cost. AWS Data Pipeline is a service that can orchestrate and automate data movement and transformation across various AWSservices and on-premises data sources. AWS Data Pipeline can be integrated withAmazon EMR to run ETL jobs on the data stored in Amazon S3. Amazon RDS is a servicethat provides a managed relational database service that can run various databaseengines, such as MySQL, PostgreSQL, Oracle, etc. Amazon RDS can use AWS KMS toencrypt the data at rest and in transit. Amazon RDS can run SQL queries on the datastored in the database tables45 Using AWS Batch to run ETL on the data and AmazonAurora to run the queries is not a good option, as it also involves moving the data fromAmazon S3 to Amazon Aurora, which can incur additional time and cost. AWS Batch is aservice that can run batch computing workloads on AWS. AWS Batch can be integratedwith AWS Lambda to trigger ETL jobs on the data stored in Amazon S3. Amazon Aurora isa service that provides a compatible and scalable relational database engine that can runMySQL or PostgreSQL. Amazon Aurora can use AWS KMS to encrypt the data at rest andin transit. Amazon Aurora can run SQL queries on the data stored in the database tables.Using AWS Lambda to transform the data and Amazon Kinesis Data Analytics to runqueries is not a good option, as it is not suitable for querying data stored in Amazon S3using SQL. AWS Lambda is a service that can run serverless functions on AWS. AWSLambda can be integrated with Amazon S3 to trigger data transformation functions on thedata stored in Amazon S3. Amazon Kinesis Data Analytics is a service that can analyzestreaming data using SQL or Apache Flink. Amazon Kinesis Data Analytics can beintegrated with Amazon Kinesis Data Streams or Amazon Kinesis Data Firehose to ingeststreaming data sources, such as web logs, social media, IoT devices, etc. Amazon KinesisData Analytics is not designed for querying data stored in Amazon S3 using SQL.
Question # 41
A data scientist has been running an Amazon SageMaker notebook instance for a few weeks. During this time, a new version of Jupyter Notebook was released along with additional software updates. The security team mandates that all running SageMaker notebook instances use the latest security and software updates provided by SageMaker. How can the data scientist meet these requirements?
A. Call the CreateNotebookInstanceLifecycleConfig API operation B. Create a new SageMaker notebook instance and mount the Amazon Elastic Block Store(Amazon EBS) volume from the original instance C. Stop and then restart the SageMaker notebook instance D. Call the UpdateNotebookInstanceLifecycleConfig API operation
Answer: C Explanation: The correct solution for updating the software on a SageMaker notebook instance is to stop and then restart the notebook instance. This will automatically apply thelatest security and software updates provided by SageMaker1The other options are incorrect because they either do not update the software or requireunnecessary steps. For example:Option A calls the CreateNotebookInstanceLifecycleConfig API operation. Thisoperation creates a lifecycle configuration, which is a set of shell scripts that runwhen a notebook instance is created or started. A lifecycle configuration can beused to customize the notebook instance, such as installing additional libraries orpackages. However, it does not update the software on the notebook instance2Option B creates a new SageMaker notebook instance and mounts the AmazonElastic Block Store (Amazon EBS) volume from the original instance. This optionwill create a new notebook instance with the latest software, but it will also incuradditional costs and require manual steps to transfer the data and settings fromthe original instance3Option D calls the UpdateNotebookInstanceLifecycleConfig API operation. Thisoperation updates an existing lifecycle configuration. As explained in option A, alifecycle configuration does not update the software on the notebook instance4References:1: Amazon SageMaker Notebook Instances - Amazon SageMaker2: CreateNotebookInstanceLifecycleConfig - Amazon SageMaker3: Create a Notebook Instance - Amazon SageMaker4: UpdateNotebookInstanceLifecycleConfig - Amazon SageMaker
Question # 42
A large company has developed a B1 application that generates reports and dashboards using data collected from various operational metrics The company wants to provide executives with an enhanced experience so they can use natural language to get data from the reports The company wants the executives to be able ask questions using written and spoken interlaces Which combination of services can be used to build this conversational interface? (Select THREE)
A. Alexa for Business B. Amazon Connect C. Amazon Lex D. Amazon Poly E. Amazon Comprehend F. Amazon Transcribe
Answer: C,E,F Explanation:To build a conversational interface that can use natural language to get data fromthe reports, the company can use a combination of services that can handle bothwritten and spoken inputs, understand the user’s intent and query, and extract therelevant information from the reports. The services that can be used for thispurpose are:Therefore, the company can use the following architecture to build theconversational interface:References:What Is Amazon Lex?What Is Amazon Comprehend?What Is Amazon Transcribe?
Question # 43
A manufacturing company needs to identify returned smartphones that have been damaged by moisture. The company has an automated process that produces 2.000 diagnostic values for each phone. The database contains more than five million phone evaluations. The evaluation process is consistent, and there are no missing values in the data. A machine learning (ML) specialist has trained an Amazon SageMaker linear learner ML model to classify phones as moisture damaged or not moisture damaged by using all available features. The model's F1 score is 0.6. What changes in model training would MOST likely improve the model's F1 score? (Select TWO.)
A. Continue to use the SageMaker linear learner algorithm. Reduce the number of featureswith the SageMaker principal component analysis (PCA) algorithm. B. Continue to use the SageMaker linear learner algorithm. Reduce the number of featureswith the scikit-iearn multi-dimensional scaling (MDS) algorithm. C. Continue to use the SageMaker linear learner algorithm. Set the predictor type toregressor. D. Use the SageMaker k-means algorithm with k of less than 1.000 to train the model E. Use the SageMaker k-nearest neighbors (k-NN) algorithm. Set a dimension reductiontarget of less than 1,000 to train the model.
Answer: A,E Explanation: Option A is correct because reducing the number of features with the SageMakerPCA algorithm can help remove noise and redundancy from the data, and improvethe model’s performance. PCA is a dimensionality reduction technique thattransforms the original features into a smaller set of linearly uncorrelated featurescalled principal components. The SageMaker linear learner algorithm supportsPCA as a built-in feature transformation option.Option E is correct because using the SageMaker k-NN algorithm with adimension reduction target of less than 1,000 can help the model learn from thesimilarity of the data points, and improve the model’s performance. k-NN is a nonparametricalgorithm that classifies an input based on the majority vote of its knearest neighbors in the feature space. The SageMaker k-NN algorithm supportsdimension reduction as a built-in feature transformation option.Option B is incorrect because using the scikit-learn MDS algorithm to reduce thenumber of features is not a feasible option, as MDS is a computationally expensivetechnique that does not scale well to large datasets. MDS is a dimensionalityreduction technique that tries to preserve the pairwise distances between theoriginal data points in a lower-dimensional space.Option C is incorrect because setting the predictor type to regressor would changethe model’s objective from classification to regression, which is not suitable for the given problem. A regressor model would output a continuous value instead of abinary label for each phone.Option D is incorrect because using the SageMaker k-means algorithm with k ofless than 1,000 would not help the model classify the phones, as k-means is aclustering algorithm that groups the data points into k clusters based on theirsimilarity, without using any labels. A clustering model would not output a binarylabel for each phone.References:Amazon SageMaker Linear Learner AlgorithmAmazon SageMaker K-Nearest Neighbors (k-NN) Algorithm[Principal Component Analysis - Scikit-learn][Multidimensional Scaling - Scikit-learn]
Question # 44
A beauty supply store wants to understand some characteristics of visitors to the store. The store has security video recordings from the past several years. The store wants to generate a report of hourly visitors from the recordings. The report should group visitors by hair style and hair color. Which solution will meet these requirements with the LEAST amount of effort?
A. Use an object detection algorithm to identify a visitor’s hair in video frames. Pass theidentified hair to an ResNet-50 algorithm to determine hair style and hair color. B. Use an object detection algorithm to identify a visitor’s hair in video frames. Pass theidentified hair to an XGBoost algorithm to determine hair style and hair color. C. Use a semantic segmentation algorithm to identify a visitor’s hair in video frames. Passthe identified hair to an ResNet-50 algorithm to determine hair style and hair color. D. Use a semantic segmentation algorithm to identify a visitor’s hair in video frames. Passthe identified hair to an XGBoost algorithm to determine hair style and hair.
Answer: C Explanation: The solution that will meet the requirements with the least amount of effort is to use a semantic segmentation algorithm to identify a visitor’s hair in video frames, andpass the identified hair to an ResNet-50 algorithm to determine hair style and hair color.This solution can leverage the existing Amazon SageMaker algorithms and frameworks toperform the tasks of hair segmentation and classification.Semantic segmentation is a computer vision technique that assigns a class label to everypixel in an image, such that pixels with the same label share certain characteristics.Semantic segmentation can be used to identify and isolate different objects or regions in animage, such as a visitor’s hair in a video frame. Amazon SageMaker provides a built-insemantic segmentation algorithm that can train and deploy models for semanticsegmentation tasks. The algorithm supports three state-of-the-art network architectures:Fully Convolutional Network (FCN), Pyramid Scene Parsing Network (PSP), and DeepLabv3. The algorithm can also use pre-trained or randomly initialized ResNet-50 or ResNet-101 as the backbone network. The algorithm can be trained using P2/P3 type Amazon EC2instances in single machine configurations1.ResNet-50 is a convolutional neural network that is 50 layers deep and can classify imagesinto 1000 object categories. ResNet-50 is trained on more than a million images from theImageNet database and can achieve high accuracy on various image recognition tasks.ResNet-50 can be used to determine hair style and hair color from the segmented hairregions in the video frames. Amazon SageMaker provides a built-in image classificationalgorithm that can use ResNet-50 as the network architecture. The algorithm can alsoperform transfer learning by fine-tuning the pre-trained ResNet-50 model with newdata. The algorithm can be trained using P2/P3 type Amazon EC2 instances in single ormultiple machine configurations2.The other options are either less effective or more complex to implement. Using an objectdetection algorithm to identify a visitor’s hair in video frames would not segment the hair atthe pixel level, but only draw bounding boxes around the hair regions. This could result ininaccurate or incomplete hair segmentation, especially if the hair is occluded or hasirregular shapes. Using an XGBoost algorithm to determine hair style and hair color wouldrequire transforming the segmented hair images into numerical features, which could losesome information or introduce noise. XGBoost is also not designed for image classificationtasks, and may not achieve high accuracy or performance.References:1: Semantic Segmentation Algorithm - Amazon SageMaker2: Image Classification Algorithm - Amazon SageMaker
Question # 45
Each morning, a data scientist at a rental car company creates insights about the previous day’s rental car reservation demands. The company needs to automate this process by streaming the data to Amazon S3 in near real time. The solution must detect high-demand rental cars at each of the company’s locations. The solution also must create a visualization dashboard that automatically refreshes with the most recent data. Which solution will meet these requirements with the LEAST development time?
A. Use Amazon Kinesis Data Firehose to stream the reservation data directly to AmazonS3. Detect high-demand outliers by using Amazon QuickSight ML Insights. Visualize the data in QuickSight. B. Use Amazon Kinesis Data Streams to stream the reservation data directly to AmazonS3. Detect high-demand outliers by using the Random Cut Forest (RCF) trained model inAmazon SageMaker. Visualize the data in Amazon QuickSight. C. Use Amazon Kinesis Data Firehose to stream the reservation data directly to AmazonS3. Detect high-demand outliers by using the Random Cut Forest (RCF) trained model inAmazon SageMaker. Visualize the data in Amazon QuickSight. D. Use Amazon Kinesis Data Streams to stream the reservation data directly to AmazonS3. Detect high-demand outliers by using Amazon QuickSight ML Insights. Visualize thedata in QuickSight.
Answer: A Explanation: The solution that will meet the requirements with the least development time is to use Amazon Kinesis Data Firehose to stream the reservation data directly to AmazonS3, detect high-demand outliers by using Amazon QuickSight ML Insights, and visualizethe data in QuickSight. This solution does not require any custom development or MLdomain expertise, as it leverages the built-in features of QuickSight ML Insights toautomatically run anomaly detection and generate insights on the streaming data.QuickSight ML Insights can also create a visualization dashboard that automaticallyrefreshes with the most recent data, and allows the data scientist to explore the outliersand their key drivers. References:1: Simplify and automate anomaly detection in streaming data with AmazonLookout for Metrics | AWS Machine Learning Blog2: Detecting outliers with ML-powered anomaly detection - Amazon QuickSight3: Real-time Outlier Detection Over Streaming Data - IEEE Xplore4: Towards a deep learning-based outlier detection … - Journal of Big Data
Question # 46
A company wants to conduct targeted marketing to sell solar panels to homeowners. The company wants to use machine learning (ML) technologies to identify which houses already have solar panels. The company has collected 8,000 satellite images as training data and will use Amazon SageMaker Ground Truth to label the data. The company has a small internal team that is working on the project. The internal team has no ML expertise and no ML experience. Which solution will meet these requirements with the LEAST amount of effort from the internal team?
A. Set up a private workforce that consists of the internal team. Use the private workforceand the SageMaker Ground Truth active learning feature to label the data. Use AmazonRekognition Custom Labels for model training and hosting. B. Set up a private workforce that consists of the internal team. Use the private workforceto label the data. Use Amazon Rekognition Custom Labels for model training and hosting. C. Set up a private workforce that consists of the internal team. Use the private workforceand the SageMaker Ground Truth active learning feature to label the data. Use theSageMaker Object Detection algorithm to train a model. Use SageMaker batch transformfor inference. D. Set up a public workforce. Use the public workforce to label the data. Use theSageMaker Object Detection algorithm to train a model. Use SageMaker batch transformfor inference.
Answer: A Explanation: The solution A will meet the requirements with the least amount of effort from the internal team because it uses Amazon SageMaker Ground Truth and AmazonRekognition Custom Labels, which are fully managed services that can provide the desiredfunctionality. The solution A involves the following steps:Set up a private workforce that consists of the internal team. Use the privateworkforce and the SageMaker Ground Truth active learning feature to label thedata. Amazon SageMaker Ground Truth is a service that can create high-qualitytraining datasets for machine learning by using human labelers. A privateworkforce is a group of labelers that the company can manage and control. Theinternal team can use the private workforce to label the satellite images as havingsolar panels or not. The SageMaker Ground Truth active learning feature canreduce the labeling effort by using a machine learning model to automatically labelthe easy examples and only send the difficult ones to the human labelers1.Use Amazon Rekognition Custom Labels for model training and hosting. AmazonRekognition Custom Labels is a service that can train and deploy custom machinelearning models for image analysis. Amazon Rekognition Custom Labels can usethe labeled data from SageMaker Ground Truth to train a model that can detectsolar panels in satellite images. Amazon Rekognition Custom Labels can also hostthe model and provide an API endpoint for inference2.The other options are not suitable because:Option B: Setting up a private workforce that consists of the internal team, usingthe private workforce to label the data, and using Amazon Rekognition CustomLabels for model training and hosting will incur more effort from the internal team than using SageMaker Ground Truth active learning feature. The internal team willhave to label all the images manually, without the assistance of the machinelearning model that can automate some of the labeling tasks1.Option C: Setting up a private workforce that consists of the internal team, usingthe private workforce and the SageMaker Ground Truth active learning feature tolabel the data, using the SageMaker Object Detection algorithm to train a model,and using SageMaker batch transform for inference will incur more operationaloverhead than using Amazon Rekognition Custom Labels. The company will haveto manage the SageMaker training job, the model artifact, and the batch transformjob. Moreover, SageMaker batch transform is not suitable for real-time inference,as it processes the data in batches and stores the results in Amazon S33.Option D: Setting up a public workforce, using the public workforce to label thedata, using the SageMaker Object Detection algorithm to train a model, and usingSageMaker batch transform for inference will incur more operational overhead andcost than using a private workforce and Amazon Rekognition Custom Labels. Apublic workforce is a group of labelers from Amazon Mechanical Turk, acrowdsourcing marketplace. The company will have to pay the public workforce foreach labeling task, and it may not have full control over the quality and security ofthe labeled data. The company will also have to manage the SageMaker trainingjob, the model artifact, and the batch transform job, as explained in option C4.References:1: Amazon SageMaker Ground Truth2: Amazon Rekognition Custom Labels3: Amazon SageMaker Object Detection4: Amazon Mechanical Turk
Question # 47
A finance company needs to forecast the price of a commodity. The company has compiled a dataset of historical daily prices. A data scientist must train various forecasting models on 80% of the dataset and must validate the efficacy of those models on the remaining 20% of the dataset. What should the data scientist split the dataset into a training dataset and a validation dataset to compare model performance?
A. Pick a date so that 80% to the data points precede the date Assign that group of datapoints as the training dataset. Assign all the remaining data points to the validation dataset. B. Pick a date so that 80% of the data points occur after the date. Assign that group of datapoints as the training dataset. Assign all the remaining data points to the validation dataset. C. Starting from the earliest date in the dataset. pick eight data points for the trainingdataset and two data points for the validation dataset. Repeat this stratified sampling untilno data points remain. D. Sample data points randomly without replacement so that 80% of the data points are inthe training dataset. Assign all the remaining data points to the validation dataset.
Answer: A Explanation: A Comprehensive Explanation: The best way to split the dataset into a training dataset and a validation dataset is to pick a date so that 80% of the data pointsprecede the date and assign that group of data points as the training dataset. This methodpreserves the temporal order of the data and ensures that the validation dataset reflectsthe most recent trends and patterns in the commodity price. This is important forforecasting models that rely on time series analysis and sequential data. The othermethods would either introduce bias or lose information by ignoring the temporal structureof the data.References:Time Series Forecasting - Amazon SageMakerTime Series Splitting - scikit-learnTime Series Forecasting - Towards Data Science
Question # 48
A chemical company has developed several machine learning (ML) solutions to identify chemical process abnormalities. The time series values of independent variables and the labels are available for the past 2 years and are sufficient to accurately model the problem. The regular operation label is marked as 0. The abnormal operation label is marked as 1 . Process abnormalities have a significant negative effect on the companys profits. The company must avoid these abnormalities. Which metrics will indicate an ML solution that will provide the GREATEST probability of detecting an abnormality?
A. Precision = 0.91Recall = 0.6 B. Precision = 0.61Recall = 0.98 C. Precision = 0.7Recall = 0.9 D. Precision = 0.98Recall = 0.8
Answer: B Explanation: The metrics that will indicate an ML solution that will provide the greatest probability of detecting an abnormality are precision and recall. Precision is the ratio of truepositives (TP) to the total number of predicted positives (TP + FP), where FP is falsepositives. Recall is the ratio of true positives (TP) to the total number of actual positives (TP+ FN), where FN is false negatives. A high precision means that the ML solution has a lowrate of false alarms, while a high recall means that the ML solution has a high rate of truedetections. For the chemical company, the goal is to avoid process abnormalities, whichare marked as 1 in the labels. Therefore, the company needs an ML solution that has ahigh recall for the positive class, meaning that it can detect most of the abnormalities andminimize the false negatives. Among the four options, option B has the highest recall forthe positive class, which is 0.98. This means that the ML solution can detect 98% of theabnormalities and miss only 2%. Option B also has a reasonable precision for the positiveclass, which is 0.61. This means that the ML solution has a false alarm rate of 39%, whichmay be acceptable for the company, depending on the cost and benefit analysis. The other options have lower recall for the positive class, which means that they have higher falsenegative rates, which can be more detrimental for the company than false positive rates.References:1: AWS Certified Machine Learning - Specialty Exam Guide2: AWS Training - Machine Learning on AWS3: AWS Whitepaper - An Overview of Machine Learning on AWS4: Precision and recall
Question # 49
A machine learning (ML) specialist uploads 5 TB of data to an Amazon SageMaker Studio environment. The ML specialist performs initial data cleansing. Before the ML specialist begins to train a model, the ML specialist needs to create and view an analysis report that details potential bias in the uploaded data. Which combination of actions will meet these requirements with the LEAST operational overhead? (Choose two.)
A. Use SageMaker Clarify to automatically detect data bias B. Turn on the bias detection option in SageMaker Ground Truth to automatically analyzedata features. C. Use SageMaker Model Monitor to generate a bias drift report. D. Configure SageMaker Data Wrangler to generate a bias report. E. Use SageMaker Experiments to perform a data check
Answer: A,D Explanation: The combination of actions that will meet the requirements with the leastoperational overhead is to use SageMaker Clarify to automatically detect data bias and toconfigure SageMaker Data Wrangler to generate a bias report. SageMaker Clarify is afeature of Amazon SageMaker that provides machine learning (ML) developers with toolsto gain greater insights into their ML training data and models. SageMaker Clarify candetect potential bias during data preparation, after model training, and in your deployedmodel. For instance, you can check for bias related to age in your dataset or in your trainedmodel and receive a detailed report that quantifies different types of potential bias1.SageMaker Data Wrangler is another feature of Amazon SageMaker that enables you toprepare data for machine learning (ML) quickly and easily. You can use SageMaker DataWrangler to identify potential bias during data preparation without having to write your owncode. You specify input features, such as gender or age, and SageMaker Data Wranglerruns an analysis job to detect potential bias in those features. SageMaker Data Wranglerthen provides a visual report with a description of the metrics and measurements ofpotential bias so that you can identify steps to remediate the bias2. The other actions eitherrequire more customization (such as using SageMaker Model Monitor or SageMakerExperiments) or do not meet the requirement of detecting data bias (such as usingSageMaker Ground Truth). References:1: Bias Detection and Model Explainability – Amazon Web Services2: Amazon SageMaker Data Wrangler – Amazon Web Services
Question # 50
A company uses sensors on devices such as motor engines and factory machines to measure parameters, temperature and pressure. The company wants to use the sensor data to predict equipment malfunctions and reduce services outages. The Machine learning (ML) specialist needs to gather the sensors data to train a model to predict device malfunctions The ML spoctafst must ensure that the data does not contain outliers before training the ..el. What can the ML specialist meet these requirements with the LEAST operational overhead?
A. Load the data into an Amazon SagcMaker Studio notebook. Calculate the first and thirdquartile Use a SageMaker Data Wrangler data (low to remove only values that are outside of those quartiles. B. Use an Amazon SageMaker Data Wrangler bias report to find outliers in the dataset Usea Data Wrangler data flow to remove outliers based on the bias report. C. Use an Amazon SageMaker Data Wrangler anomaly detection visualization to findoutliers in the dataset. Add a transformation to a Data Wrangler data flow to removeoutliers. D. Use Amazon Lookout for Equipment to find and remove outliers from the dataset.
Answer: C Explanation: Amazon SageMaker Data Wrangler is a tool that helps data scientists and ML developers to prepare data for ML. One of the features of Data Wrangler is the anomalydetection visualization, which uses an unsupervised ML algorithm to identify outliers in thedataset based on statistical properties. The ML specialist can use this feature to quicklyexplore the sensor data and find any anomalous values that may affect the modelperformance. The ML specialist can then add a transformation to a Data Wrangler dataflow to remove the outliers from the dataset. The data flow can be exported as a script or apipeline to automate the data preparation process. This option requires the leastoperational overhead compared to the other options.References:Amazon SageMaker Data Wrangler - Amazon Web Services (AWS)Anomaly Detection Visualization - Amazon SageMakerTransform Data - Amazon SageMaker
Question # 51
A data scientist wants to use Amazon Forecast to build a forecasting model for inventory demand for a retail company. The company has provided a dataset of historic inventory demand for its products as a .csv file stored in an Amazon S3 bucket. The table below shows a sample of the dataset.
How should the data scientist transform the data?
A. Use ETL jobs in AWS Glue to separate the dataset into a target time series dataset andan item metadata dataset. Upload both datasets as .csv files to Amazon S3. B. Use a Jupyter notebook in Amazon SageMaker to separate the dataset into a relatedtime series dataset and an item metadata dataset. Upload both datasets as tables inAmazon Aurora. C. Use AWS Batch jobs to separate the dataset into a target time series dataset, a relatedtime series dataset, and an item metadata dataset. Upload them directly to Forecast from alocal machine. D. Use a Jupyter notebook in Amazon SageMaker to transform the data into the optimizedprotobuf recordIO format. Upload the dataset in this format to Amazon S3.
Answer: A Explanation: Amazon Forecast requires the input data to be in a specific format. The datascientist should use ETL jobs in AWS Glue to separate the dataset into a target time seriesdataset and an item metadata dataset. The target time series dataset should contain thetimestamp, item_id, and demand columns, while the item metadata dataset should containthe item_id, category, and lead_time columns. Both datasets should be uploaded as .csvfiles to Amazon S3 . References:How Amazon Forecast Works - Amazon ForecastChoosing Datasets - Amazon Forecast
Question # 52
The chief editor for a product catalog wants the research and development team to build a machine learning system that can be used to detect whether or not individuals in a collection of images are wearing the company's retail brand. The team has a set of training data. Which machine learning algorithm should the researchers use that BEST meets their requirements?
A. Latent Dirichlet Allocation (LDA) B. Recurrent neural network (RNN) C. K-means D. Convolutional neural network (CNN)
Answer: D Explanation: The problem of detecting whether or not individuals in a collection of images are wearing the company’s retail brand is an example of image recognition, which is a typeof machine learning task that identifies and classifies objects in an image. Convolutionalneural networks (CNNs) are a type of machine learning algorithm that are well-suited forimage recognition, as they can learn to extract features from images and handle variationsin size, shape, color, and orientation of the objects. CNNs consist of multiple layers thatperform convolution, pooling, and activation operations on the input images, resulting in ahigh-level representation that can be used for classification or detection. Therefore, optionD is the best choice for the machine learning algorithm that meets the requirements of thechief editor.Option A is incorrect because latent Dirichlet allocation (LDA) is a type of machine learningalgorithm that is used for topic modeling, which is a task that discovers the hidden themesor topics in a collection of text documents. LDA is not suitable for image recognition, as itdoes not preserve the spatial information of the pixels. Option B is incorrect becauserecurrent neural networks (RNNs) are a type of machine learning algorithm that are usedfor sequential data, such as text, speech, or time series. RNNs can learn from the temporaldependencies and patterns in the input data, and generate outputs that depend on theprevious states. RNNs are not suitable for image recognition, as they do not capture thespatial dependencies and patterns in the input images. Option C is incorrect because kmeansis a type of machine learning algorithm that is used for clustering, which is a taskthat groups similar data points together based on their features. K-means is not suitable forimage recognition, as it does not perform classification or detection of the objects in theimages.References:Image Recognition Software - ML Image & Video Analysis - Amazon …Image classification and object detection using Amazon Rekognition … AWS Amazon Rekognition - Deep Learning Face and Image Recognition …GitHub - awslabs/aws-ai-solution-kit: Machine Learning APIs for common …Meet iNaturalist, an AWS-powered nature app that helps you identify …
Question # 53
A wildlife research company has a set of images of lions and cheetahs. The company created a dataset of the images. The company labeled each image with a binary label that indicates whether an image contains a lion or cheetah. The company wants to train a model to identify whether new images contain a lion or cheetah. .... Dh Amazon SageMaker algorithm will meet this requirement?
A. XGBoost B. Image Classification - TensorFlow C. Object Detection - TensorFlow D. Semantic segmentation - MXNet
Answer: B Explanation: The best Amazon SageMaker algorithm for this task is Image Classification -TensorFlow. This algorithm is a supervised learning algorithm that supports transferlearning with many pretrained models from the TensorFlow Hub. Transfer learning allowsthe company to fine-tune one of the available pretrained models on their own dataset, evenif a large amount of image data is not available. The image classification algorithm takes animage as input and outputs a probability for each provided class label. The company canchoose from a variety of models, such as MobileNet, ResNet, or Inception, depending ontheir accuracy and speed requirements. The algorithm also supports distributed training,data augmentation, and hyperparameter tuning.References:Image Classification - TensorFlow - Amazon SageMakerAmazon SageMaker Provides New Built-in TensorFlow Image ClassificationAlgorithmImage Classification with ResNet :: Amazon SageMaker WorkshopImage classification on Amazon SageMaker | by Julien Simon - Medium
Question # 54
A company’s data scientist has trained a new machine learning model that performs better on test data than the company’s existing model performs in the production environment. The data scientist wants to replace the existing model that runs on an Amazon SageMaker endpoint in the production environment. However, the company is concerned that the new model might not work well on the production environment data. The data scientist needs to perform A/B testing in the production environment to evaluate whether the new model performs well on production environment data. Which combination of steps must the data scientist take to perform the A/B testing? (Choose two.)
A. Create a new endpoint configuration that includes a production variant for each of thetwo models. B. Create a new endpoint configuration that includes two target variants that point todifferent endpoints. C. Deploy the new model to the existing endpoint. D. Update the existing endpoint to activate the new model. E. Update the existing endpoint to use the new endpoint configuration.
Answer: A,E Explanation: The combination of steps that the data scientist must take to perform the A/Btesting are to create a new endpoint configuration that includes a production variant foreach of the two models, and update the existing endpoint to use the new endpointconfiguration. This approach will allow the data scientist to deploy both models on the same endpoint and split the inference traffic between them based on a specifieddistribution.Amazon SageMaker is a fully managed service that provides developers and datascientists the ability to quickly build, train, and deploy machine learning models. AmazonSageMaker supports A/B testing on machine learning models by allowing the data scientistto run multiple production variants on an endpoint. A production variant is a version of amodel that is deployed on an endpoint. Each production variant has a name, a machinelearning model, an instance type, an initial instance count, and an initial weight. The initialweight determines the percentage of inference requests that the variant will handle. Forexample, if there are two variants with weights of 0.5 and 0.5, each variant will handle 50%of the requests. The data scientist can use production variants to test models that havebeen trained using different training datasets, algorithms, and machine learningframeworks; test how they perform on different instance types; or a combination of all of theabove1.To perform A/B testing on machine learning models, the data scientist needs to create anew endpoint configuration that includes a production variant for each of the two models.An endpoint configuration is a collection of settings that define the properties of anendpoint, such as the name, the production variants, and the data capture configuration.The data scientist can use the Amazon SageMaker console, the AWS CLI, or the AWSSDKs to create a new endpoint configuration. The data scientist needs to specify the name,model name, instance type, initial instance count, and initial variant weight for eachproduction variant in the endpoint configuration2.After creating the new endpoint configuration, the data scientist needs to update theexisting endpoint to use the new endpoint configuration. Updating an endpoint is theprocess of deploying a new endpoint configuration to an existing endpoint. Updating anendpoint does not affect the availability or scalability of the endpoint, as AmazonSageMaker creates a new endpoint instance with the new configuration and switches theDNS record to point to the new instance when it is ready. The data scientist can use theAmazon SageMaker console, the AWS CLI, or the AWS SDKs to update an endpoint. Thedata scientist needs to specify the name of the endpoint and the name of the new endpointconfiguration to update the endpoint3.The other options are either incorrect or unnecessary. Creating a new endpointconfiguration that includes two target variants that point to different endpoints is notpossible, as target variants are only used to invoke a specific variant on an endpoint, not todefine an endpoint configuration. Deploying the new model to the existing endpoint wouldreplace the existing model, not run it side-by-side with the new model. Updating theexisting endpoint to activate the new model is not a valid operation, as there is noactivation parameter for an endpoint.References:1: A/B Testing ML models in production using Amazon SageMaker | AWS MachineLearning Blog 2: Create an Endpoint Configuration - Amazon SageMaker3: Update an Endpoint - Amazon SageMake
Question # 55
A data science team is working with a tabular dataset that the team stores in Amazon S3. The team wants to experiment with different feature transformations such as categorical feature encoding. Then the team wants to visualize the resulting distribution of the dataset. After the team finds an appropriate set of feature transformations, the team wants to automate the workflow for feature transformations. Which solution will meet these requirements with the MOST operational efficiency?
A. Use Amazon SageMaker Data Wrangler preconfigured transformations to explorefeature transformations. Use SageMaker Data Wrangler templates for visualization. Exportthe feature processing workflow to a SageMaker pipeline for automation. B. Use an Amazon SageMaker notebook instance to experiment with different featuretransformations. Save the transformations to Amazon S3. Use Amazon QuickSight forvisualization. Package the feature processing steps into an AWS Lambda function forautomation. C. Use AWS Glue Studio with custom code to experiment with different featuretransformations. Save the transformations to Amazon S3. Use Amazon QuickSight forvisualization. Package the feature processing steps into an AWS Lambda function forautomation. D. Use Amazon SageMaker Data Wrangler preconfigured transformations to experimentwith different feature transformations. Save the transformations to Amazon S3. UseAmazon QuickSight for visualzation. Package each feature transformation step into aseparate AWS Lambda function. Use AWS Step Functions for workflow automation.
Answer: A Explanation: The solution A will meet the requirements with the most operationalefficiency because it uses Amazon SageMaker Data Wrangler, which is a service thatsimplifies the process of data preparation and feature engineering for machine learning.The solution A involves the following steps:Use Amazon SageMaker Data Wrangler preconfigured transformations to explorefeature transformations. Amazon SageMaker Data Wrangler provides a visualinterface that allows data scientists to apply various transformations to their tabulardata, such as encoding categorical features, scaling numerical features, imputingmissing values, and more. Amazon SageMaker Data Wrangler also supportscustom transformations using Python code or SQL queries1.Use SageMaker Data Wrangler templates for visualization. Amazon SageMakerData Wrangler also provides a set of templates that can generate visualizations ofthe data, such as histograms, scatter plots, box plots, and more. Thesevisualizations can help data scientists to understand the distribution andcharacteristics of the data, and to compare the effects of different featuretransformations1.Export the feature processing workflow to a SageMaker pipeline for automation.Amazon SageMaker Data Wrangler can export the feature processing workflow asa SageMaker pipeline, which is a service that orchestrates and automatesmachine learning workflows. A SageMaker pipeline can run the feature processingsteps as a preprocessing step, and then feed the output to a training step or aninference step. This can reduce the operational overhead of managing the featureprocessing workflow and ensure its consistency and reproducibility2.The other options are not suitable because:Option B: Using an Amazon SageMaker notebook instance to experiment withdifferent feature transformations, saving the transformations to Amazon S3, usingAmazon QuickSight for visualization, and packaging the feature processing stepsinto an AWS Lambda function for automation will incur more operational overheadthan using Amazon SageMaker Data Wrangler. The data scientist will have towrite the code for the feature transformations, the data storage, the datavisualization, and the Lambda function. Moreover, AWS Lambda has limitations onthe execution time, memory size, and package size, which may not be sufficientfor complex feature processing tasks3.Option C: Using AWS Glue Studio with custom code to experiment with differentfeature transformations, saving the transformations to Amazon S3, using AmazonQuickSight for visualization, and packaging the feature processing steps into anAWS Lambda function for automation will incur more operational overhead thanusing Amazon SageMaker Data Wrangler. AWS Glue Studio is a visual interfacethat allows data engineers to create and run extract, transform, and load (ETL)jobs on AWS Glue. However, AWS Glue Studio does not provide preconfiguredtransformations or templates for feature engineering or data visualization. The datascientist will have to write custom code for these tasks, as well as for the Lambdafunction. Moreover, AWS Glue Studio is not integrated with SageMaker pipelines,and it may not be optimized for machine learning workflows4.Option D: Using Amazon SageMaker Data Wrangler preconfiguredtransformations to experiment with different feature transformations, saving thetransformations to Amazon S3, using Amazon QuickSight for visualization, packaging each feature transformation step into a separate AWS Lambda function,and using AWS Step Functions for workflow automation will incur more operationaloverhead than using Amazon SageMaker Data Wrangler. The data scientist willhave to create and manage multiple AWS Lambda functions and AWS StepFunctions, which can increase the complexity and cost of the solution. Moreover,AWS Lambda and AWS Step Functions may not be compatible with SageMakerpipelines, and they may not be optimized for machine learning workflows5.References:1: Amazon SageMaker Data Wrangler2: Amazon SageMaker Pipelines3: AWS Lambda4: AWS Glue Studio5: AWS Step Functions
Question # 56
A Machine Learning Specialist is training a model to identify the make and model of vehicles in images The Specialist wants to use transfer learning and an existing model trained on images of general objects The Specialist collated a large custom dataset of pictures containing different vehicle makes and models. What should the Specialist do to initialize the model to re-train it with the custom data?
A. Initialize the model with random weights in all layers including the last fully connectedlayer B. Initialize the model with pre-trained weights in all layers and replace the last fullyconnected layer. C. Initialize the model with random weights in all layers and replace the last fully connectedlayer D. Initialize the model with pre-trained weights in all layers including the last fully connectedlayer
Answer: B Explanation: Transfer learning is a technique that allows us to use a model trained for acertain task as a starting point for a machine learning model for a different task. For imageclassification, a common practice is to use a pre-trained model that was trained on a largeand general dataset, such as ImageNet, and then customize it for the specific task. Oneway to customize the model is to replace the last fully connected layer, which is responsiblefor the final classification, with a new layer that has the same number of units as thenumber of classes in the new task. This way, the model can leverage the features learnedby the previous layers, which are generic and useful for many image recognition tasks, andlearn to map them to the new classes. The new layer can be initialized with randomweights, and the rest of the model can be initialized with the pre-trained weights. Thismethod is also known as feature extraction, as it extracts meaningful features from the pretrainedmodel and uses them for the new task. References:Transfer learning and fine-tuningDeep transfer learning for image classification: a survey
Question # 57
A retail company is ingesting purchasing records from its network of 20,000 stores to Amazon S3 by using Amazon Kinesis Data Firehose. The company uses a small, serverbased application in each store to send the data to AWS over the internet. The company uses this data to train a machine learning model that is retrained each day. The company's data science team has identified existing attributes on these records that could be combined to create an improved model. Which change will create the required transformed records with the LEAST operational overhead?
A. Create an AWS Lambda function that can transform the incoming records. Enable datatransformation on the ingestion Kinesis Data Firehose delivery stream. Use the Lambdafunction as the invocation target. B. Deploy an Amazon EMR cluster that runs Apache Spark and includes the transformationlogic. Use Amazon EventBridge (Amazon CloudWatch Events) to schedule an AWS Lambda function to launch the cluster each day and transform the records that accumulatein Amazon S3. Deliver the transformed records to Amazon S3. C. Deploy an Amazon S3 File Gateway in the stores. Update the in-store software todeliver data to the S3 File Gateway. Use a scheduled daily AWS Glue job to transform thedata that the S3 File Gateway delivers to Amazon S3. D. Launch a fleet of Amazon EC2 instances that include the transformation logic. Configurethe EC2 instances with a daily cron job to transform the records that accumulate in AmazonS3. Deliver the transformed records to Amazon S3.
Answer: A Explanation: The solution A will create the required transformed records with the least operational overhead because it uses AWS Lambda and Amazon Kinesis Data Firehose,which are fully managed services that can provide the desired functionality. The solution Ainvolves the following steps:Create an AWS Lambda function that can transform the incoming records. AWSLambda is a service that can run code without provisioning or managingservers. AWS Lambda can execute the transformation logic on the purchasingrecords and add the new attributes to the records1.Enable data transformation on the ingestion Kinesis Data Firehose deliverystream. Use the Lambda function as the invocation target. Amazon Kinesis DataFirehose is a service that can capture, transform, and load streaming data intoAWS data stores. Amazon Kinesis Data Firehose can enable data transformationand invoke the Lambda function to process the incoming records before deliveringthem to Amazon S3. This can reduce the operational overhead of managing thetransformation process and the data storage2.The other options are not suitable because:Option B: Deploying an Amazon EMR cluster that runs Apache Spark and includesthe transformation logic, using Amazon EventBridge (Amazon CloudWatchEvents) to schedule an AWS Lambda function to launch the cluster each day andtransform the records that accumulate in Amazon S3, and delivering thetransformed records to Amazon S3 will incur more operational overhead thanusing AWS Lambda and Amazon Kinesis Data Firehose. The company will have tomanage the Amazon EMR cluster, the Apache Spark application, the AWSLambda function, and the Amazon EventBridge rule. Moreover, this solution willintroduce a delay in the transformation process, as it will run only once a day3.Option C: Deploying an Amazon S3 File Gateway in the stores, updating the instoresoftware to deliver data to the S3 File Gateway, and using a scheduled dailyAWS Glue job to transform the data that the S3 File Gateway delivers to AmazonS3 will incur more operational overhead than using AWS Lambda and AmazonKinesis Data Firehose. The company will have to manage the S3 File Gateway,the in-store software, and the AWS Glue job. Moreover, this solution will introducea delay in the transformation process, as it will run only once a day4.Option D: Launching a fleet of Amazon EC2 instances that include thetransformation logic, configuring the EC2 instances with a daily cron job totransform the records that accumulate in Amazon S3, and delivering thetransformed records to Amazon S3 will incur more operational overhead thanusing AWS Lambda and Amazon Kinesis Data Firehose. The company will have to manage the EC2 instances, the transformation code, and the cron job. Moreover,this solution will introduce a delay in the transformation process, as it will run onlyonce a day5.References:1: AWS Lambda2: Amazon Kinesis Data Firehose3: Amazon EMR4: Amazon S3 File Gateway5: Amazon EC2
Question # 58
A company wants to enhance audits for its machine learning (ML) systems. The auditing system must be able to perform metadata analysis on the features that the ML models use. The audit solution must generate a report that analyzes the metadata. The solution also must be able to set the data sensitivity and authorship of features. Which solution will meet these requirements with the LEAST development effort?
A. Use Amazon SageMaker Feature Store to select the features. Create a data flow toperform feature-level metadata analysis. Create an Amazon DynamoDB table to storefeature-level metadata. Use Amazon QuickSight to analyze the metadata. B. Use Amazon SageMaker Feature Store to set feature groups for the current featuresthat the ML models use. Assign the required metadata for each feature. Use SageMakerStudio to analyze the metadata. C. Use Amazon SageMaker Features Store to apply custom algorithms to analyze thefeature-level metadata that the company requires. Create an Amazon DynamoDB table tostore feature-level metadata. Use Amazon QuickSight to analyze the metadata. D. Use Amazon SageMaker Feature Store to set feature groups for the current featuresthat the ML models use. Assign the required metadata for each feature. Use AmazonQuickSight to analyze the metadata.
Answer: D Explanation: The solution that will meet the requirements with the least development effort is to use Amazon SageMaker Feature Store to set feature groups for the current featuresthat the ML models use, assign the required metadata for each feature, and use AmazonQuickSight to analyze the metadata. This solution can leverage the existing AWS servicesand features to perform feature-level metadata analysis and reporting.Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store,update, search, and share machine learning (ML) features. The service provides featuremanagement capabilities such as enabling easy feature reuse, low latency serving, timetravel, and ensuring consistency between features used in training and inferenceworkflows. A feature group is a logical grouping of ML features whose organization andstructure is defined by a feature group schema. A feature group schema consists of a list offeature definitions, each of which specifies the name, type, and metadata of a feature. Themetadata can include information such as data sensitivity, authorship, description, andparameters. The metadata can help make features discoverable, understandable, andtraceable. Amazon SageMaker Feature Store allows users to set feature groups for thecurrent features that the ML models use, and assign the required metadata for each featureusing the AWS SDK for Python (Boto3), AWS Command Line Interface (AWS CLI), orAmazon SageMaker Studio1. Amazon QuickSight is a fully managed, serverless business intelligence service that makesit easy to create and publish interactive dashboards that include ML insights. AmazonQuickSight can connect to various data sources, such as Amazon S3, Amazon Athena,Amazon Redshift, and Amazon SageMaker Feature Store, and analyze the data usingstandard SQL or built-in ML-powered analytics. Amazon QuickSight can also create richvisualizations and reports that can be accessed from any device, and securely shared withanyone inside or outside an organization. Amazon QuickSight can be used to analyze themetadata of the features stored in Amazon SageMaker Feature Store, and generate areport that summarizes the metadata analysis2.The other options are either more complex or less effective than the proposed solution.Using Amazon SageMaker Data Wrangler to select the features and create a data flow toperform feature-level metadata analysis would require additional steps and resources, andmay not capture all the metadata attributes that the company requires. Creating anAmazon DynamoDB table to store feature-level metadata would introduce redundancy andinconsistency, as the metadata is already stored in Amazon SageMaker Feature Store.Using SageMaker Studio to analyze the metadata would not generate a report that can beeasily shared and accessed by the company.References:1: Amazon SageMaker Feature Store – Amazon Web Services2: Amazon QuickSight – Business Intelligence Service - Amazon Web Services
Question # 59
A company's machine learning (ML) specialist is building a computer vision model to classify 10 different traffic signs. The company has stored 100 images of each class in Amazon S3, and the company has another 10.000 unlabeled images. All the images come from dash cameras and are a size of 224 pixels * 224 pixels. After several training runs, the model is overfitting on the training data. Which actions should the ML specialist take to address this problem? (Select TWO.)
A. Use Amazon SageMaker Ground Truth to label the unlabeled images B. Use image preprocessing to transform the images into grayscale images. C. Use data augmentation to rotate and translate the labeled images. D. Replace the activation of the last layer with a sigmoid. E. Use the Amazon SageMaker k-nearest neighbors (k-NN) algorithm to label theunlabeled images.
Answer: C,E Explanation: Data augmentation is a technique to increase the size and diversity of the trainingdata by applying random transformations such as rotation, translation, scaling,flipping, etc. This can help reduce overfitting and improve the generalization of themodel. Data augmentation can be done using the Amazon SageMaker imageclassification algorithm, which supports various augmentation options such ashorizontal_flip, vertical_flip, rotate, brightness, contrast, etc1The Amazon SageMaker k-nearest neighbors (k-NN) algorithm is a supervisedlearning algorithm that can be used to label unlabeled data based on the similarityto the labeled data. The k-NN algorithm assigns a label to an unlabeled instanceby finding the k closest labeled instances in the feature space and taking amajority vote among their labels. This can help increase the size and diversity ofthe training data and reduce overfitting. The k-NN algorithm can be used with theAmazon SageMaker image classification algorithm by extracting features from theimages using a pre-trained model and then applying the k-NN algorithm on thefeature vectors2Using Amazon SageMaker Ground Truth to label the unlabeled images is not agood option because it is a manual and costly process that requires humanannotators. Moreover, it does not address the issue of overfitting on the existinglabeled data.Using image preprocessing to transform the images into grayscale images is not agood option because it reduces the amount of information and variation in theimages, which can degrade the performance of the model. Moreover, it does notaddress the issue of overfitting on the existing labeled data.Replacing the activation of the last layer with a sigmoid is not a good optionbecause it is not suitable for a multi-class classification problem. A sigmoidactivation function outputs a value between 0 and 1, which can be interpreted as aprobability of belonging to a single class. However, for a multi-class classificationproblem, the output should be a vector of probabilities that sum up to 1, which canbe achieved by using a softmax activation function.References:1: Image classification algorithm - Amazon SageMaker2: k-nearest neighbors (k-NN) algorithm - Amazon SageMaker
Question # 60
An obtain relator collects the following data on customer orders: demographics, behaviors, location, shipment progress, and delivery time. A data scientist joins all the collected datasets. The result is a single dataset that includes 980 variables. The data scientist must develop a machine learning (ML) model to identify groups of customers who are likely to respond to a marketing campaign. Which combination of algorithms should the data scientist use to meet this requirement? (Select TWO.)
A. Latent Dirichlet Allocation (LDA) B. K-means C. Se mantic feg mentation D. Principal component analysis (PCA) E. Factorization machines (FM)
Answer: B,D Explanation:The data scientist should use K-means and principal component analysis (PCA) to meetthis requirement. K-means is a clustering algorithm that can group customers based ontheir similarity in the feature space. PCA is a dimensionality reduction technique that cantransform the original 980 variables into a smaller set of uncorrelated variables that capturemost of the variance in the data. This can help reduce the computational cost and noise inthe data, and improve the performance of the clustering algorithm.References:Clustering - Amazon SageMakerDimensionality Reduction - Amazon SageMaker
Question # 61
A data engineer needs to provide a team of data scientists with the appropriate dataset to run machine learning training jobs. The data will be stored in Amazon S3. The data engineer is obtaining the data from an Amazon Redshift database and is using join queries to extract a single tabular dataset. A portion of the schema is as follows: ...traction Timestamp (Timeslamp) ...JName(Varchar) ...JNo (Varchar) Th data engineer must provide the data so that any row with a CardNo value of NULL is removed. Also, the TransactionTimestamp column must be separated into a TransactionDate column and a isactionTime column Finally, the CardName column must be renamed to NameOnCard. The data will be extracted on a monthly basis and will be loaded into an S3 bucket. The solution must minimize the effort that is needed to set up infrastructure for the ingestion and transformation. The solution must be automated and must minimize the load on the Amazon Redshift cluster Which solution meets these requirements?
A. Set up an Amazon EMR cluster Create an Apache Spark job to read the data from theAmazon Redshift cluster and transform the data. Load the data into the S3 bucket. Schedule the job to run monthly. B. Set up an Amazon EC2 instance with a SQL client tool, such as SQL Workbench/J. toquery the data from the Amazon Redshift cluster directly. Export the resulting dataset into aWe. Upload the file into the S3 bucket. Perform these tasks monthly. C. Set up an AWS Glue job that has the Amazon Redshift cluster as the source and the S3bucket as the destination Use the built-in transforms Filter, Map. and RenameField toperform the required transformations. Schedule the job to run monthly. D. Use Amazon Redshift Spectrum to run a query that writes the data directly to the S3bucket. Create an AWS Lambda function to run the query monthly
Answer: C Explanation: The best solution for this scenario is to set up an AWS Glue job that has the Amazon Redshift cluster as the source and the S3 bucket as the destination, and use thebuilt-in transforms Filter, Map, and RenameField to perform the required transformations.This solution has the following advantages:It minimizes the effort that is needed to set up infrastructure for the ingestion andtransformation, as AWS Glue is a fully managed service that provides a serverlessApache Spark environment, a graphical interface to define data sources andtargets, and a code generation feature to create and edit scripts1.It automates the extraction and transformation process, as AWS Glue canschedule the job to run monthly, and handle the connection, authentication, andconfiguration of the Amazon Redshift cluster and the S3 bucket2.It minimizes the load on the Amazon Redshift cluster, as AWS Glue can read thedata from the cluster in parallel and use a JDBC connection that supports SSLencryption3.It performs the required transformations, as AWS Glue can use the built-intransforms Filter, Map, and RenameField to remove the rows with NULL values,split the timestamp column into date and time columns, and rename the card namecolumn, respectively4.The other solutions are not optimal or suitable, because they have the following drawbacks:A: Setting up an Amazon EMR cluster and creating an Apache Spark job to readthe data from the Amazon Redshift cluster and transform the data is not the mostefficient or convenient solution, as it requires more effort and resources toprovision, configure, and manage the EMR cluster, and to write and maintain theSpark code5.B: Setting up an Amazon EC2 instance with a SQL client tool to query the datafrom the Amazon Redshift cluster directly and export the resulting dataset into aCSV file is not a scalable or reliable solution, as it depends on the availability andperformance of the EC2 instance, and the manual execution and upload of theSQL queries and the CSV file6.D: Using Amazon Redshift Spectrum to run a query that writes the data directly tothe S3 bucket and creating an AWS Lambda function to run the query monthly isnot a feasible solution, as Amazon Redshift Spectrum does not support writingdata to external tables or S3 buckets, only reading data from them7.References:1: What Is AWS Glue? - AWS Glue2: Populating the Data Catalog - AWS Glue 3: Best Practices When Using AWS Glue with Amazon Redshift - AWS Glue4: Built-In Transforms - AWS Glue5: What Is Amazon EMR? - Amazon EMR6: Amazon EC2 - Amazon Web Services (AWS)7: Using Amazon Redshift Spectrum to Query External Data - Amazon Redshift
Question # 62
A manufacturing company wants to create a machine learning (ML) model to predict when equipment is likely to fail. A data science team already constructed a deep learning model by using TensorFlow and a custom Python script in a local environment. The company wants to use Amazon SageMaker to train the model. Which TensorFlow estimator configuration will train the model MOST cost-effectively?
A. Turn on SageMaker Training Compiler by addingcompiler_config=TrainingCompilerConfig() as a parameter. Pass the script to the estimatorin the call to the TensorFlow fit() method. B. Turn on SageMaker Training Compiler by addingcompiler_config=TrainingCompilerConfig() as a parameter. Turn on managed spot trainingby setting the use_spot_instances parameter to True. Pass the script to the estimator in thecall to the TensorFlow fit() method. C. Adjust the training script to use distributed data parallelism. Specify appropriate valuesfor the distribution parameter. Pass the script to the estimator in the call to the TensorFlowfit() method. D. Turn on SageMaker Training Compiler by addingcompiler_config=TrainingCompilerConfig() as a parameter. Set theMaxWaitTimeInSeconds parameter to be equal to the MaxRuntimeInSeconds parameter.Pass the script to the estimator in the call to the TensorFlow fit() method.
Answer: B Explanation: The TensorFlow estimator configuration that will train the model most costeffectively is to turn on SageMaker Training Compiler by addingcompiler_config=TrainingCompilerConfig() as a parameter, turn on managed spot trainingby setting the use_spot_instances parameter to True, and pass the script to the estimatorin the call to the TensorFlow fit() method. This configuration will optimize the model for thetarget hardware platform, reduce the training cost by using Amazon EC2 Spot Instances,and use the custom Python script without any modification.SageMaker Training Compiler is a feature of Amazon SageMaker that enables you tooptimize your TensorFlow, PyTorch, and MXNet models for inference on a variety of targethardware platforms. SageMaker Training Compiler can improve the inference performanceand reduce the inference cost of your models by applying various compilation techniques,such as operator fusion, quantization, pruning, and graph optimization. You can enableSageMaker Training Compiler by adding compiler_config=TrainingCompilerConfig() as aparameter to the TensorFlow estimator constructor1.Managed spot training is another feature of Amazon SageMaker that enables you to useAmazon EC2 Spot Instances for training your machine learning models. Amazon EC2 SpotInstances let you take advantage of unused EC2 capacity in the AWS Cloud. SpotInstances are available at up to a 90% discount compared to On-Demand prices. You canuse Spot Instances for various fault-tolerant and flexible applications. You can enablemanaged spot training by setting the use_spot_instances parameter to True and specifyingthe max_wait and max_run parameters in the TensorFlow estimator constructor2. The TensorFlow estimator is a class in the SageMaker Python SDK that allows you to trainand deploy TensorFlow models on SageMaker. You can use the TensorFlow estimator torun your own Python script on SageMaker, without any modification. You can pass thescript to the estimator in the call to the TensorFlow fit() method, along with the location ofyour input data. The fit() method starts a SageMaker training job and runs your script as theentry point in the training containers3.The other options are either less cost-effective or more complex to implement. Adjustingthe training script to use distributed data parallelism would require modifying the script andspecifying appropriate values for the distribution parameter, which could increase thedevelopment time and complexity. Setting the MaxWaitTimeInSeconds parameter to beequal to the MaxRuntimeInSeconds parameter would not reduce the cost, as it would onlyspecify the maximum duration of the training job, regardless of the instance type.References:1: Optimize TensorFlow, PyTorch, and MXNet models for deployment usingAmazon SageMaker Training Compiler | AWS Machine Learning Blog2: Managed Spot Training: Save Up to 90% On Your Amazon SageMaker TrainingJobs | AWS Machine Learning Blog3: sagemaker.tensorflow — sagemaker 2.66.0 documentation
Question # 63
A data scientist obtains a tabular dataset that contains 150 correlated features with different ranges to build a regression model. The data scientist needs to achieve more efficient model training by implementing a solution that minimizes impact on the model's performance. The data scientist decides to perform a principal component analysis (PCA) preprocessing step to reduce the number of features to a smaller set of independent features before the data scientist uses the new features in the regression model. Which preprocessing step will meet these requirements?
A. Use the Amazon SageMaker built-in algorithm for PCA on the dataset to transform thedata B. Load the data into Amazon SageMaker Data Wrangler. Scale the data with a Min MaxScaler transformation step Use the SageMaker built-in algorithm for PCA on the scaleddataset to transform the data. C. Reduce the dimensionality of the dataset by removing the features that have the highestcorrelation Load the data into Amazon SageMaker Data Wrangler Perform a StandardScaler transformation step to scale the data Use the SageMaker built-in algorithm for PCAon the scaled dataset to transform the data D. Reduce the dimensionality of the dataset by removing the features that have the lowestcorrelation. Load the data into Amazon SageMaker Data Wrangler. Perform a Min MaxScaler transformation step to scale the data. Use the SageMaker built-in algorithm for PCAon the scaled dataset to transform the data.
Answer: B Explanation: Principal component analysis (PCA) is a technique for reducing thedimensionality of datasets, increasing interpretability but at the same time minimizinginformation loss. It does so by creating new uncorrelated variables that successivelymaximize variance. PCA is useful when dealing with datasets that have a large number ofcorrelated features. However, PCA is sensitive to the scale of the features, so it isimportant to standardize or normalize the data before applying PCA. Amazon SageMakerprovides a built-in algorithm for PCA that can be used to transform the data into a lowerdimensionalrepresentation. Amazon SageMaker Data Wrangler is a tool that allows datascientists to visually explore, clean, and prepare data for machine learning. Data Wranglerprovides various transformation steps that can be applied to the data, such as scaling,encoding, imputing, etc. Data Wrangler also integrates with SageMaker built-in algorithms,such as PCA, to enable feature engineering and dimensionality reduction. Therefore,option B is the correct answer, as it involves scaling the data with a Min Max Scalertransformation step, which rescales the data to a range of [0, 1], and then using theSageMaker built-in algorithm for PCA on the scaled dataset to transform the data. Option Ais incorrect, as it does not involve scaling the data before applying PCA, which can affectthe results of the dimensionality reduction. Option C is incorrect, as it involves removing thefeatures that have the highest correlation, which can lead to information loss and reducethe performance of the regression model. Option D is incorrect, as it involves removing thefeatures that have the lowest correlation, which can also lead to information loss andreduce the performance of the regression model. References:Principal Component Analysis (PCA) - Amazon SageMakerScale data with a Min Max Scaler - Amazon SageMaker Data WranglerUse Amazon SageMaker built-in algorithms - Amazon SageMaker Data Wrangler
Question # 64
A manufacturing company has structured and unstructured data stored in an Amazon S3 bucket A Machine Learning Specialist wants to use SQL to run queries on this data. Whichsolution requires the LEAST effort to be able to query this data?
A. Use AWS Data Pipeline to transform the data and Amazon RDS to run queries. B. Use AWS Glue to catalogue the data and Amazon Athena to run queries C. Use AWS Batch to run ETL on the data and Amazon Aurora to run the quenes D. Use AWS Lambda to transform the data and Amazon Kinesis Data Analytics to run queries
Answer: B Explanation: AWS Glue is a serverless data integration service that can catalogue, clean,enrich, and move data between various data stores. Amazon Athena is an interactive queryservice that can run SQL queries on data stored in Amazon S3. By using AWS Glue tocatalogue the data and Amazon Athena to run queries, the Machine Learning Specialistcan leverage the existing data in Amazon S3 without any additional data transformation orloading. This solution requires the least effort compared to the other options, which involvemore complex and costly data processing and storage services. References: AWSGlue, Amazon Athena
Question # 65
A Machine Learning Specialist is using Amazon Sage Maker to host a model for a highly available customer-facing application. The Specialist has trained a new version of the model, validated it with historical data, and now wants to deploy it to production To limit any risk of a negative customer experience, the Specialist wants to be able to monitor the model and roll it back, if needed What is the SIMPLEST approach with the LEAST risk to deploy the model and roll it back, if needed?
A. Create a SageMaker endpoint and configuration for the new model version. Redirectproduction traffic to the new endpoint by updating the client configuration. Revert traffic tothe last version if the model does not perform as expected. B. Create a SageMaker endpoint and configuration for the new model version. Redirectproduction traffic to the new endpoint by using a load balancer Revert traffic to the lastversion if the model does not perform as expected. C. Update the existing SageMaker endpoint to use a new configuration that is weighted tosend 5% of the traffic to the new variant. Revert traffic to the last version by resetting theweights if the model does not perform as expected. D. Update the existing SageMaker endpoint to use a new configuration that is weighted tosend 100% of the traffic to the new variant Revert traffic to the last version by resetting theweights if the model does not perform as expected.
Answer: C Explanation: Updating the existing SageMaker endpoint to use a new configuration that is weighted to send 5% of the traffic to the new variant is the simplest approach with the leastrisk to deploy the model and roll it back, if needed. This is because SageMaker supportsA/B testing, which allows the Specialist to compare the performance of different modelvariants by sending a portion of the traffic to each variant. The Specialist can monitor themetrics of each variant and adjust the weights accordingly. If the new variant does notperform as expected, the Specialist can revert traffic to the last version by resetting theweights to 100% for the old variant and 0% for the new variant. This way, the Specialist candeploy the model without affecting the customer experience and roll it back easily ifneeded. References:Amazon SageMakerDeploying models to Amazon SageMaker hosting services
Question # 66
A company is building a demand forecasting model based on machine learning (ML). In the development stage, an ML specialist uses an Amazon SageMaker notebook to perform feature engineering during work hours that consumes low amounts of CPU and memory resources. A data engineer uses the same notebook to perform data preprocessing once a day on average that requires very high memory and completes in only 2 hours. The data preprocessing is not configured to use GPU. All the processes are running well on an ml.m5.4xlarge notebook instance. The company receives an AWS Budgets alert that the billing for this month exceeds the allocated budget. Which solution will result in the MOST cost savings?
A. Change the notebook instance type to a memory optimized instance with the samevCPU number as the ml.m5.4xlarge instance has. Stop the notebook when it is not in use.Run both data preprocessing and feature engineering development on that instance. B. Keep the notebook instance type and size the same. Stop the notebook when it is not inuse. Run data preprocessing on a P3 instance type with the same memory as theml.m5.4xlarge instance by using Amazon SageMaker Processing. C. Change the notebook instance type to a smaller general purpose instance. Stop thenotebook when it is not in use. Run data preprocessing on an ml.r5 instance with the samememory size as the ml.m5.4xlarge instance by using Amazon SageMaker Processing. D. Change the notebook instance type to a smaller general purpose instance. Stop thenotebook when it is not in use. Run data preprocessing on an R5 instance with the samememory size as the ml.m5.4xlarge instance by using the Reserved Instance option.
Answer: B
Question # 67
A manufacturing company wants to use machine learning (ML) to automate quality control in its facilities. The facilities are in remote locations and have limited internet connectivity. The company has 20 of training data that consists of labeled images of defective product parts. The training data is in the corporate on-premises data center. The company will use this data to train a model for real-time defect detection in new parts as the parts move on a conveyor belt in the facilities. The company needs a solution that minimizes costs for compute infrastructure and that maximizes the scalability of resources for training. The solution also must facilitate the company’s use of an ML model in the lowconnectivity environments. Which solution will meet these requirements?
A. Move the training data to an Amazon S3 bucket. Train and evaluate the model by usingAmazon SageMaker. Optimize the model by using SageMaker Neo. Deploy the model on aSageMaker hosting services endpoint. B. Train and evaluate the model on premises. Upload the model to an Amazon S3 bucket.Deploy the model on an Amazon SageMaker hosting services endpoint. C. Move the training data to an Amazon S3 bucket. Train and evaluate the model by usingAmazon SageMaker. Optimize the model by using SageMaker Neo. Set up an edge devicein the manufacturing facilities with AWS IoT Greengrass. Deploy the model on the edgedevice. D. Train the model on premises. Upload the model to an Amazon S3 bucket. Set up anedge device in the manufacturing facilities with AWS IoT Greengrass. Deploy the model onthe edge device.
A company is building a predictive maintenance model based on machine learning (ML). The data is stored in a fully private Amazon S3 bucket that is encrypted at rest with AWS Key Management Service (AWS KMS) CMKs. An ML specialist must run data preprocessing by using an Amazon SageMaker Processing job that is triggered from code in an Amazon SageMaker notebook. The job should read data from Amazon S3, process it, and upload it back to the same S3 bucket. The preprocessing code is stored in a container image in Amazon Elastic Container Registry (Amazon ECR). The ML specialist needs to grant permissions to ensure a smooth data preprocessing workflow Which set of actions should the ML specialist take to meet these requirements?
A. Create an IAM role that has permissions to create Amazon SageMaker Processing jobs,S3 read and write access to the relevant S3 bucket, and appropriate KMS and ECRpermissions. Attach the role to the SageMaker notebook instance. Create an AmazonSageMaker Processing job from the notebook. B. Create an IAM role that has permissions to create Amazon SageMaker Processing jobs.Attach the role to the SageMaker notebook instance. Create an Amazon SageMakerProcessing job with an IAM role that has read and write permissions to the relevant S3bucket, and appropriate KMS and ECR permissions. C. Create an IAM role that has permissions to create Amazon SageMaker Processing jobsand to access Amazon ECR. Attach the role to the SageMaker notebook instance. Set upboth an S3 endpoint and a KMS endpoint in the default VPC. Create Amazon SageMakerProcessing jobs from the notebook. D. Create an IAM role that has permissions to create Amazon SageMaker Processing jobs.Attach the role to the SageMaker notebook instance. Set up an S3 endpoint in the defaultVPC. Create Amazon SageMaker Processing jobs with the access key and secret key ofthe IAM user with appropriate KMS and ECR permissions.
Answer: D
Question # 69
A machine learning specialist is developing a proof of concept for government users whose primary concern is security. The specialist is using Amazon SageMaker to train a convolutional neural network (CNN) model for a photo classifier application. The specialist wants to protect the data so that it cannot be accessed and transferred to a remote host by malicious code accidentally installed on the training container. Which action will provide the MOST secure protection?
A. Remove Amazon S3 access permissions from the SageMaker execution role. B. Encrypt the weights of the CNN model. C. Encrypt the training and validation dataset. D. Enable network isolation for training jobs.
Answer: D
Question # 70
A company wants to create a data repository in the AWS Cloud for machine learning (ML) projects. The company wants to use AWS to perform complete ML lifecycles and wants to use Amazon S3 for the data storage. All of the company’s data currently resides on premises and is 40 in size. The company wants a solution that can transfer and automatically update data between the on-premises object storage and Amazon S3. The solution must support encryption, scheduling, monitoring, and data integrity validation. Which solution meets these requirements?
A. Use the S3 sync command to compare the source S3 bucket and the destination S3bucket. Determine which source files do not exist in the destination S3 bucket and whichsource files were modified. B. Use AWS Transfer for FTPS to transfer the files from the on-premises storage toAmazon S3. C. Use AWS DataSync to make an initial copy of the entire dataset. Schedule subsequentincremental transfers of changing data until the final cutover from on premises to AWS. D. Use S3 Batch Operations to pull data periodically from the on-premises storage. EnableS3 Versioning on the S3 bucket to protect against accidental overwrites.
Answer: C Explanation: Configure DataSync to make an initial copy of your entire dataset, andschedule subsequent incremental transfers of changing data until the final cut-over fromon-premises to AWS. Reference: https://aws.amazon.com/datasync/faqs/
Question # 71
A machine learning (ML) specialist must develop a classification model for a financial services company. A domain expert provides the dataset, which is tabular with 10,000 rows and 1,020 features. During exploratory data analysis, the specialist finds no missing values and a small percentage of duplicate rows. There are correlation scores of > 0.9 for 200 feature pairs. The mean value of each feature is similar to its 50th percentile. Which feature engineering strategy should the ML specialist use with Amazon SageMaker?
A. Apply dimensionality reduction by using the principal component analysis (PCA)algorithm. B. Drop the features with low correlation scores by using a Jupyter notebook. C. Apply anomaly detection by using the Random Cut Forest (RCF) algorithm. D. Concatenate the features with high correlation scores by using a Jupyter notebook.
Answer: C
Question # 72
A Machine Learning Specialist is designing a scalable data storage solution for Amazon SageMaker. There is an existing TensorFlow-based model implemented as a train.py script that relies on static training data that is currently stored as TFRecords Which method of providing training data to Amazon SageMaker would meet the business requirements with the LEAST development overhead?
A. Use Amazon SageMaker script mode and use train.py unchanged. Point the AmazonSageMaker training invocation to the local path of the data without reformatting the trainingdata. B. Use Amazon SageMaker script mode and use train.py unchanged. Put the TFRecorddata into an Amazon S3 bucket. Point the Amazon SageMaker training invocation to the S3bucket without reformatting the training data. C. Rewrite the train.py script to add a section that converts TFRecords to protobuf andingests the protobuf data instead of TFRecords. D. Prepare the data in the format accepted by Amazon SageMaker. Use AWS Glue orAWS Lambda to reformat and store the data in an Amazon S3 bucket.
Answer: B Explanation: https://github.com/aws-samples/amazon-sagemaker-script%02mode/blob/master/tf-horovod-inference-pipeline/train.p
Question # 73
A data scientist is using the Amazon SageMaker Neural Topic Model (NTM) algorithm to build a model that recommends tags from blog posts. The raw blog post data is stored in an Amazon S3 bucket in JSON format. During model evaluation, the data scientist discovered that the model recommends certain stopwords such as "a," "an,” and "the" as tags to certain blog posts, along with a few rare words that are present only in certain blog entries. After a few iterations of tag review with the content team, the data scientist notices that the rare words are unusual but feasible. The data scientist also must ensure that the tag recommendations of the generated model do not include the stopwords. What should the data scientist do to meet these requirements?
A. Use the Amazon Comprehend entity recognition API operations. Remove the detectedwords from the blog post data. Replace the blog post data source in the S3 bucket. B. Run the SageMaker built-in principal component analysis (PCA) algorithm with the blogpost data from the S3 bucket as the data source. Replace the blog post data in the S3bucket with the results of the training job. C. Use the SageMaker built-in Object Detection algorithm instead of the NTM algorithm forthe training job to process the blog post data. D. Remove the stopwords from the blog post data by using the Count Vectorizer function inthe scikit-learn library. Replace the blog post data in the S3 bucket with the results of thevectorizer.
Answer: D Reference: https://towardsdatascience.com/natural-language-processing-count%02vectorization-with-scikit-learn-e7804269bb5e
Question # 74
A Data Scientist received a set of insurance records, each consisting of a record ID, the final outcome among 200 categories, and the date of the final outcome. Some partial information on claim contents is also provided, but only for a few of the 200 categories. For each outcome category, there are hundreds of records distributed over the past 3 years. The Data Scientist wants to predict how many claims to expect in each category from month to month, a few months in advance. What type of machine learning model should be used?
A. Classification month-to-month using supervised learning of the 200 categories based onclaim contents. B. Reinforcement learning using claim IDs and timestamps where the agent will identifyhow many claims in each category to expect from month to month. C. Forecasting using claim IDs and timestamps to identify how many claims in eachcategory to expect from month to month. D. Classification with supervised learning of the categories for which partial information onclaim contents is provided, and forecasting using claim IDs and timestamps for all other categories.
Answer: C
Question # 75
A Machine Learning Specialist uploads a dataset to an Amazon S3 bucket protected with server-side encryption using AWS KMS. How should the ML Specialist define the Amazon SageMaker notebook instance so it can read the same dataset from Amazon S3?
A. Define security group(s) to allow all HTTP inbound/outbound traffic and assign thosesecurity group(s) to the Amazon SageMaker notebook instance. B. onfigure the Amazon SageMaker notebook instance to have access to the VPC. Grantpermission in the KMS key policy to the notebook’s KMS role. C. Assign an IAM role to the Amazon SageMaker notebook with S3 read access to thedataset. Grant permission in the KMS key policy to that role. D. Assign the same KMS key used to encrypt data in Amazon S3 to the AmazonSageMaker notebook instance.
Answer: D Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/encryption-at-rest.html
Question # 76
A company provisions Amazon SageMaker notebook instances for its data science team and creates Amazon VPC interface endpoints to ensure communication between the VPC and the notebook instances. All connections to the Amazon SageMaker API are contained entirely and securely using the AWS network. However, the data science team realizes that individuals outside the VPC can still connect to the notebook instances across the internet. Which set of actions should the data science team take to fix the issue?
A. Modify the notebook instances' security group to allow traffic only from the CIDR rangesof the VPC. Apply this security group to all of the notebook instances' VPC interfaces. B. Create an IAM policy that allows the sagemaker:CreatePresignedNotebooklnstanceUrland sagemaker:DescribeNotebooklnstance actions from only the VPC endpoints. Applythis policy to all IAM users, groups, and roles used to access the notebook instances. C. Add a NAT gateway to the VPC. Convert all of the subnets where the AmazonSageMaker notebook instances are hosted to private subnets. Stop and start all of thenotebook instances to reassign only private IP addresses. D. Change the network ACL of the subnet the notebook is hosted in to restrict access toanyone outside the VPC.
Answer: B Reference: https://gmoein.github.io/files/Amazon%20SageMaker.pdf
Question # 77
A data scientist is working on a public sector project for an urban traffic system. While studying the traffic patterns, it is clear to the data scientist that the traffic behavior at each light is correlated, subject to a small stochastic error term. The data scientist must model the traffic behavior to analyze the traffic patterns and reduce congestion How will the data scientist MOST effectively model the problem?
A. The data scientist should obtain a correlated equilibrium policy by formulating thisproblem as a multi-agent reinforcement learning problem. B. The data scientist should obtain the optimal equilibrium policy by formulating thisproblem as a single-agent reinforcement learning problem. C. Rather than finding an equilibrium policy, the data scientist should obtain accuratepredictors of traffic flow by using historical data through a supervised learning approach. D. Rather than finding an equilibrium policy, the data scientist should obtain accuratepredictors of traffic flow by using unlabeled simulated data representing the new trafficpatterns in the city and applying an unsupervised learning approach.
Answer: D Reference: https://www.hindawi.com/journals/jat/2021/8878011/
Question # 78
A company is converting a large number of unstructured paper receipts into images. The company wants to create a model based on natural language processing (NLP) to find relevant entities such as date, location, and notes, as well as some custom entities such as receipt numbers. The company is using optical character recognition (OCR) to extract text for data labeling. However, documents are in different structures and formats, and the company is facing challenges with setting up the manual workflows for each document type. Additionally, the company trained a named entity recognition (NER) model for custom entity detection using a small sample size. This model has a very low confidence score and will require retraining with a large dataset. Which solution for text extraction and entity detection will require the LEAST amount of effort?
A. Extract text from receipt images by using Amazon Textract. Use the AmazonSageMaker BlazingText algorithm to train on the text for entities and custom entities. B. Extract text from receipt images by using a deep learning OCR model from the AWSMarketplace. Use the NER deep learning model to extract entities. C. Extract text from receipt images by using Amazon Textract. Use Amazon Comprehendfor entity detection, and use Amazon Comprehend custom entity recognition for customentity detection. D. Extract text from receipt images by using a deep learning OCR model from the AWSMarketplace. Use Amazon Comprehend for entity detection, and use Amazon Comprehendcustom entity recognition for custom entity detection.
Answer: C Reference: https://aws.amazon.com/blogs/machine-learning/building-an-nlp-powered%02search-index-with-amazon-textract-and-amazon-comprehend
Question # 79
A machine learning specialist is developing a regression model to predict rental rates from rental listings. A variable named Wall_Color represents the most prominent exterior wall color of the property. The following is the sample data, excluding all other variables:
The specialist chose a model that needs numerical input data. Which feature engineering approaches should the specialist use to allow the regression model to learn from the Wall_Color data? (Choose two.)
A. Apply integer transformation and set Red = 1, White = 5, and Green = 10. B. Add new columns that store one-hot representation of colors. C. Replace the color name string by its length. D. Create three columns to encode the color in RGB format. E. Replace each color name by its training set frequency.
Answer: A,D
Question # 80
A company has set up and deployed its machine learning (ML) model into production with an endpoint using Amazon SageMaker hosting services. The ML team has configured automatic scaling for its SageMaker instances to support workload changes. During testing, the team notices that additional instances are being launched before the new instances are ready. This behavior needs to change as soon as possible. How can the ML team solve this issue?
A. Decrease the cooldown period for the scale-in activity. Increase the configuredmaximum capacity of instances. B. Replace the current endpoint with a multi-model endpoint using SageMaker. C. Set up Amazon API Gateway and AWS Lambda to trigger the SageMaker inferenceendpoint. D. Increase the cooldown period for the scale-out activity.
Answer: A Reference: https://aws.amazon.com/blogs/machine-learning/configuring-autoscaling%02inference-endpoints-in-amazon-sagemaker/
Question # 81
A power company wants to forecast future energy consumption for its customers in residential properties and commercial business properties. Historical power consumption data for the last 10 years is available. A team of data scientists who performed the initial data analysis and feature selection will include the historical power consumption data and data such as weather, number of individuals on the property, and public holidays. The data scientists are using Amazon Forecast to generate the forecasts. Which algorithm in Forecast should the data scientists use to meet these requirements?
A. Autoregressive Integrated Moving Average (AIRMA) B. Exponential Smoothing (ETS) C. Convolutional Neural Network - Quantile Regression (CNN-QR) D. Prophet
Answer: B Reference: https://jesit.springeropen.com/articles/10.1186/s43067-020-00021-8
Question # 82
A company ingests machine learning (ML) data from web advertising clicks into an Amazon S3 data lake. Click data is added to an Amazon Kinesis data stream by using the Kinesis Producer Library (KPL). The data is loaded into the S3 data lake from the data stream by using an Amazon Kinesis Data Firehose delivery stream. As the data volume increases, an ML specialist notices that the rate of data ingested into Amazon S3 is relatively constant. There also is an increasing backlog of data for Kinesis Data Streams and Kinesis Data Firehose to ingest. Which next step is MOST likely to improve the data ingestion rate into Amazon S3?
A. Increase the number of S3 prefixes for the delivery stream to write to. B. Decrease the retention period for the data stream. C. Increase the number of shards for the data stream. D. Add more consumers using the Kinesis Client Library (KCL).
Answer: C
Question # 83
A machine learning specialist is running an Amazon SageMaker endpoint using the built-in object detection algorithm on a P3 instance for real-time predictions in a company's production application. When evaluating the model's resource utilization, the specialist notices that the model is using only a fraction of the GPU. Which architecture changes would ensure that provisioned resources are being utilized effectively?
A. Redeploy the model as a batch transform job on an M5 instance. B. Redeploy the model on an M5 instance. Attach Amazon Elastic Inference to theinstance. C. Redeploy the model on a P3dn instance. D. Deploy the model onto an Amazon Elastic Container Service (Amazon ECS) clusterusing a P3 instance.
Answer: B Explanation: https://aws.amazon.com/machine-learning/elastic-inference/
Question # 84
A company wants to predict the sale prices of houses based on available historical sales data. The target variable in the company’s dataset is the sale price. The features include parameters such as the lot size, living area measurements, non-living area measurements, number of bedrooms, number of bathrooms, year built, and postal code. The company wants to use multi-variable linear regression to predict house sale prices. Which step should a machine learning specialist take to remove features that are irrelevant for the analysis and reduce the model’s complexity?
A. Plot a histogram of the features and compute their standard deviation. Remove featureswith high variance. B. Plot a histogram of the features and compute their standard deviation. Remove featureswith low variance. C. Build a heatmap showing the correlation of the dataset against itself. Remove featureswith low mutual correlation scores. D. Run a correlation check of all features against the target variable. Remove features withlow target variable correlation scores.
Answer: D
Amazon MLS-C01 Frequently Asked Questions
Customers Feedback
What our clients say about MLS-C01 Learning Materials
Emma
Dec 22, 2024
I am happy to inform you that I have passed the MLS-C01 exam and can confirm that the dump is valid.
Jameson Singh
Dec 21, 2024
I was recommended these dumps by a friend and they turned out to be fantastic. I passed the AWS Certified Machine Learning - Specialty exam thanks to salesforcexamdumps.com
Oliver Walker
Dec 21, 2024
I successfully utilized the "2 for discount" offer and also shared the exam with a friend as I only needed to pass one exam. I am pleased to share that the strategy worked out well for both of us, as we both passed. I would like to express my gratitude to the team. Thank you!
Khadija
Dec 20, 2024
The MLS-C01 dumps are excellent! They helped me prepare for the exam in a short amount of time, and I passed with flying colors.
William Chen
Dec 20, 2024
The MLS-C01 exam dumps have made the preparation process incredibly easy. I passed with a 94% marks.
Penelope Martinez
Dec 19, 2024
If you want to pass the AWS Machine Learning Specialty exam on the first try, then the MLS-C01 dumps are the way to go. They are easy to follow and provide everything you need to succeed.
Nathanial Wright
Dec 19, 2024
The MLS-C01 dumps are a game-changer. They helped me identify my weaknesses and focus my study efforts. I highly recommend them.
Roma
Dec 18, 2024
I tried other study materials, but the MLS-C01 dumps were the most effective. They covered all the important topics, and the explanations were clear and concise. Thanks Saleforcexamdumps.com
Xander Reyes
Dec 18, 2024
I was skeptical at first, but the MLS-C01 dumps exceeded my expectations. They are a must-have for anyone taking the AWS Machine Learning Specialty exam I got 910/1000 thanks.
Mason Rodriguez
Dec 17, 2024
Salesforcexamdumps.com is a fantastic website The questions and explanations provided are top-notch, and the MLS-C01 practice Question are a great way to test your readiness. Highly recommended!
Leave a comment
Your email address will not be published. Required fields are marked *
Leave a comment
Your email address will not be published. Required fields are marked *