Microsoft Azure DP-100 Sample Questions
Which Azure service provides automated machine learning capabilities?
- A. Azure Databricks
- B. Azure Machine Learning
- C. Azure Stream Analytics
- D. Azure Event Hubs
Answer: B. Azure Machine Learning
Explanation: Azure Machine Learning is a cloud-based service that provides automated machine learning capabilities. It allows data scientists and developers to build, train, and deploy machine learning models at scale.
Which type of machine learning model can be used for predicting continuous values, such as stock prices or temperature?
- A. Regression
- B. Classification
- C. Clustering
- D. Reinforcement
Answer: A. Regression
Explanation: Regression is a type of machine learning model that can be used for predicting continuous values. It works by finding a mathematical relationship between input features and output values.
Which Azure service can be used to preprocess data before feeding it into a machine learning model?
- A. Azure Stream Analytics
- B. Azure Data Factory
- C. Azure Databricks
- D. Azure Functions
Answer: C. Azure Databricks
Explanation: Azure Databricks is a cloud-based service that provides a collaborative workspace for data engineering, machine learning, and analytics. It can be used to preprocess data before feeding it into a machine learning model.
Which type of machine learning model can be used for grouping similar data points together?
- A. Regression
- B. Classification
- C. Clustering
- D. Reinforcement
Answer: C. Clustering
Explanation: Clustering is a type of machine learning model that can be used for grouping similar data points together. It works by finding patterns in the data and grouping similar data points together based on those patterns.
Which Azure service can be used for real-time data processing and analysis?
- A. Azure Event Hubs
- B. Azure Stream Analytics
- C. Azure Databricks
- D. Azure Machine Learning
Answer: B. Azure Stream Analytics
Explanation: Azure Stream Analytics is a cloud-based service that can be used for real-time data processing and analysis. It allows you to process streaming data from various sources, such as Azure Event Hubs or IoT devices, and generate real-time insights and alerts.
Which Azure service provides the ability to train machine learning models on GPUs?
- A. Azure Machine Learning
- B. Azure Databricks
- C. Azure Stream Analytics
- D. Azure Functions
Answer: A. Azure Machine Learning
Explanation: Azure Machine Learning provides the ability to train machine learning models on GPUs (graphics processing units) for faster performance. This is useful for training large, complex models that require a lot of computational power.
Which type of machine learning model can be used for predicting categorical values, such as the color of a car or the outcome of a sports game?
- A. Regression
- B. Classification
- C. Clustering
- D. Reinforcement
Answer: B. Classification
Explanation: Classification is a type of machine learning model that can be used for predicting categorical values. It works by assigning input data to a predefined category based on its characteristics.
Which Azure service provides a serverless environment for running data processing workflows?
- A. Azure Stream Analytics
- B. Azure Databricks
- C. Azure Data Factory
- D. Azure Functions
Answer: D. Azure Functions
Explanation: Azure Functions provides a serverless environment for running data processing workflows. It allows you to execute code in response to events and triggers, such as new data arriving in a storage account or a message being sent to a service bus.
Which type of machine learning model can be used for making sequential decisions, such as in a game or an autonomous vehicle?
- A. Regression
- B. Classification
- C. Clustering
- D. Reinforcement
Answer: D. Reinforcement
Explanation: Reinforcement is a type of machine learning model that can be used for making sequential decisions. It works by learning to take actions in an environment to maximize a reward signal over time.
Which Azure service can be used to monitor and manage machine learning models in production?
- A. Azure Machine Learning
- B. Azure Databricks
- C. Azure Stream Analytics
- D. Azure Monitor
Answer: A. Azure Machine Learning
Explanation: Azure Machine Learning provides tools for monitoring and managing machine learning models in production. It allows you to track model performance and make updates to models as needed. Azure Monitor can also be used to monitor the health of the system and diagnose issues.
Question 1. You have the responsibility to build a team data science environment. Over 20 GB of data will be required to train models in machine learning pipelines.
Following are the requirements:
- Caffe2 or Chainer frameworks must be used to build the models.
- To build machine learning pipelines and train models on their personal devices, data scientists must be able to use a data science environment that works across connected and disconnected networks.
When connected to a network, personal devices should be able to update machine learning pipelines. You have to choose a data science environment. When connected to a network, machine learning pipeline updates should be supported on personal devices. You must choose a data science environment.
Which of the following environments should you use?
- A. Azure Machine Learning Service
- B. Azure Machine Learning Studio
- C. Azure Databricks
- D. Azure Kubernetes Service (AKS)
Correct Answer: A
Explanation: Microsoft’s Data Science Virtual Machine (DSVM) is a customized image on Azure that’s pre-configured to make working with data science easier. DSVM supports both Caffe2 and Chainer and integrates with Azure Machine Learning.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/overview
Question 2. As part of your research, you propose to implement a machine learning algorithm for predicting stock prices using a PostgreSQL database and GPU processing. To do this, you must create a virtual machine pre-configured with the required tools. How would you proceed?
- A. Creating a Data Science Virtual Machine (DSVM) Windows edition.
- B. Creating a Geo Al Data Science Virtual Machine (Geo-DSVM) Windows edition.
- C. Deep Learning Virtual Machine (DLVM) Linux edition.
- D. Creating a Deep Learning Virtual Machine (DLVM) Windows edition.
Correct Answer: A
Explanation: With DSVM, you can train your models using deep learning algorithms that are based on graphics processing units (GPUs).
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/overview
Question 3. Data must be stored in Azure Blob Storage to support Azure Machine Learning, and the data must be transferred to Azure Blob Storage.
How can this goal be achieved?
- A. Bulk Insert SQL Query
- B. AzCopy
- C. Python script
- D. Azure Storage Explorer
- E. Bulk Copy Program (BCP)
Correct Answer: BCD
Explanation: Data can be moved to and from Azure Blob storage using different technologies:
- Azure Storage-Explorer
- AzCopy
- Python
- SSIS
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/move-azure-blob
Question 4. A large dataset from Azure Machine Learning Studio must be formatted for Weka before it can be imported.
Which module should be implemented?
- A. Convert to CSV
- B. Convert to Dataset
- C. Convert to ARFF
- D. Convert to SVMLight
Correct Answer: C
Explanation: The module Convert to ARFF in Azure Machine Learning Studio is used to transform datasets and results in Azure Machine Learning to attribute-relation files capable of being read by Weka toolsets. This format is called ARFF.
Data preprocessing, classification, and feature selection are all supported by the ARFF data specification for Weka. In this format, Data is kept in a single text file and is organized by entities and their attributes.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/convert-to-arff
Question 5. When using Data Science Virtual Machines (DSVM) with open-source deep learning frameworks such as Caffe2 and PyTorch, you need to choose a pre-configured DSVM that can support these frameworks.
What should you create?
- A. Data Science Virtual Machine (Windows 2012)
- B. Data Science Virtual Machine (Linux (CentOS))
- C. Geo AI Data Science Virtual Machine using ArcGIS
- D. Data Science Virtual Machine (Windows 2016)
- E. Data Science Virtual Machine (Linux (Ubuntu))
Correct Answer: E
Data Science Virtual Machine for Linux supports Caffe2 and PyTorch.
Linux editions of the DSVM on Ubuntu 16.04 LTS and CentOS 7.4 are offered by Microsoft.
Caffe2 and PyTorch have DSVM on Ubuntu preconfigured.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/overview
Question 6. When you create a machine learning experiment using Azure Machine Learning Studio, you need to divide data into two distinct datasets.
Which module should be implemented?
- A. Assign Data to Clusters
- B. Load Trained Model
- C. Partition and Sample
- D. Tune Model-Hyperparameters
Correct Answer: C
Explanation: When partitioning and sampling using stratified split, multiple datasets are produced, and partitioned according to the rules you specified.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/partition-and-sample
Question 7. In Azure Machine Learning Studio, the Clean Missing Data module allows you to identify and resolve null or missing data in your dataset so that you can create a machine learning model.
Which parameter should be implemented?
- A. Replace with mean
- B. Remove the entire column
- C. Remove the entire row
- D. Hot Deck
- E. Custom substitution value
- F. Replace with mode
Correct Answer: C
Remove entire row: Completely erases any row in the dataset that has one or more than one missing value. You may want to use this if you have an unknown missing value.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-missing-data
Question 8. It is your intention to conduct a hands-on class on creating data visualizations using Python with several students. Each student will use a device with internet access to complete the workshop. Student devices are not set up for Python development, students do not have access to administrators for installing software on their devices, and Azure subscriptions do not become available to students. You must make sure that students can execute Python-based data visualization code.
Which Azure tool should be implemented?
- A. Anaconda Data Science Platform
- B. Azure BatchAI
- C. Azure Notebooks
- D. Azure Machine Learning Service
Correct Answer: C
Reference: https://notebooks.azure.com/
Question 9. In Azure Machine Learning Studio, you’re creating a new experiment and have a dataset with many missing values. You plan to use the Clean Missing Data because each column does not require the application of predictors. You need to pick up a data cleaning method.
Which method should be implemented?
- A. Replace using Probabilistic PCA
- B. Normalization
- C. Synthetic Minority Oversampling Technique (SMOTE)
- D. Replace using MICE
Correct Answer: A
Replace using Probabilistic PCA: This method, unlike Multiple Imputation using Chained Equations (MICE), does not require the application of predictors for each column. Rather than applying predictors for each missing value in a column, it approximates the covariance matrix for the full dataset. Thus, it may perform better for datasets that have missing values in many columns.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/clean-missing-data
Question 10. Azure Machine Learning Studio has a time series dataset and you want to split your dataset into training and testing subsets by using the Split Data tool.
Which splitting mode will you use?
- A. Recommender Split
- B. Regular Expression Split
- C. Relative Expression Split
- D. Set the Randomized Split parameter to true and split rows randomly
Correct Answer: D
Explanation: Split Rows: This option should be used if you just want to segregate the data into two parts. The percentage of data to put in each split can be specified, however, the data is divided 50-50 by default.
Reference:
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/split-data
Question 11. You have to run a script as an experiment using a Script Run Configuration, but the script relies on scipy libraries and Python packages that are not typically available on a Conda setup. For small datasets, you plan to run the experiment on your own workstation, but for larger datasets, you will run it on a remote compute cluster. You must ensure that the experiment runs successfully on local and remote computers with minimum administrative effort.
What would you do?
- A. Run the experiment using the default environment instead of specifying an environment in the run configuration.
- B. Ensure that a virtual machine (VM) with the required Python configuration is attached as a compute target and used for all experiment runs.
- C. Register and use an Environment containing the required packages for all experiments.
- D. Save the config.YAML file in the experiment folder defining the Conda packages needed.
- E. Use the default packages when running the experiment with an Estimator.
Correct Answer: C
Explanation: If there is an existing Conda environment on your local computer, then you may use the service to build an environment object. By implementing this strategy, your local interactive environment can be reused on remote runs.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments
Question 12. Your goal is to create a machine learning model. Outliers in the data are to be identified.
Which two visualizations will you use?
- A. Venn diagram
- B. Box plot
- C. ROC curve
- D. Random forest diagram
- E. Scatter plot
Correct Answer: BE
Explanation: You can use the box-plot algorithm to display outliers. Another way to quickly identify Outliers visually is by creating scatter plots.
Reference: https://blogs.msdn.microsoft.com/azuredev/2017/05/27/data-cleansing-tools-in-azure-machine-learning/
Question 13. As part of your evaluation, you will use precision as the metric for evaluating a binary classification machine learning model.
Which visualization will you use?
- A. Violin plot
- B. Gradient descent
- C. Box plot
- D. Binary classification confusion matrix
Correct Answer: D
Reference: https://machinelearningknowledge.ai/confusion-matrix-and-performance-metrics-machine-learning/
Question 14. By using Azure Machine Learning Studio, you select filter-based features from a dataset for a multi-class classifier. The dataset has categorical features that are highly correlated with output labels. You must select the appropriate feature scoring statistical method to determine the key predictors.
Which method can be implemented?
- A. Kendall correlation
- B. Spearman correlation
- C. Chi-squared
- D. Pearson correlation
Correct Answer: D
Explanation: In statistical models, Pearson’s correlation statistic, or Pearson’s correlation coefficient, is also referred to as the r-value. For any two variables, it returns a value that suggests the strength of the correlation
https://www.statisticssolutions.com/pearsons-correlation-coefficient/
Question 15. As the evaluation metric, you need to use precision for evaluating a binary classification machine learning model. Which visualization will you use?
- A. violin plot
- B. Gradient descent
- C. Scatter plot
- D. Receiver Operating Characteristic (ROC) curve
Correct Answer: D
Explanation: Receiver operating characteristic (or ROC) is a chart that compares the correctly classified labels to those that are not correctly identified.
Question 16. By using k-fold cross-validation, you must evaluate your classification model on a limited sample of data. You begin by configuring a k parameter as the number of splits. For cross-validation, you need to configure the k parameter.
Which value should you use?
- A. k=1
- B. k=10
- C. k=0.5
- D. k=0.9
Correct Answer: B
Explanation:
Leave-one-out cross-validation (LOO) is a special case of the K-fold approach when K = n (the number of observations).
The LOO cross-validation procedure is often used but typically does not provide an adequate level of diversity. Hence, the estimates from each fold are highly correlated and the average can have high variance.
The usual choice for K is 5 or 10 because it provides a good compromise between bias and variance.
Question 17. Using the designer, you have to create a new Azure Machine Learning pipeline. There is a comma-separated values (CSV) file on the website that must be used for training a model, but no dataset has been created. It is important to ingest the data from the CSV file using the least amount of administrative effort as possible into the designer pipeline.
Which module should be added to the pipeline in Designer?
- A. Convert to CSV
- B. Enter Data Manually
- C. Import Data
- D. Dataset
Correct Answer: D
Explanation: You can use the Dataset class to provide data to an Azure Machine Learning pipeline. The Dataset class allows you to point to data that lives in or is accessible from a datastore or at a Web URL. You can create instances of the Dataset class, which is abstract. One way to do so is by using a FileDataset for one or more files with delimited columns of data, or you can use an instance of TabularDataset that has been created from one or more files with delimited columns of data.
Example:
from azureml.core import Dataset
iris_tabular_dataset = Dataset.Tabular.from_delimited_files([(def_blob_store, ‘train-dataset/iris.csv’)])
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-your-first-pipeline
Question 18. To train your model, you intend to use Azure Machine Learning’s Hyperdrive feature.
Hyperdrive must be used to experiment with the following hyperparameter values:
- learning_rate: any value between 0.001 and 0.1
- batch_size: 16, 32, or 64
Hyperdrive experiments require a search space configuration.
Which two-parameter expressions will you use?
- A. a choice expression for learning_rate
- B. a uniform expression for learning_rate
- C. a normal expression for batch_size
- D. a choice expression for batch_size
- E. a uniform expression for batch_size
Correct Answer: BD
B: Continuous hyperparameters define a range of values that are continuous in nature. Supported distributions include:
- uniform(low, high) – Returns a value that is distributed evenly between low and high
In discrete hyperparameters, discrete values are selected. You can choose one or more comma-separated values
- a range object
- any arbitrary list object
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters
Question 19. You want to train a classification model from comma-separated values (CSV) data, and you’re using Azure Machine Learning studio’s Automated Machine Learning interface. The task type is set to Classification. During the Automated Machine Learning process, only linear models should be evaluated.
What should you do?
- A. Adding all algorithms other than linear ones to the blocked algorithms list.
- B. Setting the Exit criterion option to a metric score threshold.
- C. Clearing the option to perform automatic featurization.
- D. Clearing the option to enable deep learning.
- E. Setting the task type to Regression.
Correct Answer: C
Explanation: Nonlinear models can be fit automatically by automatic feature extraction.
Reference: https://econml.azurewebsites.net/spec/estimation/dml.html
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-automated-ml-for-ml-models
Question 20. You must insert a feature called CityName into a dataset and populate its column value with the text London when performing feature engineering. New features are to be added to the dataset.
Which of the following Azure Machine Learning Studio module will you implement?
- A. Edit Metadata
- B. Filter-Based Feature Selection
- C. Execute Python Script
- D. Latent Dirichlet Allocation
Correct Answer: A
Explanation: Marking columns as features is a typical metadata change.
Reference: https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/edit-metadata