DP 203

NewDumps
http://www.newdumps.tk
NewDumps 提供免費的 IT 認證题库 DEMO，下載最新的試題和答案
最新免費的 DP-203 考試題庫-免費下載試用體驗 DP-203 考題
IT Certification Guaranteed, The Easy Way!
Exam : DP-203
Title : Data Engineering on Microsoft

Azure
Vendor : Microsoft
Version : DEMO
DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 1 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

https://www.newdumpspdf.com/DP-203-exam-new-dumps.html
NO.1 You need to ensure that the Twitter feed data can be analyzed in the dedicated SQL pool. The
solution must meet the customer sentiment analytics requirements.
Which three Transaction-SQL DDL commands should you run in sequence? To answer, move the
appropriate commands from the list of commands to the answer area and arrange them in the
correct order.
NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct
orders you select.
Answer:
Explanation:
Scenario: Allow Contoso users to use PolyBase in an Azure Synapse Analytics dedicated SQL pool to
query the content of the data records that host the Twitter feeds. Data must be protected by using
row-level security (RLS). The users must be authenticated by using their own Azure AD credentials.
Box 1: CREATE EXTERNAL DATA SOURCE
External data sources are used to connect to storage accounts.
Box 2: CREATE EXTERNAL FILE FORMAT
CREATE EXTERNAL FILE FORMAT creates an external file format object that defines external data
stored in Azure Blob Storage or Azure Data Lake Storage. Creating an external file format is a
prerequisite for creating an external table.
Box 3: CREATE EXTERNAL TABLE AS SELECT
When used in conjunction with the CREATE TABLE AS SELECT statement, selecting from an external

table imports data into a table within the SQL pool. In addition to the COPY statement, external
tables are useful for loading data.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-external-tables
Topic 1, Contoso Case StudyTransactional Date
Contoso has three years of customer, transactional, operation, sourcing, and supplier data comprised
of 10 billion records stored across multiple on-premises Microsoft SQL Server servers. The SQL server
instances contain data from various operational systems. The data is loaded into the instances by
using SQL server integration Services (SSIS) packages.
You estimate that combining all product sales transactions into a company-wide sales transactions
dataset will result in a single table that contains 5 billion rows, with one row per transaction.
Most queries targeting the sales transactions data will be used to identify which products were sold
in retail stores and which products were sold online during different time period. Sales transaction
data that is older than three years will be removed monthly.
You plan to create a retail store table that will contain the address of each retail store. The table will
be approximately 2 MB. Queries for retail store sales will include the retail store addresses.
You plan to create a promotional table that will contain a promotion ID. The promotion ID will be
associated to a specific product. The product will be identified by a product ID. The table will be
approximately 5 GB.
Streaming Twitter Data
The ecommerce department at Contoso develops and Azure logic app that captures trending Twitter
feeds referencing the company's products and pushes the products to Azure Event Hubs.
Planned Changes
Contoso plans to implement the following changes:
* Load the sales transaction dataset to Azure Synapse Analytics.
* Integrate on-premises data stores with Azure Synapse Analytics by using SSIS packages.
* Use Azure Synapse Analytics to analyze Twitter feeds to assess customer sentiments about
products.
Sales Transaction Dataset Requirements
Contoso identifies the following requirements for the sales transaction dataset:
* Partition data that contains sales transaction records. Partitions must be designed to provide
efficient loads by month. Boundary values must belong: to the partition on the right.
* Ensure that queries joining and filtering sales transaction records based on product ID complete as
quickly as possible.
* Implement a surrogate key to account for changes to the retail store addresses.
* Ensure that data storage costs and performance are predictable.
* Minimize how long it takes to remove old records.
Customer Sentiment Analytics Requirement
Contoso identifies the following requirements for customer sentiment analytics:
* Allow Contoso users to use PolyBase in an A/ure Synapse Analytics dedicated SQL pool to query the
content of the data records that host the Twitter feeds. Data must be protected by using row-level
security (RLS). The users must be authenticated by using their own A/ureAD credentials.
* Maximize the throughput of ingesting Twitter feeds from Event Hubs to Azure Storage without
purchasing additional throughput or capacity units.
* Store Twitter feeds in Azure Storage by using Event Hubs Capture. The feeds will be converted into
Parquet files.

* Ensure that the data store supports Azure AD-based access control down to the object level.
* Minimize administrative effort to maintain the Twitter feed data records.
* Purge Twitter feed data records;itftaitJ are older than two years.
Data Integration Requirements
Contoso identifies the following requirements for data integration:
Use an Azure service that leverages the existing SSIS packages to ingest on-premises data into
datasets stored in a dedicated SQL pool of Azure Synaps Analytics and transform the data.
Identify a process to ensure that changes to the ingestion and transformation activities can be
version controlled and developed independently by multiple data engineers.
NO.2 You need to design a data storage structure for the product sales transactions. The solution
must meet the sales transaction dataset requirements.
What should you include in the solution? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Graphical user interface, text, application, chat or text message Description automatically generated
Box 1: Hash
Scenario:
Ensure that queries joining and filtering sales transaction records based on product ID complete as

A hash distributed table can deliver the highest query performance for joins and aggregations on
large tables.
Box 2: Set the distribution column to the sales date.
Scenario: Partition data that contains sales transaction records. Partitions must be designed to
provide efficient loads by month. Boundary values must belong to the partition on the right.
Reference:
https://rajanieshkaushikk.com/2020/09/09/how-to-choose-right-data-distribution-strategy-for-
azure-synapse/
NO.3 You need to implement the surrogate key for the retail store table. The solution must meet the
sales transaction dataset requirements.
What should you create?
A. a table that has an IDENTITY property
B. a system-versioned temporal table
C. a user-defined SEQUENCE object
D. a table that has a FOREIGN KEY constraint
Answer: A
Explanation:
Scenario: Implement a surrogate key to account for changes to the retail store addresses.
A surrogate key on a table is a column with a unique identifier for each row. The key is not generated
from the table data. Data modelers like to create surrogate keys on their tables when they design
data warehouse models. You can use the IDENTITY property to achieve this goal simply and
effectively without affecting load performance.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-
tables-identity
NO.4 You need to design a data retention solution for the Twitter feed data records. The solution
must meet the customer sentiment analytics requirements.
Which Azure Storage functionality should you include in the solution?
A. change feed
B. soft delete
C. time-based retention
D. lifecycle management
Answer: B
Explanation:
Scenario: Purge Twitter feed data records that are older than two years.
Data sets have unique lifecycles. Early in the lifecycle, people access some data often. But the need
for access often drops drastically as the data ages. Some data remains idle in the cloud and is rarely
accessed once stored.
Some data sets expire days or months after creation, while other data sets are actively read and
modified throughout their lifetimes. Azure Storage lifecycle management offers a rule-based policy
that you can use to transition blob data to the appropriate access tiers or to expire data at the end of
the data lifecycle.
Reference:

https://docs.microsoft.com/en-us/azure/storage/blobs/lifecycle-management-overview
NO.5 You need to design an analytical storage solution for the transactional data. The solution must
meet the sales transaction dataset requirements.
area.
Answer:
Explanation:
Graphical user interface, text, application, table Description automatically generated

Box 1: Round-robin
Round-robin tables are useful for improving loading speed.
Scenario: Partition data that contains sales transaction records. Partitions must be designed to
provide efficient loads by month.
Box 2: Hash
Hash-distributed tables improve query performance on large fact tables.
Reference:
tables-distribu
NO.6 You need to integrate the on-premises data sources and Azure Synapse Analytics. The solution
must meet the data integration requirements.
Which type of integration runtime should you use?
A. Azure-SSIS integration runtime
B. self-hosted integration runtime
C. Azure integration runtime
Answer: C
NO.7 You need to design a data retention solution for the Twitter teed data records. The solution
Which Azure Storage functionality should you include in the solution?
A. time-based retention
B. change feed
C. soft delete
D. Iifecycle management
Answer: C
NO.8 You need to implement an Azure Synapse Analytics database object for storing the sales
transactions data. The solution must meet the sales transaction dataset requirements.
What solution must meet the sales transaction dataset requirements.

What should you do? To answer, select the appropriate options in the answer area.
Answer:
Explanation:
Graphical user interface, text, application, table Description automatically generated
Box 1: Create table

Scenario: Load the sales transaction dataset to Azure Synapse Analytics Box 2: RANGE RIGHT FOR
VALUES Scenario: Partition data that contains sales transaction records. Partitions must be designed
to provide efficient loads by month. Boundary values must belong to the partition on the right.
RANGE RIGHT: Specifies the boundary value belongs to the partition on the right (higher values).
FOR VALUES ( boundary_value [,...n] ): Specifies the boundary values for the partition.
Scenario: Load the sales transaction dataset to Azure Synapse Analytics.
Contoso identifies the following requirements for the sales transaction dataset:
efficient loads by month. Boundary values must belong to the partition on the right.
* Ensure that queries joining and filtering sales transaction records based on product ID complete as
* Implement a surrogate key to account for changes to the retail store addresses.

* Minimize how long it takes to remove old records.

Reference:
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-azure-sql-data-warehouse
NO.9 You need to implement versioned changes to the integration pipelines. The solution must
meet the data integration requirements.
In which order should you perform the actions? To answer, move all actions from the list of actions to
the answer area and arrange them in the correct order.
Answer:
Explanation:
Graphical user interface, application Description automatically generated

Scenario: Identify a process to ensure that changes to the ingestion and transformation activities can
be version-controlled and developed independently by multiple data engineers.
Step 1: Create a repository and a main branch
You need a Git repository in Azure Pipelines, TFS, or GitHub with your app.
Step 2: Create a feature branch
Step 3: Create a pull request
Step 4: Merge changes
Merge feature branches into the main branch using pull requests.
Step 5: Publish changes
Reference:
https://docs.microsoft.com/en-us/azure/devops/pipelines/repos/pipeline-options-for-git
NO.10 You need to design a data ingestion and storage solution for the Twitter feeds. The solution
area NOTE: Each correct selection b worth one point.

Answer:
Explanation:
Graphical user interface, text Description automatically generated
Box 1: Configure Evegent Hubs partitions

Scenario: Maximize the throughput of ingesting Twitter feeds from Event Hubs to Azure Storage
without purchasing additional throughput or capacity units.
Event Hubs is designed to help with processing of large volumes of events. Event Hubs throughput is
scaled by using partitions and throughput-unit allocations.
Event Hubs traffic is controlled by TUs (standard tier). Auto-inflate enables you to start small with the
minimum required TUs you choose. The feature then scales automatically to the maximum limit of
TUs you need, depending on the increase in your traffic.
Box 2: An Azure Data Lake Storage Gen2 account
Scenario: Ensure that the data store supports Azure AD-based access control down to the object
level.
Azure Data Lake Storage Gen2 implements an access control model that supports both Azure role-
based access control (Azure RBAC) and POSIX-like access control lists (ACLs).
Reference:
https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-features
https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control
NO.11 You need to design the partitions for the product sales transactions. The solution must meet
the sales transaction dataset requirements.
area.

Answer:
Explanation:

Box 1: Sales date

Scenario: Contoso requirements for data integration include:
efficient loads by month. Boundary values must belong to the partition on the right.
Box 2: An Azure Synapse Analytics Dedicated SQL pool
Scenario: Contoso requirements for data integration include:
The size of a dedicated SQL pool (formerly SQL DW) is determined by Data Warehousing Units
(DWU).
Dedicated SQL pool (formerly SQL DW) stores data in relational tables with columnar storage. This
format significantly reduces the data storage costs, and improves query performance.
Synapse analytics dedicated sql pool
Reference:
overview-wha
Topic 2, Litware, inc. Case Study
Case study
This is a case study. Case studies are not timed separately. You can use as much exam time as you
would like to complete each case. However, there may be additional case studies and sections on this
exam. You must manage your time to ensure that you are able to complete all questions included on
this exam in the time provided.
To answer the questions included in a case study, you will need to reference information that is
provided in the case study. Case studies might contain exhibits and other resources that provide
more information about the scenario that is described in the case study. Each question is
independent of the other questions in this case study.
At the end of this case study, a review screen will appear. This screen allows you to review your
answers and to make changes before you move to the next section of the exam. After you begin a
new section, you cannot return to this section.
To start the case study
To display the first question in this case study, click the Next button. Use the buttons in the left pane
to explore the content of the case study before you answer the questions. Clicking these buttons

displays information such as business requirements, existing environment, and problem statements.
If the case study has an All Information tab, note that the information displayed is identical to the
information displayed on the subsequent tabs. When you are ready to answer a question, click the
Question button to return to the question.
Overview
Litware, Inc. owns and operates 300 convenience stores across the US. The company sells a variety of
packaged foods and drinks, as well as a variety of prepared foods, such as sandwiches and pizzas.
Litware has a loyalty club whereby members can get daily discounts on specific items by providing
their membership number at checkout.
Litware employs business analysts who prefer to analyze data by using Microsoft Power BI, and data
scientists who prefer analyzing data in Azure Databricks notebooks.
Requirements
Business Goals
Litware wants to create a new analytics environment in Azure to meet the following requirements:
* See inventory levels across the stores. Data must be updated as close to real time as possible.
* Execute ad hoc analytical queries on historical data to identify whether the loyalty club discounts
increase sales of the discounted products.
* Every four hours, notify store employees about how many prepared food items to produce based
on historical demand from the sales data.
Technical Requirements
Litware identifies the following technical requirements:
* Minimize the number of different Azure services needed to achieve the business goals.
* Use platform as a service (PaaS) offerings whenever possible and avoid having to provision virtual
machines that must be managed by Litware.
* Ensure that the analytical data store is accessible only to the company's on-premises network and
Azure services.
* Use Azure Active Directory (Azure AD) authentication whenever possible.
* Use the principle of least privilege when designing security.
* Stage Inventory data in Azure Data Lake Storage Gen2 before loading the data into the analytical
data store. Litware wants to remove transient data from Data Lake Storage once the data is no longer
in use.
Files that have a modified date that is older than 14 days must be removed.
* Limit the business analysts' access to customer contact information, such as phone numbers,
because this type of data is not analytically relevant.
* Ensure that you can quickly restore a copy of the analytical data store within one hour in the event
of corruption or accidental deletion.
Planned Environment
Litware plans to implement the following environment:
* The application development team will create an Azure event hub to receive real-time sales data,
including store number, date, time, product ID, customer loyalty number, price, and discount
amount, from the point of sale (POS) system and output the data to data storage in Azure.
* Customer data, including name, contact information, and loyalty number, comes from Salesforce, a
SaaS application, and can be imported into Azure once every eight hours. Row modified dates are not
trusted in the source table.
* Product data, including product ID, name, and category, comes from Salesforce and can be
imported into Azure once every eight hours. Row modified dates are not trusted in the source table.

* Daily inventory data comes from a Microsoft SQL server located on a private network.
* Litware currently has 5 TB of historical sales data and 100 GB of customer data. The company
expects approximately 100 GB of new data per month for the next year.
* Litware will build a custom application named FoodPrep to provide store employees with the
calculation results of how many prepared food items to produce every four hours.
* Litware does not plan to implement Azure ExpressRoute or a VPN between the on-premises
network and Azure.
NO.12 You are designing an application that will store petabytes of medical imaging data When the
data is first created, the data will be accessed frequently during the first week. After one month, the
data must be accessible within 30 seconds, but files will be accessed infrequently. After one year, the
data will be accessed infrequently but must be accessible within five minutes.
You need to select a storage strategy for the data. The solution must minimize costs.
Which storage tier should you use for each time frame? To answer, select the appropriate options in
the answer area.
Answer:

Explanation:

First week: Hot

Hot - Optimized for storing data that is accessed frequently.
After one month: Cool
Cool - Optimized for storing data that is infrequently accessed and stored for at least 30 days.
After one year: Cool
NO.13 You have the following Azure Data Factory pipelines

* ingest Data from System 1
* Ingest Data from System2
* Populate Dimensions
* Populate facts
ingest Data from System1 and Ingest Data from System1 have no dependencies. Populate
Dimensions must execute after Ingest Data from System1 and Ingest Data from System* Populate
Facts must execute after the Populate Dimensions pipeline. All the pipelines must execute every eight
hours.
What should you do to schedule the pipelines for execution?
A. Add an event trigger to all four pipelines.
B. Create a parent pipeline that contains the four pipelines and use an event trigger.
C. Create a parent pipeline that contains the four pipelines and use a schedule trigger.
D. Add a schedule trigger to all four pipelines.
Answer: C
Explanation:

Schedule trigger: A trigger that invokes a pipeline on a wall-clock schedule.

Reference:
https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers
NO.14 You haw an Azure data factory named ADF1.

You currently publish all pipeline authoring changes directly to ADF1.
You need to implement version control for the changes made to pipeline artifacts. The solution must
ensure that you can apply version control to the resources currently defined m the UX Authoring
canvas for ADF1.
Which two actions should you perform? Each correct answer presents part of the solution NOTE:
Each correct selection is worth one point.
A. Create an Azure Data Factory trigger
B. From the UX Authoring canvas, select Set up code repository
C. Create a GitHub action
D. From the Azure Data Factor Studio, run Publish All.
E. Create a Git repository
F. From the UX Authoring canvas, select Publish
Answer: B E
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/source-control
NO.15 You are creating an Azure Data Factory data flow that will ingest data from a CSV file, cast
columns to specified types of data, and insert the data into a table in an Azure Synapse Analytic
dedicated SQL pool. The CSV file contains three columns named username, comment, and date.
The data flow already contains the following:
* A source transformation.
* A Derived Column transformation to set the appropriate types of data.
* A sink transformation to land the data in the pool.
You need to ensure that the data flow meets the following requirements:
* All valid rows must be written to the destination table.
* Truncation errors in the comment column must be avoided proactively.
* Any rows containing comment values that will cause truncation errors upon insert must be written
to a file in blob storage.
Which two actions should you perform? Each correct answer presents part of the solution.
A. To the data flow, add a sink transformation to write the rows to a file in blob storage.
B. To the data flow, add a Conditional Split transformation to separate the rows that will cause
truncation errors.
C. To the data flow, add a filter transformation to filter out rows that will cause truncation errors.
D. Add a select transformation to select only the rows that will cause truncation errors.
Answer: A B
Explanation:
B: Example:
1. This conditional split transformation defines the maximum length of "title" to be five. Any row that
is less than or equal to five will go into the GoodRows stream. Any row that is larger than five will go

into the BadRows stream.
2. This conditional split transformation defines the maximum length of "title" to be five. Any row that
is less than or equal to five will go into the GoodRows stream. Any row that is larger than five will go
into the BadRows stream.
A:
3. Now we need to log the rows that failed. Add a sink transformation to the BadRows stream for
logging.
Here, we'll "auto-map" all of the fields so that we have logging of the complete transaction record.
This is a text-delimited CSV file output to a single file in Blob Storage. We'll call the log file
"badrows.csv".
4. The completed data flow is shown below. We are now able to split off error rows to avoid the SQL
truncation errors and put those entries into a log file. Meanwhile, successful rows can continue to
write to our target database.
Reference:

https://docs.microsoft.com/en-us/azure/data-factory/how-to-data-flow-error-rows
NO.16 Note: The question is part of a series of questions that present the same scenario. Each
question in the series contains a unique solution that might meet the stated goals. Some question
sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it As a result these
questions will not appear in the review screen. You have an Azure Data Lake Storage account that
contains a staging zone.
You need to design a dairy process to ingest incremental data from the staging zone, transform the
data by executing an R script, and then insert the transformed data into a data warehouse in Azure
Synapse Analytics.
Solution: You use an Azure Data Factory schedule trigger to execute a pipeline that executes a
mapping data low. and then inserts the data into the data warehouse.
Does this meet the goal?
A. Yes
B. No
Answer: B
NO.17 You plan to create an Azure Synapse Analytics dedicated SQL pool.
You need to minimize the time it takes to identify queries that return confidential information as
defined by the company's data privacy regulations and the users who executed the queues.
Which two components should you include in the solution? Each correct answer presents part of the
solution.
A. sensitivity-classification labels applied to columns that contain confidential information
B. resource tags for databases that contain confidential information
C. audit logs sent to a Log Analytics workspace
D. dynamic data masking for columns that contain confidential information
Answer: A C
Explanation:
A: You can classify columns manually, as an alternative or in addition to the recommendation-based
classification:

* Select Add classification in the top menu of the pane.

* In the context window that opens, select the schema, table, and column that you want to classify,
and the information type and sensitivity label.
* Select Add classification at the bottom of the context window.
C: An important aspect of the information-protection paradigm is the ability to monitor access to
sensitive data. Azure SQL Auditing has been enhanced to include a new field in the audit log called
data_sensitivity_information. This field logs the sensitivity classifications (labels) of the data that was
returned by a query. Here's an example:
Reference:
https://docs.microsoft.com/en-us/azure/azure-sql/database/data-discovery-and-classification-
overview
NO.18 You have an Azure Synapse Analytics dedicated SQL pool that contains a large fact table. The
table contains
50 columns and 5 billion rows and is a heap.
Most queries against the table aggregate values from approximately 100 million rows and return only
two columns.
You discover that the queries against the fact table are very slow.
Which type of index should you add to provide the fastest query times?
A. nonclustered columnstore
B. clustered columnstore

C. nonclustered
D. clustered
Answer: B
Explanation:
Clustered columnstore indexes are one of the most efficient ways you can store your data in
dedicated SQL pool.
Columnstore tables won't benefit a query unless the table has more than 60 million rows.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-dedicated-sql-pool
NO.19 You have an Azure Synapse Analytics dedicated SQL pool named Pcol1. Pool1 contains a table
named tablet You load 5 TB of data into table1.
You need to ensure that column store compression is maximized for table1.
Which statement should you execute?
A. ALTER INDEX ALL on table REBUILD
B. DBCC DBREINOEX (table)
C. DBCC IIDEXDEFRAG (pool1, table1)
D. ALTER INDEX ALL on table REORGANIZE
Answer: B
NO.20 You have an Azure Synapse Analytics dedicated SQL pod.

You need to create a pipeline that will execute a stored procedure in the dedicated SQL pool and use
the returned result set as the input (or a downstream activity. The solution must minimize
development effort.
Which Type of activity should you use in the pipeline?
A. Notebook
B. U-SQL
C. Script
D. Stored Procedure
Answer: D
NO.21 You are designing an Azure Databricks interactive cluster. The cluster will be used
infrequently and will be configured for auto-termination.
You need to ensure that the cluster configuration is retained indefinitely after the cluster is
terminated. The solution must minimize costs.
What should you do?
A. Clone the cluster after it is terminated.
B. Terminate the cluster manually when processing completes.
C. Create an Azure runbook that starts the cluster every 90 days.
D. Pin the cluster.
Answer: D
Explanation:
To keep an interactive cluster configuration even after it has been terminated for more than 30 days,
an administrator can pin a cluster to the cluster list.

References:
https://docs.azuredatabricks.net/clusters/clusters-manage.html#automatic-termination
NO.22 You have an Azure subscription that contains the resources shown in the following table.
You need to ensure that you can Spark notebooks in ws1. The solution must ensure secrets from kv1
by using UAMI1. What should you do? To answer, select the appropriate options in the answer area.
Answer:
Explanation:
NO.23 You use Azure Data Lake Storage Gen2 to store data that data scientists and data engineers
will query by using Azure Databricks interactive notebooks. Users will have access only to the Data
Lake Storage folders that relate to the projects on which they work.
You need to recommend which authentication methods to use for Databricks and Data Lake Storage

to provide the users with the appropriate access. The solution must minimize administrative effort
and development effort.
Which authentication method should you recommend for each Azure service? To answer, select the
appropriate options in the answer area.
Answer:
Explanation:
Table Description automatically generated

Box 1: Personal access tokens

You can use storage shared access signatures (SAS) to access an Azure Data Lake Storage Gen2
storage account directly. With SAS, you can restrict access to a storage account using temporary
tokens with fine-grained access control.
You can add multiple storage accounts and configure respective SAS token providers in the same
Spark session.
Box 2: Azure Active Directory credential passthrough
You can authenticate automatically to Azure Data Lake Storage Gen1 (ADLS Gen1) and Azure Data
Lake Storage Gen2 (ADLS Gen2) from Azure Databricks clusters using the same Azure Active Directory
(Azure AD) identity that you use to log into Azure Databricks. When you enable your cluster for Azure
Data Lake Storage credential passthrough, commands that you run on that cluster can read and write
data in Azure Data Lake Storage without requiring you to configure service principal credentials for
access to storage.
After configuring Azure Data Lake Storage credential passthrough and creating storage containers,
you can access data directly in Azure Data Lake Storage Gen1 using an adl:// path and Azure Data
Lake Storage Gen2 using an abfss:// path:
Reference:
https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/adls-gen2/azure-
datalake-gen2-sas-acc
https://docs.microsoft.com/en-us/azure/databricks/security/credential-passthrough/adls-
passthrough
NO.24 You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1.
You have files that are ingested and loaded into an Azure Data Lake Storage Gen2 container named
container1.
You plan to insert data from the files into Table1 and azure Data Lake Storage Gen2 container named
container1.
You plan to insert data from the files into Table1 and transform the data. Each row of data in the files
will produce one row in the serving layer of Table1.
You need to ensure that when the source data files are loaded to container1, the DateTime is stored
as an additional column in Table1.
Solution: You use a dedicated SQL pool to create an external table that has a additional DateTime

column.
Does this meet the goal?
A. Yes
B. No
Answer: A
NO.25 You have an enterprise data warehouse in Azure Synapse Analytics.

You need to monitor the data warehouse to identify whether you must scale up to a higher service
level to accommodate the current workloads Which is the best metric to monitor?
More than one answer choice may achieve the goal. Select the BEST answer.
A. Data 10 percentage
B. CPU percentage
C. DWU used
D. DWU percentage
Answer: C
NO.26 You are implementing an Azure Stream Analytics solution to process event data from devices.
The devices output events when there is a fault and emit a repeat of the event every five seconds
until the fault is resolved. The devices output a heartbeat event every five seconds after a previous
event if there are no faults present.
A sample of the events is shown in the following table.
You need to calculate the uptime between the faults.

How should you complete the Stream Analytics SQL query? To answer, select the appropriate options
in the answer area.

Answer:
Explanation:

Graphical user interface, text, application Description automatically generated
Box 1: WHERE EventType='HeartBeat'

Box 2: ,TumblingWindow(Second, 5)
Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals.
The following diagram illustrates a stream with a series of events and how they are mapped into 10-
second tumbling windows.
Timeline Description automatically generated

Reference:
https://docs.microsoft.com/en-us/stream-analytics-query/session-window-azure-stream-analytics
https://docs.microsoft.com/en-us/stream-analytics-query/tumbling-window-azure-stream-analytics
NO.27 You have a table named SalesFact in an enterprise data warehouse in Azure Synapse
Analytics. SalesFact contains sales data from the past 36 months and has the following
characteristics:
* Is partitioned by month
* Contains one billion rows
* Has clustered columnstore indexes
At the beginning of each month, you need to remove data from SalesFact that is older than 36
months as quickly as possible.
Which three actions should you perform in sequence in a stored procedure? To answer, move the
appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Answer:
Explanation:
Step 1: Create an empty table named SalesFact_work that has the same schema as SalesFact.

Step 2: Switch the partition containing the stale data from SalesFact to SalesFact_Work.
SQL Data Warehouse supports partition splitting, merging, and switching. To switch partitions
between two tables, you must ensure that the partitions align on their respective boundaries and
that the table definitions match.
Loading data into partitions with partition switching is a convenient way stage new data in a table
that is not visible to users the switch in the new data.
Step 3: Drop the SalesFact_Work table.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-partition
NO.28 You have an Azure Synapse Analytics dedicated SQL pool.

You need to create a table named FactInternetSales that will be a large fact table in a dimensional
model.
FactInternetSales will contain 100 million rows and two columns named SalesAmount and
OrderQuantity.
Queries executed on FactInternetSales will aggregate the values in SalesAmount and OrderQuantity
from the last year for a specific product. The solution must minimize the data size and query
execution time.
How should you complete the code? To answer, select the appropriate options in the answer area.
Answer:

Explanation:
Box 1: (CLUSTERED COLUMNSTORE INDEX
CLUSTERED COLUMNSTORE INDEX
Columnstore indexes are the standard for storing and querying large data warehousing fact tables.
This index uses column-based data storage and query processing to achieve gains up to 10 times the
query performance in your data warehouse over traditional row-oriented storage. You can also
achieve gains up to 10 times the data compression over the uncompressed data size. Beginning with
SQL Server 2016 (13.x) SP1, columnstore indexes enable operational analytics: the ability to run
performant real-time analytics on a transactional workload.
Note: Clustered columnstore index
A clustered columnstore index is the physical storage for the entire table.
Diagram Description automatically generated

To reduce fragmentation of the column segments and improve performance, the columnstore index
might store some data temporarily into a clustered index called a deltastore and a B-tree list of IDs
for deleted rows. The deltastore operations are handled behind the scenes. To return the correct
query results, the clustered columnstore index combines query results from both the columnstore
and the deltastore.
Box 2: HASH([ProductKey])
A hash distributed table distributes rows based on the value in the distribution column. A hash
distributed table is designed to achieve high performance for queries on large tables.
Choose a distribution column with data that distributes evenly
Reference: https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-
indexes-overview
tables-overview
tables-distribu
NO.29 You are designing a data mart for the human resources (MR) department at your company.
The data mart will contain information and employee transactions. From a source system you have a
flat extract that has the following fields:
* EmployeeID
* FirstName
* LastName

* Recipient
* GrossArnount
* TransactionID
* GovernmentID
* NetAmountPaid
* TransactionDate
You need to design a start schema data model in an Azure Synapse analytics dedicated SQL pool for
the data mart.
Which two tables should you create? Each Correct answer present part of the solution.
A. a dimension table for employee
B. a fabric for Employee
C. a dimension table far EmployeeTransaction
D. a dimension table for Transaction
E. a fact table for Transaction
Answer: A E
Reference:
tables-overview
NO.30 You have an Azure Synapse Analytics dedicated SQL pool that hosts a database named DB1
You need to ensure that D81 meets the following security requirements:
* When credit card numbers show in applications, only the last four digits must be visible.
* Tax numbers must be visible only to specific users.
What should you use for each requirement? To answer, select the appropriate options in the answer
area NOTE: Each correct selection is worth one point.
Answer:
Explanation:

NO.31 You have an Azure subscription that contains an Azure Data Lake Storage account named
myaccount1. The myaccount1 account contains two containers named container1 and contained.
The subscription is linked to an Azure Active Directory (Azure AD) tenant that contains a security
group named Group1.
You need to grant Group1 read access to contamer1. The solution must use the principle of least
privilege.
Which role should you assign to Group1?
A. Storage Blob Data Reader for container1
B. Storage Table Data Reader for container1
C. Storage Blob Data Reader for myaccount1
D. Storage Table Data Reader for myaccount1
Answer: A
NO.32 A company purchases IoT devices to monitor manufacturing machinery. The company uses an
Azure IoTHub to communicate with the IoT devices.
The company must be able to monitor the devices in real-time.
You need to design the solution.
What should you recommend?
A. Azure Stream Analytics Edge application using Microsoft Visual Studio.
B. Azure Analytics Services using Azure portal
C. Azure Analysis Services using Microsoft visual Studio
D. Azure Data Factory instance using Microsoft visual Studio
Answer: A
NO.33 You plan to implement an Azure Data Lake Storage Gen2 container that will contain CSV files.
The size of the files will vary based on the number of events that occur per hour.
File sizes range from 4.KB to 5 GB.
You need to ensure that the files stored in the container are optimized for batch processing.
What should you do?
A. Compress the files.
B. Merge the files.
C. Convert the files to JSON
D. Convert the files to Avro.
Answer: D
Explanation:
Avro supports batch and is very relevant for streaming.
Note: Avro is framework developed within Apache's Hadoop project. It is a row-based storage format
which is widely used as a serialization process. AVRO stores its schema in JSON format making it easy
to read and interpret by any program. The data itself is stored in binary format by doing it compact

and efficient.
Reference:
https://www.adaltas.com/en/2020/07/23/benchmark-study-of-different-file-format/


DP 203

Uploaded by

Copyright:

Available Formats

DP 203

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DP 203

Uploaded by

Copyright:

Available Formats

NewDumps

Title : Data Engineering on Microsoft

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 1 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 2 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 3 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 4 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 5 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 6 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 7 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

Box 1: Create table

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 8 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

* Minimize how long it takes to remove old records.

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 9 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 10 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

Box 1: Configure Evegent Hubs partitions

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 11 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 12 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

Box 1: Sales date

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 13 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 14 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 15 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 16 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

First week: Hot

NO.13 You have the following Azure Data Factory pipelines

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 17 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

Schedule trigger: A trigger that invokes a pipeline on a wall-clock schedule.

NO.14 You haw an Azure data factory named ADF1.

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 18 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

into the BadRows stream.

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 19 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 20 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

* Select Add classification in the top menu of the pane.

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 21 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

NO.20 You have an Azure Synapse Analytics dedicated SQL pod.

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 22 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 23 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 24 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

Box 1: Personal access tokens

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 25 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

NO.25 You have an enterprise data warehouse in Azure Synapse Analytics.

You need to calculate the uptime between the faults.

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 26 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 27 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

Graphical user interface, text, application Description automatically generated

Box 1: WHERE EventType='HeartBeat'

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 28 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 29 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 30 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

NO.28 You have an Azure Synapse Analytics dedicated SQL pool.

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 31 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 32 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 33 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 34 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 35 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

DP-203 題庫，DP-203 題庫下載，DP-203 免費題庫，DP-203 36 考古題，DP-203 考試資訊，DP-203 學習指南，DP-203 考題書

You might also like