DP 203
DP 203
DP 203
http://www.newdumps.tk
NewDumps 提供免費的 IT 認證题库 DEMO,下載最新的試題和答案
最新免費的 DP-203 考試題庫-免費下載試用體驗 DP-203 考題
IT Certification Guaranteed, The Easy Way!
Exam : DP-203
Vendor : Microsoft
Version : DEMO
NO.1 You need to ensure that the Twitter feed data can be analyzed in the dedicated SQL pool. The
solution must meet the customer sentiment analytics requirements.
Which three Transaction-SQL DDL commands should you run in sequence? To answer, move the
appropriate commands from the list of commands to the answer area and arrange them in the
correct order.
NOTE: More than one order of answer choices is correct. You will receive credit for any of the correct
orders you select.
Answer:
Explanation:
Scenario: Allow Contoso users to use PolyBase in an Azure Synapse Analytics dedicated SQL pool to
query the content of the data records that host the Twitter feeds. Data must be protected by using
row-level security (RLS). The users must be authenticated by using their own Azure AD credentials.
Box 1: CREATE EXTERNAL DATA SOURCE
External data sources are used to connect to storage accounts.
Box 2: CREATE EXTERNAL FILE FORMAT
CREATE EXTERNAL FILE FORMAT creates an external file format object that defines external data
stored in Azure Blob Storage or Azure Data Lake Storage. Creating an external file format is a
prerequisite for creating an external table.
Box 3: CREATE EXTERNAL TABLE AS SELECT
When used in conjunction with the CREATE TABLE AS SELECT statement, selecting from an external
table imports data into a table within the SQL pool. In addition to the COPY statement, external
tables are useful for loading data.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-external-tables
Topic 1, Contoso Case StudyTransactional Date
Contoso has three years of customer, transactional, operation, sourcing, and supplier data comprised
of 10 billion records stored across multiple on-premises Microsoft SQL Server servers. The SQL server
instances contain data from various operational systems. The data is loaded into the instances by
using SQL server integration Services (SSIS) packages.
You estimate that combining all product sales transactions into a company-wide sales transactions
dataset will result in a single table that contains 5 billion rows, with one row per transaction.
Most queries targeting the sales transactions data will be used to identify which products were sold
in retail stores and which products were sold online during different time period. Sales transaction
data that is older than three years will be removed monthly.
You plan to create a retail store table that will contain the address of each retail store. The table will
be approximately 2 MB. Queries for retail store sales will include the retail store addresses.
You plan to create a promotional table that will contain a promotion ID. The promotion ID will be
associated to a specific product. The product will be identified by a product ID. The table will be
approximately 5 GB.
Streaming Twitter Data
The ecommerce department at Contoso develops and Azure logic app that captures trending Twitter
feeds referencing the company's products and pushes the products to Azure Event Hubs.
Planned Changes
Contoso plans to implement the following changes:
* Load the sales transaction dataset to Azure Synapse Analytics.
* Integrate on-premises data stores with Azure Synapse Analytics by using SSIS packages.
* Use Azure Synapse Analytics to analyze Twitter feeds to assess customer sentiments about
products.
Sales Transaction Dataset Requirements
Contoso identifies the following requirements for the sales transaction dataset:
* Partition data that contains sales transaction records. Partitions must be designed to provide
efficient loads by month. Boundary values must belong: to the partition on the right.
* Ensure that queries joining and filtering sales transaction records based on product ID complete as
quickly as possible.
* Implement a surrogate key to account for changes to the retail store addresses.
* Ensure that data storage costs and performance are predictable.
* Minimize how long it takes to remove old records.
Customer Sentiment Analytics Requirement
Contoso identifies the following requirements for customer sentiment analytics:
* Allow Contoso users to use PolyBase in an A/ure Synapse Analytics dedicated SQL pool to query the
content of the data records that host the Twitter feeds. Data must be protected by using row-level
security (RLS). The users must be authenticated by using their own A/ureAD credentials.
* Maximize the throughput of ingesting Twitter feeds from Event Hubs to Azure Storage without
purchasing additional throughput or capacity units.
* Store Twitter feeds in Azure Storage by using Event Hubs Capture. The feeds will be converted into
Parquet files.
* Ensure that the data store supports Azure AD-based access control down to the object level.
* Minimize administrative effort to maintain the Twitter feed data records.
* Purge Twitter feed data records;itftaitJ are older than two years.
Data Integration Requirements
Contoso identifies the following requirements for data integration:
Use an Azure service that leverages the existing SSIS packages to ingest on-premises data into
datasets stored in a dedicated SQL pool of Azure Synaps Analytics and transform the data.
Identify a process to ensure that changes to the ingestion and transformation activities can be
version controlled and developed independently by multiple data engineers.
NO.2 You need to design a data storage structure for the product sales transactions. The solution
must meet the sales transaction dataset requirements.
What should you include in the solution? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Graphical user interface, text, application, chat or text message Description automatically generated
Box 1: Hash
Scenario:
Ensure that queries joining and filtering sales transaction records based on product ID complete as
quickly as possible.
A hash distributed table can deliver the highest query performance for joins and aggregations on
large tables.
Box 2: Set the distribution column to the sales date.
Scenario: Partition data that contains sales transaction records. Partitions must be designed to
provide efficient loads by month. Boundary values must belong to the partition on the right.
Reference:
https://rajanieshkaushikk.com/2020/09/09/how-to-choose-right-data-distribution-strategy-for-
azure-synapse/
NO.3 You need to implement the surrogate key for the retail store table. The solution must meet the
sales transaction dataset requirements.
What should you create?
A. a table that has an IDENTITY property
B. a system-versioned temporal table
C. a user-defined SEQUENCE object
D. a table that has a FOREIGN KEY constraint
Answer: A
Explanation:
Scenario: Implement a surrogate key to account for changes to the retail store addresses.
A surrogate key on a table is a column with a unique identifier for each row. The key is not generated
from the table data. Data modelers like to create surrogate keys on their tables when they design
data warehouse models. You can use the IDENTITY property to achieve this goal simply and
effectively without affecting load performance.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-
tables-identity
NO.4 You need to design a data retention solution for the Twitter feed data records. The solution
must meet the customer sentiment analytics requirements.
Which Azure Storage functionality should you include in the solution?
A. change feed
B. soft delete
C. time-based retention
D. lifecycle management
Answer: B
Explanation:
Scenario: Purge Twitter feed data records that are older than two years.
Data sets have unique lifecycles. Early in the lifecycle, people access some data often. But the need
for access often drops drastically as the data ages. Some data remains idle in the cloud and is rarely
accessed once stored.
Some data sets expire days or months after creation, while other data sets are actively read and
modified throughout their lifetimes. Azure Storage lifecycle management offers a rule-based policy
that you can use to transition blob data to the appropriate access tiers or to expire data at the end of
the data lifecycle.
Reference:
https://docs.microsoft.com/en-us/azure/storage/blobs/lifecycle-management-overview
NO.5 You need to design an analytical storage solution for the transactional data. The solution must
meet the sales transaction dataset requirements.
What should you include in the solution? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Graphical user interface, text, application, table Description automatically generated
Box 1: Round-robin
Round-robin tables are useful for improving loading speed.
Scenario: Partition data that contains sales transaction records. Partitions must be designed to
provide efficient loads by month.
Box 2: Hash
Hash-distributed tables improve query performance on large fact tables.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-
tables-distribu
NO.6 You need to integrate the on-premises data sources and Azure Synapse Analytics. The solution
must meet the data integration requirements.
Which type of integration runtime should you use?
A. Azure-SSIS integration runtime
B. self-hosted integration runtime
C. Azure integration runtime
Answer: C
NO.7 You need to design a data retention solution for the Twitter teed data records. The solution
must meet the customer sentiment analytics requirements.
Which Azure Storage functionality should you include in the solution?
A. time-based retention
B. change feed
C. soft delete
D. Iifecycle management
Answer: C
NO.8 You need to implement an Azure Synapse Analytics database object for storing the sales
transactions data. The solution must meet the sales transaction dataset requirements.
What solution must meet the sales transaction dataset requirements.
What should you do? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Graphical user interface, text, application, table Description automatically generated
NO.9 You need to implement versioned changes to the integration pipelines. The solution must
meet the data integration requirements.
In which order should you perform the actions? To answer, move all actions from the list of actions to
the answer area and arrange them in the correct order.
Answer:
Explanation:
Graphical user interface, application Description automatically generated
Scenario: Identify a process to ensure that changes to the ingestion and transformation activities can
be version-controlled and developed independently by multiple data engineers.
Step 1: Create a repository and a main branch
You need a Git repository in Azure Pipelines, TFS, or GitHub with your app.
Step 2: Create a feature branch
Step 3: Create a pull request
Step 4: Merge changes
Merge feature branches into the main branch using pull requests.
Step 5: Publish changes
Reference:
https://docs.microsoft.com/en-us/azure/devops/pipelines/repos/pipeline-options-for-git
NO.10 You need to design a data ingestion and storage solution for the Twitter feeds. The solution
must meet the customer sentiment analytics requirements.
What should you include in the solution? To answer, select the appropriate options in the answer
area NOTE: Each correct selection b worth one point.
Answer:
Explanation:
Graphical user interface, text Description automatically generated
NO.11 You need to design the partitions for the product sales transactions. The solution must meet
the sales transaction dataset requirements.
What should you include in the solution? To answer, select the appropriate options in the answer
area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
displays information such as business requirements, existing environment, and problem statements.
If the case study has an All Information tab, note that the information displayed is identical to the
information displayed on the subsequent tabs. When you are ready to answer a question, click the
Question button to return to the question.
Overview
Litware, Inc. owns and operates 300 convenience stores across the US. The company sells a variety of
packaged foods and drinks, as well as a variety of prepared foods, such as sandwiches and pizzas.
Litware has a loyalty club whereby members can get daily discounts on specific items by providing
their membership number at checkout.
Litware employs business analysts who prefer to analyze data by using Microsoft Power BI, and data
scientists who prefer analyzing data in Azure Databricks notebooks.
Requirements
Business Goals
Litware wants to create a new analytics environment in Azure to meet the following requirements:
* See inventory levels across the stores. Data must be updated as close to real time as possible.
* Execute ad hoc analytical queries on historical data to identify whether the loyalty club discounts
increase sales of the discounted products.
* Every four hours, notify store employees about how many prepared food items to produce based
on historical demand from the sales data.
Technical Requirements
Litware identifies the following technical requirements:
* Minimize the number of different Azure services needed to achieve the business goals.
* Use platform as a service (PaaS) offerings whenever possible and avoid having to provision virtual
machines that must be managed by Litware.
* Ensure that the analytical data store is accessible only to the company's on-premises network and
Azure services.
* Use Azure Active Directory (Azure AD) authentication whenever possible.
* Use the principle of least privilege when designing security.
* Stage Inventory data in Azure Data Lake Storage Gen2 before loading the data into the analytical
data store. Litware wants to remove transient data from Data Lake Storage once the data is no longer
in use.
Files that have a modified date that is older than 14 days must be removed.
* Limit the business analysts' access to customer contact information, such as phone numbers,
because this type of data is not analytically relevant.
* Ensure that you can quickly restore a copy of the analytical data store within one hour in the event
of corruption or accidental deletion.
Planned Environment
Litware plans to implement the following environment:
* The application development team will create an Azure event hub to receive real-time sales data,
including store number, date, time, product ID, customer loyalty number, price, and discount
amount, from the point of sale (POS) system and output the data to data storage in Azure.
* Customer data, including name, contact information, and loyalty number, comes from Salesforce, a
SaaS application, and can be imported into Azure once every eight hours. Row modified dates are not
trusted in the source table.
* Product data, including product ID, name, and category, comes from Salesforce and can be
imported into Azure once every eight hours. Row modified dates are not trusted in the source table.
* Daily inventory data comes from a Microsoft SQL server located on a private network.
* Litware currently has 5 TB of historical sales data and 100 GB of customer data. The company
expects approximately 100 GB of new data per month for the next year.
* Litware will build a custom application named FoodPrep to provide store employees with the
calculation results of how many prepared food items to produce every four hours.
* Litware does not plan to implement Azure ExpressRoute or a VPN between the on-premises
network and Azure.
NO.12 You are designing an application that will store petabytes of medical imaging data When the
data is first created, the data will be accessed frequently during the first week. After one month, the
data must be accessible within 30 seconds, but files will be accessed infrequently. After one year, the
data will be accessed infrequently but must be accessible within five minutes.
You need to select a storage strategy for the data. The solution must minimize costs.
Which storage tier should you use for each time frame? To answer, select the appropriate options in
the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
NO.15 You are creating an Azure Data Factory data flow that will ingest data from a CSV file, cast
columns to specified types of data, and insert the data into a table in an Azure Synapse Analytic
dedicated SQL pool. The CSV file contains three columns named username, comment, and date.
The data flow already contains the following:
* A source transformation.
* A Derived Column transformation to set the appropriate types of data.
* A sink transformation to land the data in the pool.
You need to ensure that the data flow meets the following requirements:
* All valid rows must be written to the destination table.
* Truncation errors in the comment column must be avoided proactively.
* Any rows containing comment values that will cause truncation errors upon insert must be written
to a file in blob storage.
Which two actions should you perform? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
A. To the data flow, add a sink transformation to write the rows to a file in blob storage.
B. To the data flow, add a Conditional Split transformation to separate the rows that will cause
truncation errors.
C. To the data flow, add a filter transformation to filter out rows that will cause truncation errors.
D. Add a select transformation to select only the rows that will cause truncation errors.
Answer: A B
Explanation:
B: Example:
1. This conditional split transformation defines the maximum length of "title" to be five. Any row that
is less than or equal to five will go into the GoodRows stream. Any row that is larger than five will go
2. This conditional split transformation defines the maximum length of "title" to be five. Any row that
is less than or equal to five will go into the GoodRows stream. Any row that is larger than five will go
into the BadRows stream.
A:
3. Now we need to log the rows that failed. Add a sink transformation to the BadRows stream for
logging.
Here, we'll "auto-map" all of the fields so that we have logging of the complete transaction record.
This is a text-delimited CSV file output to a single file in Blob Storage. We'll call the log file
"badrows.csv".
4. The completed data flow is shown below. We are now able to split off error rows to avoid the SQL
truncation errors and put those entries into a log file. Meanwhile, successful rows can continue to
write to our target database.
Reference:
https://docs.microsoft.com/en-us/azure/data-factory/how-to-data-flow-error-rows
NO.16 Note: The question is part of a series of questions that present the same scenario. Each
question in the series contains a unique solution that might meet the stated goals. Some question
sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it As a result these
questions will not appear in the review screen. You have an Azure Data Lake Storage account that
contains a staging zone.
You need to design a dairy process to ingest incremental data from the staging zone, transform the
data by executing an R script, and then insert the transformed data into a data warehouse in Azure
Synapse Analytics.
Solution: You use an Azure Data Factory schedule trigger to execute a pipeline that executes a
mapping data low. and then inserts the data into the data warehouse.
Does this meet the goal?
A. Yes
B. No
Answer: B
NO.17 You plan to create an Azure Synapse Analytics dedicated SQL pool.
You need to minimize the time it takes to identify queries that return confidential information as
defined by the company's data privacy regulations and the users who executed the queues.
Which two components should you include in the solution? Each correct answer presents part of the
solution.
NOTE: Each correct selection is worth one point.
A. sensitivity-classification labels applied to columns that contain confidential information
B. resource tags for databases that contain confidential information
C. audit logs sent to a Log Analytics workspace
D. dynamic data masking for columns that contain confidential information
Answer: A C
Explanation:
A: You can classify columns manually, as an alternative or in addition to the recommendation-based
classification:
Reference:
https://docs.microsoft.com/en-us/azure/azure-sql/database/data-discovery-and-classification-
overview
NO.18 You have an Azure Synapse Analytics dedicated SQL pool that contains a large fact table. The
table contains
50 columns and 5 billion rows and is a heap.
Most queries against the table aggregate values from approximately 100 million rows and return only
two columns.
You discover that the queries against the fact table are very slow.
Which type of index should you add to provide the fastest query times?
A. nonclustered columnstore
B. clustered columnstore
C. nonclustered
D. clustered
Answer: B
Explanation:
Clustered columnstore indexes are one of the most efficient ways you can store your data in
dedicated SQL pool.
Columnstore tables won't benefit a query unless the table has more than 60 million rows.
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-dedicated-sql-pool
NO.19 You have an Azure Synapse Analytics dedicated SQL pool named Pcol1. Pool1 contains a table
named tablet You load 5 TB of data into table1.
You need to ensure that column store compression is maximized for table1.
Which statement should you execute?
A. ALTER INDEX ALL on table REBUILD
B. DBCC DBREINOEX (table)
C. DBCC IIDEXDEFRAG (pool1, table1)
D. ALTER INDEX ALL on table REORGANIZE
Answer: B
NO.21 You are designing an Azure Databricks interactive cluster. The cluster will be used
infrequently and will be configured for auto-termination.
You need to ensure that the cluster configuration is retained indefinitely after the cluster is
terminated. The solution must minimize costs.
What should you do?
A. Clone the cluster after it is terminated.
B. Terminate the cluster manually when processing completes.
C. Create an Azure runbook that starts the cluster every 90 days.
D. Pin the cluster.
Answer: D
Explanation:
To keep an interactive cluster configuration even after it has been terminated for more than 30 days,
an administrator can pin a cluster to the cluster list.
References:
https://docs.azuredatabricks.net/clusters/clusters-manage.html#automatic-termination
NO.22 You have an Azure subscription that contains the resources shown in the following table.
You need to ensure that you can Spark notebooks in ws1. The solution must ensure secrets from kv1
by using UAMI1. What should you do? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
NO.23 You use Azure Data Lake Storage Gen2 to store data that data scientists and data engineers
will query by using Azure Databricks interactive notebooks. Users will have access only to the Data
Lake Storage folders that relate to the projects on which they work.
You need to recommend which authentication methods to use for Databricks and Data Lake Storage
to provide the users with the appropriate access. The solution must minimize administrative effort
and development effort.
Which authentication method should you recommend for each Azure service? To answer, select the
appropriate options in the answer area.
NOTE: Each correct selection is worth one point.
Answer:
Explanation:
Table Description automatically generated
NO.24 You have an Azure Synapse Analytics dedicated SQL pool that contains a table named Table1.
You have files that are ingested and loaded into an Azure Data Lake Storage Gen2 container named
container1.
You plan to insert data from the files into Table1 and azure Data Lake Storage Gen2 container named
container1.
You plan to insert data from the files into Table1 and transform the data. Each row of data in the files
will produce one row in the serving layer of Table1.
You need to ensure that when the source data files are loaded to container1, the DateTime is stored
as an additional column in Table1.
Solution: You use a dedicated SQL pool to create an external table that has a additional DateTime
column.
Does this meet the goal?
A. Yes
B. No
Answer: A
NO.26 You are implementing an Azure Stream Analytics solution to process event data from devices.
The devices output events when there is a fault and emit a repeat of the event every five seconds
until the fault is resolved. The devices output a heartbeat event every five seconds after a previous
event if there are no faults present.
A sample of the events is shown in the following table.
Answer:
Explanation:
Reference:
https://docs.microsoft.com/en-us/stream-analytics-query/session-window-azure-stream-analytics
https://docs.microsoft.com/en-us/stream-analytics-query/tumbling-window-azure-stream-analytics
NO.27 You have a table named SalesFact in an enterprise data warehouse in Azure Synapse
Analytics. SalesFact contains sales data from the past 36 months and has the following
characteristics:
* Is partitioned by month
* Contains one billion rows
* Has clustered columnstore indexes
At the beginning of each month, you need to remove data from SalesFact that is older than 36
months as quickly as possible.
Which three actions should you perform in sequence in a stored procedure? To answer, move the
appropriate actions from the list of actions to the answer area and arrange them in the correct order.
Answer:
Explanation:
Step 1: Create an empty table named SalesFact_work that has the same schema as SalesFact.
Step 2: Switch the partition containing the stale data from SalesFact to SalesFact_Work.
SQL Data Warehouse supports partition splitting, merging, and switching. To switch partitions
between two tables, you must ensure that the partitions align on their respective boundaries and
that the table definitions match.
Loading data into partitions with partition switching is a convenient way stage new data in a table
that is not visible to users the switch in the new data.
Step 3: Drop the SalesFact_Work table.
Reference:
https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-partition
Answer:
Explanation:
Box 1: (CLUSTERED COLUMNSTORE INDEX
CLUSTERED COLUMNSTORE INDEX
Columnstore indexes are the standard for storing and querying large data warehousing fact tables.
This index uses column-based data storage and query processing to achieve gains up to 10 times the
query performance in your data warehouse over traditional row-oriented storage. You can also
achieve gains up to 10 times the data compression over the uncompressed data size. Beginning with
SQL Server 2016 (13.x) SP1, columnstore indexes enable operational analytics: the ability to run
performant real-time analytics on a transactional workload.
Note: Clustered columnstore index
A clustered columnstore index is the physical storage for the entire table.
Diagram Description automatically generated
To reduce fragmentation of the column segments and improve performance, the columnstore index
might store some data temporarily into a clustered index called a deltastore and a B-tree list of IDs
for deleted rows. The deltastore operations are handled behind the scenes. To return the correct
query results, the clustered columnstore index combines query results from both the columnstore
and the deltastore.
Box 2: HASH([ProductKey])
A hash distributed table distributes rows based on the value in the distribution column. A hash
distributed table is designed to achieve high performance for queries on large tables.
Choose a distribution column with data that distributes evenly
Reference: https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-
indexes-overview
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-
tables-overview
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-
tables-distribu
NO.29 You are designing a data mart for the human resources (MR) department at your company.
The data mart will contain information and employee transactions. From a source system you have a
flat extract that has the following fields:
* EmployeeID
* FirstName
* LastName
* Recipient
* GrossArnount
* TransactionID
* GovernmentID
* NetAmountPaid
* TransactionDate
You need to design a start schema data model in an Azure Synapse analytics dedicated SQL pool for
the data mart.
Which two tables should you create? Each Correct answer present part of the solution.
A. a dimension table for employee
B. a fabric for Employee
C. a dimension table far EmployeeTransaction
D. a dimension table for Transaction
E. a fact table for Transaction
Answer: A E
Reference:
https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-
tables-overview
NO.30 You have an Azure Synapse Analytics dedicated SQL pool that hosts a database named DB1
You need to ensure that D81 meets the following security requirements:
* When credit card numbers show in applications, only the last four digits must be visible.
* Tax numbers must be visible only to specific users.
What should you use for each requirement? To answer, select the appropriate options in the answer
area NOTE: Each correct selection is worth one point.
Answer:
Explanation:
NO.31 You have an Azure subscription that contains an Azure Data Lake Storage account named
myaccount1. The myaccount1 account contains two containers named container1 and contained.
The subscription is linked to an Azure Active Directory (Azure AD) tenant that contains a security
group named Group1.
You need to grant Group1 read access to contamer1. The solution must use the principle of least
privilege.
Which role should you assign to Group1?
A. Storage Blob Data Reader for container1
B. Storage Table Data Reader for container1
C. Storage Blob Data Reader for myaccount1
D. Storage Table Data Reader for myaccount1
Answer: A
NO.32 A company purchases IoT devices to monitor manufacturing machinery. The company uses an
Azure IoTHub to communicate with the IoT devices.
The company must be able to monitor the devices in real-time.
You need to design the solution.
What should you recommend?
A. Azure Stream Analytics Edge application using Microsoft Visual Studio.
B. Azure Analytics Services using Azure portal
C. Azure Analysis Services using Microsoft visual Studio
D. Azure Data Factory instance using Microsoft visual Studio
Answer: A
NO.33 You plan to implement an Azure Data Lake Storage Gen2 container that will contain CSV files.
The size of the files will vary based on the number of events that occur per hour.
File sizes range from 4.KB to 5 GB.
You need to ensure that the files stored in the container are optimized for batch processing.
What should you do?
A. Compress the files.
B. Merge the files.
C. Convert the files to JSON
D. Convert the files to Avro.
Answer: D
Explanation:
Avro supports batch and is very relevant for streaming.
Note: Avro is framework developed within Apache's Hadoop project. It is a row-based storage format
which is widely used as a serialization process. AVRO stores its schema in JSON format making it easy
to read and interpret by any program. The data itself is stored in binary format by doing it compact
and efficient.
Reference:
https://www.adaltas.com/en/2020/07/23/benchmark-study-of-different-file-format/