Final DBMS Unit 7
Final DBMS Unit 7
Final DBMS Unit 7
• Query Processing
• Layers of Query Processing
• Measures of Query Cost
• File Scan(Linear & Binary Search)
• Materialized View & Pipelining.
• Query Optimization
• Equivalence Rules
• Cost-Based Query Optimization
What is
What is relational Model? ?
Query Processing
1. Interpreting the Query: Understanding and converting the query from a high-level
language (like SQL) into a format that the database can process.
2. Searching Through Data: Accessing and retrieving the relevant data from the
database storage that satisfies the query conditions.
Note: Overall, query processing involves transforming a user's request into actions that
efficiently retrieve the desired information from a database.
Step in Query Processing:
1. Parsing and translation
2. Optimization
3. Evaluation
Process of SQL
Parsing: The parser takes the SQL query as input and checks it for syntax and
semantic correctness.
• Components:
i. Lexical Analyzer: Breaks the query into tokens.
ii. Syntax Analyzer: Verifies the query structure.
iii. Semantic Analyzer: Checks the validity of the query against the database
schema (e.g., table names, column names).
Optimization:
➢ Query optimization is a crucial phase in query processing where the database
management system (DBMS) seeks to determine the most efficient way to execute a
given query.
➢ This process involves generating, evaluating, and selecting the best possible execution
plan from many alternatives.
Key Objectives of Query Optimization:
Advantages of SQL
• Minimize Response Time: Reduce the time it takes to return the query
results.
4. Network Latency: Time taken to transfer data across the network, particularly in
distributed systems.
2.Check Each Record: Each record is examined one by one to see if it matches the search criteria (e.g., a specific ID,
name, or other attributes).
3.Compare the Record: The search key (the value you are looking for) is compared with the corresponding attribute in
the current record.
4.Continue Sequentially: If the current record does not match the search key, the search moves to the next record in
the sequence.
5.Repeat Until Found or End: This process is repeated until either:
->The desired record is found (a match is found).
->The end of the file is reached without finding a match.
6.Return Result: If a match is found, the record is returned.
->If no match is found after scanning all records, the search concludes that
the record does not exist in the file.
➢ Suppose tS is the seek time (number of Seek is usually one – to reach the
beginning of the file) , tT is the number of traversal time for one block, and B is
the number of blocks to be transferred, then the cost is calculated as:
tS + (B*tT) =>This is the cost required to fetch the record based on non-key
attribute.
➢ Suppose the search is based on the key value. Then the average cost of query
is tS + (B*tT)/2 and at worst case it would be tS + (B*tT).
2. Binary search: Binary search is a more efficient search method that requires
the data to be sorted. It works by repeatedly dividing the search interval in half.
1. Initialize Search Interval:Start with the entire dataset as the initial search
interval. This means you consider the first record to the last record.
2. Find Middle Record: Calculate the middle index of the current search interval.
3. Compare and Narrow Down: Compare the search key with the key of the
middle record.
• If the search key matches the middle record's key, the search is successful, and
the record is returned.
• If the search key is less than the middle record's key, narrow the search interval
to the left half (all records before the middle record).
• If the search key is greater than the middle record's key, narrow the search
interval to the right half (all records after the middle record).
4. Repeat: Continue dividing the interval and comparing until the search key is
found or the interval is empty.
5. Return Result: If the search key is found, return the corresponding record.
->If the search interval becomes empty (no more records to check), conclude that
the record is not present in the dataset.
➢ Suppose blocks of records are stored continuously in the memory and the cost of
the query to fetch first record is calculated as log (B)* (ts+ tT).
➢ Formula Components
->B: Number of blocks.
->log(B): Logarithmic function indicating how many levels or steps are needed to
reach the desired block.
->ts: Time to seek to the correct block in memory (seek time).
->tT: Time to transfer the data from the block to the CPU (transfer time).
Materialized View
➢ A materialized view in a database management system (DBMS) is a database
object that contains the results of a query.
➢ Unlike a regular view, which is a virtual table computed dynamically upon access,
a materialized view stores the query result physically on disk.
➢ This allows for quicker access to complex query results, as the data is pre
computed and stored, but it also requires management to keep the data up to
date.
NOTE:
➢ Materialized views are a powerful tool in DBMS for improving
query performance, especially for complex and frequently
executed queries.
➢ Properly managing refresh strategies is crucial to ensure the data
in materialized views remains accurate and up to date.
Pipelining
➢ Pipelining in the query treatment means the method is based on the
approach of splitting the query processor into multiple mini-processes, which
help to perform parallel tasks and, as a result, increase the efficiency of the
queries.
2. Reduced I/O Wait Times: By overlapping operations, pipelining can reduce the
time spent waiting for disk I/O operations.
3.Improved Resource Utilization: Pipelining can make better use of CPU, memory,
and disk resources by keeping all parts of the system busy.
4.Examples in SQL Queries: In an SQL query, pipelining might be used when
combining multiple operations such as joins, selections, and projections. For
instance, if a query involves joining two tables and then filtering the results,
pipelining allows the filtering to start as soon as the first rows of the join are
produced.
2.Cost Model: The optimizer uses a cost model to estimate the resource
consumption (CPU, memory, I/O) of different query plans. The plan with the
lowest estimated cost is chosen.
Equivalence Rules
➢ Equivalence rules in a Database Management System (DBMS) are fundamental
principles used during query optimization to transform queries into different
but equivalent forms.