Tejas 22-10-24
Tejas 22-10-24
Tejas 22-10-24
Codd’s Rules for relational databases, proposed by Dr. Edgar F. Codd in 1985, are a set of thirteen rules
(numbered 0 to 12) that define what is required for a database management system (DBMS) to be
considered relational. These rules provide a framework for relational database design, ensuring that
databases are structured in a consistent, flexible, and efficient way.
The system must qualify as a relational database management system (RDBMS) by using relational
techniques for data storage, retrieval, and manipulation. This rule ensures that all other rules are applied.
All data must be stored in table format, with rows and columns. The data should be accessible by table
name, column name, and primary key values. This rule formalizes the table structure.
Every piece of data (atomic value) in a relational database must be accessible without ambiguity by using a
combination of the table name, primary key, and column name. This guarantees direct access to any data.
The database must support null values (missing or inapplicable information) in a systematic way. Nulls
should be treated independently from other data and should not affect the database's integrity.
The database must have an online, accessible catalog (metadata) that describes the database's structure. The
catalog should be queryable using the same query language as the database itself.
The database must support a comprehensive language that allows for data definition, manipulation,
querying, and transactions. SQL (Structured Query Language) is a widely used example of such a language.
Any view (virtual table) that is theoretically updateable must be updatable through the system. Views should
not only be read-only but should allow users to make changes that reflect in the underlying tables.
The database must support insert, update, and delete operations at the level of a set of records, rather than
requiring operations on one record at a time. This allows for bulk data manipulation.
Changes to the physical storage of data (e.g., how files are organized on disk) should not affect how data is
accessed or structured at the logical level. This ensures that users and programs are insulated from physical
changes.
Rule 9: Logical Data Independence
Changes to the logical structure of the database (such as adding or removing fields or tables) should not
require rewriting application programs. This enables the database structure to evolve without affecting user
interaction.
The database must have integrity constraints (such as primary keys, foreign keys, and uniqueness
constraints) that are independent of the application. These rules should be stored within the database and
enforced by the DBMS.
The database should work in a distributed environment (across multiple physical locations), but users should
not notice any differences whether the data is stored on one machine or distributed across many.
If a system provides a lower-level language (like record-at-a-time access), it should not be able to bypass the
integrity rules or security constraints defined at the relational level. This ensures that all operations go
through the relational interface.
3. What are the steps involved in converting ER diagrams into relational tables?
Each entity in the ER diagram is converted into a table (relation) in the relational schema.
The attributes of the entity become the columns of the table.
The primary key of the entity becomes the primary key of the table.
Each attribute of an entity in the ER diagram becomes a column in the corresponding table.
Ensure that any composite attributes are broken down into their component attributes and added as
separate columns.
Derived attributes (attributes that can be calculated from other attributes) are generally not stored as
columns in the table; they are calculated as needed.
Multivalued attributes (those that can have multiple values for a single entity, like a person having
multiple phone numbers) are typically represented by creating a separate table.
This new table will have a foreign key that references the primary key of the original table and a
column for the multivalued attribute.
5. Convert Weak Entities
Weak entities (entities that depend on another entity for identification) are represented by creating a
table, and the table must include a foreign key that references the parent entity.
The primary key of a weak entity table is usually a combination of the primary key of the related
strong entity and some partial key of the weak entity.
Primary Key Constraints: Ensure that each table has a primary key to uniquely identify records.
Foreign Key Constraints: Define foreign keys to maintain referential integrity between related
tables.
Unique Constraints: Apply unique constraints where applicable (e.g., Email should be unique for
Customer).
Not Null Constraints: Specify attributes that cannot be null, such as the primary key.
7. Normalization (Optional)
After mapping entities and relationships into tables, you may need to normalize the database to reduce
redundancy and ensure that the tables are in the appropriate normal forms (e.g., 1NF, 2NF, 3NF).
Normalization eliminates data anomalies by breaking down tables into smaller, more structured tables.
1. Customers:
2. Orders:
1. INNER JOIN:
The INNER JOIN returns rows that have matching values in both tables. If there’s no match, the row is
excluded from the result.
Code:
FROM Customers
ON Customers.Customer_ID = Orders.Customer_ID;
Result:
Name Order_ID Total_Amount
The LEFT JOIN returns all rows from the left table (Customers), and the matched rows from the right table
(Orders). If there is no match, the result is NULL on the right side.
Code:
FROM Customers
ON Customers.Customer_ID = Orders.Customer_ID;
Result:
Name Order_ID Total_Amount
The RIGHT JOIN returns all rows from the right table (Orders), and the matched rows from the left table
(Customers). If there’s no match, NULL values are returned for the left table.
Code:
FROM Customers
ON Customers.Customer_ID = Orders.Customer_ID;
Result:
Name Order_ID Total_Amount
The FULL JOIN returns all rows when there is a match in either left or right table. If there’s no match,
NULL values are returned for the columns that don’t have a match.
Code:
FROM Customers
ON Customers.Customer_ID = Orders.Customer_ID;
Result:
Name Order_ID Total_Amount
5. CROSS JOIN:
The CROSS JOIN returns the Cartesian product of the two tables, i.e., every row in the first table is
combined with every row in the second table. This results in a large number of rows.
FROM Customers
Alice 101
Alice 102
Alice 103
Alice 104
Bob 101
Bob 102
1. Atomicity: All-or-Nothing
Atomicity ensures that a transaction is treated as a single, indivisible unit of work. Either all operations
within the transaction are executed successfully, or none are. If any part of the transaction fails, the entire
transaction is rolled back, leaving the database in its previous state.
Imagine you are transferring $100 from your checking account to your savings account. This transaction
involves two steps:
If the system crashes after deducting the $100 but before adding it to the savings account, the transaction is
incomplete and inconsistent
Consistency ensures that a transaction brings the database from one valid state to another, maintaining the
integrity and predefined rules of the database
Real-Life Example: E-Commerce Purchase
When a customer purchases an item from an online store, several consistency rules must be enforced:
Isolation ensures that multiple transactions occurring at the same time do not affect each other. Intermediate
states during a transaction are invisible to other transactions. This prevents conflicts caused by concurrent
transactions trying to access or modify the same data.
Imagine two people are booking the last ticket for a concert at the same time:
Durability ensures that once a transaction is successfully committed, the changes are permanent and will
survive system crashes or power failures. The data is written to non-volatile storage, ensuring it remains
intact even if the system restarts.
After you book a flight and the transaction is completed (e.g., you get a confirmation email and a ticket),
durability ensures that even if the airline’s booking system crashes or undergoes maintenance, your
booking information is not lost.
A serializable schedule in database systems refers to a schedule (or sequence) of transac ons that ensures
the same result as if the transac ons were executed in some serial order (one a er the other), even though
they are executed concurrently. In other words, a serializable schedule produces the same outcome as a
serial schedule, where transac ons are run sequen ally without any overlap.
Concurrency control in databases allows mul ple transac ons to execute simultaneously, which improves
performance and resource u liza on. However, this concurrency can lead to issues such as lost updates,
dirty reads, or inconsistent data. Serializability ensures that even with concurrent execu on, the outcome
is equivalent to one that would be produced if the transac ons were executed one by one in some order.
Let's say we have two transactions, T1 and T2, operating on the same data item, X. Here are the actions of
each transaction:
Timestamp-based protocols are a type of concurrency control mechanism used in database systems to
ensure that transactions execute in a way that guarantees serializability, preserving data consistency and
preventing conflicts between concurrent transactions. These protocols assign a unique timestamp to each
transaction and use this information to manage the order in which transactions can access and modify shared
data items.
Timestamps:
Each transaction is assigned a unique timestamp when it starts. This timestamp is used to determine
the order in which transactions should be executed.
Timestamps can be based on the system clock or generated as sequential numbers to ensure
uniqueness.
Two important timestamps are maintained for each data item:
o Read Timestamp (RTS): The timestamp of the most recent transaction that successfully read
the data item.
o Write Timestamp (WTS): The timestamp of the most recent transaction that successfully
wrote to the data item.
Basic Idea:
Transactions are ordered by their timestamps. A transaction with an earlier timestamp is considered
"older" and should be executed before a transaction with a later timestamp.
The protocol ensures that transactions execute in a way that respects this logical ordering, preventing
any "younger" transaction from affecting the operations of an "older" one.
UPDATE Accounts
SET Balance = Balance - 200
WHERE Account_ID = 1;
UPDATE Accounts
SET Balance = Balance + 200
WHERE Account_ID = 2;
COMMIT;
Rollback:
BEGIN TRANSACTION;
UPDATE Accounts
SET Balance = Balance - 300
WHERE Account_ID = 1;
UPDATE Accounts
SET Balance = Balance + 300
WHERE Account_ID = 2;
ROLLBACK;
9.Explain how deadlocks occur in databases. List two methods to avoid deadlocks.
A deadlock in a database occurs when two or more transactions are waiting for each other to release
locks on resources, but none of them can proceed. This creates a circular waiting situation where
each transaction is waiting for a resource that the other holds, leading to a state where none of the
transactions can complete. In other words, the transactions are stuck because each one is holding a
lock on a resource that the other needs.
Deadlocks occur in databases primarily due to:
Several strategies can be employed to prevent or avoid deadlocks in database systems. Below are two
common methods:
This method avoids deadlocks by setting a timeout period for transactions. If a transaction is unable to
acquire the necessary locks within the specified timeout period, it is assumed to be in a deadlock situation
and is automatically aborted. The system then rolls back the transaction, releases the locks it holds, and
allows other transactions to proceed.
This method avoids deadlocks by imposing a total order on how transactions can acquire locks on
resources. The Wait-Die and Wound-Wait schemes are two variations of this approach, both relying on
transaction timestamps to break circular wait conditions.
Wait-Die Scheme:
Older transac ons can wait for younger transac ons to release locks, but younger transac ons are aborted
if they need to wait for an older transac on.
Wound-Wait Scheme:
Older transac ons can "wound" (abort) younger transac ons to acquire the locks they need, but younger
transac ons are forced to wait if they need to acquire resources held by older transac ons.
1. Active State
Description: This is the initial state of a transaction, where it is executing its operations (such as
reading or writing data). During this stage, the transaction is working with the database and acquiring
necessary resources like locks.
Description: After the transaction has executed its final statement but before the changes have been
made permanent, the transaction enters the partially committed state. This means the system has
performed all operations but hasn’t yet committed them to the database.
3. Committed State
Description: Once a transaction successfully completes and the changes made to the database are
permanently saved, it enters the committed state. At this point, the transaction has been
successfully completed, and its results are now visible to other transactions.
4. Failed State
Description: A transaction enters the failed state if it encounters an error during its execution (e.g., a
constraint violation, system failure, or insufficient resources). In this state, the transaction can no
longer proceed, and it needs to be aborted or rolled back.
5. Aborted State
Description: When a transaction cannot proceed due to a failure or if the transaction is rolled back, it
enters the aborted state. At this point, all the changes made by the transaction are undone, and the
database is restored to its previous consistent state.
6. Terminated State
Description: A transaction enters the terminated state after it has either committed or aborted. At
this point, the transaction is finished, and no further operations will be performed.
4. Business Statistics (MT11)
Unit 1: Introduction to Business Statistics
1. Define statistics and explain its role in business decision-making.
Statistics is a branch of mathematics that involves the collection, analysis, interpretation,
presentation, and organization of data. It provides tools and methodologies to make sense of
numerical data, identify patterns, and draw meaningful conclusions. In essence, statistics is used to
understand variability in data and make informed decisions based on that understanding.
Role of Sta s cs in Business Decision-Making
1. Data-Driven Insights
Statistics help businesses collect and analyze data from various sources (e.g., customer surveys, sales
records, market research) to derive valuable insights. These insights guide businesses in understanding
customer behavior, identifying growth opportunities, and improving products or services.
Statistical techniques such as regression analysis, time series analysis, and trend analysis are used to forecast
future business conditions like sales, demand, revenue, or market trends. This helps businesses prepare for
the future and align their strategies accordingly.
3. Quality Control
Statistics play a critical role in maintaining and improving the quality of products and services. Techniques
like Statistical Process Control (SPC) and Six Sigma use statistical tools to monitor and control
production processes, ensuring that they meet specified standards and reduce defects.
4. Risk Management
By analyzing historical data, businesses can identify potential risks and their impact. Statistical models such
as probability distributions and Monte Carlo simulations help businesses estimate the likelihood of risks and
develop strategies to mitigate them.
Statistics allow businesses to measure the effectiveness of their marketing campaigns by tracking metrics
such as conversion rates, customer retention, and return on investment (ROI). Additionally, statistical
analysis of customer data enables businesses to segment their markets and target specific customer groups
more effectively.
6. Operational Efficiency
Statistical analysis can optimize business operations by identifying areas for improvement. Techniques like
data mining and process analysis help businesses streamline their operations, reduce costs, and enhance
productivity.
7. Financial Analysis and Investment Decisions
In finance, statistics are used to analyze market trends, portfolio performance, and economic indicators.
Businesses rely on statistical models to assess investment risks, calculate the expected returns, and make
informed financial decisions.
Statistics provide businesses with tools to measure their performance against industry benchmarks or
competitors. Key performance indicators (KPIs) such as sales growth, profit margins, or customer
satisfaction scores are often analyzed statistically to assess how well a company is doing.
1. Qualitative Data
Qualitative data is non-numerical and typically represents characteristics, descriptions, or qualities that
can’t easily be measured with numbers. It deals with descriptions, observations, and words rather than
measurements or amounts.
Examples:
Customer feedback: "The product is easy to use," or "I find the customer service to be very helpful."
Gender: Male, female, non-binary.
Colors: Red, blue, green.
Types of cuisine: Italian, Mexican, Chinese.
2. Quantitative Data
Quantitative data is numerical and represents measurable quantities or amounts. It can be counted or
expressed in numbers, and it allows for mathematical operations and statistical analysis.
Examples:
1. Nominal Scale
Definition: The nominal scale is the simplest form of measurement, used to categorize data without
any quantitative value or order.
2. Ordinal Scale
Definition: The ordinal scale categorizes data into ordered categories, where the order matters but
the differences between categories are not quantifiable.
3. Interval Scale
Definition: The interval scale is a numerical scale where the distance between values is meaningful,
but there is no true zero point.
4. Ratio Scale
Definition: The ratio scale is the most informative scale, with all the properties of the interval scale,
plus a true zero point that indicates the absence of the quantity being measured.
Descriptive Statistics
Definition: Descriptive statistics involve summarizing and organizing data to describe its main
features. It provides a way to present and understand the basic characteristics of a dataset.
Focuses on describing the data at hand.
Utilizes measures such as mean, median, mode, standard deviation, and range.
Often includes graphical representations like charts, graphs, and tables.
Inferential Statistics
Definition: Inferential statistics involves making predictions or inferences about a population based
on a sample of data. It allows researchers to draw conclusions beyond the immediate data.
Focuses on describing the data at hand.
Utilizes measures such as mean, median, mode, standard deviation, and range.
Often includes graphical representations like charts, graphs, and tables.