Homework 3_2024 (1)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Homework 3 – Homework 2 revisited

Points: 2

Deadline: 21.11.2024 00:00 (System time in Blackboard)

In the prior task, you saw that creating Deadlocks is rather easy. Solving them, such that all
transactions eventually commit however, creates an overhead, e.g., for restarting the
transactions. That is, the overall system performance degrades. In this task, we thus want to
better understand what is happening and optimize the infrastructures throughput.

1) Extend the benchmark (1 Point)


In case you examine the infrastructure of the prior task in detail, you will quickly notice that the
transactions were designed to provoke many deadlocks. This ensures that you can observe them
independent of the hardware and software you are using. There are quite long write-only
transactions and only a few data objects (i.e., tuples). From a practical point of view, this is an
implausible setting. To this end, we first change the transaction generation process. Then, we
benchmark how your three existing solutions perform. With existing solutions, I refer to: the
original one not restarting the transactions, the restarting transactions solution, and the one
avoiding life-locks.

Specifically, the task is as follows:

1. Change the transaction generation process as follows, to mimic a more realistic


workload: (0.5 Points)
a. For each transaction t (i.e., Blocker class instance) dice a value a p in [0,1] i.i.d.
b. If p<0.7 holds, transaction t becomes a read-only multi-point query. Such a
transaction reads the data_value attribute of three (i.e.,
num_queries_per_task) random (distinct) tuples, and outputs the sum at
completion to the console. Use one SQL statement per tuple using, e.g., the
following prepared statement Select data_value from
blocking_data where data_id = ?.
c. If p>=0.7 and p<0.8 holds, transaction t becomes a full table read-only scan. Such
a transaction performs the query Select sum(data_value) from
blocking_data. Then, it outputs the result to the console.
d. If p>=0.8, the transaction becomes a write query. Such a query executes 3
update statements as before. However, each update statement shall only modify
a single tuple.
2. Benchmark your solutions for reasonable values of num_taks. Also, play with the
number of hardware threads to see whether you observe for some linear speed-up
when increasing the number of threads. (0.25 P)
3. Prepare a live-demo such that the restarting solution finishes in – say – 30 seconds.
(0.25 P)

Hints

• There is no need to optimize loading the data into the database (unless it takes to long
for your).
• Do not increase the number of tuples, if required to distribute the load.

2) Find and eliminate existing bottlenecks (1 Points)


Finally, we aim at optimizing the systems throughput. The objective is to minimize the overall run
time for a given set of transactions from 1). All transactions must commit eventually. The core
idea is that we study how much improvement we gain with your solution (for reasonable values
Blocking tasks), compared to a trivial serial execution of all transactions.

Specifically, your submission shall include

1. An implementation of a serial execution approach and respective benchmark results for


reasonable values of num_taks. (0.25P)
2. An analysis of existing bottlenecks. It is recommended to rely - apart from code reviews -
on (Java) Profilers, and consider the logging features Postgres offers. (0.25P)
3. Solve some of bottlenecks you identified. (0.25P)
4. Compare your improved solution(s) to the results of the serial execution. Do you observe
some unexpected results? Prepare a talk and submit the slides in Blackboard. The slides
shall also contain infos about explain the general idea, report on most important issues
you faced upon solving this task, and the induvial contribution per team member (in %).
(0.25P)

Hints

• Do not change the order of write operations in a transaction, e.g., to provoke canonical
locking.
• Don’t be disappointed if the most relevant bottlenecks are trivial.
• If you do some pre-processing e.g., to create batches of non-conflicting transactions: Add
this time to the run time.
• I recommend that Client and Database server run on the same machine to avoid nasty
network issues.
• Just, in case you have find that there is a major issue in the infrastructure, you can change
it almost freely. The minimal requirement is that there needs to be a client connected via
JDBC/ODBC to a Postgres database. In addition, the table schema and query should remain
the same.
• We are only interested in the run time of transaction. The time loading the data is not
relevant.

FAQ

• Q: May I use some idea or code found on the internet? A: Yes, if you can explain the
idea, and respect the corresponding license agreements. Cite and mention the sources
in the talk.
• Q: May I use ChatGPT or similar AI? A: Same answer as before.
• Q: May I optimize the client code because Java does [insert your complaint here]. A:
Yes, if the requirements mentioned in tasks are fulfilled. Mention your modifications
in the talk.
• Q: May I change internals of PostgreSQL (such as locking granularity, lock escalation
strategy, etc.), or set the locks manually. A: Sure.
• Q: May I execute all the read only queries first and then serially execute all writing
tasks? Or perform some other kind of grouping? A: Yes, but the time for building the
groups needs to be included in the measurement. Additionally, the time stops after all
Blocking tasks finish successfully.

You might also like