Measuring peak disk use of a process

Question

I am trying to benchmark a tool I'm developing in terms of time, memory, and disk use. I know /usr/bin/time gives me basically what I want for the first two, but for disk use I came to the conclusion I would have to roll my own bash script that periodically extracts the 'bytes written' contents from /proc/<my_pid>/io. Based on this script, here's what I came up with:

"$@" &
pid=$!
status=$(ps -o rss -o vsz -o pid | grep $pid)
maxdisk=0
while [ "${#status}" -gt "0" ];
do
    sleep 0.05
    delta=false
    disk=$(cat /proc/$pid/io | grep -P '^write_bytes:' | awk '{print $2}')
    disk=$(disk/1024)
    if [ "0$disk" -gt "0$maxdisk" ] 2>/dev/null; then
        maxdisk=$disk
        delta=true
    fi
    if $delta; then
        echo disk: $disk
    fi
    status=$(ps -o rss -o vsz -o pid | grep $pid)
done
wait $pid
ret=$?
echo "maximal disk used: $maxdisk KB"

Unfortunately, I am running into two problems:

The first is that I am piping the output of this script along with that of the tool I would like to benchmark to a file, and it seems occasionally these streams interfere, leading me to see 0 or too low disk use reported at the bottom of this file.
The second problem is that I don't know what to do about processes that delete temporary files as part of their process. In this case I think the fair benchmark would be to record the maximum net disk use (i.e., the peak in bytes written - bytes erased), but I don't know where the second part of this difference can be found.

How can I resolve these problems?

Community · Accepted Answer · 2020-06-20 09:12:55Z

1

You may like to have a look at filetop from BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more:

tools/filetop: File reads and writes by filename and process. Top for files.

This script works by tracing the vfs_read() and vfs_write() functions using kernel dynamic tracing, which instruments explicit read and write calls. If files are read or written using another means (eg, via mmap()), then they will not be visible using this tool.

Brendan Gregg gives good talks and demos about Linux Performance Tools, they are quite instructive.

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Jan 10, 2017 at 16:06

Maxim Egorushkin

136k17 gold badges194 silver badges287 bronze badges

Add a comment |

Community · Accepted Answer · 2017-05-23 11:52:56Z

0

I eventually found this similar question: How do I measure net used disk space change due to activity by a given process in Linux?.

Based on the answers there, this seams to be a thorny problem due to the difficulty in tracking all the different types of changes that may be initiated by a given process.

Dtrace is also mentioned there, but as I understand it, it is proprietary to Sun (or I guess Oracle now?) and thus available by only on Solaris by default. Eventually I found this Github repo, aiming to close that gap for Linux users.

edited May 23, 2017 at 11:52

CommunityBot

11 silver badge

answered Jan 10, 2017 at 13:26

roro

1778 bronze badges

1

You may like to watch Give me 15 minutes and I'll change your view of Linux tracing.
– Maxim Egorushkin
Commented Jan 10, 2017 at 16:07

Add a comment |

Tomachi · Accepted Answer · 2018-04-17 05:47:20Z

You could think of it differently, not worry at all about deleted files, by using multiple timestamps in your records, giving you:

Disk writes delta over time. eg 8 GB/day. Doesn't matter if all of it to /tmp. Each time it is run a new average saved to disc, with a counter, to keep a rolling average. So if each hour you errant process does 2 GB, then 1 GB, then 0 GB, each hour, thats 1 GB/hour (for the time period)
For each snapshot, you pick the highest, record that, in this case 2 GB for the first hour of operation. If you run the script each hour and it's always 0 GB, it will report 2 GB in the first hour. Then if in the wee smalls it kicks up and puts down 5 GB, you "peak" will show that at 3am say, with average of 333 MB/hour.

Collectives™ on Stack Overflow

Measuring peak disk use of a process

3 Answers 3

Your Answer

Not the answer you're looking for? Browse other questions tagged
linux
bash
io
diskspace
data-profiling
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged linuxbashiodiskspacedata-profiling or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
linux
bash
io
diskspace
data-profiling
or ask your own question.