0

I am trying to read from a file, perform an operation on the contents on the file, and then write to another file as fast as possible (for a competition). To do this, I mmap both the input and output file and read and write to the mmaped files. However, I noticed that mmaping an existing output file was significantly slower than mmapping a nonexisting file (x2 or x3 slower total runtime). Specifically, munmap was much slower for the output file. So now I check if the output file already exists and delete it if it does, which gives significantly faster run times. However, this takes about 10ms, which is quite significant given that my algorithm takes 100ms for a 100mb input file.

if(access(argv[2], F_OK) == 0) {
    if(remove(argv[2]) != 0) {
        fprintf(stderr, "%s\n", "Error removing output file");
        exit(0);
    }
}

The access call is fast, it is the remove call that is taking time.

Things that didn't work:

  • Opening the output file with O_TRUNC before mmaping (slow mmap)
  • Writing to a temporary empty file, then renaming to the output file (as slow as remove)
  • Unlink (as slow as remove)

Is there a faster way to achieve the same effect as deleting?

4
  • 1
    Perhaps the OS operation to remove the file needs that time. If so, you cannot do anything. Commented Nov 7, 2022 at 16:38
  • 2
    Likely the OS checking that there are no more open references to that file. {shrug} I don't think highly of "competitive coding" anyway. In the world of actual software, this is a pointless micro-optimization.
    – DevSolar
    Commented Nov 7, 2022 at 16:40
  • "Unlink (as slow as remove)" -- they're the same thing.
    – Barmar
    Commented Nov 7, 2022 at 18:01
  • In Linux, a simple copy_file_range() call will beat memory-mapping anyway. Commented Nov 9, 2022 at 1:09

1 Answer 1

0

I'm not sure if this will help, but branching take time, while branch-less code will run faster, since yours for a competition:

if(access(argv[2], F_OK) == 0) {
    if(remove(argv[2]) != 0) {
        fprintf(stderr, "%s\n", "Error removing output file");
        exit(0);
    }
}
//because exit is a void function I couldn't eliminate the all branching.
if ((access(argv[2], F_OK) == 0) && (remove(argv[2]) != 0) && fprintf(stderr, "%s\n", "Error removing output file"))
 exit(0);
// test this variant also if it did better.
(access(argv[2], F_OK) == 0) && (remove(argv[2]) != 0) && fprintf(stderr, "%s\n", "Error removing output file") && my_exit(0);
// with
int my_exit(int status)
{
  exit(status);
}

this article talks about it: https://dev.to/jobinrjohnson/branchless-programming-does-it-really-matter-20j4 this video give a good overview to understand more: https://www.youtube.com/watch?v=S4mKJxbrkT4&t --> 07:43 where he talked about the branching.

also for the best performance there are flags to compile your code with if that legal for the competition of course.

i don't know about removing the file, but if this helped you tell me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.