0

I'm in an organization where Hadoop/Spark is available, but I can't alter its configuration and can't access it either. So, as a client code, I was wondering if I can ask Spark to retry failing tasks on a different executor if it's not already the case.

Indeed, I try to access a file on HDFS from a given executor server, let's call it executorserver42, and this particular server keeps provoking "Filesystem closed" I/O exceptions (that's implied, but the file exists, is not corrupted in any way and so on). And it tells me that after having tried 4 times. My first interrogation was, did it try 4 times from that server or did it relocate the task each time.

To sum up: are task retries relocated on each retry? If not, how could I configure that task retries must be relocated as a client code? Moreover, are HDFS filesystem temporary unavailability failures common (I understand that you don't know my organization and could respond that it depends on how it's laid out there. That's fine. I'm just fishing for that last question)?

Thanks.

1
  • Besides the original question, I found out why the code we're working with had that FileSystem closed exception. It's because this code was written by then without understanding of how FileSystem.get() works. When I started working on it, I had none too. Actually, the FileSystem object that's returned is cached by default and should not be closed in that case. So, in a multithreaded context, closing is just asking for a timed bomb to provoke a random disaster in the process: stackoverflow.com/questions/23779186/…
    – tomoyo255
    Commented Aug 11, 2022 at 12:06

1 Answer 1

0

This isn't what you asked but it's the problem you describe.

To answer the question you asked:

You should be able to access/configure spark configuration through your code. You could ask your code to be reschedule on another node with spark.excludeOnFailure.enabled true There are more setting in regards to failures and scheduling of tasks in the Spark Documentation.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.