Apache-Spark: Retry on different executor

Question

I'm in an organization where Hadoop/Spark is available, but I can't alter its configuration and can't access it either. So, as a client code, I was wondering if I can ask Spark to retry failing tasks on a different executor if it's not already the case.

Indeed, I try to access a file on HDFS from a given executor server, let's call it executorserver42, and this particular server keeps provoking "Filesystem closed" I/O exceptions (that's implied, but the file exists, is not corrupted in any way and so on). And it tells me that after having tried 4 times. My first interrogation was, did it try 4 times from that server or did it relocate the task each time.

To sum up: are task retries relocated on each retry? If not, how could I configure that task retries must be relocated as a client code? Moreover, are HDFS filesystem temporary unavailability failures common (I understand that you don't know my organization and could respond that it depends on how it's laid out there. That's fine. I'm just fishing for that last question)?

Thanks.

Besides the original question, I found out why the code we're working with had that FileSystem closed exception. It's because this code was written by then without understanding of how FileSystem.get() works. When I started working on it, I had none too. Actually, the FileSystem object that's returned is cached by default and should not be closed in that case. So, in a multithreaded context, closing is just asking for a timed bomb to provoke a random disaster in the process: stackoverflow.com/questions/23779186/… — tomoyo255, Commented Aug 11, 2022 at 12:06

Matt Andruff · Accepted Answer · 2022-08-10 12:42:57Z

0

This isn't what you asked but it's the problem you describe.

To answer the question you asked:

You should be able to access/configure spark configuration through your code. You could ask your code to be reschedule on another node with spark.excludeOnFailure.enabled true There are more setting in regards to failures and scheduling of tasks in the Spark Documentation.

answered Aug 10, 2022 at 12:42

Matt Andruff

5,1001 gold badge6 silver badges23 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Apache-Spark: Retry on different executor

1 Answer 1

Your Answer

Not the answer you're looking for? Browse other questions tagged
apache-spark
hdfs
executor
retry-logic
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged apache-sparkhdfsexecutorretry-logic or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
apache-spark
hdfs
executor
retry-logic
or ask your own question.