I'm in an organization where Hadoop/Spark is available, but I can't alter its configuration and can't access it either. So, as a client code, I was wondering if I can ask Spark to retry failing tasks on a different executor if it's not already the case.
Indeed, I try to access a file on HDFS from a given executor server, let's call it executorserver42, and this particular server keeps provoking "Filesystem closed" I/O exceptions (that's implied, but the file exists, is not corrupted in any way and so on). And it tells me that after having tried 4 times. My first interrogation was, did it try 4 times from that server or did it relocate the task each time.
To sum up: are task retries relocated on each retry? If not, how could I configure that task retries must be relocated as a client code? Moreover, are HDFS filesystem temporary unavailability failures common (I understand that you don't know my organization and could respond that it depends on how it's laid out there. That's fine. I'm just fishing for that last question)?
Thanks.