Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retries #27

Closed
mnot opened this issue Jun 20, 2017 · 13 comments · Fixed by #222
Closed

Retries #27

mnot opened this issue Jun 20, 2017 · 13 comments · Fixed by #222

Comments

@mnot
Copy link
Member

mnot commented Jun 20, 2017

As discussed quite a bit, retries in HTTP need to be better defined. In particular, HTTP does not offer a guarantee that any particular request will not be automatically retried.

@mnot
Copy link
Member Author

mnot commented Oct 16, 2018

Messaging - 9.4.1. Retrying Requests contains some things that probably belong in Semantics, probably somewhere up in the architecture section.

Beyond that, I think the action here is to clarify a bit more that POST etc. requests are actively retried by many UAs (e.g., most Web browsers) because of H1's connection management issues. @mcmanus can you confirm that that's H1-specific?

@mcmanus
Copy link

mcmanus commented Oct 16, 2018

generally goaway would determine this in h2

@mnot
Copy link
Member Author

mnot commented Jan 8, 2019

Semantics - 7.2.2 Idempotent Methods talks about retries some.

I'm inclined to add a very short reference to it in architecture -- e.g., "Some requests can be retried in the event of failure; see ref" -- around the sentence starting "A client sends an HTTP request..." in 2.1 Client Server Messaging.

In h1-messaging, I'd like to add the behaviour that UAs actually implement to the list of examples of how they'd know it's OK to retry. AIUI, that would be something like:

For example, when a user agent sending a request has the underlying connection closed by its peer before the complete request headers are sent can retry that request, because the server has not had an opportunity to begin processing it.

@mcmanus is that accurate?

@martinthomson
Copy link
Contributor

Well, the server could have had an opportunity to process a request, because they routinely start processing based on only a small amount of the header.

The fact is that some clients retry based on very weak signals, risking duplicate processing in favour of greater robustness. The retry logic as I understand it is "have I seen any hint of a response? no? retry."

@mnot
Copy link
Member Author

mnot commented Jan 8, 2019

It seems like either clients or the spec should change, then. I suspect clients will not...

@martinthomson
Copy link
Contributor

There's two things you might consider here:

  • noting that clients might do some things to retry based on their own determination of when that is appropriate (for example, in the extreme, retrying if the connection breaks before any byte of a response is seen).

  • recommendations about good practice, which probably doesn't want to point at what the most aggressive clients (i.e., browsers) might choose to do

Trying to walk the line without making that distinction is probably too hard.

@mnot
Copy link
Member Author

mnot commented Jan 8, 2019

That seems like a good direction, except that the spec currently contains a MUST NOT; "good practice" isn't that strong...

MUST NOT (BUT WE KNOW YOU WILL)?

@martinthomson
Copy link
Contributor

Yep. Part of this work is in aligning the spec with reality. A SHOULD NOT is probably appropriate. No need to resort to RFC 6919 language.

@mnot
Copy link
Member Author

mnot commented Jan 8, 2019

Proposal:

A user agent SHOULD NOT automatically retry a request with a non-idempotent method unless it has some means to know that the request semantics are actually idempotent, regardless of the method, or some means to detect that the original request was never applied.

For example, a user agent that knows (through design or configuration) that a POST request to a given resource is safe can repeat that request automatically. Likewise, a user agent designed specifically to operate on a version control repository might be able to recover from partial failure conditions by checking the target resource revision(s) after a failed connection, reverting or fixing any changes that were partially applied, and then automatically retrying the requests that failed.

Some user agents use weaker signals to initiate automatic retries; for example, when a POST request is sent, but the underlying transport connection is closed before any part of the response is received. Although this is commonly implemented, it is NOT RECOMMENDED.

@martinthomson
Copy link
Contributor

LGTM. s/; f/. F/

@wtarreau
Copy link

wtarreau commented Jan 8, 2019

While I'm fine with this, I'm stil having a big problem with retries. Some of our (haproxy) users are pressuring us to implement the so-called "L7 retries" by which if the intermediary faces a problem with the server after the request was delivered, it happily retries it, even for a POST! Not only it's a technical problem (requires all requests to be buffered for the time it takes a server to respond) but it's in total violation of all rules.

The problem here is that there's an ongoing trend for doing this and users find it normal since it allows them to hide their constantly crashing servers... And usually they consider that the application deals with replay protection by itself so obviously the infrastructure must retry. Thus my big concern here is to figure till what extent an intermediary could reasonably retry, or how we could have signals between UA and server to indicate if the request is retry-safe.

@mnot
Copy link
Member Author

mnot commented Jan 9, 2019

A protocol extension seems reasonable for that case; I think it could be interesting to clients -- both intermediary and otherwise.

@mnot
Copy link
Member Author

mnot commented Feb 14, 2019

Just leaving a note to remember to go through this draft to make sure we got everything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

4 participants