After an outage caused by expensive gallery tag expansions not timing out & being retried, we introduced a PHP API timeout of 290s. Subsequent changes lowered this to 60s. However, there is evidence both in this task, T152074 and T149421 that those timeouts do not actually work in an FCGI context. Expensive requests can and do pile up in HHVM, causing outages such as T151702.
Further tightening timeouts per API end point or request
Since the normal cost of different API end points differs by several orders of magnitude, a global upper bound like 60 seconds is unlikely to be useful for many clients. For example, in many situations users are likely to move on instead of waiting for 60s. More critically, avoiding retry amplification requires a coordination of timeouts in our infrastructure (as described in https://www.mediawiki.org/wiki/Rules_of_thumb_for_robust_service_infrastructure), which means that the time budget for the lowest level services is very limited.
It would be much better for overall system stability to support tighter timeouts per API end points.
One option would be to generally set up timeouts based on expected execution times, and clearly document these so that clients can set their timeouts slightly larger. Another option would be to allow clients to pass in a lower timeout.