Troubleshooting errors using an nginx reverse proxy. Gateway errors: 502, 504

This article can help you troubleshoot issues related to 504/502/gateway errors. When troubleshooting, it can be helpful to understand how the processing works between nginx and PHP, which is often used for high performance web application hosting, like WordPress.

Common errors that occur that are connected to this configuration:

  • 504 Gateway Timeout
  • Gateway Timeout (504)
  • HTTP Error 504 – Gateway Timeout
  • Gateway Timeout Error
  • 502 bad gateway
  • Bad Gateway

Background to gateway errors

Traditional web servers use Apache to respond to HTTP requests, then if there’s a PHP app running (like WordPress), they’ll interact with PHP directly like this:

Apache -> php-fastcgi

But over the years apache became bloated, eating up lots of ram and disk IO resources for every single visitor to the site, so web developers decided to build replacement HTTP servers, one such server being nginx. We use it to ensures the websites we host are responding as quickly as possible.

Adding nginx to the mix results in the above changing to look like this:

nginx -> apache -> php-fpm

Since nginx is lightweight, it can handle many simple queries realllllly fast, this results in any simple requests for files like html and javascript to respond super fast with nginx.

nginx acts as a gateway sending requests through to apache and PHP. If there’s a problem ‘downstream’ — such as with apache or php — you often get a gateway error, like 502 Bad Gateway or 504 Gateway Timeout.

Troubleshooting Gateway Errors

The primary way to help narrow things down is to check the logs to see what’s being reported. sometimes the logs will simply report a 502 or 504 error — just like you’re seeing in the browser — which won’t be helpful. But sometimes you might see an error or warning logged just prior to the Gateway error that will pinpoint what plugin or theme is causing the error.

You can learn how to use Plesk to view logs here. If you can use those errors to get to the bottom of the issue, then definitely do so! Otherwise read on for additional ways to narrow things down.

Sometimes when you’re receiving constant Gateway errors, you are unable to access your website’s admin area (like WordPress’s wp-admin), yet you need to do so to help troubleshoot this issue. The only way to regain access is to block access until you can get the issue resolved. If you are not having problems accessing your website admin, skip on down to the first possible Cause below.

To block access, you can proceed in one of two ways: 1) block access to the particular part of the site that’s eating up resources, or 2) you can block access to all IPs except your own temporarily to regain access.

To block access to all IPs except your own (easier)

Login to Plesk and visit the apache & nginx settings page. Under “Deny access to the site” enter a custom value and make it "* (without the quotes). Then under “Excluding” enter your own IP address and save the changes. You may need to wait up to 3 minutes after saving your changes for the server to restart (~1m) and for any PHP processes that were running to reach their timeout (~2m).

Be sure to remove this block when you’re done troubleshooting.

To block access to the resource intensive request (more complex)

If there’s only one page or resource on the site that’s creating this problem (like a single page that loads slowly), use the logs to identify that particular request URI. Then add the following code snippet to the top of your .htaccess file to block the problematic resource (even just temporarily) so you can access the admin panel. Replace filename\.php with the actual request URI shown in the logs (this is the part that comes after the domain). If there are dots in the file path be sure to prepend them with a backslash as seen in the example.

<FilesMatch filename\.php>
Order Allow,Deny
Deny from all
</FilesMatch>

Cause: a process run amok (one-time)

If the issue was just a one-time thing (like you ran a script and it wouldn’t stop running), the solution is to directly restart the underlying services, which forces that script to end. If you don’t have root access to the server, then you can often trigger such a restart by making changes to your website configuration in Plesk.

For example, try going to the “PHP Settings” button for your domain, then making any small change there (like increasing memory from 32MB to 48MB) then save your changes. This update triggers a restart of apache and PHP processing and could solve your error!

It may take up to 3 minutes (or longer depending on your provider) for the changes to take effect.

Cause: external resource requests

Some code is trying to communicate with an external resource (hosted elsewhere) that’s taking too long to respond or not responding at all. The process running that code reaches its timeout, then returns to nginx saying “sorry, couldn’t do it” to which nginx will provide a gateway error, like bad gateway (502) or gateway timeout (504).

To solve issues with External Resources, you’ll need to look into what parts of your website’s PHP code (custom code or plugins/theme code) are trying to retrieve external resources. This is most commonly code that connects to a 3rd party service like a payment processor or weather data retrieval (typically via their API). Similarly, any services that require connectivity on a non-standard port (standard ports are 80 and 443) might hang like this because our firewall blocks the connection.

Here’s a few examples:

  • Some payment gateways, or XML API services, though it’s very uncommon these days as standard ports are used far more frequently now
  • Older twitter widgets have been known to retrieve tweets using back-end code and then failures to connect to the Twitter servers cause hang-ups like this. This does not apply to Twitter’s own feed embeds that operate via Javascript asynchronously.

If you’re using WordPress or any other CMS with plugins, be sure to try disabling any plugin that might communicate externally to see if it solves the problem. If you can narrow down which part of the software could be doing this, then we can work on a solution together, like using an alternate plugin, or perhaps opening a port in the firewall if that’s what’s required.

If you’re not able to determine which plugin might be at fault intuitively, then your remaining option is exhaustive troubleshooting. We have a guide to exhaustive troubleshooting with WordPress here.

If you’re quite certain that your site does not make requests to external systems to retrieve data of any kind, your issue is probably one of the remaining options below.

Cause: Too many PHP processes needed to serve requests

If your site requires a lot of PHP processes due to a large amount of traffic creating dynamic processing requests, then your primary goal should be reducing the amount of dynamic processing that needs to occur on each page load. Here’s how:

Cause: PHP Processes running too long

You’ll know if this is the issue if pages on the site take longer than ~10s to even begin loading, with your browser showing nothing until that 10+ second period has passed.

With this issue you’ve got some code running on the site that is not optimized for performance. An example of this would be an infinite loop in the code, or something that tries to pull millions of database records in one query (rather than retrieving them in batches).

Therefore the first thing to check is any code that has been recently added to the site, like plugins, a new theme, custom code, etc. You’ll probably want to get your developer involved to help narrow this down.

If you’re not able to determine the cause intuitively (ie: recent changes), then your remaining option is exhaustive troubleshooting. We have a guide to exhaustive troubleshooting with WordPress here.

Cause: Limited Buffers

Nginx is configured to not provide enough wiggle room to PHP, such as with too small buffers. Here’s how to configure nginx’s buffers if you have your own server. If you’re on a shared server, we already provide large enough buffers for all scenarios.

Cause: PHP-FPM Misconfigured or Needs Tuning (VPS/Dedicated Server Only)

Sometimes if you’re getting constant Gateway Timeout errors, and the only thing that clears them up is restarting PHP (as described above), or if you’re on a VPS and you see your PHP processes staying running at high CPU and lasting far longer than they should (or you’ve been told that’s what is happening by server techs), you will need to take some steps to rein them in.

Note: if you’re on our shared hosting, our PHP-FPM configuration is already optimized for a shared hosting environment and so if you’re experiencing gateway errors, this solution will not apply.

Here are some options that have worked for us:

  1. PHP-FPM allows you to tune the PHP Process Manager (FastCGI Process Manager = FPM) by making some changes in Plesk. If you’re using the “ondemand” style PHP-FPM process, you may want to reduce the number of requests each process handles before gracefully exiting / restarting a new process. This is labelled as pm.max_requests. Try 100 or 150 if you’re having performance issues
  2. If your PHP processes aren’t dying off, you may need to *make* them die. There’s a couple good ways to do this. First is to definitely set an idle timeout for the PHP-FPM processes by setting pm.process_idle_timeout – try setting it to 10s (you can do this in the field below the FPM settings.
  3. If your PHP processes still aren’t dying off, you may need to get even more aggressive with them. Try setting request_terminate_timeout to a few seconds higher than your max_execution_time setting. If max_execution_time doesn’t kill off the process, request_terminate_timeout certainly will.

If you found this guide helpful, check out the other guides and posts available. If you’re in need of a high-performance Canadian web host or VPS hosting partner, check out our services!

About Jordan Schelew

Jordan has been working with computers, security, and network systems since the 90s and is a managing partner at Websavers Inc. As a founder of the company, he's been in the web tech space for over 15 years.

2 Comments

  1. Qasim Abbas on April 14, 2017 at 5:04 am

    @Jordan, had the same issue in my previous website but it is fine now.

Leave a Comment