Redirect broken 404 pages with .htaccess and regular expressions

I have been unpublishing old pages on my website. To avoid 404 pages, I'd like to redirect these pages to a generic page.

So for example this page:

Should redirect to:

I'm not very skilled with .htaccess or regular expressions, I've bee trying to redirect the pages with:

RewriteRule ^artigos/(.*)$ /artigos/$1 [R=301,L]

But something isn't working, can anyone help?

1 answer

  • answered 2019-02-10 17:06 MrWhite

    To redirect requests for physical files that no longer exist you need to actually check that the file no longer exists, otherwise it will indeed "redirect everything" (as mentioned in comments).

    For example, to redirect any requests of the form /artigos/<something>, that do not map to physical files, to /artigos/ you can do the following:

    RewriteEngine On
    REwriteCond %{ENV:REDIRECT_STATUS} ^$
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteRule ^(artigos/). /$1 [R=302,L]

    The $1 backreference simply prevents you having to repeat the directory name.

    The first condition that checks against the REDIRECT_STATUS environment is intended to ensure that only direct requests are redirected. This is probably only required if you are still on Apache 2.2 (as opposed to 2.4) since mod_dir will execute first, rewriting the redirected request to index.php (if it exists) and causing a rewrite loop. On Apache 2.4, mod_dir executes later.

    Test with a 302 and only change to a 301 when you are sure it's working OK - to prevent caching issues.

    You will need to clear your browser cache before testing.

    However, a 404 would generally be a better response. The search engines will likely see the redirect to a common root as a soft-404 and users are more likely to be "confused" when they don't see the information they requested.

    RewriteRule ^artigos/(.*)$ /artigos/$1 [R=301,L]

    By itself, this would result in a redirect loop, as it simply redirects to itself.

    UPDATE: it's not a file, it is an article in a Joomla CMS

    If valid URLs do not map to physical files then you can't do this in .htaccess. In your case, a valid URL is determined by the Joomla CMS (as stored in the Joomla database). .htaccess is processed at the very start of the request, before control passes to PHP/Joomla. Directives in .htaccess can only look at the HTTP request and the physical filesystem.

    Joomla uses a front-controller pattern. All URLs, that do not map to physical files (to exclude static resources like CSS, JS and images), are internally rewritten to index.php (the "front-controller"), this effectively "routes" the URL and decides what content should be returned.

    What you are asking could only be done on a static website where URLs map to physical files on the file system.

    You need to perform this redirect in Joomla itself, when Joomla has determined that the requested URL does not exist. (This is actually more efficient anyway as you only need to execute your code after a 404 has been determined, rather than on every single request, as it would be if you used .htaccess.)