WordPress Media Cleanup Sitemap Validation – A Practical Recovery Walkthrough

This document describes a practical maintenance workflow performed on a WordPress + WooCommerce website after replacing legacy .jpg images with modern .webp images. The goal was to remove broken media references, eliminate server warnings, and confirm that search engines and sitemaps continued working correctly.

The process below explains what happened, why it happened, and how the situation was resolved.


1. Initial Situation: Server Logs Showing Errors

Server logs started showing repeated messages such as:

File not found [/home/.../public_html/403.shtml]

Additionally, PHP warnings appeared from the LiteSpeed Cache plugin:

getimagesize(): HTTP/1.1 404 Not Found
wp-content/plugins/litespeed-cache/src/media.cls.php

At the same time, firewall and access logs contained entries like:

GET /wp-content/uploads/... 404 Allowed
GET /products/... 404 Allowed
GET /.well-known/... 404 Allowed
GET /xmlrpc.php 404 Allowed

These requests came from various sources including:

  • Googlebot
  • Googlebot-Image
  • Bing crawler
  • Facebook / Meta crawler
  • generic web scanners
  • automated bots

This type of traffic is normal for publicly accessible websites.


2. Root Cause: JPG Images Were Replaced With WebP

The website migrated all product and content images from .jpg format to the modern .webp format.

After replacing the images, the original JPG files were removed from the server. However, WordPress can still retain references to deleted files inside the database.

These references may exist in:

  • post content
  • post metadata
  • theme options
  • WooCommerce product metadata
  • attachment records

If these references remain, plugins such as LiteSpeed Cache may attempt to process missing images, resulting in warnings in the server logs.


3. Advantages of Replacing JPG Images With WebP

The migration from JPG to WebP provides several important benefits for website performance and SEO.

Smaller File Size

WebP images typically reduce file size by 25–40% compared to JPG images while maintaining the same visual quality.

Faster Page Loading

Smaller image sizes reduce the amount of data transferred to visitors, which improves page load times and reduces server bandwidth usage.

Improved Core Web Vitals

Faster image loading contributes to better performance metrics such as:

  • Largest Contentful Paint (LCP)
  • First Contentful Paint (FCP)
  • Overall PageSpeed scores

Better Mobile Performance

WebP images help mobile users load pages faster, especially on slower connections.

SEO Benefits

Search engines reward faster websites with better rankings and improved crawl efficiency.

Reduced Server Load

Serving smaller images reduces disk usage and server bandwidth consumption.


4. Cleaning WordPress Database References

The first step was identifying posts still referencing deleted JPG files.

SQL Query Used


SELECT 
  ID,
  post_type,
  post_status,
  post_title
FROM wpdatabase_posts
WHERE post_content LIKE '%.jpg%'
  AND post_status IN ('publish','draft','private');

This query returned two posts that still contained JPG references. These references were removed manually.


Cleaning Post Metadata

The next step was checking the postmeta table.


SELECT 
  post_id,
  meta_key,
  meta_value
FROM wpdatabase_postmeta
WHERE meta_value LIKE '%.jpg%';

Result:

161 rows found

These rows represented leftover metadata such as:

  • featured image associations
  • gallery metadata
  • builder image references

Cleanup Query


DELETE FROM wpdatabase_postmeta
WHERE meta_value LIKE '%.jpg%';

All 161 references were removed.


Cleaning Theme and Plugin Options


SELECT 
  option_name,
  option_value
FROM wpdatabase_options
WHERE option_value LIKE '%.jpg%';

Result:

9 rows found

These were cached theme or plugin values.

Cleanup Query


DELETE FROM wpdatabase_options
WHERE option_value LIKE '%.jpg%';

5. Cleaning Orphan Media Attachments

WordPress can still contain attachment records even after the files themselves are deleted.


SELECT 
  ID,
  post_title,
  guid
FROM wpdatabase_posts
WHERE post_type = 'attachment'
  AND guid LIKE '%.jpg%';

One orphan attachment was found.

Removal


DELETE FROM wpdatabase_posts
WHERE post_type = 'attachment'
  AND guid LIKE '%.jpg%';

6. Fixing Missing Error Pages

The server configuration expected custom error pages that did not exist.

File not found /403.shtml
File not found /404.shtml

Two minimal files were created.

403.shtml




403 Forbidden


404.shtml




404 Not Found


After creating these files, the log warnings stopped.


7. Understanding the Remaining 404 Requests

Even after cleanup, logs continued to show requests returning 404.

Examples included:

  • /wp-content/uploads/…
  • /products/…
  • /detail/…
  • /xmlrpc.php
  • /.well-known/…

These requests originate from:

  • search engine crawlers
  • cached search results
  • external links
  • automated scanning bots

Returning 404 responses is correct behavior. Over time, search engines update their index and stop requesting removed URLs.


8. Verifying Sitemap Integrity

The sitemap was checked manually:

https://mypcb.in/sitemap_index.xml

The sitemap loaded correctly and listed 238 URLs.

This confirmed that the site structure remained intact.


9. Understanding the “Images” Column in the Sitemap

The sitemap interface displayed a column labeled Images with many entries showing 1.

This does not mean the sitemap exposes image URLs.

Yoast counts image associations based on metadata such as featured images. It does not verify whether the file exists.

The XML sitemap itself did not contain:

  • .jpg URLs
  • /wp-content/uploads paths
  • <image:image> blocks

This confirms that no image URLs are exposed to search engines.


10. Final State After Cleanup

Database

  • posts table: clean
  • postmeta table: clean
  • options table: clean
  • attachments table: clean

Server Configuration

  • 403.shtml present
  • 404.shtml present

Media

  • All JPG images replaced with WebP
  • smaller file sizes
  • faster loading pages

Sitemap

  • loads correctly
  • contains only valid URLs
  • no image URLs exposed

11. Key Takeaways

  1. When replacing image formats, database references must be cleaned.
  2. Orphan media attachments should be removed.
  3. Custom error pages must exist if configured.
  4. 404 responses after content removal are normal.
  5. Search engines gradually update their index.

After cleanup and WebP migration, the website operates with cleaner logs, faster loading images, and improved overall performance.