Help & Support Tutorials

Smartcrawl – Sitemaps

Smartcrawl – Sitemaps

3.8 Sitemaps

Automatically generate detailed sitemaps to tell search engines what content you want them to crawl and index.

Google XML Sitemaps vs SmartCrawl Sitemaps

Curious to see how SmartCrawl’s Sitemap feature stacks up against the competition? Check out our blog post on Google XML Sitemaps vs SmartCrawl Sitemaps.

3.8.1 General Sitemap

Here you can see basic information and any issues SmartCrawl found during Sitemap scan. You can also turn off the automated sitemap feature, run a new crawl or view your sitemap (by clicking on the link). Your sitemap is located in public_html/wp-content/uploads or public_html/wp-content/uploads/sites/number for each subsite separately on a Multisite.

If you would prefer to customize the native WordPress core sitemap, you can do so by clicking the Switch button. Note that if you switch to WordPress core sitemap, SmartCrawl sitemap will be disabled. It is also worth keeping in mind that using the SmartCrawl map comes with many advantages, such as:

  • It is cached for performance.
  • It includes images from post content.
  • The ability to add styling to the sitemap.
  • The possibility of automatically updating the sitemap.
  • The option to auto-notify search engines.

switch to wp core button

You will be asked to confirm the switch. Click Switch to continue with the switch to WordPress core sitemap or click Cancel to exit without switching.

confirm switch to wp core

You can switch back to SmartCrawl sitemap at any point by clicking the Switch button again.

switch to smartcrawl sitemap button

Below, you can set up what you would like your sitemap to include (everything is enabled by default) and you can even enter extra URLs, in the Inclusions field, to manually add ones that are not located in your sitemap.

include groups to sitemap or individual urls

Pro Tip

If you disable all content types here and don’t manually include any URLs, your sitemap will effectively be empty and will simply return a 404 page not found error. See Additional Troubleshooting Options (Sitemap) for details.

You can also explicitly exclude URLs or post IDs from your Sitemap in the Exclusions section.

Exclude URLs or post IDs from SmartCrawl sitemap

Don’t forget to click Save Settings once you are done 🙂

3.8.2 News Sitemap

If your site is publishing newsworthy content, you may want to enable and generate a Google News Sitemap to ensure that your news articles and posts published in the last 48 hours show up in Google News.

Why use a News Sitemap?

For more insight into News Sitemaps and how they can help boost your content, see how to Make Headlines on Google News with SmartCrawl’s Free Google News Sitemap on the blog.

To enable the Google News Sitemap, click Enable.

Enable Google news sitemap

Enter your Google News publication name, matching the name as it appears on news.google.com.

Set Google news publication name

Select the Post Types that should be included in your Google News Sitemap. To include specific items or groups, expand the post type first.

Select post types to include in Google news sitemap

After creating your News Sitemap, submit it through Google’s Search Console to tell Google where to find it on your site.

Automatic NewsArticle Schema

Any posts that you add to your News Sitemap that do not already have any specific schema types mapped to them will be automatically switched to the NewsArticle schema type, with the corresponding schema added to the post.

However, if you have already assigned a schema type to certain posts that you include in your News Sitemap, or have excluded certain categories from the News Sitemap, those posts will not be switched to the NewsArticle type.

NOTE

Your News Sitemap will only include content published in the last 48 hours. If no content has been published within the last 48 hours, then any request for your sitemap will return an HTTP 404 error.

3.8.3 Multilingual Sitemaps with WPML

The WPML plugin offers different formatting options for your language URLs. These options are:

  • Different languages in directories
  • The language name as a parameter
  • A different domain per language

When using the different languages in directories or the language name added as a parameter language URL option, SmartCrawl creates a single sitemap that includes your default and secondary languages site pages.

On the other hand, when using the different domain per language option, SmartCrawl creates a sitemap for each language (domain) you have on your site. You can view the secondary languages’ sitemaps by going to languagedomain.com/sitemap.xml or languagedomain.com/?wds_sitemap=1&wds_sitemap_type=index if you are using the Plain permalink structure on your site.

3.8.4 Crawler

The Crawler will detect issues and URLs that are not on your sitemap. It will go through your site, starting at the homepage, and follow any and all links it can find, up to a maximum of 500 distinct URLs. Note that this is simply a reporting feature, it does not affect sitemap creation in any way.

Click New Crawl in the top right-hand corner to start a new sitemap crawl.

start new crawl

A progress bar will be visible during the crawl to keep you in the loop of the duration.


Once the crawl has finished, it will display any issues and URLs that are not on your sitemap. We recommend fixing them to ensure you aren’t penalized by search engines – but if you want to ignore any of the warnings you can.

Url-crawler-missing-from-sitemap

You can Add or Ignore presented URLs here or bulk Ignore/Add them all by pressing the corresponding buttons.

Note that once a scan has completed, whether initiated from either SmartCrawl or the site’s SEO tab in The Hub, you must wait at least 1 hour before running a new scan.

3.8.5 Reporting (Sitemap)

Enable the Run regular URL crawls option to have SmartCrawl automatically crawl your URLs daily, weekly or monthly and send an email report to your inbox. You can add as many email recipients as you need by clicking the Add Recipient button and filling in the popup form for each one.

sitemap-reporting

3.8.6 Settings (Sitemap)

Sitemap Structure

Your sitemap will be automatically split into multiple sitemaps; one per post type. Each one will contain up to 2000 URLs, but you can change the maximum number of items that should be included in each one here.

Set the number of links per post-type sitemap

If the total number of URLs of any post type exceeds the number you set here, additional sitemap pages will be created and numbered automatically, following this format:
https://example.com/post-sitemap1.xml
https://example.com/post-sitemap2.xml

Include images

Here you can include images within your sitemap. For this to properly function make sure to add titles and captions that clearly describe your images.

Include image items in SmartCrawl sitemap

Note that plugin memory consumption will considerably increase if you enable this. How much depends on how much image content you have, as well as your server configuration & capabilities.

Auto-Notify Search Engines

Auto Notify does exactly that – auto notifies search engines (specifically Google and Bing) that your sitemap has changed. You can choose between the following two modes:

  • Default – You don’t need to do anything; search engines will automatically be notified when your sitemap changes.
  • Manual – This allows you to manually trigger notifying search engines. Click Notify Search Engines whenever you want them to be notified that your sitemap has changed.

auto-notify search engines

Style Sitemap

Enabling the Style sitemap option will make your sitemap easier to read (for human eyes).

Add stylesheet to SmartCrawl sitemap

For example, this is a sitemap before enabling it (you can access your sitemap by going to yourdomain.com/sitemap.xml or http://yourdomain.com/?wds_sitemap=1&wds_sitemap_type=index if you are using the Plain permalink structure on your site)

SmartCrawl sitemap without stylesheet

And the same sitemap after:

Automatic Sitemap Updates

You can choose whether you want to automatically update your sitemap when you publish new pages, posts, post types, or taxonomies. The three available modes are:

  • Default – You don’t need to do anything; your sitemap will automatically be updated.
  • Manual – This allows you to manually trigger a sitemap update. Click Update Sitemap whenever you want to update your sitemap after publishing new pages, posts, post types, or taxonomies.
  • Scheduled – This allows you to schedule your sitemap updates. Sitemap updates can be scheduled on an hourly, daily, or weekly basis. For daily updates, you can set the time of the day and for weekly updates, you can set the day of the week at which the update should be triggered.

Troubleshoot Sitemap

This tool automatically detects and resolves common sitemap problems. It also provides suggestions to fix issues caused by sitemap conflicts that you need to fix manually. Click the Troubleshoot button to open the Troubleshoot Sitemap modal window.

Troubleshoot Sitemap button

Click the Start button.

Troubleshoot sitemap Start button

A progress bar will be visible during the scan for sitemap issues. This process usually takes a few seconds.

When the discovered issues are automatically fixed, or no issues were detected after scanning the sitemaps for conflicts and issues, a green checkmark will appear at the top of the modal window.

Green checkmark appears when issues are resolved automatically or no issues found

A red cross-mark will be displayed on the top of the modal window if any issue is detected. You will see one of the issues below that requires fixing manually:

    • Plugin Conflict – This issue indicates that you are using another plugin that generates sitemaps, causing conflicts with SmartCrawl sitemaps. Click the Go To The Plugins Screen button to deactivate the plugin causing the issue.

Troubleshoot sitemap plugin conflict

    • File Conflict – This error is displayed when you have a physical file on your server conflicting with SmartCrawl sitemaps. To fix the issue, delete the file causing the error.

Troubleshoot sitemap file conflict

    • Permalink Problem – This error shows when pretty permalinks don’t work for your sitemaps. The troubleshooter first tries to fetch the sitemap through the pretty permalink. If it does not get valid XML, it then checks the plain version of the sitemap URL. If that works, it assumes that pretty permalinks are not working. You need to manually include some rewrite rules to your server’s configuration files, or exclude the sitemap URL from caching to fix this issue. For instructions, see the Additional Troubleshooting Options (Sitemap) section.

Troubleshoot sitemap permalink problem

    • Incorrect Permalink Settings – This error indicates that you are using a permalink structure that is preventing SmartCrawl sitemaps from working properly. To fix this issue, click on the Go To Permalink Settings to change the permalink structure.

Troubleshoot sitemap incorrect permalink settings

Once you have resolved any of the sitemap issues and conflicts that may occur, click the Check Again button to ensure that the issue is resolved and there are no other pending issues.

3.8.7 Additional Troubleshooting Options (Sitemap)

Sitemap 404

No Content

If your site only has a homepage with no other content, the sitemap.xml will return a 404 error as it is effectively empty.

To resolve that, simply add a / character in the Inclusions box and the sitemap should load as expected.

If your site does have content other than just the homepage, check to ensure that at least one content type is active in the Include section.

NGINX

If your site is running on a NGINX server and your sitemap URLs are returning 404, try adding the following rewrite rules to your nginx.conf file:

rewrite ^/sitemap.xml$ /index.php?wds_sitemap=1&wds_sitemap_type=index last;
rewrite ^/([^/]+?)-sitemap([0-9]+)?.xml$ /index.php?wds_sitemap=1&wds_sitemap_type=$1&wds_sitemap_page=$2 last;

Remember to resave permalinks after making this change.

Sitemap Cached

If you are using Page Caching in Hummingbird, ensure that the following is present in the URL Strings section of the Page Caching Exclusions:

sitemap[^\/.]*\.xml

Crawler User Agent

If you ever need to allowlist the user agent used by SmartCrawl’s Crawler feature in your robots.txt file or a firewall, it is called: SmartCrawl SEO Audit 1.0

No Meaningful Results

Make sure that your site is not blocked somehow. For example, the robots.txt or a firewall could be blocking SmartCrawl.

As noted above, the Crawler user agent is: SmartCrawl SEO Audit 1.0

Also make sure WPMU DEV IP addresses are allowlisted. You’ll find our IPs at WPMU DEV IP Addresses

URLS Can’t be Found

Recrawling a site doesn’t have any effect on the sitemap itself. The crawler is an external service that just checks your sitemap by looking at it from the outside as Google would.

However, if you get a message like “xxx URLs can’t be found“, try recrawling the site as there may have been some connection hiccup the first time around.

Crawler Timeout or Crawl Stuck

This can happen when the external servers responsible for crawling your website are facing issues.

Please contact support so they can try resetting the crawler using our backend tools.

3.8.8 Crawling Your Site in Google Search Console

Getting your site or specific URLs crawled and indexed in Google Search Console is a relatively simple process, but it can take anywhere from a few days to weeks according to Google’s own help documentation.

This chapter aims to make getting that done a little easier for you.

First, get your property created and verified

The first thing you need to do is add your site as a new property if you haven’t already done that.

Log into Google Search Console and, in the Search property box, click the + Add Property option.

Add a property in Google Search Console

In the modal window that pops open, you’ll be prompted to select one of two ways to verify that you own the domain you want to use:

  • Verify the domain using a DNS record
  • Verify the domain using alternate methods like a meta tag

As you are using SmartCrawl, let’s do it the easy way and use the 2nd option.

Enter the full URL to your site’s homepage, including the https:// part, in the URL prefix box. Then click Continue.

Add a doman in Google Search Console

In the next modal window that pops open, click the HTML tag option to expand it, and click the Copy button to copy the meta tag you need to verify ownership of your site.

Copy meta tag from Google Search Console

Now, log into your site admin in a new tab, and go to SmartCrawl > Settings. Scroll down to the Search Engines section, and paste the meta tag you just copied into the Google Verification field there. Then click Save Settings.

Paste meta tag from Google Search Console

Next, back in Google Search Console, click the Verify button where you copied the HTML tag to verify your site.

Verify meta tag in Google Search Console

You should see a nice Ownership Verified success message. Click the Go To Property link and proceed to the next step.

Property verified in Google Search Console

Add your sitemap URL to get your site crawled

Click on the Sitemaps option in the left navigation.

Add a sitemap in Google Search Console

In the Add a new sitemap field, enter the slug of your sitemap’s index page, and click the Submit button.

  • If you are using SmartCrawl’s sitemap, enter sitemap.xml
  • If you had switched to use WP core sitemaps, enter wp-sitemap.xml

Add a sitemap URL in Google Search Console

Once the sitemap URL has been processed, you can click on it to view the individual sitemaps included in it.

New sitemap processed in Google Search Console

Request indexing for new or changed pages

Occasionally, there may be URLs of pages that are in your sitemap, but that are not showing up in search results. Or perhaps content has been updated, but those changes aren’t reflected in search results.

Fortunately, there’s a handy tool in Google Search Console to get that fixed.

Enter any site URL in the Inspect any URL box at the top of the screen to get a detailed report of how Google sees that URL.

If the page has already been indexed, you’ll see a green checkmark and confirmation that the URL is on Google. Expand any sections in the URL Inspection report to get as much detail as you need about how Google sees your page.

URL confirmed indexed by Google

If the URL has not been indexed, click on the Coverage section to gain insight into why it’s not.

URL confirmed not indexed by Google

If the data of an existing URL appears outdated, perhaps after you’ve edited your content, or if you need to index a missing URL, you can do that by clicking the Request Indexing link.

Request indexing of a page in Google Search Console

You’ll then see a confirmation that Indexing has been requested. Note once again that this process can take some days or weeks to complete.

Indexing of a page requested in Google Search Console

For more information about how these things work in Google Search Console, see Google’s help article here.