Google Crawler Documentation Updated with Additional IP Addresses

Google Crawler Documentation Updated with Additional IP Addresses

Discover the latest update in Google's crawler documentation featuring a new set of IP addresses for uncontrolled Google crawlers. Stay informed on the changes and enhancements made to optimize web crawling processes.

Google Updates IP Addresses for Bots

Google recently made changes to their Googlebot and crawler documentation by adding a new range of IP addresses for bots that are activated by users of Google products. This update is significant for publishers who rely on whitelisting Google-controlled IP addresses to manage their website traffic. The new list of IP addresses will help publishers in blocking scrapers that utilize Google's cloud services and other crawlers that are not directly linked to Google.

Google says that the list contains IP ranges that have long been in use, so they’re not new IP address ranges.

There are two kinds of IP address ranges:

IP ranges that are initiated by users but controlled by Google and resolve to a Google.com hostname.

These are tools like Google Site Verifier and presumably the Rich Results Tester Tool.

Users initiate IP ranges that are not under Google's control and point to a gae.googleusercontent.com hostname. These could be apps hosted on Google cloud or apps scripts accessed from Google Sheets.

The lists for each category have been updated.

Previously, the list for Google IP addresses was special-crawlers.json (which resolves to gae.googleusercontent.com)

The "special crawlers" list includes crawlers that are not managed by Google.

IPs found in the user-triggered-fetchers.json object are linked to gae.googleusercontent.com hostnames. These IPs are utilized when a website hosted on Google Cloud (GCP) needs to fetch external RSS feeds at the user's request.

The new list that corresponds to Google controlled crawlers is: 

user-triggered-fetchers-google.json

When a user triggers a fetch, certain tools and product functions come into play. An example of this is Google Site Verifier, which responds to user requests. Since the fetch was initiated by a user, these fetchers do not adhere to robots.txt rules.

Fetchers that are under the control of Google come from IPs listed in the user-triggered-fetchers-google.json object and lead back to a google.com hostname.

The list of IPs from Google Cloud and App crawlers that Google doesn’t control can be found here:

https://developers.google.com/static/search/apis/ipranges/user-triggered-**fetchers**.json

The list of IP from Google that are triggered by users and controlled by Google is here:

https://developers.google.com/static/search/apis/ipranges/user-triggered-**fetchers**-google.json

New Section Of Content

There is a new section of content that explains what the new list is about.

Google Fetchers

Fetchers controlled by Google come from IPs listed in the user-triggered-fetchers-google.json object and are associated with a google.com hostname. On the other hand, IPs in the user-triggered-fetchers.json object are linked to gae.googleusercontent.com hostnames. These IPs are utilized when a website hosted on Google Cloud (GCP) needs to fetch external RSS feeds at the user's request. The URLs for these fetchers can be in the format of ---.gae.googleusercontent.com or google-proxy----.google.com as specified in the user-triggered-fetchers.json and user-triggered-fetchers-google.json files.

Google’s changelog explained the changes like this:

“Exporting an additional range of Google fetcher IP addresses

An extra list of IP addresses for fetchers controlled by Google products has been included. This list, named user-triggered-fetchers-google.json, consists of IP ranges that have been utilized for a significant period of time. The reason for this addition is due to the technical capability to export these IP ranges.

Read the updated documentation:

Verifying Googlebot and other Google crawlers

Read the old documentation:

Archive.org – Verifying Googlebot and other Google crawlers

Featured Image by Shutterstock/JHVEPhoto

Editor's P/S:

Google's recent update to their IP address ranges for bots is significant for publishers. By adding a new range of IP addresses that are activated by users of Google products, Google has made it easier for publishers to block scrapers and other unwanted crawlers. This is especially important for publishers who rely on whitelisting Google-controlled IP addresses to manage their website traffic.

The new list of IP addresses is divided into two categories: IP ranges that are initiated by users but controlled by Google, and IP ranges that are not under Google's control. The first category includes tools like Google Site Verifier and the Rich Results Tester Tool. The second category includes apps hosted on Google cloud or apps scripts accessed from Google Sheets.