Google Introduces Two Innovative Web Scrapers

Google Introduces Two Innovative Web Scrapers

Discover Google's latest unveiling of two advanced web scrapers designed specifically for extracting images and videos to enhance research and development endeavors.

Google recently introduced two new crawlers designed specifically for gathering image and video content for "research and development" purposes. While it is not explicitly stated in the documentation, it is believed that blocking these new crawlers will not affect a website's ranking.

It is important to clarify that the data collected by these crawlers is not intended for AI training purposes. Google has a separate crawler, known as the Google-Extended crawler, for that specific purpose.

****GoogleOther Crawlers

Two new versions of Google's ****GoogleOther crawler were introduced in April 2023. The original ****GoogleOther crawler was initially used by Google product teams for research and development purposes, specifically for one-off crawls. This gives us an idea of how the new ****GoogleOther variants will be utilized.

****GoogleOther is a generic crawler that different product teams can use to fetch publicly available content from websites. It can be used for tasks like one-time crawls for internal research and development.

Two ****GoogleOther Variants

There are two new ****GoogleOther crawlers:

****GoogleOther-Image

****GoogleOther-Video

The new variants are designed for crawling binary data, which refers to data that is not in text format. Text files, ASCII, or Unicode files are usually considered HTML data. If the data can be displayed in a text file, then it falls under the category of text file/ASCII/Unicode file. On the other hand, binary files are files that cannot be opened in a text viewer app, such as images, audio, and videos.

The new Google variants are specifically tailored for image and video content. Google provides user agent tokens for both of these new crawlers, which can be utilized in a robots.txt file to block the new crawlers.

1. ****GoogleOther-Image

User agent tokens:

****GoogleOther-Image

****GoogleOther

Full user agent string:

****GoogleOther-Image/1.0

2. ****GoogleOther-Video

User agent tokens:

****GoogleOther-Video

****GoogleOther

Full user agent string:

****GoogleOther-Video/1.0

Updated ****GoogleOther User Agent Strings

Google has recently updated the user agent strings for the regular ****GoogleOther crawler. You can still block this crawler by using the same user agent token as before (****GoogleOther). User Agent Strings are simply data sent to servers to identify the full description of the crawlers, including the technology used. In this case, the technology used is Chrome, with the model number being periodically updated to reflect the version of Chrome being used (W.X.Y.Z is a placeholder for the Chrome version number).

The complete collection of ****GoogleOther user agent strings includes: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; ****GoogleOther)

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ****GoogleOther) Chrome/W.X.Y.Z Safari/537.36

****GoogleOther Family Of Bots

These new bots may appear in your server logs occasionally. This information can assist in recognizing them as authentic Google crawlers. It will also benefit publishers who wish to avoid having their images and videos scraped for research and development purposes.

Check out the latest Google crawler documentation

****GoogleOther-Image

****GoogleOther-Video

Featured Image by Shutterstock/ColorMaker

Editor's P/S:

The introduction of ****GoogleOther-Image and ****GoogleOther-Video crawlers is a notable development in the world of SEO and web analytics. These crawlers are designed to collect image and video content specifically, and they are distinct from Google's existing crawlers. While blocking these new crawlers is unlikely to affect a website's ranking, it can prevent Google from accessing and using image and video content for research and development purposes. Publishers who are concerned about the use of their images and videos for such purposes may consider blocking these crawlers.

Google's regular ****GoogleOther crawler has also received updates to its user agent strings. These changes are primarily related to the version of Chrome being used by the crawler. Publishers can still block this crawler using the ****GoogleOther user agent token. The article provides a comprehensive overview of the new Google crawlers and their potential implications for website owners and publishers. It is important for website owners to stay informed about the different types of crawlers that access their websites and to make informed decisions about how to handle them.