Data scraping used to be seen as a complex task that required technical expertise. Many people, including myself, found the idea of extracting data from webpages using code or scripts to be daunting.
However, data scraping plays a crucial role in various SEO tasks such as auditing, competitor analysis, and evaluating website and data structure.
Google sheets provides easy-to-use solutions, one of which is the IMPORTXML function. This function enables users to extract data from webpages with minimal parameters. It opens up data scraping to more people, including those with no programming background.
The real game-changer happened when generative AI was incorporated into the mix, taking this function to a whole new level.
In this guide, we will demonstrate how you can utilize Google Sheets and AI, specifically ChatGPT, to conduct web scraping tasks without requiring extensive coding knowledge.
The Tools: AI And Chatbots
We are now all familiar with AI, ChatGPT, and similar chatbots.
Many of us rely on tools like ChatGPT to create our own code, scripts, and programs even if we have little to no programming experience.
By giving specific prompts and collaborating with the chatbot, we are able to develop tools that we once thought were too complex for us to handle.
But most importantly, these tools are revolutionizing how we tackle our daily tasks.
For instance, if we inquire ChatGPT about the IMPORTXML function and its application in Google Sheets for extracting the title of an HTML webpage, it provides a precise answer. Within seconds, we receive the code needed to implement the formula in Google Sheets.
But to be honest, that was a very basic and simple task that we could have easily completed without ChatGPT.
The Task
So, how does it work if we need to extract data that is not as standard as a page title or description?
For instance, how do we extract the following data from the PPC front page of Search Engine Journal?
Sure, ChatGPT can assist with listing all featured articles, their authors, the link URLs, and the article descriptions for the columns listed on https://www.searchenginejournal.com/category/paid-media/pay-per-click/.
Using ChatGPT Effectively
When working with ChatGPT, it may require a few tries to craft prompts that are clear and specific enough for the chatbot to grasp the task at hand and deliver accurate responses.
In many cases, it felt like the AI was under pressure to return quick results despite their accuracy.
But let me explain.
The task was pretty straightforward - I had to look through the page and make a list of all the featured articles. This meant noting down the names of the authors, the URLs of the articles, and a short description for each of the 30 articles that were on the page. Once I had all the information, I put it into a table and then saved it as a CSV file. Easy peasy!
ChatGPT initially provided only seven article samples with titles and URLs. After adjusting the prompt, it successfully displayed and exported all 30 articles and their corresponding links.
It was a positive outcome. To finish the task, we simply had to include the authors' names and article descriptions.
However, the bot encountered a stumbling block as it struggled to accurately describe each article, even though we had given it examples of the specific page elements to locate and duplicate.
Despite our repeated attempts to guide ChatGPT by providing clear instructions, it continued to disregard our input and offer its own versions of the article descriptions.
ChatGPT even failed when we tried with a different approach and downloaded and uploaded a copy of the page HTML.
ChatGPT extract
Screenshot from ChatGPT, February 2024
This time, it was able to provide accurate data for seven articles but couldn’t go past that. The issue reported:
Extracting data from this page poses a challenge due to its structure and content, making it difficult to do so in one session. The page is extensive and complex, and extracting all 30 articles in its current format is not practical.
ChatGPT extracting from 30 articles
Screenshot from ChatGPT, February 2024
ChatGPT + Google Sheets
So, going back to IMPORTXML and Google Sheets.
This time, getting ChatGPT to provide the formulas for each field was like a breeze.
ChatGPT extracting instructions
Screenshot from ChatGPT, February 2024
Here are some of the formulas, as suggested by the chatbot, that you can easily try yourself in Google Sheets to extract:
Title
=IMPORTXML("https://www.searchenginejournal.com/category/paid-media/pay-per-click/", "//*[@id='archives-wrapper']/article/div/div[2]/h2/a")
Author Name
=IMPORTXML("https://www.searchenginejournal.com/category/paid-media/pay-per-click/", "//*[@id='archives-wrapper']/article/div/div[2]/p[1]/a")
URL Link
=IMPORTXML("https://www.searchenginejournal.com/category/paid-media/pay-per-click/", "//*[@id='archives-wrapper']/article/div/div[2]/h2/a/@href")
Description
=IMPORTXML("https://www.searchenginejournal.com/category/paid-media/pay-per-click/", "//*[@id='archives-wrapper']/article/div/div[2]/p[2]")
In no time, we were able to extract the data into the spreadsheet.
Google Sheets
Screenshot from Google Sheets, February 2024
Additionally, by using simply built nested formulas, we can quickly pull the data from multiple pages at the same time.
In this example, I managed to gather information such as the title, author, URL link, and description for every article on the first 10 pages of the PPC section.
As a result, I was able to scrape a total of 300 articles in under a minute!
Google Sheets extract results
Screenshot from Google Sheets, February 2024
Comparing The Two
So, how do ChatGPT vs. ChatGPT + Google Sheets IMPORTXML compare?
In my experience, I struggled to find a simple and fast method to scrape the data I needed using ChatGPT. This doesn't mean it's impossible, as there might be various ways to achieve this, but I personally didn't come across any.
What ended up working for me was using a mix of different tools, which proved to be very effective for the specific purpose I had in mind.
ChatGPT was really helpful for creating the IMPORTXML formulas that I needed for Google Sheets. The formulas took care of the rest.
Another great thing about using ChatGPT with Google Sheets is that you can simply use the free 3.5 version of ChatGPT to build your IMPORTXML formulas. You don't need version 4 to scan the page and extract the data.
Key Takeaway
This highlights a critical aspect of how AI has transformed how we think and work.
The key to success is not just relying on one tool like AI, Google Sheets, or any software by itself. It is actually the combination of various tools and skills that leads to the best results.
By integrating different tools and skills, we can create workflows that are both efficient and effective. This ultimately helps us boost our productivity and achieve better outcomes.
More resources:
Try These Tools & Methods For Exporting Google Search Results To Excel
SEO For Beginners: An Introduction To SEO Basics
PPC Trends 2024
Featured Image: Visual Generation/Shutterstock
Editor's P/S:
The article discusses the use of generative AI like ChatGPT to enhance data scraping tasks using Google Sheets' IMPORTXML function. While ChatGPT initially struggled to provide accurate descriptions of extracted articles, it excelled at generating formulas for Google Sheets, enabling efficient data extraction. By combining both tools, it became possible to extract large amounts of data quickly and accurately, with ChatGPT handling formula creation and Google Sheets performing the actual extraction. This exemplifies the value of combining different tools and skills for optimal results. competitor analyses, and website evaluations.
Moreover, the article emphasizes the importance of combining tools and skills to maximize productivity and achieve optimal results. It encourages readers to explore the potential of AI-assisted web scraping and embrace the versatility of combining different technologies for efficient workflows. This approach enables users to overcome limitations and explore innovative solutions tailored to their specific needs.