Published 2023-01-04
The search_analytics.py Python script will download Google Search Console data using a web API. The data are augmented with rows representing anonymised queries omitted by Google. Secondary search results are also identified. The augmented data are stored in a SQLite database file so you can perform offline analysis e.g. with R. You may archive data over time - Google keeps the last 16 months for you.
Note You are responsible for ensuring storing data for longer periods of time is in line with your organisations data privacy policy.
The Google search console search analytics API is a web service provided by Google. It gives the user programmatic access to their website search performance data gathered by the Google search engine.
This data includes daily impressions, clicks and search engine result position statistics. The data are aggregated by various dimensions including country, search device and search type.
The API is available as a web service from many programming languages. You can use google search console api from Python as we show below.
From Python we suggest you use the google-api-client search console client. This can be installed using pip / pip3 package installer.
An example showing how to use google search console api is provided by the script below. Other samples are provided by Google.
If you are involved in Search Engine optimisation (SEO) you most likely will have looked into the tools provided by Google to analyse the search performance of your site.
Google offers the Google Search Console web app as a way to navigate the metrics they capture for your site. You can query data for specific time periods, by url, country, device and so on.
However performing more bespoke queries and data mining of your search performance requires that you download your data. That is the process performed by the script
search_analytics.py has the following features:
Making an offline copy of the data provides the following benefits:
search_analytics.py is free to download and runs under Python 3 on Windows, MacOS or Linux. It is part of the tool chest on GitHub. You may clone the source git repository using:
You could also download a zipped snapshot of the repository. Once you have the files locally on your device you will install dependencies using:
The final stage is to setup OAuth access to the Google Search Console API as follows:
You can run the script directly from the command line. Then later you can automate your process by scheduling the script to run from Windows Scheduler, Linux Cron and MacOS launchd.
The script has the following required arguments:
A typical invocation might be:
This will create a file called search_analytics.sqlite3 containing all search data between the given dates.
The optional --engine argument, when supplied, provides the SQLAlchemy database engine to use during the download. By default this is sqlite:///./search_analytics.sqlite3. See database urls for further options you might like to supply.
The optional -v parameter enables verbose output from SQLAlchemy and is useful when debugging SQL issues.
The data downloaded by the script are held in the following database tables.
The aggregate table contains verbatim data from Google and will match values you see in the graphs shown by the Google Search Console web app.
Column Name | Description |
---|---|
search_type | image / video / web |
date | The date of the search analytic |
device | DESKTOP / MOBILE / TABLET |
country | The ISO country code of the search analytic |
clicks | The sum of clicks on breakdown rows included in the aggregate analytic |
impressions | The sum of impressions from the breakdown rows included in the aggregate analytic |
ctr | The click through rate of the breakdown rows included in the aggregate analytic |
average_position | The average SERP of the breakdown rows included in the aggregate analytic |
The breakdown table includes both verbatim data from Google Search Console and additional rows used to ensure the breakdown totals match the values in the aggregate table. In addition each row is augmented to include a flag indicating if it represents a secondary search result or not.
The additional rows can be identified by the text "*OMITTED*" in the url and query columns.
Column Name | Description |
---|---|
search_type | image / video / web |
date | The date of the query |
device | DESKTOP / MOBILE / TABLET |
country | The country code where the query originated |
url | Resultant (canonical) URL presented by Google for the query |
query | The keywords of the query entered by the Google Search user |
secondary_result | true if this query was not the first query to appear on the SERP |
clicks | The number of clicks on this query result |
impressions | The number of impressions for this query result |
ctr | The click through rate of this query result |
average_position | The average SERP of this query result |
The search_appearance table contains verbatim data from Google and will contain the values you see in the search appearance table shown by the Google Search Console web app.
Column Name | Description |
---|---|
search_type | image / video / web |
date | The date of the search analytic |
appearance | WEBLITE |
clicks | The sum of clicks on queries included in this analytic |
impressions | The sum of impressions of queries included in this analytic |
ctr | The click through rate = clicks / impressions |
average_position | The average SERP of queries included in this analytic |
Show urls that have been returned as secondary search results:
Show top 5 countries by total clicks across all devices, search types and dates:
Show all keywords, including secondary results, surfaced by Google Search users on a given day, from a desktop device looking for web results:
python
google search console