If you’ve ever done SEO research or tried to coming up with content ideas, you how useful Google Trends can be when it comes to determining what people are interested in your niche. I was making SEO trend reports at work and to say it was hell to do manually was an understatement, so I decided to automate the process and add some new cool feature while i’m at it, including a simple algorithm that calculates that actual search volume instead of the limited 0-100 scale on Google Trends. Now let’s get into this.

Enter PyTrends..

I built this program off of the PyTrends package which is basically an unofficial Google Trends API. I cover the necessary info in this article, but if you want to dig deeper into some of the other capabilities, you can find more in the docs here. All of the required imports are below.

from pytrends.request import TrendReq
from scipy.stats import linregress
import time
import openpyxl
from openpyxl import load_workbook
from datetime import date
import math
import pandas as pd
from string import ascii_uppercase

First we need to create our PyTrends object that we will use later to make the API request. In short, it sets the language and timezone parameters, along with a timeout time which will help prevent us from overdoing it will the requests and upset google.

Importing Your Keyword List

Next, I read the excel file that i’ve added to my directory that contains a the list of keywords in one column called “Terms” and an optional second column containing actual monthly search volume of said keyword in a second column called “Volume”. I got this column of monthly search volume from SemRush, but like I mentioned you definitely can just leave that column blank, you just won’t get the actual search volume. Side note: if anyone knows a free source for monthly search volume definitely let me know!

#initiate pytrends. docs: https://pypi.org/project/pytrends/
pytrends = TrendReq(hl='en-US', tz=360, timeout=(10, 25))

#Import the excel doc of keywords
terms_df = pd.read_excel('GoogleTrendTerms.xlsx', index_col=None)

#Get the terms and monthly search volumes if they exists in the directory
terms = list(terms_df['Terms'])

try:
    volume = list(terms_df['Volume'])
except:
    volume = [0 for num in range(0,len(terms))]

Google Trends API Arguments

We now need to create a function that we will take a search term as a parameter and will scrape its trend. You have several options as far as settings go:

  • cat: this is the category that your term is related to, which may be relevant for instance if you want the trend for ‘interest’ but only in the financial sense, then you can lookup the finance code and add that here. 0 as you might imagine is all themes.
  • timeframe: this is the date ranges and intervals that you want to look at. I built this program around a two year interval so you might want to use that to avoid errors. However, feel free to play around with this because you can go as far back as 2004 as well as minute by minute metrics from the past hour.
  • geo: the location of the searches. You can do it by country code and also by US states and some provinces.
  • gprop: this is the source of the search. Blank is regular search but you can segment it by ‘youtube’, ‘images’, ‘new’, or even ‘froogle’ for shopping.

Function to Scraping Google Trends

#All categories: https://github.com/pat310/google-trends-api/wiki/Google-Trends-Categories

def scrape_google(term):
    pytrends.build_payload(term, cat=0, timeframe='2018-01-01 2020-01-01', geo='US', gprop='')
    trends = pytrends.interest_over_time()
    try:
        trends = trends.drop(columns=['isPartial'])
    except:
        pass
    
    return trends

When comparing multiple trends together on Google, the 0-100 values are relative to each other. PyTends, is similar in that it can get trends for a list of up to 5 keywords. We however are more interested in the trend of the individual terms here, so this next function loops through our list of terms from our excel file earlier and runs each through our scrape_google function above and then append this trend data to a pandas dataframe called trends.

Random thoughts on API speeds

The speed of this API is about average and probably take 60 seconds to scrape the trends for 100 terms. This is slower than some API’s much much faster than other like the Youtube API which is hauntingly slow. Like several hours to scrape one channel slow…I digress

At the end of the code block we call function we just coded up and voilà! We have a dataframe of search trends.

#Loops through the list of keywords and creates pandas df out of the scrape_google function output

def get_trends(terms):
    for i in range(0,len(terms)):
        if i == 0:
            trends = scrape_google(terms[i:i+1])
        else:
            trends = pd.concat([trends, scrape_google(terms[i:i+1])], axis=1)
    return trends
    
    
#Scrape the list of keywords and 
trends = get_trends(terms)

Before we get to some calculations, let’s add a week number row that will help with this.

#Create a week column and add to DataFrame
week_number = []
i = 1
for row in range(0,len(trends)):
    week_number.append(i)
    i+=1
    
trends.insert(0, 'week number', week_number)

#make the indexes strings
trends.index = trends.index.astype(str) 

Calculations…

As I mentioned earlier, I wrote this program to deal mostly with two year periods. This is because I was interested in getting the percent change from year one to year two along with the average change over that period. The function in this next block of code calculates the average search volume for each year.

#Calc the percent change from year one to year 2
def get_year_avg(col):
    year_avgs = [sum(list(trends[col][0:52]))/12,sum(list(trends[col][52:105]))/12]
    if year_avgs[0] == 0:
        change = 100
    else:
        change = ((year_avgs[1] - year_avgs[0])/year_avgs[0])*100
    return change

Now, let’s create a data frame of various metrics that indicate the change in search volume from year one to year two. These include percent change, which is the average slope over that time (think rise over run for high school math), yearly change, which is the output of our function above, and the standard error, which basically tells us how how much variance or how reliable these calculations are.

#Create list of search volume metrics to create trend_stats df

week = list(trends['week number'])
percent_change = []
yearly_change = []
standard_error = []

for col in list(trends.columns)[1:]:
    try:
        percent_change.append(linregress(week, list(trends[col]))[0] * 100)
        standard_error.append(linregress(week, list(trends[col]))[4])
        yearly_change.append(get_year_avg(col))
    except:
        percent_change.append(0)
        standard_error.append(0)
        yearly_change.append(0)
    
#Construct trend_stats df
data = {
    "Search Term":list(trends.columns)[1:],
    "Average Change %":percent_change,
    "Yearly Change %": yearly_change,
    "Standard Error":standard_error
}

trend_stats = pd.DataFrame(data) 

Let’s make sure our data still lines up

Next, we construct a list of search volumes that we will add to our data frame of trends. The reason we aren’t just using the list of search volumes from earlier is because there is a minimum volume for a term to get metrics for on Google Trends and if the trend doesn’t exist the output just leaves out that row. So, what we are doing is making sure that the list of volumes and terms line up.

trends = trends.drop(columns=['week number'])

#Transpose the trends and make a copy. Makes calculations more intuitive
sv_trends = trends.copy().transpose()
trends = trends.transpose()


#get the list of volumes from the terms that had trends (required because if a term doesn't have a trend it is removed)
volumes_col = []
for i in range(0, len(terms)):
    if terms[i] in list(trends.index):
        volumes_col.append(volume[i])
        
        
#add the search volume column to the trends dataframe
try:
    trends.insert(0, "avg monthly volume", volumes_col)
except:
    pass

Actual Weekly Google Search Volume

This next block is my favorite in this entire program because it gives us the actual search volumes for each week of the time period just like glory days of Google Trends. The math goes something like this:

  • Divide average monthly search volume by 4 to get average weekly search volume.
  • Divide average weekly search volume by 50 to get a multiplier for that particular term.
  • Calculate the real weekly search volume for each week of the period by multiplying the 0-100 index for that week by the multiplier for that term. (notice how a 50 on the 0-100 scale is actually just the average weekly search volume)
#calculate the real weekly search volume for each week based on the google trends index and the monthly search volume
for i in range(0,len(sv_trends.index)):
    sv_trends[i:i+1] = sv_trends[i:i+1].mul((volume[i]/4)/50)

#add an empty 'Trend Lines' column to the df where we will add sparklines
sv_trends.insert(loc=0, column='Trend Lines', value=['' for i in range(sv_trends.shape[0])])

Formating Excel in Python

Python and Excel don’t always get along when you have more than 26 columns. Excel doesn’t like a simple column “28” but rather demands the language of column “AB”. We are solving this language barrier below with a function that calculates the letter combo that corresponds with a letter. This was actually kind of fun and reminded me of something you’d find on LeetCode.

#mini algorithym that calculates the letter combonation that corresponds with a column number (eg 28 == AB)  
def get_col(col_number):
    cols = ascii_uppercase
    if col_number <= 26:
        return cols[col_number-1]
    else:
        col = col_number//26
        col_rem = col_number%26
        if col_number/26 != col:
            col2 = col_number%26
            col =  cols[col-1] + cols[col2-1]
        else:
            col = cols[col-2] + cols[col_rem-1]
        
        return col

Exporting Excel in Python

In this last block we are exporting our three data frames to excel and adding additional features and formatting like sparklines, conditional formatting, and adjusting the column/row widths. There is a ton that you can do with xlsx writer so feel free to check out more in their docs.

#exports the three dataframes into excel and formats them. For more on this: https://xlsxwriter.readthedocs.io/

today = date.today()

writer = pd.ExcelWriter("GoogleTrends({}).xlsx".format(today), engine='xlsxwriter')
trends.to_excel(writer, sheet_name='Trends', index=True)
sv_trends.to_excel(writer, sheet_name='Volume Trends', index=True)
trend_stats.to_excel(writer, sheet_name='Trend Stats', index=False)


wb = writer.book
stats_sheet = writer.sheets['Trend Stats']
trends_sheet = writer.sheets['Trends']
vol_sheet = writer.sheets['Volume Trends']

#trends sheet formatting
trends_sheet.set_column(2, len(trends.columns), 10)
trends_sheet.set_column(0, 1, 25)
trends_sheet.set_column(1, 2, 18)

# add conditional formatting
for row in range(2,len(trends)+2):
    trends_sheet.conditional_format('C{}:{}{}'.format(row, get_col(len(trends.columns)+1), row), {'type': '3_color_scale'})


#volume sheet formatting    
vol_sheet.set_column(2, len(sv_trends.columns), 10)
vol_sheet.set_column(0, 1, 25)

#add sparklines that show the trend over the period
for row in range(0,len(sv_trends)):
    vol_sheet.add_sparkline('B{}'.format(row+2), {'range': 'Trends!{}{}:{}{}'.format(get_col(3),row+2, get_col(len(trends.columns)), row+2)})
    vol_sheet.set_row(row+2, 25)


for row in range(0,len(sv_trends)+2):
    vol_sheet.conditional_format('B{}:{}{}'.format(row, get_col(len(trends.columns)+1), row), {'type': '3_color_scale'})

    
#stat sheet formatting
stats_sheet.set_column('A:A', 30)
stats_sheet.set_column('E:E', 100)
stats_sheet.set_column(1, 4, 14)

#cols = ascii_uppercase
for row in range(0,len(trend_stats)):
    stats_sheet.add_sparkline('E{}'.format(row+2), {'range': 'Trends!{}{}:{}{}'.format(get_col(3),row+2, get_col(len(trends.columns)), row+2)})
    stats_sheet.set_row(row+2, 25)
    
writer.save()

You can get the complete code on my GitHub