SEO Topic Cluster Creation

This is my process for creating content clusters, topical maps—whatever you want to call them. Ultimately, this is how I plan content.

Check out more of my work at CashCowLabs.io

The Spark That Ignited the Journey

A few months back, I found myself wrestling with the complexities of SEO, particularly with the concept of topical authority. As search engines evolved, it became clear that organizing content around coherent topics was becoming crucial. I needed a reliable system for creating topical maps or clustering keywords—a pressing need in my day-to-day work.

I pondered over various approaches, diving deep into algorithms and existing tools. But nothing seemed to fit just right. Then, a thought struck me: "The answer is always on the first page of Google." I can't recall whether it was Kyle Roof or Matt Diggity who said that, but it resonated with me.

I realized that Google itself was already grouping keywords for us. If a single page ranks for multiple keywords, then those keywords are related in Google's eyes. So, why not use Google's own SERPs to inform our keyword clustering?

Using Google's Wisdom at play

With this hypothesis, I set out to build a tool that could use Google's understanding of keyword relationships. The idea was simple:

Gather a comprehensive list of keywords around a primary topic. (Using various tools like Ahrefs/Semrush - standard procedure)
Scrape the SERPs for each keyword to find the top-ranking URLs.
Group keywords by overlapping URLs, effectively letting Google show us which keywords belong together.

So the Process

Gathering Keywords

I started by casting a wide net. Let's say the primary topic is coffee. I used keyword research tools [Ahrefs] to pull in as many related keywords as possible—digging into subtopics like brewing methods, bean types, roasting techniques, and more.

To keep things manageable, I applied a few filters:

Monthly Search Volume > 0
Language: English
Long-tail Keyowords

I wasn't too picky at this stage. The goal was to collect a broad dataset to work with.

Scraping the SERPs (But Be Careful!)

Next came the scraping part. I needed to retrieve the top search results for each keyword. While the noble way is to use APIs like the Google Custom Search API, it can get pricey and comes with limitations.

Instead, I opted for Serper.dev's API, which offers a more affordable solution—around $1 per 1,000 keywords, compared to other tools that charge upwards of $5 for the same amount. [You also get free 2500 scraping with multi-batch processing, so it was super awesome]

Note: Scraping Google's search results without permission is against their TOS.

Clustering Keywords by URL Overlaps

With the SERP data in hand, I began grouping keywords based on overlapping URLs. If two keywords had several common top-ranking URLs, it indicated that Google considered them closely related.

I utilized an agglomerative clustering algorithm for this. It's a hierarchical method that starts by treating each keyword as its own cluster and then merges them based on similarity—in this case, the number of overlapping URLs.

Adding Intent Classification

To enrich the clusters further, I incorporated intent classification using Sentence Transformers. This step involved analyzing the titles of the top-ranking pages to determine whether the user's intent was informational, commercial, or navigational.

While this added depth to the clusters, it also increased processing time. The Sentence Transformer model is powerful but resource-intensive. I found it worth the wait, though, as understanding intent is crucial for crafting content that truly meets user needs.

Overcoming Challenges Along the Way

Balancing Depth and Efficiency

One of the biggest hurdles was balancing the thoroughness of the clustering with the time and resources it required. Running intent classification on large datasets can be time-consuming.

To mitigate this, I:

Optimized the code for better performance.
Allowed users to adjust parameters like the minimum number of overlapping URLs required for clustering.
Made the intent classification optional, so users could choose based on their priorities.

Navigating the Ethical Landscape

Scraping data always comes with ethical considerations. I was cautious to use APIs that respect the terms of service and privacy policies. While tools exist to scrape data directly from SERPs, I do not recommend or endorse unauthorized scraping.

Building the Tool: A Technical Overview

While I won't dive deep into the code here, I used a combination of powerful Python libraries:

Streamlit for the user interface, making the tool interactive and accessible.
SQLite for efficient data storage and retrieval.
NetworkX for building and analyzing graphs based on URL overlaps.
Transformers library for intent classification with models like facebook/bart-large-mnli.

How You Can Use It

I've made the tool available on GitHub for anyone interested.

Requirements

Python 3.7 or higher
API key from Serper.dev

Installation

Clone the Repository

git clone https://github.com/yourusername/seo-keyword-clustering-tool.git
cd seo-keyword-clustering-tool

Install the Dependencies
```
pip install -r requirements.txt
```

Running the Tool

streamlit run app.py

Steps to Follow

Create or Select a Project
- Launch the app and either select an existing project or create a new one.
Upload Keywords
- Upload a CSV file containing your keywords and their search volumes.
- Make sure the file has keyword and volume columns.
Set Parameters and Scrape SERPs
- Enter your Serper.dev API key.
- Set the minimum number of overlapping URLs for clustering.
- Click on "Save Data and Start Scraping" to begin.
Clustering
- Navigate to the Clustering page.
- Adjust the overlap threshold if needed.
- Start the clustering process.
- Optionally, download the clusters as a CSV file.
Intent Classification
- On the Results page, you can start the intent classification.
- Be patient; this process can take some time due to the Sentence Transformer.
Explore and Export Results
- View your clusters along with their dominant intent.
- Download the final results as a CSV file.

Creating this tool was both challenging and rewarding. It allowed me to deepen my understanding of SEO and data analysis. More importantly, it gave me a practical solution that saves time and enhances the quality of my team's content planning.

Why This Matters

At CashCowLabs.io, my goal is to drive rapid SEO growth for clients. This tool helps streamlining the content planning process for my team and helping create more effective content clusters for my clients.

By leveraging Google's own data and sophisticated algorithms, we can stay ahead in the ever-evolving SEO landscape.

Feel free to reach out if you have any projects, questions or insights. Happy clustering bois!

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.devcontainer		.devcontainer
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
test - Sheet1.csv		test - Sheet1.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SEO Topic Cluster Creation

The Spark That Ignited the Journey

Using Google's Wisdom at play

So the Process

Gathering Keywords

Scraping the SERPs (But Be Careful!)

Clustering Keywords by URL Overlaps

Adding Intent Classification

Overcoming Challenges Along the Way

Balancing Depth and Efficiency

Navigating the Ethical Landscape

Building the Tool: A Technical Overview

How You Can Use It

Requirements

Installation

Running the Tool

Steps to Follow

Why This Matters

About

Releases

Packages

Languages

License

agniiva/seo-keyword-clustering

Folders and files

Latest commit

History

Repository files navigation

SEO Topic Cluster Creation

The Spark That Ignited the Journey

Using Google's Wisdom at play

So the Process

Gathering Keywords

Scraping the SERPs (But Be Careful!)

Clustering Keywords by URL Overlaps

Adding Intent Classification

Overcoming Challenges Along the Way

Balancing Depth and Efficiency

Navigating the Ethical Landscape

Building the Tool: A Technical Overview

How You Can Use It

Requirements

Installation

Running the Tool

Steps to Follow

Why This Matters

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages