SEO Crawler

Bare-bones Basic SEO Crawler using Python Scrapy

Using Scrapy, get the main SEO elements for exploratory analysis of a website. It works by supplying a list of known URLs to crawl and return structured results.

The main elements include:

url: the actual URL
slug: the URI part of the URL
directories: splits the URI by slashes to return the different folders (directories) in each URI
title: the <title> tag
h1, h2, h3, h4: header tags
description: the meta description
link_urls: not activated, needs special configuration to make sure you are getting links to certain sites
link_text: depends on the above, extracts the anchor text of each link
link_count: number of links on page (based on your criteria)
load_time: page load time in seconds
status_code: response code of page 200, 301, 404, etc.

Many other elements should be added to the list but they differ from site to site, some examples:

publishing date
product price
content category
tags of an article
whether or not a certain keyword is in a certain location
type of content (inferred from a URL directory, or from certain content on page)
etc.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
seo_crawler		seo_crawler
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SEO Crawler

About

Releases

Packages

Languages

butchewing/seo_crawler

Folders and files

Latest commit

History

Repository files navigation

SEO Crawler

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages