searchBot is a command line tool to save searches to e-commerce, online thrift shops or any other webpage providing searches for products, and then receive emails every time a new product matches the search.
Some e-commerce sites like ebay provides the concept of "saved searches", you can save a search and receive alerts every time a new product matches your search, but many others don't. With searchbot you can use one of the implemented webpage scrapers, register your search and receive emails every time there are new items in your search.
To configure searchBot you need to edit the settings.js file that you will find in the root of the project.
You can configure a lot of things but the only thing you really need to worry about is the nodemailer configuration. You will need to configure it with a valid email account that searchBot will use to send you emails. See https://github.com/andris9/Nodemailer for documentation about how to configure nodemailer for your email account.
You can register new searches by simply editing the searches.js config file you will find in the root of the project:
var searches = [{
name: 'left-handed guitars',
where: [
{ page: 'milanuncios', searchUrl: 'http://www.milanuncios.com/instrumentos-musicales/guitarra-zurdo.htm?desde=400&hasta=2500&dias=1' },
{ page: 'wallapop', searchUrl: 'http://es.wallapop.com/search?kws=guitarra+zurdo&lat=41.387245&lng=2.191056' }
],
notifyTo: somemail@mail.com
}];
You just need to put an object (a POJO) inside the searches array with this keys:
- name: A name for this search (this will be in the subject of the email you will receive for new ads).
- where: An array of objects declaring the target pages, with the keys:
- page: This should be the name of one of the implemented scrapers. Be sure to write it identically to the prefix of the scraper file inside scrapers directory (./lib/scrapers).
- searchUrl: The url of a valid and working search for the chosen page.
- notifyTo: The target email where searchbot will notify new ads.
Of course you can add as many searches as you want to the searches array to receive email notifications for multiple concepts.
Searchbot requires mongodb, node, npm and a job scheduler to run it periodically. Instead of installing all this in your machine you can use the searchbot docker image to install and run it in any platform in a much more easier way.
First you will need to install docker in your system: https://docs.docker.com/installation/
Be sure to check that the docker daemon is running.
Next create a settings file and a searches file as described above in "Configure searcBot" and "Register your searches".
Then you can use the command "docker run" to get the searchbot docker image and run it in a container:
docker run -d -v <PATH_TO_YOUR_SEARCHES_FILE>/searches.js:/var/searchBot/searches.js -v <PATH_TO_YOUR_SETTINGS_FILE>/settings.js:/var/searchBot/settings.js ferca/searchbot
And voila! Now you have searchBot installed and running in a isolated container! It's configured to check for new ads every day at 2:30AM.
Since your searches file and your settings file are mounted inside the docker container you can add new searches by simply editing the searches.js file in your local machine and the changes will take efect in the next searchBot execution.
With Docker you will be able to see the container logs, execute a bash interactive session inside de container, get a new version of searchBot (downloading the latest image). I encourage you to read the docker documentation and get familiar with the concepts and tools. It's an awesome tool!
Implementing a new scraper is really easy. You just need to create a new scraper class inside the "scarpers" directory.
Your class should expose a method called "extractAds". This method will receive a string with the HTML code of the configured search and should return an array of ad objects (lib/ad.js).
And that's all!
Also you can create a integration test inside the "integration_test" folder, you can use it to help you develop your scraper and after, this will ensure that your scraper keeps working and realize if the scraped page changes.