Skip to content

winb1g/ism-scores

 
 

Repository files navigation

Description

This script scrapes the ism web page to get both smi and pmi quantitative information out of their respective text based reports.

⚠️Disclaimer

Read carefully the code before you run it to make sure you understand the implications of doing so. In the case of ismworld, this may be against its Terms of service, more precisely, against:

exploit the Service or collect any data incorporated in the Service in any automated manner through the use of bots, metaspiders, crawlers or any other automated means;

This code is intended for educational purposed only and the creator is not held liable for any consequences resulting from its use

Dependencies

There should be no dependencies (external programs) to be installed. Just having python3.7+ and internet connection should be enough to install the program and run it.

Installing

You can install the necessary python packages by running

pip install -r requirements.txt

To make sure everything is installed properly, run

python ism.py --version

Running program

Once required packages have been installed, one can safely run the script by typing:

python ism.py -y --pmi march

The above command should have generated a txt file named march_pmi.txt. Keep in mind that reports are released next month, so march report will be available in mid april. Ism webpage usually stores the two most recent reports, so in the example above, if the march report is available, then the february report will be available too, but the january one will not.

The output <month>_<report>.csv will contain a summary of the report broken down by industry. Each industry will have a score based on the report analyzed. The more positive (negative) the score, the more bullish (bearish) the industry.

How scores are generated

There are some .csv files in the scoring-tables folder accompanying the main script.

  • servicesindustries.csv: The list of industries that the ism uses to generate the smi report. A change on this list will result in failing to match what the report says about an specific industry and the rate assigned to it.

  • manufacturingindustries.csv: The list of industries that ism uses to generate the pmi report. A change on this list will result in failing to match what the report says about an specific industry and the rate assigned to it.

  • servicestags.csv: Here we define the categories in smi report. In order to find which category the text is referring to, we need a tag in the html defined here and an offset. The column multiplier is the score we assign to that category, being a positive number positive sentiment and negative number negative. Usually in reports they express sentiments as backlog of orders have increased for the manufacturing industry. So we need to know if an increase in backlog of orders is positive (+1.5 multiplier for example) or negative (a -1.5 multiplier). All the categories for any given industry are averaged up taking into consideration the weight on this multiplier column.

  • manufacturingtags.csv: Same as servicestags.csv but for the pmi report.

Regression tests

In order to check that the code works as expected (especially after modifying it), we can run regression tests. The command is:

python -m unittest ism_test.TestRegression

In case we made a breaking change in the code or the ism webpage structure has changes, we should update the golden the test suite check against. To do so, just run:

python -m unittest ism_test.GenerateGoldens

Historical reports

With the help of archive.org we can have historical ISM reports. In order to store them to disk we can run the following tests

python -m unittest ism_test.GetHistory

We use proxies to avoid hitting rate limits. Please be patient as the test can take up to 5 minutes to complete.

Plot reports

Once the data is dumped into disk, we can generate a historical report by executing:

python ism.py -d historical/

After that one can open a python console and write

import matplotlib.pyplot as plt
import pandas as pd
manufacturing_df = pd.read_csv('manufacturing.csv', sep=";", index_col=0)
services_df = pd.read_csv('services.csv', sep=";", index_col=0)
manufacturing_df.cumsum().plot()
plt.show()
services_df.cumsum().plot()
plt.show()

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.2%
  • Python 1.8%