This is a Starter Kit (SK), which is designed to get you up and running quickly with a common industry pattern, and to provide information and best practices around Watson services. This application was created to demonstrate how the services can be used to detect sentiment and customer's satisfaction based on different product reviews. This demo for this SK uses reviews of electronics products on Amazon.
IMPORTANT NOTES:
-
You must sign up to use the Watson Knowledge Studio tool. A 30-day free trial is available. Go to the WKS link to learn more.
-
The application requires an AlchemyAPI key with high transaction limits. The free AlchemyAPI key that you request has a limit of 1000 transactions per day, which is insufficient for significant use of this sample application. You can upgrade to the Standard or Advanced Plan of the AlchemyAPI service to obtain a key that supports more than 1000 transactions per day. Go here.
-
The Natural Language Classifier service requires training prior to running the application. Refer to the
Training
notebook in/notebooks
.
- How this app works
- Getting Started
- Installation
- Running locally
- Running the notebooks
- Adapting/Extending the Starter Kit
- Deploying the application to Bluemix
- Best Practices
- Troubleshooting
- Reference information
This starter kit uses Jupyter Notebook, a web application that enables you to create and share documents that contain code, visualizations, and explanatory text. (Jupyter Notebook was formerly known as iPython Notebook.) Jupyter Notebook automatically executes specific sections of Python code that are embedded in a notebook, displaying the results of those commands in a highlighted section below each block code block. The Jupyter notebooks in this SK show you how to creating an entity model, classifying and clustering the data.
This SK has 2 primary Jupyter notebooks:
Training
, which shows how to take a data set, import it into Cloudant, create Ground Truth, and use WKS to create an entity model, and then train a classifier.WKS
, which runs on all of the review data after all of the models are trained and validated.
The application is written in Python. Instructions for downloading and installing it are included in the documentation.
You need the following to use this SK:
- A UNIX-based OS (or Cygwin)
- Git
- Python 2.7.x (Note: Python 3.x is not supported for this SK)
- Anaconda—installing this package also installs the Jupyter notebook package, which includes iPython (now referred to as Jupyter)
- A Bluemix account
-
Log into GitHub and fork the project repository. Clone your fork to a folder on your local system and change to that folder.
-
Create a Bluemix Account. Sign up in Bluemix, or use an existing account. Watson Beta and Experimental Services are free to use.
-
If it is not already installed on your system, download and install the Cloud Foundry CLI tool.
-
Edit the
manifest.yml
file in the folder that contains your fork and, underapplications:
, replace the value of- name:
with a unique name for your copy of the application. The name that you specify determines the application's URL, such asapplication-name.mybluemix.net
. The relevant portion of themanifest.yml
file looks like the following:declared-services: alchemy-language-service: label: alchemy plan: free natural-language-classifier-service: label: natural_language_classifier plan: standard cloudantNoSQLDB-service: label: cloudantNoSQLDB plan: Free applications: - services: - alchemy-service - natural-language-classifier-service - cloudantNoSQLDB-service - name: product-intel-demo command: python server.py path: . memory: 512M
-
Install the python dependencies by using
pip
:pip install -r requirements.txt
-
Connect to Bluemix by running the following commands in a terminal window:
cf api https://api.ng.bluemix.net cf login -u <your-Bluemix-ID> -p <your-Bluemix-password>
-
Create instances of the services that are used by the application. Create and retrieve service keys to access the Natural Language Classifier service by running the following commands:
cf create-service-key natural-language-classifier-service <your-NLC-key>
cf create-service-key natural_language_classifier <your-NLC-key>
cf service-key natural-language-classifier-service <your-NLC-key>
In this command, <your-NLC-key>
is the credentials file found on the natural-language-classifier-service
tile on your Bluemix Dashboard. Unless you have credentials for other services already defined, the default name for <your-NLC-key>
is Credentials-1
.
Note: The commands return a message that states "Attention: The plan standard of service natural_language_classifier
is not free. The instance classifier-service will incur a cost. Contact your administrator if you think this is in error." The first Natural Language Classifier instance that you create is free under the standard plan, so there is no charge if you create only a single classifier instance for use by this application.
-
Create and retrieve service keys for the Alchemy Language service. If you already have an instance of the Alchemy Language Service, you can use that instance and its API key.
cf create-service alchemy-language standard alchemy-language-service cf create-service-key alchemy-language-service <your-AlchemyAPI-key> cf service-key alchemy-language-service <your-AlchemyAPI-key>
-
Create and retrieve service keys for the Cloudant service. If you are using an existing Cloudant service, use those credentials instead.
cf create-service cloudantNoSQLDB standard cloudantNoSQLDB-service cf create-service-key cloudantNoSQLDB-service <your-Cloudant-key> cf service-key cloudantNoSQLDB-service <your-Cloudant-key>
Note: The commands return a message that warns you that the Shared plan for the Cloudant NoSQLDB service is not free.
-
A file named
.env
file is used to list the service keys for your service instances to the application. Create a.env
file in the root directory of your clone of the project repository by copying the sample.env.example
file by using the following command:cp .env.example .env
Edit the
.env
file to add values for the listed environment variables:[CLOUDANT] CLOUDANT_USERNAME= CLOUDANT_PASSWORD= CLOUDANT_URL= CLOUDANT_DB=voc_ask_db [NLC] NLC_URL=https://gateway.watsonplatform.net/natural-language-classifier/api NLC_USERNAME= NLC_PASSWORD= NLC_CLASSIFIER= [ALCHEMY] ALCHEMY_API_KEY= [WKS] WKS_MODEL_ID=
Note: You must perform the procedure in the WKS
Jupyter notebook to generate the value for the WKS_MODEL_ID
environment variable. Add the value to the .env
file after you have created the WKS model as described in Training an entity detection model.
The Jupyter notebooks show you step-by-step instructions, automatically executing specified sections of Python code. We used Jupyter notebooks because they encourage experimentation, which is an important part of developing any machine learning system.
To start the notebooks, make sure you are in the root directory of your git checkout of the SK repository, and run the command jupyter notebook
. This starts the Jupyter notebook server and opens a browser window. With the browser window open, click on notebooks, and then open the notebook labeled Training
. Follow the instructions.
The training phase is responsible for creating a customized model that detects entities related to the topic of the reviews. This model can be created by using Watson Knowledge Studio (WKS) for annotating the data (product reviews) to detect entities and their relationships.
The WKS tool exports a customized Alchemy model that is able to extract entities and relationships from unseen data. The steps to preprocess the data and create the models are detailed in the Jupyter notebooks under the notebooks
folder of this repo.
To create your WKS model and export it to your Alchemy API key, follow the instructions in the WKS
notebook.
After you have created your customized model, follow the instructions to train your classifier in the Training
notebook.
This step uses the models trained in the previous step by the WKS
and Training
notebooks and it can be achieved by following the Processing
notebook. This is optional and you should only run it if you want to deploy the application locally using the UI provided in the repo using Cloudant in the persistence layer.
To deploy the application locally, run the command
python server.py
The app will be listening on port 3000. Open a web browser and type
localhost:3000
This Starter Kit works off of product reviews data gathered from Amazon product reviews (http://jmcauley.ucsd.edu/data/amazon/). However, the concepts used here are platform independent and can be applied to use cases other than electronic products reviews. Just define your use case and make sure you train your Natural Language Classifier accordingly by using the tool provided on the service page. Additionally, you can create your own customized models for entity extraction by using Watson Knowledge Studio and Alchemy.
Push the updated application live by running the following command:
cf push
or by pressing the "Deploy to Bluemix" button below.
- When defining intents, follow naming conventions to create consistent intents.
- Use "
-
" to separate multiple levels (Example:location-weather-forecast
) - Use "
_
" to separate multiple word intents (Example:business_center
) - Provide more variations of input via examples for each intent. The more variations the better.
- Avoid overlapping intents across examples. (Example:
benefits_eligibility
andbenefits_eligibility_employee
). To avoid this, group examples into a single intent and use entities to deal with subtle variations. - Examples for intents should be representative of end user input.
- Entity and Relation Types can NOT have spaces. It is best to stick with alphanumeric characters and the underscore ("
_
") character. - At least 2 entity types and at least 2 relation types with 2 example mentions of each in the ground truth are required to perform a successful training run of machine learning annotator.
- Rule of thumb: 50 mentions for a given type (entity type, relation type) in the training data. It is recommended to have training data distributed across all possible subtypes and roles for entities to help train system better.
- When defining type system and document size, make sure that type system is not too complex and document size is not too large that human annotators won't be able to efficiently follow the guidelines. Keep the entity types to fewer than 50 and keep document size to no more than a few paragraphs.
- Name entity and relation types in a way that is not ambiguous. If any names for entity or relation types are similar, it will make it more difficult to remember when to use which type.
- For ground truth, it is recommended to use representative documents that include the entities and relations most relevant for your application. Representative really means that the mentions and relations appear in similar context (other words around them) as to what your application expects.
To troubleshoot your Bluemix application, use the logs. To see the logs, run:
cf logs <application-name> --recent
The following links provide more information about the Natural Language Classifier, Cloudant, and Alchemy Language services.
- API documentation: Get an in-depth knowledge of the Natural Language Classifier service
- API reference: SDK code examples and reference
- API Explorer: Try out the API
- Creating your own classifier: How to use the API to create and use your own classifier
- API documentation: Get an in-depth understanding of the Cloudant services
- API reference: Code examples and reference
- API documentation: Get an in-depth understanding of the AlchemyAPI services
- AlchemyData News reference: API and query gallery
This sample code is licensed under Apache 2.0. Full license text is available in LICENSE.
See CONTRIBUTING.
Find more open source projects on the IBM Github Page
This node sample web application includes code to track deployments to Bluemix and other Cloud Foundry platforms. The following information is sent to a Deployment Tracker service on each deployment:
- Application Name (
application_name
) - Space ID (
space_id
) - Application Version (
application_version
) - Application URIs (
application_uris
)
This data is collected from the VCAP_APPLICATION
environment variable in IBM Bluemix and other Cloud Foundry platforms. This data is used by IBM to track metrics around deployments of sample applications to IBM Bluemix. Only deployments of sample applications that include code to ping the Deployment Tracker service will be tracked.
Deployment tracking can be disabled by removing require('cf-deployment-tracker-client').track();
from the beginning of the server.py
file at the root of this repo.
This sample web application includes code to track deployments to Bluemix and other Cloud Foundry platforms. The following information is sent to a Deployment Tracker service on each deployment:
- Application Name (
application_name
) - Space ID (
space_id
) - Application Version (
application_version
) - Application URIs (
application_uris
)
This data is collected from the VCAP_APPLICATION
environment variable in IBM Bluemix and other Cloud Foundry platforms. This data is used by IBM to track metrics around deployments of sample applications to IBM Bluemix. Only deployments of sample applications that include code to ping the Deployment Tracker service will be tracked.