This is the official repository for the *Chat, a scalable conversational engine for B2B applications.
To contribute to *Chat, please send us a pull request from your fork of this repository!
Our concise contribution guideline contains the bare minumum requirements of the code contributions.
Before contributing (or opening issues), you might want send us an email at [email protected].
The easiest way is to install *chat using two docker images. You only need:
In this way, you will put all the indices in the Elasticsearch (V 5.2) image, and *chat itself in the Java (8) image.
If you do not use docker you therefore need on your machine:
Generate a packet distribution:
sbt dist
Enter the directory docker-starchat:
cd docker-starchat
Extract the packet into the docker-starchat folder:
unzip ../target/universal/starchat-master.zip
Review the configuration files starchat-master/config/application.conf
and configure
the language if needed (by default you have index_language = "english"
)
(If you are re-installing *Chat, and want to start from scratch see start from scratch.)
Start both startchat and elasticsearch:
docker-compose up -d
(Problems like elastisearch exited with code 78
? have a look at troubleshooting!)
Run from a terminal:
# create the indices in Elasticsearch
curl -v -H "Content-Type: application/json" -X POST "http://localhost:8888/index_management"
Now you have to load the configuration file for the actual chat. We have provided an example csv in English, therefore:
cd scripts/indexing/
./index_documents_dt.py ../../doc/sample_state_machine_specification.csv 1
Every time you load the configuration file you need to index the analyzer:
curl -v -H "Content-Type: application/json" -X POST "http://localhost:8888/decisiontable_analyzer"
Note: we do not support this installation.
- Clone the repository and enter the starchat directory.
- Initialize the Elasticsearch instance (see above for Docker)
- Run the service:
sbt compile run
The service binds on the port 8888 by default.
Is the service working?
curl -X GET localhost:8888 | python -mjson.tool
Get the test_state
curl -H "Content-Type: application/json" -X POST http://localhost:8888/get_next_response -d '{
"conversation_id": "1234",
"user_input": { "text": "Please send me the test state" },
"values": {
"return_value": "",
"data": {}
}
}'
You should get:
{
"action": "",
"action_input": {},
"analyzer": "and(keyword(\"test\"), or(keyword(\"send\"), keyword(\"get\")))",
"bubble": "This is the test state",
"conversation_id": "1234",
"data": {},
"failure_value": "",
"max_state_count": 0,
"score": 1.0,
"state": "test_state",
"state_data": {},
"success_value": ""
}
If you look at the "analyzer"
field, you'll see that this state is triggered when
the user types the test and either get or send. Try with "text": "Please dont send me the test state"
and *Chat will send an empty message.
With *Chat you can easily implement workflow-based chatbots. After the installation (see above) you only have to configure a conversation flow and eventually a front-end client.
In practice, *Chat:
- analyze user's query and identifies a test where such user should be sent to
- creation of dynamic content using variables inferred from the conversation (e.g. "Please write your email so that I can send you a message")
Work in progress
- Elasticsearch and the "queries" field
- The analyzer: atomic expressions and operators
*Chat was design with the following goals in mind:
- easy deployment
- horizontally scalability without any service interruption.
- modularity
- statelessness
*Chat uses Elasticsearch as NoSQL database and, as said above, NLP preprocessor, for indexing, sentence cleansing, and tokenization.
*Chat consists of two different services: the "KnowledBase" and the "DecisionTable"
For quick setup based on real Q&A logs. It stores question and answers pairs. Given a text as input it proposes the pair with the closest match on the question field. At the moment the KnowledBase supports only Analyzers implemented on Elasticsearch.
The conversational engine itself. For the usage, see below.
You configure the DecisionTable through CSV file. Please have a look at the one provided in doc/
:
state | max_state_count | analyzer | queries | bubble | action | action_input | state_data | success_value | failure_value |
---|---|---|---|---|---|---|---|---|---|
start | 0 | "How may I help you?" | |||||||
further_details_access_question | 0 | ((forgot).*(password)) | "[""cannot access account"", ""problem access account""]" | show_buttons | "{""Forgot Password"": ""forgot_password"", ""Account locked"": ""account_locked"", ""None of the above"": ""start""}" | eval(show_buttons),"""dont_understand""" | |||
forgot_password | 0 | "[""Forgot password""]" | "I will send you a new password generation link, enter your email." | input_form | "{""email"": ""email""}" | """send_password_generation_link""" | """dont_understand""" | ||
send_password_generation_link | 0 | "Sending message to %email% with instructions." | send_password_generation_link | "{ ""template"": "If you requested a password reset, follow this link: %link%"", ""email"": ""%email%"" }" | """any_further""" | call_operator |
Fields in the configuration file are of three types:
- (R): Return value: the field is returned by the API
- (T): Triggers to the state: when should we enter this state?
- (I): Internal: a field not exposed to the API
And the fields are:
- state: a unique name of the state (e.g.
forgot_password
) - max_state_count: defines how many times *Chat can repropose the state during a conversation.
- analyzer: specify an analyzer expression which triggers the state
- query (T,I): list of sentences whose meaning identify the state
- bubble (R): content, if any, to be shown to the user. It may contain variables like %email% or %link%.
- action (R): a function to be called on the client side. *Chat developer must provide types of input and output (like an abstract method), and the GUI developer is responsible for the actual implementation (e.g.
show_button
) - action_input (R): input passed to action's function (e.g., for
show_buttons
can be a list of pairs("text to be shown on button", state_to_go_when_clicked)
- state_data (R): a dictionary of strings with arbitrary data to pass along
- success_value (R): output to return in case of success
- failure_value (R): output to return in case of failure
For the CSV in the example above, the client will have to implement the following set of functions:
- show_buttons: tell the client to render a multiple choice button
- input: a key/value pair with the key indicating the text to be shown in the button, and the value indicating the state to follow e.g.: {"Forgot Password": "forgot_password", "Account locked": "account_locked", "Specify your problem": "specify_problem", "I want to call an operator": "call_operator", "None of the above": "start"}
- output: the choice related to the button clicked by the user e.g.: "account_locked"
- input_form: render an input form or collect the input following a specific format
- input: a dictionary with the list of fields and the type of fields, at least "email" must be supported: e.g.: { "email": "email" } where the key is the name and the value is the type
- output: a dictionary with the input values e.g.: { "email": "[email protected]" }
- send_password_generation_link: send an email with instructions to regenerate the password
- input: a valid email address e.g.: "[email protected]"
- output: a dictionary with the response fields e.g.: { "user_id": "123", "current_state": "forgot_password", "status": "true" }
Other application specific functions can be implemented by the client these functions must be called with the prefix "priv_" e.g. "priv_retrieve_user_transactions" ( @angleto to clarify)
Ref: sample_state_machine_specification.csv.
- The client implements the functions which appear in the action field of the spreadsheet. We will provide interfaces.
- The client call the rest API "decisiontable" endpoint communicating a state if any, the user input data and other state variables
- The client receive a response with guidance on what to return to the user and what are the possible next steps
- The client render the message to the user and eventually collect the input, then call again the system to get instructions on what to do next
- When the "decisiontable" functions does not return any result the user can call the "knowledgebase" endpoint which contains all the conversations.
*Chat consists of two different services: *Chat itself and an Elasticsearch cluster.
*Chat can scale horizontally by simple replication. Because *Chat is stateless, instances looking at the same Elasticsearch index will behave identically. New instances can then be added together with a load balancing service.
In the diagram below, a load balancer forward requests coming from the front-end to *Chat instances
1, 2 or 3. These instances, as said, behave identically because they all refer to Index 0
in the
Elasticsearch cluster.
Similarly, Elasticsearch can easily scale horizontally adding new nodes to the cluster, as explained in Elasticsearch Documentation.
Tell *Chat about the user actions (wrote something, clicked a button etc) and receives instruction about the next state.
Data to post:
{
"conversation_id": "1234",
"user_input": "(Optional)",
"text" : "the text typed by the user (Optional)",
"img": "(e.g.) image attached by the user (Optional)",
"values": "(Optional)",
"return_value": "the value either in success_value or in failure_value (Optional)",
"data": "all the variables, e.g. for the STRING TEMPLATEs (Optional)"
}
####200
Similar Json, see examples below
User input is "I forgot my password":
curl -H "Content-Type: application/json" -X POST http://localhost:8888/get_next_response -d '{
"conversation_id": "1234",
"user_input": { "text": "I forgot my password" },
"values": {
"return_value": "",
"data": {}
}
}'
returns:
{
"action": "input_form",
"action_input": {
"email": "email"
},
"bubble": "We can reset your password by sending you a message to your registered e-mail address. Please tell me your address so I may send you the new password generation link.",
"conversation_id": "1234",
"data": {},
"failure_value": "\"dont_understand\"",
"max_state_count": 0,
"analyzer": "",
"state": "forgot_password",
"state_data": {
"verification": "did you mean you forgot the password?"
},
"success_value": "\"send_password_generation_link\""
}
User inserts their email after having been in forgot_password
.
The client sends:
curl -H "Content-Type: application/json" -X POST http://localhost:8888/get_next_response -d '
{
"conversation_id": "1234",
"user_input": { "text": "" },
"values": {
"return_value": "send_password_generation_link",
"data": { "email": "[email protected]" }
}
}'
and gets:
{
"action": "send_password_generation_link",
"action_input": {
"email": "[email protected]",
"template": "somebody requested to reset your password, if you requested the password reset follow the link: %link%"
},
"bubble": "Thank you. An e-mail will be sent to this address: [email protected] with your account details and the necessary steps for you to reset your password.",
"conversation_id": "1234",
"data": {
"email": "[email protected]"
},
"failure_value": "call_operator",
"max_state_count": 0,
"analyzer": "",
"state": "send_password_generation_link",
"state_data": {},
"success_value": "\"any_further\""
}
No response was found
Internal server error
Bad request:
* meaning: the input data structure is not valid
* output data: no data returned
* meaning: bad request data, the input data is formally valid but there is some issue with data interpretation
* output data: the output data structure is a json dictionary with two fields: code and message. The following code are supported:
* code: 100
* message: "error evaluating the template strings, bad values"
* meaning: not found
* output data: no data returned
Get a document by ID
Output JSON
Sample call
# retrieve one or more entries with given ids; ids can be specified multiple times
curl -v -H "Content-Type: application/json" "http://localhost:8888/decisiontable?ids=further_details_access_question"
Sample output
{
"total": 1,
"max_score": 0,
"hits": [
{
"score": 0,
"document": {
"analyzer": "((forgot).*(password))",
"queries": [
"cannot access account",
"problem access account"
],
"state": "further_details_access_question",
"state_data": {
"verification": "did you mean you can't access to your account?"
},
"success_value": "eval(show_buttons)",
"failure_value": "\"dont_understand\"",
"bubble": "Hello and welcome to our customer service chat. Please note that while I am not a human operator, I will do my very best to assist You today. How may I help you?",
"action_input": {
"Specify your problem": "specify_problem",
"I want to call an operator": "call_operator",
"None of the above": "start",
"Forgot Password": "forgot_password",
"Account locked": "account_locked"
},
"max_state_count": 0,
"action": "show_buttons"
}
}
]
}
Output JSON
Sample call
# update the "further_details_access_question" entry in the DT
curl -v -H "Content-Type: application/json" -X PUT http://localhost:8888/decisiontable/further_details_access_question -d '{
"queries": ["cannot access account", "problem access account", "unable to access to my account"]
}'
Sample output
{
"created": false,
"dtype": "state",
"id": "further_details_access_question",
"index": "jenny-en-0",
"version": 2
}
Insert a new document.
Output JSON
Sample call
curl -v -H "Content-Type: application/json" -X POST http://localhost:8888/decisiontable -d '{
"state": "further_details_access_question",
"max_state_count": 0,
"analyzer": "",
"queries": ["cannot access account", "problem access account"],
"bubble": "What seems to be the problem exactly?",
"action": "show_buttons",
"action_input": {"Forgot Password": "forgot_password", "Account locked": "account_locked", "Payment problem": "payment_problem", "Specify your problem": "specify_problem", "I want to call an operator": "call_operator", "None of the above": "start"},
"success_value": "eval(show_buttons)",
"failure_value": "dont_understand"
}'
Sample output
{
"created": true,
"dtype": "state",
"id": "further_details_access_question",
"index": "jenny-en-0",
"version": 1
}
Delete a document by ID
Output JSON
Sample call
curl -v -H "Content-Type: application/json" -X DELETE http://localhost:8888/decisiontable/further_details_access_question
Sample output
{
"dtype": "state",
"found": true,
"id": "further_details_access_question",
"index": "jenny-en-0",
"version": 3
}
Update a document
Output JSON
Sample call
curl -v -H "Content-Type: application/json" -X POST http://localhost:8888/decisiontable_search -d '{
"queries": "cannot access my account",
"min_score": 0.1,
"boost_exact_match_factor": 2.0
}'
(WORK IN PROGRESS, PARTIALLY IMPLEMENTED)
Get and return the map of analyzer for each state
Output JSON
Sample call
curl -v -H "Content-Type: application/json" -X GET "http://localhost:8888/decisiontable_analyzer"
Sample response
{
"analyzer_map": {
"further_details_access_question": "((forgot).*(password))"
}
}
Load/reload the map of analyzer from ES
Output JSON
Sample call
curl -v -H "Content-Type: application/json" -X POST "http://localhost:8888/decisiontable_analyzer"
Sample response
{"num_of_entries":1}
Return a document by ID
Output JSON
Sample call
# retrieve one or more entries with given ids; ids can be specified multiple times
curl -v -H "Content-Type: application/json" "http://localhost:8888/knowledgebase?ids=0"
Sample response
{
"hits": [
{
"document": {
"answer": "you are welcome!",
"conversation": "832",
"doctype": "normal",
"id": "0",
"index_in_conversation": 11,
"question": "thank you",
"state": "",
"status": 0,
"topics": "",
"verified": false
},
"score": 0.0
}
],
"max_score": 0.0,
"total": 1
}
Insert a new document
Sample call
Output JSON
curl -v -H "Content-Type: application/json" -X POST http://localhost:8888/starchat-en/knowledgebase -d '{
"answer": "you are welcome!",
"conversation": "832",
"doctype": "normal",
"id": "0",
"index_in_conversation": 11,
"question": "thank you",
"state": "",
"status": 0,
"topics": "",
"verified": true
}'
Sample response
{
"hits": [
{
"document": {
"answer": "you are welcome!",
"conversation": "832",
"doctype": "normal",
"id": "0",
"index_in_conversation": 11,
"question": "thank you",
"state": "",
"status": 0,
"topics": "",
"verified": true
},
"score": 0.0
}
],
"max_score": 0.0,
"total": 1
}
Delete a document by ID
Output JSON
Sample call
curl -v -H "Content-Type: application/json" -X DELETE http://localhost:8888/knowledgebase/0
Sample output
{
"dtype": "question",
"found": false,
"id": "0",
"index": "jenny-en-0",
"version": 5
}
Update an existing document
Output JSON
Sample call
curl -v -H "Content-Type: application/json" -X PUT http://localhost:8888/starchat-en/knowledgebase/ e9d7c04d0c539415620884f8c885fef93e9fd0b49bbea23a7f2d08426e4d185119068365a0c1c4a506c5c43079e1e8da4ef7558a7f74756a8d850cb2d14e5297 -d '{
"answer": "you are welcome!",
"conversation": "832",
"doctype": "normal",
"index_in_conversation": 11,
"question": "thank yoy",
"state": "",
"status": 0,
"topics": "",
"verified": false
}'
Sample response
{
"created": false,
"dtype": "question",
"id": "e9d7c04d0c539415620884f8c885fef93e9fd0b49bbea23a7f2d08426e4d185119068365a0c1c4a506c5c43079e1e8da4ef7558a7f74756a8d850cb2d14e5297",
"index": "jenny-en-0",
"version": 3
}
Output JSON
Sample call
curl -v -H "Content-Type: application/json" -X POST http://localhost:8888/knowledgebase_search -d '{
"question": "thank you",
"verified": true,
"doctype": "normal"
}'
Sample output
{
"hits": [
{
"document": {
"answer": "you are welcome",
"conversation": "4346",
"doctype": "normal",
"id": "10",
"index_in_conversation": 6,
"question": "thank you",
"state": "",
"status": 0,
"topics": "",
"verified": true
},
"score": 3.5618982315063477
}
],
"max_score": 3.5618982315063477,
"total": 1
}
Output JSON
Sample call
curl -v -H "Content-Type: application/json" -X POST "http://localhost:8888/language_guesser" -d "
{
\"input_text\": \"good morning, may I ask you a question?\"
}
"
Sample output
{
"enhough_text" : false,
"language" : "en",
"confidence" : "MEDIUM",
"score" : 0.571426689624786
}
Output JSON
Sample call
curl -v -H "Content-Type: application/json" -X GET "http://localhost:8888/language_guesser/en"
Sample output
{"message":"updated index: jenny-en-0 dt_type_ack(true) kb_type_ack(true) kb_type_ack(true)"}
Output JSON
Sample call
curl -v -H "Content-Type: application/json" -X POST "http://localhost:8888/index_management"
Sample output
{"message":"create index: jenny-en-0 create_index_ack(true)"}
Output JSON
Sample call
curl -v -H "Content-Type: application/json" -X GET "http://localhost:8888/index_management"
Sample output
{"message":"settings index: jenny-en-0 dt_type_check(state:true) kb_type_check(question:true) term_type_name(term:true)"}
Output JSON
Sample call
curl -v -H "Content-Type: application/json" -X PUT "http://localhost:8888/index_management"
Sample output
{"message":"updated index: jenny-en-0 dt_type_ack(true) kb_type_ack(true) kb_type_ack(true)"}
Output JSON
Sample call
curl -v -H "Content-Type: application/json" -X DELETE "http://localhost:8888/language_guesser/en"
Sample output
{"message":"removed index: jenny-en-0 index_ack(true)"}
Output JSON
Sample call
curl -v -H "Content-Type: application/json" -X POST http://localhost:8888/term/index -d '{
"terms": [
{
"term": "मराठी",
"frequency": 1.0,
"vector": [1.0, 2.0, 3.0],
"synonyms":
{
"bla1": 0.1,
"bla2": 0.2
},
"antonyms":
{
"bla3": 0.1,
"bla4": 0.2
},
"tags": "tag1 tag2",
"features":
{
"NUM": "S",
"GEN": "M"
}
},
{
"term": "term2",
"frequency": 1.0,
"vector": [1.0, 2.0, 3.0],
"synonyms":
{
"bla1": 0.1,
"bla2": 0.2
},
"antonyms":
{
"bla3": 0.1,
"bla4": 0.2
},
"tags": "tag1 tag2",
"features":
{
"NUM": "P",
"GEN": "F"
}
}
]
}'
Sample output
{
"data" : [
{
"version" : 1,
"created" : true,
"dtype" : "term",
"index" : "jenny-en-0",
"id" : "मराठी"
},
{
"dtype" : "term",
"created" : true,
"version" : 1,
"id" : "term2",
"index" : "jenny-en-0"
}
]
}
Output JSON
Sample call
curl -v -H "Content-Type: application/json" -X POST http://localhost:8888/term/get -d '{
"ids": ["मराठी", "term2"]
}'
Sample output
{
"terms" : [
{
"vector" : [
1,
2,
3
],
"frequency" : 1,
"term" : "मराठी",
"antonyms" : {
"bla4" : 0.2,
"bla3" : 0.1
},
"features" : {
"NUM" : "S",
"GEN" : "M"
},
"synonyms" : {
"bla2" : 0.2,
"bla1" : 0.1
},
"tags" : "tag1 tag2"
},
{
"antonyms" : {
"bla3" : 0.1,
"bla4" : 0.2
},
"features" : {
"NUM" : "P",
"GEN" : "F"
},
"term" : "term2",
"frequency" : 1,
"vector" : [
1,
2,
3
],
"synonyms" : {
"bla1" : 0.1,
"bla2" : 0.2
},
"tags" : "tag1 tag2"
}
]
}
Output JSON
Sample call
curl -v -H "Content-Type: application/json" -X DELETE http://localhost:8888/term -d '{
"ids": ["मराठी", "term2"]
}'
Sample output
{
"data" : [
{
"dtype" : "term",
"version" : 2,
"id" : "मराठी",
"index" : "jenny-en-0",
"found" : true
},
{
"dtype" : "term",
"id" : "term2",
"version" : 2,
"found" : true,
"index" : "jenny-en-0"
}
]
}
Output JSON
Sample call
curl -v -H "Content-Type: application/json" -X PUT http://localhost:8888/term -d '{
"terms": [
{
"term": "मराठी",
"frequency": 1.0,
"vector": [1.0, 2.0, 3.0, 4.0],
"synonyms":
{
"bla1": 0.1,
"bla2": 0.2
},
"antonyms":
{
"term2": 0.1,
"bla4": 0.2
},
"tags": "tag1 tag2",
"features":
{
"FEATURE_NEW1": "V",
"GEN": "M"
}
},
{
"term": "term2",
"frequency": 1.0,
"vector": [1.0, 2.0, 3.0, 5.0],
"synonyms":
{
"bla1": 0.1,
"bla2": 0.2
},
"antonyms":
{
"bla3": 0.1,
"bla4": 0.2
},
"tags": "tag1 tag2",
"features":
{
"FEATURE_NEW1": "N",
"GEN": "F"
}
}
]
}'
Sample output
{
"data" : [
{
"version" : 2,
"id" : "मराठी",
"index" : "jenny-en-0",
"created" : false,
"dtype" : "term"
},
{
"index" : "jenny-en-0",
"id" : "term2",
"version" : 2,
"dtype" : "term",
"created" : false
}
]
}
Output JSON
Sample call
curl -v -H "Content-Type: application/json" -X GET http://localhost:8888/term/term -d '{
"term": "मराठी"
}'
Sample output
{
"hits" : {
"terms" : [
{
"vector" : [
1.2,
2.3,
3.4,
4.5
],
"antonyms" : {
"bla4" : 0.2,
"term2" : 0.1
},
"frequency" : 1,
"features" : {
"FEATURE_NEW1" : "V",
"GEN" : "M"
},
"score" : 0.6931471824646,
"tags" : "tag1 tag2",
"term" : "मराठी",
"synonyms" : {
"bla2" : 0.2,
"bla1" : 0.1
}
}
]
},
"total" : 1,
"max_score" : 0.6931471824646
}
Output JSON
Sample call
curl -v -H "Content-Type: application/json" -X GET http://localhost:8888/term/text -d 'term2 मराठी'
Sample output
{
"max_score" : 0.6931471824646,
"hits" : {
"terms" : [
{
"term" : "मराठी",
"score" : 0.6931471824646,
"tags" : "tag1 tag2",
"vector" : [
1.2,
2.3,
3.4,
4.5
],
"features" : {
"GEN" : "M",
"FEATURE_NEW1" : "V"
},
"antonyms" : {
"bla4" : 0.2,
"term2" : 0.1
},
"synonyms" : {
"bla2" : 0.2,
"bla1" : 0.1
},
"frequency" : 1
},
{
"tags" : "tag1 tag2",
"score" : 0.6931471824646,
"term" : "term2",
"features" : {
"FEATURE_NEW1" : "N",
"GEN" : "F"
},
"vector" : [
1.6,
2.7,
3.8,
5.9
],
"antonyms" : {
"bla3" : 0.1,
"bla4" : 0.2
},
"frequency" : 1,
"synonyms" : {
"bla1" : 0.1,
"bla2" : 0.2
}
}
]
},
"total" : 2
}
- Unit tests are available with
sbt test
command - A set of test script is present inside scripts/api_test
You might want to start from scratch, and delete all docker images.
If you do so (docker images
and then docker rmi -f <java/elasticsearch ids>
) remember that all data for the
Elasticsearch docker are local, and mounted only when the container is up. Therefore you need to:
cd docker-starchat
rm -rf elasticsearch/data/nodes/
If elasticsearch complain about the size of the virtual memory:
max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
elastisearch exited with code 78
run:
sysctl -w vm.max_map_count=262144