Textual OOD Detection for Intent Classification in the Banking Industry

Source code of the article Textual OOD Detection for Intent Classification in the Banking Industry.

Objective

With the growing number of online banks or digitalized services of traditional banks, the need for human contact for customer support drops drastically. Deep learning and natural language processing methods allow today to answer efficiently and precisely to customers' questions (via chatbots for example) and to get closer to the way a human would answer. However, it is important that these methods do not give wrong information and do not answer questions they are not capable of answering.

The project aims to evaluate several out-of-distribution (OOD) detection methods to overcome these problems, and focus on task of intent classification in the banking domain. OOD detection refers to the ability of a model to identify input data that falls outside of the distribution of data it was trained on, and flag it as potentially unsafe or unreliable. Developing effective OOD detection techniques is critical for ensuring the safety and trustworthiness of large language models.

The methodology is based on the Todd library and ToddBenchmark framework.

Getting started

Install the required libraries using

pip install -r requirements.txt

Go to notebook

Check the performance of different OOD detectors by loading the results in backup (see the Results section)

records = {
    model.name: {
    dataset: load_records(model, dataset)
    for dataset in (
        "in_train", 
        "out_test", 
        "out_atis", 
        "out_bitext", 
        "out_clinc"
    )
    }
    for model in (Model.BERT, Model.DistilBERT)
}

Fit your own detectors as shown in the Detectors section

Results

The following plots depict the distribution of different OOD scoring methods. Both MahalanobisScorer CosineProjectionScorer have been computed using the BERT-based classifier. The plots related to CosineProjectionScorer only show the OUT-DS datasets for display purpose. This scorer is too much close to -1 for IN-DS entries, preventing from having a clear view on the distribution of other instances. The msp and energy scorers are derived from the classifier's logits.

Name		Name	Last commit message	Last commit date
Latest commit History 135 Commits
Todd		Todd
backup		backup
datasets		datasets
imgs		imgs
preprocessing		preprocessing
toddbenchmark		toddbenchmark
.gitignore		.gitignore
README.md		README.md
notebook.ipynb		notebook.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Textual OOD Detection for Intent Classification in the Banking Industry

Objective

Getting started

Results

References

Code

Datasets

Models

About

Releases

Packages

Languages

PeDiot/OOD-Detection-Intent-Classification

Folders and files

Latest commit

History

Repository files navigation

Textual OOD Detection for Intent Classification in the Banking Industry

Objective

Getting started

Results

References

Code

Datasets

Models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages