Skip to content

An implementation of "Subspace Representations for Soft Set Operations and Sentence Similarities" (NAACL 2024)

License

Notifications You must be signed in to change notification settings

yoichi1484/subspace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Subspace Representations for Soft Set Operations and Sentence Similarities

Yoichi Ishibashi, Sho Yokoi, Katsuhito Sudoh, Satoshi Nakamura: Subspace Representations for Soft Set Operations and Sentence Similarities (NAACL, 2024)

Open In Colab

About

Setup

Install the required packages.

cd subspace
pip install -r requirements.txt

Set similarity

Our subspace-based sentence (set of words) similarity can be easily computed as follows.

Usage

from subspace.tool import SubspaceBERTScore

scorer = SubspaceBERTScore(device='cpu', model_name_or_path='bert-base-uncased')

sentences_a = ["A man with a hard hat is dancing.", "A young child is riding a horse."]
sentences_b = ["A man wearing a hard hat is dancing.", "A child is riding a horse."]

scorer(sentences_a, sentences_b)

STS task

Evaluation experiments on the STS task can be conducted with SentEval. The first step is to download the evaluation data.

cd SentEval/data/downstream/
bash download_dataset.sh

The evaluation scripts and the calculation of correlation coefficients are based on the code of Gao & Yao. Here is how to run the script:

cd ../../../
bash run_sts.sh

Other set operations

Other subspace-based set operations such as union, intersection, orthogonal complement, and soft membership can be computed as follows using torch.

import torch
from subspace.operations import *

torch.manual_seed(0)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
A = torch.rand((50, 300), device=device) # 50 stacked 300-dimensional word vectors
B = torch.rand((80, 300), device=device) # 80 stacked 300-dimensional word vectors

Compute bases of the subspace

SA = subspace(A)
SA.shape # torch.Size([50, 300])

Compute bases of the orthogonal complement

A_NOT = orthogonal_complement(A)
A_NOT.shape # torch.Size([250, 300])

Compute bases of the intersection

A_AND_B = intersection(A, B)
A_AND_B.shape # torch.Size([1, 300])

Compute bases of the sum space

A_OR_B = sum_space(A, B)
A_OR_B.shape # torch.Size([130, 300])

Compute soft membership degree

v = torch.rand(300, device=device)
soft_membership(A, v) # tensor(0.89)

Note

The previous numpy-based operations have been moved to a separate folder. If you still need to use them, you can find them in the subspace/legacy_operations folder.

About

An implementation of "Subspace Representations for Soft Set Operations and Sentence Similarities" (NAACL 2024)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published