Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Really high false negative rate in ScalableCuckooFilter #50

Open
duongcongtoai opened this issue Nov 21, 2024 · 4 comments
Open

Really high false negative rate in ScalableCuckooFilter #50

duongcongtoai opened this issue Nov 21, 2024 · 4 comments

Comments

@duongcongtoai
Copy link

The feature of having a scalable cuckooFilter is really convenient but for my usecase the application cannot tolerate false negative.
To reproduce

func Test_ScalableCuckooFilter_FalseNegative(t *testing.T) {
	filter := NewScalableCuckooFilter()

	// filter := NewFilter(100000)
	exist := []string{}
	removed := []string{}
	for i := 0; i < 15000; i++ {
		id := fmt.Sprintf("%d-", i) + uuid.NewString()
		exist = append(exist, id)
		removed = append(removed, id+"to-removed")
	}
	for i := 0; i < len(exist); i++ {
		filter.Insert([]byte(exist[i]))
	}
	for i := 0; i < len(exist); i++ {
		filter.Insert([]byte(removed[i]))
	}

	for i := 0; i < len(exist); i++ {
		filter.Delete([]byte(removed[i]))
	}
	for i := 0; i < len(exist); i++ {
		filter.Insert([]byte(removed[i]))
	}
	falseNegatives := []int{}
	for i := 0; i < len(exist); i++ {
		exist := filter.Lookup([]byte(exist[i]))
		if !exist {
			falseNegatives = append(falseNegatives, i)
		}
	}
	assert.Equal(t, 0, len(falseNegatives))
}

And the result

    scalable_cuckoofilter_test.go:43: 
        	Error Trace:	scalable_cuckoofilter_test.go:43
        	Error:      	Not equal: 
        	            	expected: 0
        	            	actual  : 160
        	Test:       	Test_ScalableCuckooFilter_FalseNegative
--- FAIL: Test_ScalableCuckooFilter_FalseNegative (0.02s)
@duongcongtoai
Copy link
Author

duongcongtoai commented Nov 21, 2024

I think this happens because during Delete, it tries to remove the element from all the children, and accidentally remove the footprint of other elements in these children

	for _, filter := range sf.filters {
		if filter.Delete(data) {
			return true
		}
	}

@duongcongtoai
Copy link
Author

https://cis.temple.edu/~wu/research/publications/Publication_files/Chen_ICNP_2017.pdf

According to the section "ReliableDelete" the fingerprint computation across children filter should be the same, but this is not the case

func getIndexAndFingerprint(data []byte, bucketPow uint) (uint, fingerprint) {

Each time a new filter is created, bucketPow changes

@duongcongtoai
Copy link
Author

func configure(sfilter *ScalableCuckooFilter) {
	if sfilter.loadFactor == 0 {
		sfilter.loadFactor = DefaultLoadFactor
	}
	if sfilter.scaleFactor == nil {
		sfilter.scaleFactor = func(currentSize uint) uint {
			return currentSize * bucketSize * 2
		}
	}
	if sfilter.filters == nil {
		initFilter := NewFilter(DefaultCapacity)
		sfilter.filters = []*Filter{initFilter}
	}
}

If i make the scaleFactor to remain the same size as the previous filter, the test mentioned in this issue passes

@duongcongtoai
Copy link
Author

duongcongtoai commented Nov 24, 2024

But after adding this, the false positive rate also increases, which is also mentioned by the paper (false positive correlates with the count of the children bloom filter and fingerprint size)

I'm also opening a PR to this fork: https://github.com/panmari/cuckoofilter, which has large fingerprint size and provide more reliable FP rate.
Example is at: https://github.com/duongcongtoai/cuckoofilter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant