I hope that someone can share with knowledge.
I'm using optimised my BSGS.
iceland use bloomfilter for his bsgs:
about keyhunt we can implement
cuckoo filter too.
Bloom filters: O(k) for adding elements to the set and
O(k) for testing membership in the set, where k is the number of hash functions used
A
cuckoo filter is a probabilistic data structure that is used for membership testing. It is similar to a Bloom filter, but it offers improved space efficiency and the ability to delete elements from the filter.
Like a Bloom filter, a cuckoo filter uses a hash function to map elements to positions in the filter, and it stores a small value at each position to indicate the presence of an element. However, instead of storing a single value at each position, a cuckoo filter stores two values, or "buckets," at each position. This allows it to store more information in the same amount of space, which makes it more space-efficient than a Bloom filter.
In addition to membership testing, cuckoo filters can also support adding and deleting elements from the filter. When an element is added to a cuckoo filter, it is hashed to one of the two buckets at a particular position in the filter, and the value in the bucket is set to indicate the presence of the element. If the chosen bucket is already occupied, the element is "kicked out" of the filter and rehashed to a new position, until it finds an empty bucket. Deletion works in a similar way, by setting the value in the occupied bucket to indicate that the element is no longer present.
Cuckoo filters have a time complexity of O(1) for membership tests and a space complexity of O(n), where n is the number of elements stored in the filter. They are a good choice for applications that need fast membership testing and the ability to add and delete elements, but have limited space available for storing the filter.
import hashlib
class CuckooFilter:
def __init__(self, capacity, false_positive_rate):
# Calculate the size of the filter and the number of hash functions needed
# based on the desired capacity and false positive rate
self.size = int(-1 * capacity * math.log(false_positive_rate) / (math.log(2) ** 2))
self.num_hashes = int(round(math.log(2) * self.size / capacity))
self.buckets = [[None, None] for _ in range(self.size)]
#print(self.size,self.num_hashes)
def add(self, element):
# Hash the element to one of the two buckets at each position
for i in range(self.num_hashes):
h = self._hash(element, i)
if self.buckets[h][0] is None:
self.buckets[h][0] = element
return
elif self.buckets[h][1] is None:
self.buckets[h][1] = element
return
# If both buckets are occupied, "kick out" one of the elements
# and rehash it to a new position
else:
e = self.buckets[h][i % 2]
self.buckets[h][i % 2] = element
element = e
# If the element could not be added to the filter, raise an exception
raise Exception("Filter is full")
def contains(self, element):
# Check if the element is present in either of the two buckets at each position
for i in range(self.num_hashes):
h = self._hash(element, i)
if self.buckets[h][0] == element or self.buckets[h][1] == element:
return True
return False
def delete(self, element):
# Set the value in the occupied bucket to None to indicate that the element is deleted
for i in range(self.num_hashes):
h = self._hash(element, i)
if self.buckets[h][0] == element:
self.buckets[h][0] = None
return
elif self.buckets[h][1] == element:
self.buckets[h][1] = None
return
def _hash(self, element, i):
# Hash the element to a position in the filter using the ith hash function
m = hashlib.sha1()
m.update(element.encode())
m.update(str(i).encode())
return int(m.hexdigest(), 16) % self.size
# Create a cuckoo filter with a capacity of 100 elements and a false positive rate of 0.1
filter = CuckooFilter(capacity=100, false_positive_rate=0.1)
# Add some elements to the filter
filter.add("apple")
filter.add("banana")
filter.add("cherry")
# Test for membership
print("is apple in filter",filter.contains("apple")) # prints True
print("is pear in filter",filter.contains("pear")) # prints False
# Delete an element from the filter
filter.delete("cherry")
# Test for membership again
print("is cherry in filter after delete",filter.contains("cherry")) # prints False
Both Cuckoo filters and Bloom filters have a time complexity of O(1) for most operations, which means that their running time is independent of the input size. This makes them both very fast data structures.
However, there are some differences between the two data structures in terms of time complexity:
Insertion: The time complexity of inserting an item into a Cuckoo filter is O(1) on average, but it can take O(n) time in the worst case if the filter becomes full and needs to be rebalanced. In contrast, the time complexity of inserting an item into a Bloom filter is always O(1).
Lookup: The time complexity of looking up an item in a Cuckoo filter is also O(1). The time complexity of looking up an item in a Bloom filter is also O(1), but it is slightly slower than a Cuckoo filter because it requires more hash function evaluations.
Deletion: Cuckoo filters support deletion of items, while Bloom filters do not. The time complexity of deleting an item from a Cuckoo filter is O(1) on average, but it can take O(n) time in the worst case if the filter needs to be rebalanced.
Overall, both Cuckoo filters and Bloom filters are very fast data structures with a time complexity of O(1) for most operations. However, Cuckoo filters are slightly faster than Bloom filters for lookups, and they also support deletion of items, which Bloom filters do not.
what do you think? what is your opinion?