Bitcoin Forum
May 02, 2024, 12:38:15 AM *
News: Latest Bitcoin Core release: 27.0 [Torrent]
 
   Home   Help Search Login Register More  
Pages: [1]
  Print  
Author Topic: what is best algorithm similiar Bloom filter  (Read 111 times)
ecdsa123 (OP)
Full Member
***
Offline Offline

Activity: 211
Merit: 105

Dr WHO on disney+


View Profile
January 07, 2023, 12:22:02 PM
Merited by LeGaulois (3), ABCbits (1)
 #1


I hope that someone can share with knowledge.

I'm using optimised my BSGS.

iceland use bloomfilter for his bsgs:
about keyhunt we can implement cuckoo filter too.

Bloom filters: O(k) for adding elements to the set and O(k) for testing membership in the set, where k is the number of hash functions used

A cuckoo filter is a probabilistic data structure that is used for membership testing. It is similar to a Bloom filter, but it offers improved space efficiency and the ability to delete elements from the filter.

Like a Bloom filter, a cuckoo filter uses a hash function to map elements to positions in the filter, and it stores a small value at each position to indicate the presence of an element. However, instead of storing a single value at each position, a cuckoo filter stores two values, or "buckets," at each position. This allows it to store more information in the same amount of space, which makes it more space-efficient than a Bloom filter.

In addition to membership testing, cuckoo filters can also support adding and deleting elements from the filter. When an element is added to a cuckoo filter, it is hashed to one of the two buckets at a particular position in the filter, and the value in the bucket is set to indicate the presence of the element. If the chosen bucket is already occupied, the element is "kicked out" of the filter and rehashed to a new position, until it finds an empty bucket. Deletion works in a similar way, by setting the value in the occupied bucket to indicate that the element is no longer present.

Cuckoo filters have a time complexity of O(1) for membership tests and a space complexity of O(n), where n is the number of elements stored in the filter. They are a good choice for applications that need fast membership testing and the ability to add and delete elements, but have limited space available for storing the filter.

Code:
import hashlib

class CuckooFilter:
    def __init__(self, capacity, false_positive_rate):
        # Calculate the size of the filter and the number of hash functions needed
        # based on the desired capacity and false positive rate
        self.size = int(-1 * capacity * math.log(false_positive_rate) / (math.log(2) ** 2))
        self.num_hashes = int(round(math.log(2) * self.size / capacity))
        self.buckets = [[None, None] for _ in range(self.size)]
        #print(self.size,self.num_hashes)
    def add(self, element):
        # Hash the element to one of the two buckets at each position
        for i in range(self.num_hashes):
            h = self._hash(element, i)
            if self.buckets[h][0] is None:
                self.buckets[h][0] = element
                return
            elif self.buckets[h][1] is None:
                self.buckets[h][1] = element
                return
            # If both buckets are occupied, "kick out" one of the elements
            # and rehash it to a new position
            else:
                e = self.buckets[h][i % 2]
                self.buckets[h][i % 2] = element
                element = e
        # If the element could not be added to the filter, raise an exception
        raise Exception("Filter is full")
        
    def contains(self, element):
        # Check if the element is present in either of the two buckets at each position
        for i in range(self.num_hashes):
            h = self._hash(element, i)
            if self.buckets[h][0] == element or self.buckets[h][1] == element:
                return True
        return False
    
    def delete(self, element):
        # Set the value in the occupied bucket to None to indicate that the element is deleted
        for i in range(self.num_hashes):
            h = self._hash(element, i)
            if self.buckets[h][0] == element:
                self.buckets[h][0] = None
                return
            elif self.buckets[h][1] == element:
                self.buckets[h][1] = None
                return
        
    def _hash(self, element, i):
        # Hash the element to a position in the filter using the ith hash function
        m = hashlib.sha1()
        m.update(element.encode())
        m.update(str(i).encode())
        return int(m.hexdigest(), 16) % self.size

 

# Create a cuckoo filter with a capacity of 100 elements and a false positive rate of 0.1
filter = CuckooFilter(capacity=100, false_positive_rate=0.1)

# Add some elements to the filter
filter.add("apple")
filter.add("banana")
filter.add("cherry")

# Test for membership
print("is apple in filter",filter.contains("apple"))  # prints True
print("is pear in filter",filter.contains("pear"))   # prints False

# Delete an element from the filter
filter.delete("cherry")

# Test for membership again
print("is cherry in filter after delete",filter.contains("cherry"))  # prints False
    


Both Cuckoo filters and Bloom filters have a time complexity of O(1) for most operations, which means that their running time is independent of the input size. This makes them both very fast data structures.

However, there are some differences between the two data structures in terms of time complexity:

    Insertion: The time complexity of inserting an item into a Cuckoo filter is O(1) on average, but it can take O(n) time in the worst case if the filter becomes full and needs to be rebalanced. In contrast, the time complexity of inserting an item into a Bloom filter is always O(1).

    Lookup: The time complexity of looking up an item in a Cuckoo filter is also O(1). The time complexity of looking up an item in a Bloom filter is also O(1), but it is slightly slower than a Cuckoo filter because it requires more hash function evaluations.

    Deletion: Cuckoo filters support deletion of items, while Bloom filters do not. The time complexity of deleting an item from a Cuckoo filter is O(1) on average, but it can take O(n) time in the worst case if the filter needs to be rebalanced.

Overall, both Cuckoo filters and Bloom filters are very fast data structures with a time complexity of O(1) for most operations. However, Cuckoo filters are slightly faster than Bloom filters for lookups, and they also support deletion of items, which Bloom filters do not.


what do you think? what is your opinion?

Donate: bc1q0sezldfgm7rf2r78p5scasrrcfkpzxnrfcvdc6

Subscribe : http://www.youtube.com/@Ecdsa_Solutions
1714610295
Hero Member
*
Offline Offline

Posts: 1714610295

View Profile Personal Message (Offline)

Ignore
1714610295
Reply with quote  #2

1714610295
Report to moderator
1714610295
Hero Member
*
Offline Offline

Posts: 1714610295

View Profile Personal Message (Offline)

Ignore
1714610295
Reply with quote  #2

1714610295
Report to moderator
1714610295
Hero Member
*
Offline Offline

Posts: 1714610295

View Profile Personal Message (Offline)

Ignore
1714610295
Reply with quote  #2

1714610295
Report to moderator
Even in the event that an attacker gains more than 50% of the network's computational power, only transactions sent by the attacker could be reversed or double-spent. The network would not be destroyed.
Advertised sites are not endorsed by the Bitcoin Forum. They may be unsafe, untrustworthy, or illegal in your jurisdiction.
gmaxwell
Moderator
Legendary
*
expert
Offline Offline

Activity: 4158
Merit: 8382



View Profile WWW
January 10, 2023, 03:54:03 AM
Merited by ABCbits (4), LeGaulois (3)
 #2

When the true positive rate is very low, bloom filters can be much faster than expected because most queries will be rejected by the first couple bits read. In that case a blocked bloom filter can easily outperform a cuckoo filter since a negative result always requires two random memory lookups for the cuckoo filter.  In exchange for memory usage Bloom filters can also be made faster for mostly negative query loads by running them under-full so the probability of any bit being falsely set is extremely low.

the fine details of the implementation can easily matter more than the overall data structure.   
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2006-2009, Simple Machines Valid XHTML 1.0! Valid CSS!