In one of my previous posts on Bloom Filters, I stated the expressions for both the False Positive rate, ie $$p \approx \left( 1 – e^{\frac{-kn}{m}} \right)^{k}$$ and the optimal number of hash functions, ie $$k = \frac{m}{n} \ln 2$$ In this post I will detail the derivations of both expressions. Derivation of the False…

# Category: Learning

## Bridging the Graph

I’ve been lucky enough to have worked on many interesting and challenging projects throughout my career, especially in the space of detecting “bad” people. As one can imagine, much focus is placed on detecting persons of interest in the government sector. After all, you can’t ensure national security, for instance, without being able to detect…

## In Bloom – Nirvana?

I recently discussed how Confidential Computing allows us to analyse and use data without actually seeing it. A key component of Confidential Computing is Privacy-Preserving Record Linkage (PPRL), and one of the most widely used techniques within PPRL is the Bloom Filter (BF), which is the focus of this post. The aim is to give an overview of…

## Matrix Calculus: The Mathematics of ‘Learning’

Over the years, I’ve worked with talented Data Scientist’s who’s backgrounds weren’t in typical quantitative disciplines, such as mathematics or statistics. I’ve had the privilege of assisting some of them to better understand the underlying mathematics behind many commonly used Machine Learning and Deep Learning algorithms. This, along with the current growth in the popularity of…