Bloom filters

« previous post | next post »

Today's xkcd:

According to Wikipedia,

A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not – in other words, a query returns either "possibly in set" or "definitely not in set". […]

This is an all-too-common situation in forensic applications, though the reason has nothing to do with the Bloom filter hash-function method. To take a simple example, suppose that a video recording shows that someone is 6'1", give or take an inch.  If a suspect is is 6'1", they're "possibly in set" — though it's not strong evidence of guilt, since there are lots of people that size. But if they're 5'4", then they're "definitely not in set", at least if the measurements are accurate.

In my opinion, a more complicated version of the same thing applies to forensic speaker identification.

The "beyond a reasonable doubt" standard of proof adds an additional asymmetry in criminal cases.





  1. bks said,

    May 21, 2024 @ 7:16 am

    Speaking of forensics, we're hoping you investigate the claim that OpenAI faked Scarlett Johansson's voice for ChatGPT:

  2. Michael W said,

    May 21, 2024 @ 8:44 am

    Looks like the feet & inches markers are transposed there. Though that raises an interesting linguistic point, maybe, since transposition of small symbols could make a lot of difference (unless that is the actual point and I missed it — the 6 inch + 1 foot person is well outside the set, and the 5 inch + 4 foot person is closer, though I would think 5"6' to be a better example if it were).
    One could also imagine phrases nearly matching but not in the same set, like "it's 40 degrees out, you should wear a warm coat" and "It's 40 degrees out! You shouldn't wear a warm coat".

  3. David Marjanović said,

    May 21, 2024 @ 5:00 pm

    Alt-text: "Sometimes, you can tell Bloom filters are the wrong tool for the job, but when they're the right one you can never be sure."

  4. Mark Liberman said,

    May 23, 2024 @ 5:08 pm

    @Michael W: "Looks like the feet & inches markers are transposed there."

    Indeed — thanks for catching the mistake, which is now fixed…

RSS feed for comments on this post