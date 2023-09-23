« previous post | next post »

The Large Language Model DistilBert is "a distilled version of BERT: smaller, faster, cheaper and lighter".

A trained DistilBert model is available from Hugging Face, and recommended applications include "text classification", with the featured application being "sentiment analysis":

And as with many similar applications, it's been noted that this version of "sentiment analysis" has picked up lots of (sometimes unexpected?) biases from its training material, like strong preferences among types of ethnic food.

This led Zack Ives to create an example for his AI Course, based on the widely-report gen-Z shift in iPhone-vs-Android attitudes:

Since Hugging Face's DistilBert model page includes an interactive sentiment-analysis web app, you can try it out without bothering to go through the (rather simple) process of installing and running the system as a program. The results look slightly different from what Zack found, for various reasons, but the biases are still the same:

The ethnic foods biases come out less biased than in the previously-linked slide, presumably because of changes in the model since 2017 when the slide was created:

But there are still some very strong (and controversial) food-sentiment biases:

And also these:

This "DistilBERT base uncased finetuned SST-2 model", downloaded 3,727,871 times last month, seems to date from 2022:

@misc {hf_canonical_model_maintainers_2022, author = { {HF Canonical Model Maintainers} }, title = { distilbert-base-uncased-finetuned-sst-2-english (Revision bfdd146) }, year = 2022, url = { https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english }, doi = { 10.57967/hf/0181 }, publisher = { Hugging Face } }

But the sentiment-bias problem doesn't seem to have improved much since Robyn Speer wrote "How to make a racist AI without really trying" in July of 2017.

See also "Stochastic parrots", 6/10/2021.

[Note that "sentiment analysis" is supposed to tell us about the emotional valence intended by the writer of a given analyzed text, not a summary of popular opinion overall. The examples above don't seem to give a good estimate of either definition of "sentiment", but they're particularly unlikely to tell us anything worthwhile about the attitudes of the author of such texts.]

