Leena Rao at TechCrunch points out a case where semantic search turned into anti-semitic search.

This morning I wrote about NetBase SolutionshealthBase, a semantic search engine that aggregates medical content from millions of authoritative health sites including WebMD, Wikipedia, and PubMed. But is it a semantic engine or an anti-semitic search engine?

Several of our readers tested out the site and found that healthBase’s semantic search engine has some major glitches (see the comments). One of the most unfortunate examples is when you type in a search for “AIDS,” one of the listed causes of the disease is “Jew.” Really.

The ridiculousness continues. When you click on Jew, you can see proper “Treatments” for Jews, “Drugs And Medications” for Jews and “Complications” for Jews. Apparently, “alcohol” and “coarse salt” are treatments to get rid of Jews, as is Dr. Pepper! Who knew?

Apparently this was not the result of amalgamating medical advice from Hamas, but rather a consequence of some artificial stupidity applied to Wikipedia, as a company representative explained:

This is an unfortunate example of homonymy, i.e. words that have different meanings.
The showcase was not configured to distinguish between the disease “AIDS” and the verb “aids” (as in aiding someone). If you click on the result “Jew” you see a sentence from a Wikipedia page about 7th Century history: “Hispano-Visigothic king Egica accuses the Jews of aiding the Muslims, and sentences all Jews to slavery. ” Although Wikipedia contains a lot of great health information it also contains non-health related information (like this one) that is hard to filter out.

And that's not the end of the fun and games:

If you look at the pros of AIDS (yes, it thinks here are pros to having AIDS), it comically lists the “Spanish Civil War.” One of the causes of hemorrhoids is “Bronco” (I don’t even want to know).

It only took a few clicks for me to get here:

Or here:


  1. Vincent said,

    September 3, 2009 @ 10:20 pm

    I've been trying to get rid of these hookers forever. Indications: one bourbon, one scotch, one beer.

    [(myl) In case the link fails:

    And let's not forget the causes:

    And the complications:

    The complications are kind of poetic: "Attack flank… Cheer on Crash … Disgust couch … Scant hint". ]

  2. joeplus said,

    September 3, 2009 @ 10:25 pm

    Did Eric Cartman design this search engine?

    Interesting to consider case-sensitivity: ever since web addresses came along it seemed anachronistic to talk about capital letters on the internet. Now that the web is better at rich information, it becomes important again.

  3. Dan Lufkin said,

    September 3, 2009 @ 10:49 pm

    I'm glad to see that the pros of linguistics outweigh the cons 4:1.

    [(myl) But those cons are fierce:


  4. Vincent said,

    September 3, 2009 @ 11:09 pm

    Cons are pretty accurate!

  5. Ken said,

    September 3, 2009 @ 11:11 pm

    I blogged it here.

    Note: even when you use it absolutely properly and as intended, it returns bizarre results. Note, for instance, when I entered "Prozac" to see the pros and cons of that drug as a treatment, one of the returns I got:

    Mr. Hankey, the Christmas Poo
    A doctor prescribes Prozac, a real life antidepressant, to Kyle for his apparent love for feces, which he describes as “fecalphilia”, a fictional condition.

    In other words, it returned a South Park episode summary.

    Here's the thing. It's supposed to be worthwhile to use HealthBase, rather than (for instance) Google, because it consults "authoritative health sources” rather than the entire internet.

    What kind of authoritative health sources carry South Park summaries?

    [(myl) I can't resist quoting your blog post:

    For example, by using HealthBase, you can determine that:

    1. “Obama” should be treated with jellyfish collagen and elections, is caused by Adolf, and has ruined NASCAR;
    2. “Jews” should be treated with Dr. Pepper, though this treatment may cause them to kill Christians and attack Jesus;
    3. The benefits of treating “crazy” are defeating evil mistresses, but the cons are becoming a horse thief;
    4. “Democracy” is caused by Vladmir Putin;
    5. My depression may be caused by prolonged IV infusion, can be treated by female salmon at the risk of economic collapse, and
    6. The benefits of treating your wife are bringing Asian blood and providing an interactive brain, but the cons are STDs and sperm. I think I knew that one already.

    Was it who prescribed the jellyfish collagen? ]

  6. dr pepper said,

    September 3, 2009 @ 11:40 pm

    Perhaps we can find the key to a couple of old chronic complaints:

  7. TB said,

    September 3, 2009 @ 11:44 pm

    I have never laughed so hard at a Language Log post/comment thread in my life. Moar!

  8. John Lawler said,

    September 3, 2009 @ 11:52 pm

    @TB: ¡Ditto!

    Who knew CL/NLP would provide so many good laughs?
    Has Comedy Central got a horse in the race?

  9. Rubrick said,

    September 4, 2009 @ 12:26 am

    You have to admit, "developing sense" is a pretty good anti-war prescription.

  10. Vincent II said,

    September 4, 2009 @ 1:44 am

    One of the causes of hemorrhoids is “Bronco” (I don’t even want to know).

    Yes, but your readers may be curious. I am old enough to remember a brand of toilet paper called Bronco, whose strong point was its hard shiny surface.

  11. Vincent II said,

    September 4, 2009 @ 2:34 am

    For reference to the above, see

  12. Chris Hunt said,

    September 4, 2009 @ 5:02 am

    Wikipedia as an "authoratative health resource"? Maybe they should have read If Surgery was like Wikipedia

  13. Kenny Easwaran said,

    September 4, 2009 @ 5:32 am

    I note that they claim to search "millions" of sources – are they counting individual pages and articles from each source I guess?

    Also, I note that "Obama" and "Jews" no longer return any results, but "Hookers" still does.

  14. Dierk said,

    September 4, 2009 @ 6:03 am

    In short: They created a syntactical search engine with a relatively large database of synonyms. Nothing semantic about it.

  15. Faldone said,

    September 4, 2009 @ 7:12 am

    One of the "pros" of linguistics is "Abnormal language development"?! Me abtrocious indignificated!!

  16. greg said,

    September 4, 2009 @ 8:24 am

    causes of 'george bush':
    Ralph Nader
    Hose paper
    Midget statement

    treatments of 'george bush':
    Template system (Template)
    American Eagle

    complications of 'george bush':
    Invade Iraq
    Authorize nsa wiretap program
    Defame website
    Financial issue
    Financial meltdown
    Kill uncle
    Lose family member
    Return haitian refugee
    Telephone number

  17. mgh said,

    September 4, 2009 @ 8:24 am

    I like Mark's coinage of "artificial stupidity" for this kind of AI
    I guess the software that runs it would be ASS

  18. greg said,

    September 4, 2009 @ 8:51 am

    To be fair, the semantic search itself is quite effective, it's that the databases it is searching aren't limited to strictly medical issues. Searching for something fairly specific like 'lymphoma' produces legitimate results.

  19. be_slayed said,

    September 4, 2009 @ 9:15 am

    Kenny Easwaran said: Also, I note that "Obama" and "Jews" no longer return any results, but "Hookers" still does.

    It looks like they've "fixed" some problems by filtering out certain words (like "Jews"). So while Jews are no longer considered a cause of AIDS, "strong magnetic fields", "Ivorian air strikes", and "explanations" are:

    [(myl) For the record:


  20. greg said,

    September 4, 2009 @ 11:04 am

    re: aids causes – not any more :(

  21. Emily said,

    September 4, 2009 @ 11:14 am

    Drugs & Medications for teabag:
    Slow-release fertilizer pellet

    Treatments for teabag:
    Teabag ex-hoofer
    Small reactor
    Band Teabag
    Buddy Lackey
    Deeper tea strainer

  22. Emily said,

    September 4, 2009 @ 11:18 am

    Also, "Drugs and medications for Alaska" include "satellite system", "Alaska Gov. Sarah Palin", and "Illicit drug".

    Under "food and plants" one finds the mysterious "High-velocity grain":
    "Three mammoth tusks found in Alaska and Siberia, which were carbon-dated to be about 34,000 years old, are pitted with slightly radioactive, iron-rich impact sites caused by high-velocity grains…"

    Which has no cons, but only one pro: "iron-rich impact site".

  23. zak said,

    September 4, 2009 @ 11:42 am

    The results for "copyright infringement" are interesting.

  24. Marcus said,

    September 4, 2009 @ 11:59 am

    Causes of pregnancy:


  25. greg said,

    September 4, 2009 @ 12:25 pm

    One of the complications for 'curiosity' is 'kill cat'

    And this one is a mixed bag:

    Pros of intercourse (20)

    HIV transmission (transmit HIV)
    Achieve ejaculation
    Achieve pregnancy
    Acquire HPV
    Adopt similar sociopolitical institution
    Assess efficacy
    Blood-to-blood contact
    Boost endorphin
    Bring bad luck
    Burn calorie

    Cons of intercourse (20)

    Worsen Neurologic symptom
    HIV infection risk
    Anogenital abrasion

  26. greg said,

    September 4, 2009 @ 12:27 pm

    Sorry, I'm done posting things. I could play with this all day.

  27. codeman38 said,

    September 4, 2009 @ 3:16 pm

    Apparently the treatments for ADHD include rats, and the causes include efficacy, ingredients, headline stories and campus. Who knew?

  28. dr pepper said,

    September 4, 2009 @ 5:45 pm

    Hmmm. I can see someone writing a post apocalyptic story in which parts of the internet are still functioning, thanks to backup servers in deep military bunkers. A few extra tough laptops with solar rechargers have survived and those who own them become the new shamans. By the third generation, they no longer know enough to determine the context of their search results, and offer only oracular responses.

    — except it looks as if we're there already.

  29. Jerry Friedman said,

    September 4, 2009 @ 6:17 pm

    @mgh: An early hit for "artificial stupidity" in an AI context is from Artificial intelligence: a combined preprint of papers presented at Artificial Intelligence Sessions of the winter general meeting, New York, N.Y., January 27-February 1, 1963, according to a Google Books search (which provided no preview). Your acronym is new to me, though, and I like it.

  30. Emily said,

    September 4, 2009 @ 6:19 pm

    Drugs and medication: gold
    Treatments: Song, BRSM, Earl of Bute, Image, Movement
    Food and plants: Sergio Leone's 1972 film Duck
    Causes: Jew(you can still access "Treatments for Jew" through this), pacifist, political adversary, political opponent
    Complications: belated attack, delay wider theatrical run, grow more burning.

    Treatments: surgery, conservative treatment, resection, radiotherapy, party, chemotherapy, exercise, conservative Judaism, election, biopsy
    Food and plants: ginger ale, ginger product
    Causes: Bloomberg, Fox News, dominate moderate wing, costume, Invasive ademocarcinoma
    Complications: activity loss, Attack Michael Steele, Damage uterine structural integrity, disappointment

    Drugs: marijuana, liberal salt diet
    Treatments: Definition, party, college, conservative, liberal education, antibiotics
    Food and plants: water intake, melted butter
    Causes: choice, 95-seat loss, America, Cyprus dispute, Vatican
    Complications: Waste resource, abortion, anger prime minister, Attack Truth About Hillary, Avoid medical therapy, be greatest enemy

  31. möngke said,

    September 4, 2009 @ 7:08 pm

    I couldn't resist.

    Pros of Arab:

    Bring Islam
    Fastest growing economy
    Provide detail
    Achieve knowledge
    Achieve lasting peace
    Allergic disease
    Bring art

    Cons of Arab:

    Arab attack
    Invade Sicily
    Lose home
    Kill Jew
    Lose mean
    Attack Jew
    Diabetes Mellitus
    Invade Nubia

    and via an on-site suggestion, treatments for Arab Attack:
    Haganah convoy (pretty accurate IMO)
    Citroen 2CV
    Deadly force

    … priceless.

  32. Nick Lamb said,

    September 4, 2009 @ 9:28 pm

    They explicitly list Wikipedia as an "authoritative health source" even though Wikipedia makes clear that isn't and doesn't want to be authoritative and that you should consult primary sources.

    That's a step beyond mere incompetence, anyone thinking of dealing with Netbase needs to keep in mind that not only does their technology output useless gibberish, but the company itself is happy to make false claims about other people's information.

    [(myl) But it could be, and is, worse — HealthBase also features information from, which tells us that vaccines cause autism, that cow's milk is a deadly poison, that herbal remedies will cure breast cancer by cleansing the lymphatic system, that homeopathy is an effective treatment for malaria and AIDS, while HIV is not the cause of AIDS, etc., etc. ]

  33. Kenny Easwaran said,

    September 5, 2009 @ 2:11 am

    On a hunch, I tried "homosexuality" and it's been blocked too. But "heterosexuality" gives some interesting results.

    Nineteen eighty four
    Qualitative research design


    HIV Infection

    Through the "treatment" link I managed to get Pros and Cons of Homosexuality:

    Advance political agendum [sic]
    Control overpopulation
    Describe adult sexuality
    Encompass entire social identity
    Examine four-dimensional concept
    Invisible status
    Offer hope
    Present reality
    Shape social psychological adjustment

    HIV infection
    Lower reproductive success
    Morally wrong
    Affect Communist Party

    The Pros and Cons of Heterosexuality aren't as interesting (and are a proper subset of earlier lists).

  34. Emily said,

    September 5, 2009 @ 11:07 am

    Treatments[for heterosexuality]:
    Nineteen eighty four

    Puts a whole new spin on "He loved Big Brother".

  35. Kenny Easwaran said,

    September 5, 2009 @ 12:39 pm

    I should also mention that the Pro for "language log" is "good article" while the Con is "sustain consistent criticism". Sounds like you're doing a good job!

  36. Jan said,

    September 9, 2009 @ 2:26 am

    After doing some searches, it seems that they have removed Wikipedia as a resource. That was probably the best thing to do. In any case, results for 'aids' and 'depression' look a lot saner, now.

  37. Dennis Brennan said,

    September 9, 2009 @ 5:46 pm

    Cures for "Catholicism" includd "stem cell research".

  38. John Rehling said,

    September 10, 2009 @ 9:41 am

    A launch like this creates bad buzz for the entire field of NLP, but all of these problems are fixable, and a stronger showing is possible if they're put into place:

  39. Dan Parvaz said,

    September 12, 2009 @ 8:48 pm

    Note that the system was smart enough not to recommend Dr. Brown.

