I'm spending the week at LREC 2012 in Istanbul, and the presentation that I just listened to was Maria Eskevich, Gareth J.F. Jones, Martha Larson and Roeland Ordelman, "Creating a Data Collection for Evaluating Rich Speech Retrieval":
We describe the development of a test collection for the investigation of speech retrieval beyond identification of relevant content. This collection focuses on satisfying user information needs for queries associated with specific types of speech acts. The collection is based on an archive of the Internet video from Internet video sharing platform (blip.tv), and was provided by the MediaEval benchmarking initiative. A crowdsourcing approach was used to identify segments in the video data which contain speech acts, to create a description of the video containing the act and to generate search queries designed to refind this speech act. We describe and reflect on our experiences with crowdsourcing this test collection using the Amazon Mechanical Turk platform. We highlight the challenges of constructing this dataset, including the selection of the data source, design of the crowdsouring task and the specification of queries and relevant items.
The paper was interesting, and it's worth learning more about the MediaEval Benchmarking Initiative, and the 2011 MediaEval Rich Speech Retrieval Task in particular. But the thing that caught my attention in this case was the reference to blip.tv, since I happened to notice (following a link yesterday in a New York Times story) that blip.tv is banned in Turkey.
The wikipedia article mentions that "As of April 2011, Blip.tv was blocked by Turkey", but doesn't explain why.
Here's the message that appears in place of the banned videos:
Does anyone know the history of this ban?