Morse code in straight vs. curly quotes

« previous post | next post »

Chris Smith, "Genius hid a Morse code message in song lyrics to prove Google was copying them", BGR 6/17/2019:

Did you ever notice how you tend to Google the lyrics of a song and then you don't bother clicking through to Genius's website because Google displays them right on the search results page? Well, Genius alleges that Google has been copying its lyrics for years and posting them directly on Google Search, thus preventing visitors from going to its own site. And here's the best part: Genius says it hid a Morse code message within some lyrics on its site to prove Google has been stealing them and reposting them word for word. […]

To catch Google, Genius watermarked lyrics with the help of apostrophes, alternating between straight and curly single-quote marks in exactly the same sequence for every song. When turned into dots and dashes, the apostrophes spell the words Red Handed, which is a smart trick.

See also Richard Priday & Henry T. Casey, "Google Allegedly Caught Stealing Song Lyrics … Because of Punctuation", Tom's Guide 6/17/2019.

Hard-to-see Unicode variation in things like quote curlytude are the source of infinitely annoying text-processing bugs, so it's nice to see someone getting some use out of it.

 



30 Comments

  1. Ray said,

    June 17, 2019 @ 4:38 pm

    I love this post with all my might!

    (and the word "curlytude")

  2. Chandra said,

    June 17, 2019 @ 5:04 pm

    "Curlitude", surely? (Expressing my surelitude)

    I believe e-book publishers do something similar to identify illegal pirating (in fact I may even have read about it first in a post here).

  3. Karl Weber said,

    June 17, 2019 @ 5:24 pm

    Related, of course, to the occasional practice of mapmakers inserting fictitious towns or other geographic features into their maps to trap unauthorized copiers.

    https://en.wikipedia.org/wiki/Fictitious_entry

  4. J.W. Brewer said,

    June 17, 2019 @ 5:39 pm

    My own default assumption (which I suppose I would have done more due diligence on if I were a major corporate enterprise like google …) is that most websites putting up song lyrics, including but not limited to genius, are "informal," in the sense that they don't typically or consistently get licenses from the holders of the relevant copyrights and are thus technically infringing to the extent the rightsholders cared enough to pursue the issue which for various reasons it would be unsurprising that most of them don't. And many people might agree that there is nothing particularly problematic in "stealing" from someone who has "stolen" the content in turn. But maybe Genius is more formal than I might have supposed, although I'm not sure to what extent adding coded messages via punctuation is the sort of "original" composition the copyright laws are intended to protect — indeed, if done w/o the express authorization of the holder of the copyright in the underlying lyrics it might be an additional act of infringement.

  5. Russell Borogove said,

    June 17, 2019 @ 6:22 pm

    J.W. Brewer — Genius's Wikipedia page says they got licenses from music publishing companies in 2014 after operating illegally for 5 years or so. The lyrics are crowdsourced rather than provided by the licensors, but AFAIK they are currently not infringing copyright.

    It would require significant chutzpah to charge another party with stealing one's material otherwise!

  6. There's No Graphy Like Stega-No-Graphy | Libertarian Party of Alabama Unofficial said,

    June 17, 2019 @ 7:01 pm

    […] would have worked regardless of how the curly quote substitution was arranged. Thanks to Prof. Mark Liberman (Language Log) for the […]

  7. Andrew Usher said,

    June 17, 2019 @ 9:11 pm

    The previous comment is spam.

    J.W. Brewer appeared to be saying that the coded messages themselves are what was illegally copied. Not so (or, at least, not relevantly) – they are merely evidence of the actual infringement alleged. For the code to work, it had to remain secret, which is contrary to what copyright is meant to protect.

    Google putting song lyrics directly on their results pages is problematic to start with – even if they are completely OK with copyright, they'd still be using monopoly power to harm _all_ sites with lyrics, regardless of status.

    k_over_hbarc at yahoo.com

  8. Philip Taylor said,

    June 18, 2019 @ 12:01 am

    I am not convinced by the last remark. (a) Google does not have monopoly power; it is just one of several public search engines. (b) Does obviating the need to visit a site necessarily harm that site ? I would argue not. For example, if Google puts a restaurant's menu directly on its results page, eliminating the need for a Google user to visit the restaurant's web page, this does not of itself harm the restaurant — the Google user may well still end up dining there. Whilst the analogy with sites offering music lyrics is not a perfect one, I think that it nonetheless demonstrates that just because Google puts a site's content directly on its results page, that does not, of itself, necessarily harm that site.

  9. Twill said,

    June 18, 2019 @ 3:30 am

    @Phillip Taylor Not a very helpful analogy, as a restaurant's revenue doesn't come from people visiting their website. It's more akin to Google transcribing news articles and depriving papers of ad revenue, which I seem to recall happening too.

  10. Terror Incognita said,

    June 18, 2019 @ 5:12 am

    Philip – yes, it harms them, because websites like Genius generate advertising revenue from click-throughs. If you deny them click-throughs (which Google is), you're denying them revenue.

  11. /df said,

    June 18, 2019 @ 5:34 am

    On the other hand the search engine saves the searched website the cost of serving HTTP requests from lyric-searchers who may only be interested in one line of the lyric and may have ad-blocking enabled.

    It would be shady practice to post an excerpt without attributing the source, but Google may not have known about the source because it is no longer offering a plain web search service but also a lyric website, an airfare website, etc. From TLA: "… Google partnered with LyricFind in 2016, but the company's chief executive Darryl Ballantyne told The Journal that it doesn't source its lyrics from Genius, relying on its own content team for the lyrics." So we can guess how his "content team" sources some of the lyrics! Perhaps they should have paid more attention to the source's name.

  12. Philip Taylor said,

    June 18, 2019 @ 6:56 am

    Well, is Google "denying them click-throughs" ? I see the situation as being no different to an altruist standing outside a butcher's shop offering free sausages, identical to those sold inside. The altruist is not denying customers the right to enter the shop and purchase sausages if they wish; he is simply offering them an alternative, which they are free to accept or reject as they choose. Those who wish to support Genius will continue to the web site; those who just wish to read the lyrics may well stop at Google.

    None of this is intended to defend Google if they have stolen the lyrics from Genius in the first place, but if they have obtained the lyrics legitimately (and free of licence) then I would argue that they are free to give them away if they so wish.

  13. Terror Incognita said,

    June 18, 2019 @ 8:18 am

    Philip, the whole of the article is about the fact that Google is stealing the lyrics from Genius…!

  14. Twill said,

    June 18, 2019 @ 8:22 am

    @Phillip Taylor If the articles being discussed are to be believed, Genius has substantiated with overwhelming certainty that Google reproduced their lyrics (their IP, as odd as it sounds) litteratim, and it can be reasonably inferred by Genius feeling the need to demonstrate and
    subsequently publicise this that Google was not granted permission to do so. In your analogy, the butcher has security camera footage of the supposed altruist taking his sausages from his coldroom the night before.

  15. Twill said,

    June 18, 2019 @ 8:39 am

    I would also point out that accessing the lyrics via Google isn't "freer" in any meaningful sense than via Genius— you're paying for the service (gratis in both cases) by being advertised to and having your data collected either way.

  16. J.W. Brewer said,

    June 18, 2019 @ 10:40 am

    It's only "stealing" if the copied content "belongs" to Genius in some legally or morally relevant sense. What's the basis for thinking that? Genius certainly didn't create the content and apparently relies on crowdsourcing rather than even paying employees to type it up. There's an old conflict in copyright law about what's usually called the https://en.wikipedia.org/wiki/Sweat_of_the_brow doctrine where different court systems at different points in time have taken different views, and this might be an instance of that.

    Old-economy parallel: if publisher A has put out a cheap paperback edition of a Shakespeare play or some other such public-domain work, is publisher B who wants to put out their own edition of the same underlying work required (either legally or morally) to get access to a copy of the First Folio and incur the cost of having it transcribed, or can they just copy publisher A's version — which might even include coded punctuation variation aimed at making such copying easy to prove — via OCR or what have you (assuming there's no copying of any "new" original content in publisher A's version like explanatory footnotes)? Is there an intermediate solution where it's okay for publisher B to copy directly from a published edition that's 100 years old (or 50? or 25?) but not from one that just came out last year? What assumption is reasonable to make about whether publisher A itself did or didn't engage in similar shortcuts before putting out its own edition?

  17. Narmitaj said,

    June 18, 2019 @ 12:51 pm

    Publishers usually have copyright in their layout and typesetting and so on, so if Penguin produced a nice new clear re-typeset edition of a Shakespeare play last year, I can't just scan it as images and reproduce it this year, even if I strip out all new footnote and introductory material.

    Using OCR to copy their text to create new load of text characters, which I can then lay out in a different font in a different format, might well be equally frowned on but rather harder to prove, unless Penguin had introduced the odd word of non-Shakespeare in the middle of the text. The London A-to-Z map introduced a tiny non-existent dead-end street for this purpose.

    For much older printed texts that have fallen out of copyright both in the content and the typesetting form, it is possible for someone to just scan the text and reprint it using some online self-publishing system. I am not sure where the acceptable use line falls and presumably it depends on different copyright conventions. It is a bit of an extreme case but I recently bought Mercier's Astraea's Return, or The Halcyon Days of France in the Year 2440: A Dream, which was published in French in 1771, in a new printing that fimply fcanned and copied the printed 1797 London tranflation.

  18. Chandra said,

    June 18, 2019 @ 12:59 pm

    People don't click through to places like lyrics websites because they "wish to support them". They click through because they want to find lyrics. The sites gather revenue because these lyrics happen to appear beside paid advertisements. If Google takes their content, their main (or in many cases only) draw, and displays it wholesale in the search results, people no longer have any reason to click through to see it, and the site gets no ad revenue. So to further Twill's extended analogy above, the stolen sausages are not even enticing people into the shop because the sausages are what they came for in the first place.

    So yes, Google is denying them click-throughs. Whether that's ethical or not (in light of J.W. Brewer's points, for example) is a different matter.

  19. J.W. Brewer said,

    June 18, 2019 @ 2:53 pm

    To Narmitaj's point, current UK copyright law provides special protection for a more limited period (25 years from date of publication as opposed to life-of-the-author-plus-seventy-years for most copyrightable things) for "Typographical arrangement of published editions," considered as a distinct form of copyrightable expression separable from the author's words. As of the last time I had occasion to be looking for patterns in slight-but-interesting differences among the ancestrally-related copyright statutes of the various Anglophone nations, which was probably 1992, US law provided no such protection at all. Other countries may do some third thing, and once you leave physical book publishing and go online it becomes harder to figure out which national copyright regime(s) to be guided by. But my overall point here is that you get a lot of variation in this area precisely because it is a marginal area where people turn out to have differing senses of what should or shouldn't be allowed in a way you don't get for more "core" issues of copyright, and you thus get different resolutions of those conflicts in different places due to ad hoc political/historical contingencies.

  20. Thomas Rees said,

    June 18, 2019 @ 3:56 pm

    @Narmitaj: Don't you mean "ſimply ſcanned" or did the new printing render the long esses as effs?

  21. Peter Ludemann said,

    June 18, 2019 @ 6:10 pm

    Google's statement (which says, amongst other things, that it licenses things properly):
    https://www.blog.google/products/search/how-we-help-you-find-lyrics-google-search/

    Various other sites report this, e.g.: https://www.androidpolice.com/2019/06/18/google-stole-song-texts-lyrics-from-genius-com-red-handed-apostrophe-trick/

    And for your amusement:
    https://www.jwz.org/blog/2019/06/the-only-good-use-for-smart-quotes-ever/

  22. Ray said,

    June 18, 2019 @ 6:51 pm

    google searches also display content from sites like wikipedia (not dependent on ad rev) and imdb (dependent on ad rev). so there's that.

  23. PeterL said,

    June 18, 2019 @ 8:52 pm

    Google's response (tl;dr: we license things but there seems to be an issue with one 3rd-party provider): https://www.blog.google/products/search/how-we-help-you-find-lyrics-google-search/

    And for your amusement, a rant about the "music mafia trying to quadruple-dip what they've already sold you three times": https://www.jwz.org/blog/2019/06/the-only-good-use-for-smart-quotes-ever/

  24. zafrom said,

    June 18, 2019 @ 8:57 pm

    Speaking of "straight and curly single-quote marks in exactly the same sequence for every song", Project Gutenberg may be doing something similar. Per its "General Terms of Use and Redistributing…" info, "Do not copy, display, perform, distribute or redistribute this electronic work, or any part of this electronic work, without prominently displaying the sentence set forth in paragraph 1.E.1 with active links or immediate access to the full terms of the Project Gutenberg-tm License."

    I bought an out-of-copyright paperback book last year from a major online book retailer. Wondering about the unexpected formatting of its text (weird line breaks), I found a copy of the book, with the same weird formatting, online in Project Gutenberg. No mention in the paperback of Gutenberg.

    (My bad. I learned, for $11 US, that "Publisher: CreateSpace Publishing" can mean "Hello sucker". The retailer never offered any alternative or refunded my money, never responded to the issue and, to this day, still sells such books online. As a token gesture, I stopped buying from them.)

  25. B.Ma said,

    June 19, 2019 @ 8:38 am

    @zafrom, I am not aware that Project Gutenberg prints its books and mails them to you for free, so not sure how you are a "sucker" for buying the book, even if the publisher misappropriated the efforts of PG volunteers.

  26. Andy Averill said,

    June 19, 2019 @ 3:45 pm

    Huh. I just googled several different song lyrics. Google displayed the full lyrics for each, and at the bottom it said:
    Source: lyricfind

    Clicking the link, I learned that Lyricfind is "the world's largest lyric licensing service" and that one of their clients is Google. So presumably Google IS paying, just not paying Genius.

    I'm on Safari on the ipad, if that matters.

  27. Andy Averill said,

    June 19, 2019 @ 3:51 pm

    UPDATE I should have added that at the very VERY bottom it says:

    Desolation Row lyrics © Audiam, Inc

    where

    "audiam is a Digital Reproduction Collection Agency
    We work for music publishers & self published songwriters

    We License & Collect:
    • U.S. Streaming Mechanical Royalties
    • All Canadian Digital Mechanical Royalties
    • YouTube Royalties"

  28. zafrom said,

    June 19, 2019 @ 10:29 pm

    @B.Ma, It's knowing before I buy what I'm paying for, from a site that does sell well-printed books that are also out of copyright. I trust that some readers have bought such books online. If the seller advertises its editions as copy-and-pastes from Gutenberg, then the potential buyers could make a more-informed decision.

  29. jaap said,

    June 20, 2019 @ 7:11 am

    Ray: "google searches also display content from sites like wikipedia (not dependent on ad rev)"

    Recently there was some pushback from Wikipedia. Wikipedia relies on donations, and companies extensively use Wikipedia (not only in web search results, but also through Google Home/Siri/Alexa etc). However, those companies hardly donate to Wikipedia, and if fewer people actually visit Wikipedia's own web pages then the occasional calls for donations there will be less effective.

  30. Ray said,

    June 22, 2019 @ 7:45 am

    jaap: true, dat.

    maybe the solution in all this is for google to only display part of a song's lyrics (the way it only displays part of a wikipedia article), with a link to the lyrics' source for users to see the complete lyrics (the way it provides a link to wikipedia for the full article)…

RSS feed for comments on this post