AI Hyperauthorship

« previous post | next post »

This paper's content is interesting — Mirzadeh, Iman, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, and Mehrdad Farajtabar. "GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models." arXiv preprint arXiv:2410.05229 (2024). In short, the authors found that small changes in Grade-School Mathematics benchmark questions, like substituting different numerical values or adding irrelevant clauses, caused all the tested LLMs to do worse. You should read the whole thing for the details, to which I'll return another time.

But what inspired this post is a feature of that paper's bibliography, in which many items have a large number of authors. For example, this reference lists 65 authors before "and et al." [sic]:

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurélien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Rozière, Bethany Biron, Binh Tang, Bobbie Chern, Charlotte Caucheteux, Chaya Nayak, Chloe Bi, Chris Marra, Chris McConnell, Christian Keller, Christophe Touret, Chunyang Wu, Corinne Wong, Cristian Canton Ferrer, Cyrus Nikolaidis, Damien Allonsius, Daniel Song, Danielle Pintz, Danny Livshits, David Esiobu, Dhruv Choudhary, Dhruv Mahajan, Diego Garcia-Olano, Diego Perino, Dieuwke Hupkes, Egor Lakomkin, Ehab AlBadawy, Elina Lobanova, Emily Dinan, Eric Michael Smith, Filip Radenovic, Frank Zhang, Gabriel Synnaeve, Gabrielle Lee, Georgia Lewis Anderson, Graeme Nail, Grégoire Mialon, Guan Pang, Guillem Cucurell, Hailey Nguyen, Hannah Korevaar, Hu Xu, Hugo Touvron, and et al. The llama 3 herd of models. CoRR, abs/2407.21783, 2024. doi: 10.48550/ARXIV.2407.21783. URL https://doi.org/10.48550/arXiv.2407.21783.

Drilling down, that reference itself (The llama 3 herd of models") supplies its "contributor list" as an appendix:

And the appendix splits the contributor list into two parts:

Llama 3 is the result of the work of a large number of people at Meta. Below, we list all core contributors (people who worked on Llama 3 for at least 2/3rd of the runtime of the project) and contributors (people who worked on Llama 3 for at least 1/5th of the runtime of the project). We list all contributors in alphabetical order of first name.

They then list 222 “Core Contributors” and 311 “Contributors”, for a total of 533 authors.

That's an order of magnitude smaller than (what I think is) the hyperauthorship record,  the 5,154 authors for "Combined Measurement of the Higgs Boson Mass in pp Collisions at s= 7 and 8 TeV with the ATLAS and CMS Experiments", Physical review letters 2015.

 



3 Comments

  1. AntC said,

    October 12, 2024 @ 9:10 pm

    Do academic rules make a distinction between "contributor" vs "author"?

    This mass-attribution business seems way beyond 'head' tenured academic + PostDoc "assistants" who typically do all the actual work.

    For "author" I'd expect some identifiable text that they'd written to appear in the final publication.

  2. AntC said,

    October 12, 2024 @ 9:16 pm

    There's some spoofs of the Oscar awards speeches for The Lord of the Rings movies where winners run out of people to thank and started listing everybody in New Zealand.

    And there's Spike Milligan who thanked nobody "because I did it all myself".

  3. David Marjanović said,

    October 14, 2024 @ 2:01 pm

    Do academic rules make a distinction between "contributor" vs "author"?

    In some disciplines and some journals, it's normal to list everyone who touched any piece of data or work that went into the paper as an author. That's the most common way to get to dozens or hundreds of coauthors.

RSS feed for comments on this post