AI Hyperauthorship

« previous post |

This paper's content is interesting — Mirzadeh, Iman, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, and Mehrdad Farajtabar. "GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models." arXiv preprint arXiv:2410.05229 (2024). In short, the authors found that small changes in Grade-School Mathematics benchmark questions, like substituting different numerical values or adding irrelevant clauses, caused all the tested LLMs to do worse. You should read the whole thing for the details, to which I'll return another time.

But what inspired this post is a feature of that paper's bibliography, in which many items have a large number of authors. For example, this reference lists 65 authors before "and et al." [sic]:

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurélien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Rozière, Bethany Biron, Binh Tang, Bobbie Chern, Charlotte Caucheteux, Chaya Nayak, Chloe Bi, Chris Marra, Chris McConnell, Christian Keller, Christophe Touret, Chunyang Wu, Corinne Wong, Cristian Canton Ferrer, Cyrus Nikolaidis, Damien Allonsius, Daniel Song, Danielle Pintz, Danny Livshits, David Esiobu, Dhruv Choudhary, Dhruv Mahajan, Diego Garcia-Olano, Diego Perino, Dieuwke Hupkes, Egor Lakomkin, Ehab AlBadawy, Elina Lobanova, Emily Dinan, Eric Michael Smith, Filip Radenovic, Frank Zhang, Gabriel Synnaeve, Gabrielle Lee, Georgia Lewis Anderson, Graeme Nail, Grégoire Mialon, Guan Pang, Guillem Cucurell, Hailey Nguyen, Hannah Korevaar, Hu Xu, Hugo Touvron, and et al. The llama 3 herd of models. CoRR, abs/2407.21783, 2024. doi: 10.48550/ARXIV.2407.21783. URL https://doi.org/10.48550/arXiv.2407.21783.

Drilling down, that reference itself (The llama 3 herd of models") supplies its "contributor list" as an appendix:

And the appendix splits the contributor list into two parts:

Llama 3 is the result of the work of a large number of people at Meta. Below, we list all core contributors (people who worked on Llama 3 for at least 2/3rd of the runtime of the project) and contributors (people who worked on Llama 3 for at least 1/5th of the runtime of the project). We list all contributors in alphabetical order of first name.

They then list 222 “Core Contributors” and 311 “Contributors”, for a total of 533 authors.

That's an order of magnitude smaller than (what I think is) the hyperauthorship record,  the 5,154 authors for "Combined Measurement of the Higgs Boson Mass in pp Collisions at s= 7 and 8 TeV with the ATLAS and CMS Experiments", Physical review letters 2015.

 



Leave a Comment