Mark Ziemann, Yotam Eren and Assam El-Osta, "Gene name errors are widespread in the scientific literature", Genome Biology 2016:
The spreadsheet software Microsoft Excel, when used with default settings, is known to convert gene names to dates and floating-point numbers. A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions.
This was a problem a dozen years ago when I worked on information extraction from biomedical literature — it's amazing to me that it still goes on. The authors note that
Automatic conversion of gene symbols to dates and floating-point numbers is a problematic feature of Excel software. The description of this problem and workarounds were first highlighted over a decade ago —nevertheless, we find that these errors continue to pervade supplementary files in the scientific literature. To date, there is no way to permanently deactivate automatic conversion to dates in MS Excel and other spreadsheet software such as LibreOffice Calc or Apache OpenOffice Calc. We note, however, that the spreadsheet program Google Sheets did not convert any gene names to dates or numbers when typed or pasted; notably, when these sheets were later reopened with Excel, LibreOffice Calc or OpenOffice Calc, gene symbols such as SEPT1 and MARCH1 were protected from date conversion.
It's shocking that biologists ever relied on Excel as a database system, and even more shocking that they're still doing it.
If you're unavoidably connected to people who do this, the authors have provided some scripts for assessing the damage:
Bash scripts, URLs and output data supporting the conclusions of this article are available in the SourceForge repository (https://sourceforge.net/projects/genenameerrorsscreen/).