Excel converts gene names to other data formats

Aug 24, 2016 03:30 GMT  ·  By

Three scientists from the Baker IDI Heart & Diabetes Institute have discovered that 19.6 percent of all scientific papers ever written on gene research contain gene conversion errors caused by Microsoft Excel.

Most scientific papers that deal with gene research come with supplementary files containing in-depth details about the referenced genes.

Since there are tens of thousands of known genes, when the research is put together, this data is often handled via data processing software, such as Microsoft Excel.

Excel automatically converts gene names to dates or other formats

The three researchers say they identified a series of common mistakes in multiple scientific research papers which they tracked down to the researchers using Excel to handle the data.

The three say that Excel, when used in default settings, will convert gene names into other data formats.

For example, a researcher writing SEPT2 (short name for Septin 2) will see his text automatically converted to 2-Sep, as in the date, September 2. The same happens with MARCH1 (short name for Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase).

Besides dates, Excel also mangles gene names and rewrites them in various number formats as well.

Copy-pasting large data sets hid the error for years

But if researchers would enter this data one Excel cell at a time, they would surely notice. But they don't, mainly because most of this data is copy-pasted from tables or other sources inside Excel files, hundreds or thousands of values at a time.

The conversion takes place without the researcher noticing and culminates in research papers with errors in their supplementary files, sometimes contributing to unverifiable data or errors in subsequent calculations.

But there's a way to avoid such mistakes in the future, as the three scientists explain in their research paper:

  To date, there is no way to permanently deactivate automatic conversion to dates in MS Excel and other spreadsheet software such as LibreOffice Calc or Apache OpenOffice Calc. We note, however, that the spreadsheet program Google Sheets did not convert any gene names to dates or numbers when typed or pasted; notably, when these sheets were later reopened with Excel, LibreOffice Calc or OpenOffice Calc, gene symbols such as SEPT1 and MARCH1 were protected from date conversion.