Dealing with a Byte Order Mark (BOM)
I have just been trying to import some data into R. The data were exported from a SQL Server client in tab-separated value (TSV) format. However, reading the data into R the “usual” way produced unexpected results:
Those weird characters in the first record… where did they come from? They don’t show up in a text editor, so they’re not easy to edit out.
Googling ensued and revealed that those weird characters were in fact the byte order mark (BOM), special characters which indicate the endianness of the file. This was quickly confirmed using CYGWIN. (Yes, shamefully, I am working under Windows at the moment!)
The solution is remarkably simple: just specify the correct character encoding.