Question
I need to generate a file for Excel, some of the values in this file contain multiple lines.
there's also non-English text in there, so the file has to be Unicode.
The file I'm generating now looks like this: (in UTF8, with non English text mixed in and with a lot of lines)
Header1,Header2,Header3
Value1,Value2,"Value3 Line1
Value3 Line2"
Note the multi-line value is enclosed in double quotes, with a normal everyday newline in it.
According to what I found on the web this supposed to work, but it doesn't, at least not win Excel 2007 and UTF8 files, Excel treats the 3rd line as the second row of data not as the second line of the first data row.
This has to run on my customer's machines and I have no control over their version of Excel, so I need a solution that will work with Excel 2000 and later.
Thanks
EDIT: I "solved" my problem by having two CSV options, one for Excel (Unicode, tab separated, no newlines in fields) and one for the rest of the world (UTF8, standard CSV).
Not what I was looking for but at least it works (so far)
Answer
You should have space characters at the start of fields ONLY where the space
characters are part of the data. Excel will not strip off leading spaces. You
will get unwanted spaces in your headings and data fields. Worse, the "
that
should be "protecting" that line-break in the third column will be ignored
because it is not at the start of the field.
If you have non-ASCII characters (encoded in UTF-8) in the file, you should
have a UTF-8 BOM (3 bytes, hex EF BB BF
) at the start of the file. Otherwise
Excel will interpret the data according to your locale's default encoding
(e.g. cp1252) instead of utf-8, and your non-ASCII characters will be trashed.
Following comments apply to Excel 2003, 2007 and 2013; not tested on Excel 2000
If you open the file by double-clicking on its name in Windows Explorer, everything works OK.
If you open it from within Excel, the results vary:
- You have only ASCII characters in the file (and no BOM): works.
- You have non-ASCII characters (encoded in UTF-8) in the file, with a UTF-8 BOM at the start: it recognises that your data is encoded in UTF-8 but it ignores the csv extension and drops you into the Text Import not-a-Wizard , unfortunately with the result that you get the line-break problem.
Options include:
- Train the users not to open the files from within Excel :-(
- Consider writing an XLS file directly ... there are packages/libraries available for doing that in Python/Perl/PHP/.NET/etc