Importing and exporting an Excel CSV
In this article we want to clear up some of the myths and issues surrounding Excel CSV data import / export and how the Excel CSV processing can be done easier.
- What a CSV file is and why CSV files have nothing to do with Excel (even though Microsoft seems to think so)
- Problems that can occur when using Excel and CSV files, in particular details about formatting data such as numbers and dates
- How to perform an Excel CSV import/export with the TextFile Adapter and Excel Adapter in FlowHeater
- How a subset of Excel data can be filtered in a CSV export
- How rows/records in an Excel workbook can be updated in a CSV data import
What is a CSV file?
Firstly, a CSV file is simply a text file without any additional metadata for formatting like fonts, data types, field lengths, and so on. This way, a distinct record of data is represented by each line and the individual fields are divided by a separator character. It is sometimes asserted that a CSV text file must be encoded as US ASCII (Codepage 437), but that is a fallacy. You can use any encoding you like. Commonly UTF-8 is now used, which allows a wealth of special and national characters. The file extension is also not necessarily ".csv", what matters is the content. For an exchange of data, it is only necessary for both sides to agree on what is used.
The CSV separator
A comma was originally used as the field separator. Hence the name CSV (Comma Separated Values). As in some countries the comma is used as a decimal separator, an alternative character is used. For example, in Germany the semicolon is popular. In principle, you can choose any character you want to use, but as already mentioned, both sides must agree for the data exchange to work.
Note: You can set the default list separator in Windows in the Control Panel. FlowHeater uses the list separator specified in the Control Panel as the default setting when creating a new data import/export Definition. The CSV list separator to be used can, of course, be changed at any time in the FlowHeater Adapter format settings.
What happens when the CSV separator occurs in the field content?
Of course, this can occur and would result in the displacement of fields if not addressed. This problem can be solved by enclosing field contents with a text delimiter. Quotation marks are typically used for this, but any other character could be used as a text delimiter. As already mentioned, both sides of the data exchange have to agree.
e.g. "Text with a , comma in the content","…"
In this case only two columns are detected in the import, not three.
What if the text delimiter also occurs in the field content?
In this case, the text delimiter must be additionally masked or quoted. To do this, the text delimiter is simply doubled in the text.
e.g. "Text with a "","" comma in the content","…"
When reading, the duplicate field delimiter is detected and removed from the field content once. Reading continues until the actual CSV column delimiter is found after a final text delimiter that is not quoted. The field content of the first field thus becomes "Text with a "," comma in the content"
Note: By enclosing it in text delimiters, a block of field content can also extend over several lines. However, some programs get confused when reading fields of a CSV file that consist of more than one line.
Problems with CSV import/export using Excel
One problem is that when Microsoft Excel is installed on the computer, the file extension ".csv" becomes associated with the Excel program. The Excel application icon is also used as the CSV file icon, which suggests to many that CSV is the same thing as Excel, which of course it is not.
If you just want to inspect the content of a CSV file quickly, Excel can be used for this purpose to a limited extent. However, it soon becomes clear that Excel tries to format certain CSV fields by applying automatic recognition of data types. This can result in a numerical value in a CSV file being displayed as a date in Excel, for example. In other cases, decimal places may be truncated, etc.
It gets even worse when you try to change the contents of a CSV file in Excel and save it again. In most cases, you destroy the original format of the CSV file, making it no longer possible to process it cleanly with other programs.
FlowHeater, with its TextFile Adapter, offers some powerful mechanisms in this respect, so that a matched format of a CSV file is always interpreted the same way. See more about this below.
Convenient Excel CSV import/export with FlowHeater
This is an original feature of FlowHeater: FlowHeater was developed precisely to solve the problems mentioned above that occur during Excel CSV processing. The objective was to only have to worry about the formatting of data once. Thereafter, FlowHeater ensures that the defined format is maintained. Since we now live in a globalized world, a stored FlowHeater Definition also transforms data between countries with locales and regions that have different numeric and/or date formatting. FlowHeater offers the possibility of using all the locales available on a system. After selection, the locale can be further adapted to the required conditions. A FlowHeater Definition (.fhd) created in this way now works the same everywhere.
In order to import a CSV file into an Excel workbook or to create a CSV file from an Excel table, create a new FlowHeater Definition and select the required Adapters on the READ and WRITE sides.
READ TextFile Adapter and WRITE Excel Adapter to import a CSV file into an Excel workbook
as well as vice versa with
READ Excel Adapter and WRITE TextFile Adapter to export (create) a CSV file from an Excel sheet
The procedure is the same in both cases. First you select the required Adapters, then you configure the Adapters and adjust the desired format. As Excel can handle data types, it is only necessary to define a suitable format for the TextFile Adapter. In the format settings of the TextFile Adapter you only need specify how numbers, dates and time information should appear or how they are given in the CSV file. If you now correctly assign the corresponding data types (string, integer, number with decimal places, date, time, etc.) for the individual columns in the Excel Adapter, the field contents are automatically converted and set according to the specifications made.
You will find two examples of this in the FlowHeater Download Archive : Excel-CSV-export.fhd and Excel-CSV-import.fhd
How can only a certain subset be exported to a CSV file?
For this purpose, FlowHeater offers the possibility to define IF-THEN-ELSE conditions. Together with a filter, the amount of data to be exported can be individually limited during the transfer. This way only the desired data records, based on the defined filter condition, end up in the CSV file. For example, to restrict a CSV export based on a date that falls within a specific date range, proceed as follows.
First create a new definition and select the Excel Adapter on the READ side and the TextFile Adapter on the WRITE side. Both Adapters should be configured as if you want to export the complete set of data to a CSV file. Once that has been done, you need an IF-THEN-ELSE Heater for the filter condition as well as a Filter Heater that dynamically filters the data out of the Excel data source during processing based on the true/false result. Drag the Excel column that contains the date to be filtered to the IF-THEN-ELSE Heater. Within this Heater we now define a condition. See the illustration.
<01/01/2021 OR >12/31/2021
Warning: The format of the dates entered here must correspond to the Adapter date format settings. This is the only way to ensure that the value is correctly interpreted as a date.
Now connect the output of the IF-THEN-ELSE with the Filter Heater. With this filter condition and the connected Filter Heater, Excel rows that contain dates before or after 2021 are thereby excluded from processing. In the CSV file, only the filtered set of data will result, based on the filter condition.
In the FlowHeater Download Archive you will find the example: Excel-CSV-export-subset.fhd
How can Excel rows be updated during a CSV import?
For this the FlowHeater Excel Adapter offers the possibility to define key fields. These key fields are used to identify matching or existing rows/records in an Excel workbook. If a matching record is found based on these key fields, that row is updated. If multiple rows/records are found in the Excel workbook, all matching rows/records will be updated, of course. Thus, similar to a SQL database UPDATE, data records in an Excel workbook can be updated via CSV import. If desired, the Excel Adapter can support both UPDATE and INSERT at the same time: If no matching record is found, a new row is added to the Excel workbook. The Excel CSV import behavior can be customized to your needs. As mentioned above, the TextFile Adapter ensures that the CSV fields are always interpreted according to the format settings defined.
In the FlowHeater Download Archive you will find the example: Excel-CSV-import-with-update.fhd
CSV import/export remains a challenge in Excel. With the functions offered by Microsoft Excel, it is difficult to process CSV files cleanly and without undertaking manual format changes. In contrast, FlowHeater offers powerful mechanisms to perform Excel CSV processing with minimal effort. If you have further questions, please do not hesitate to contact us via the free support forum.
- Advanced MS Excel CSV export example with grouping
- FlowHeater Support Forum (Excel Adapter)
- FlowHeater Support Forum (TextFile Adapter)
Microsoft®, Windows®, Excel® are registered trademarks of Microsoft Corporation