logo

CSV to TSV to CSV

It often happens that data to be cleaned arrives as a CSV file, which then needs to be converted to TSV format before the recipes described on this website will work. There are free online services for doing the conversion, but the services I've seen won't accept very large files and won't guarantee problem-free results. Luckily, there a few other possibilities.

The easiest way to convert a CSV file to a TSV file is to open the CSV in a spreadsheet (LibreOffice Calc, Gnumeric, Excel), select and copy the table's cells to the clipboard, then paste them into a text editor and save the paste as a text file. By default, spreadsheets use tabs as field separators for this job.

copy/paste

Alternatively, open the CSV in a spreadsheet, then save it as a tab-separated text file. The 'save as' options may include quoting of data items; choose 'No'. 

If the CSV file is very large (say, 50+ Mb), a spreadsheet may struggle (or fail) to open it. Luckily, you can convert a CSV file of any size to TSV on the command line.

Before doing the conversions shown below, however, it's a good idea to make sure there are no tabs within data items in the CSV, and if there are, replace them with spaces:

$ cat file.csv
"aaa","bbb","ccc"
"ddd","ee   e","fff"
"ggg","hhh","iii"
 
$ cat -T file.csv | grep -c "\^I"
1
 
$ cat -T file.csv | grep -n "\^I"
2:"ddd","ee^Ie","fff"
 
$ sed '2s/e\te/e e/' file.csv > file1
 
$ cat file1
"aaa","bbb","ccc"
"ddd","ee e","fff"
"ggg","hhh","iii"

That particular sed substitution was addressed to line 2 and the contents of its second field. If there were more tabs in line 2 you could replace them all with

$ sed '2s/\t/ /g' file.csv > file1

or replace all tabs in the file with

$ sed 's/\t/ /g' file.csv > file1

If your grep supports Perl-type regular expressions, you can also find tabs like this:

$ grep -cP "\t" file.csv
1
 
$ grep -nP --color=always "\t" file.csv | cat -T
2:"ddd","ee^Ie","fff"

No items quoted

If double quotes have not been used around any of the field-separating commas, you can use tr to convert all the commas to tabs:

$ cat csv_file
red,green,yellow,blue
 
$ tr ',' '\t' < csv_file > tsv_file
 
$ cat tsv_file
red   green   yellow   blue

All items quoted, no embedded double quotes

If the CSV is an 'all-fields-double-quoted' one with no double quotes embedded within data items, you can delete the leading and trailing quote on each line, and replace quote-comma-quote (",") with tab. I usually do this with sed:

$ cat csv_file
"red","green","yellow, light","","blue"
 
$ sed 's/^"//;s/","/\t/g;s/"$//' csv_file > tsv_file
 
$ cat tsv_file
red   green   yellow, light        blue

Only some items quoted,
but without embedded "doubled" double quotes

Some CSV-generating applications produce files in which data items containing commas or spaces are double-quoted, while the other data items are left unquoted:

$ cat csv_file
red,green,"yellow, light",,blue

This CSV style can be converted with an AWK command so long as there no double quotes embedded within quoted data items:

$ cat csv_file
red,green,"yellow, light",,blue
 
$ awk -v FPAT='[^,]*|"[^"]*"' -v OFS="\t" \
'{$1=$1; gsub(/"/,""); print}' csv_file > tsv_file
 
red   green   yellow, light        blue

All or only some items quoted,
some quoted data items with embedded "doubled" double quotes

This is the trickiest CSV style to convert:

$ cat csv_file
red,green,"sort of ""Prussian Blue"" colour, dark",,yellow
"red","green","sort of ""Prussian Blue"" colour, dark","","yellow"

You can detect doubled double quotes with grep, although the command will also detect empty quoted fields:

$ grep '""' csv_file
red,green,"sort of ""Prussian Blue"" colour, dark",,yellow
"red","green","sort of ""Prussian Blue"" colour, dark","","yellow"

A command to convert this ugly CSV uses sed to deal with the doubled double quotes before and after processing with AWK. The '@@@' string is a dummy placeholder for the doubled double quote:

$ cat csv_file
red,green,"sort of ""Prussian Blue"" colour, dark",,yellow
"red","green","sort of ""Prussian Blue"" colour, dark","","yellow"
 
$ sed 's/""/@@@/g' csv_file \
| awk -v FPAT='[^,]*|"[^"]*"' -v OFS="\t" \
'{$1=$1; gsub(/"/,""); print}' \
|sed 's/\t@@@\t/\t\t/g;s/@@@/"/g' > tsv_file
 
$ cat tsv_file
red   green   sort of "Prussian Blue" colour, dark      yellow
red   green   sort of "Prussian Blue" colour, dark      yellow

Any (?) CSV

The sed/AWK combo command will convert almost any CSV to a TSV, so I've saved it as a function, c2t, and use it for command-line conversions of "file" to "tsv_file":

c2t()
{
sed 's/""/@@@/g' "$1" \
| awk -v FPAT='[^,]*|"[^"]*"' -v OFS="\t" \
'{$1=$1; gsub(/"/,""); print}' \
| sed 's/\t@@@\t/\t\t/g;s/@@@/"/g' > tsv_"$1"
}

TSV > CSV

To convert a TSV file to an 'all-fields-double-quoted' CSV file, use sed substitutions:

$ cat tsv_file
red   green   yellow, light        blue
 
$ sed 's/\t/","/g;s/^/"/;s/$/"/' tsv_file > csv_file
 
$ cat csv_file
"red","green","yellow, light","","blue"

If there are double-quoted strings within data items, double them before using this command.