logo

Data-cleaning rule no. 1

Back up a data table before cleaning it. One way to do this is to copy the uncleaned data table to a separate 'Cleaning' directory as a working file. Move to the 'Cleaning' directory before you start any cleaning of the working file, and the original data table will be safe from unintended edits.

$ cp original_data.txt ~/Cleaning/table
 
$ cd ~/Cleaning
 
$ ls
table

I often keep progressive backups as I make changes to a data table. I usually number these backups serially, such as table1, table2 etc:

$ [do something] table > table1
 
$ [do something else] table1 > table2



Data-cleaning rule no. 2

Keep a log of every step in data cleaning. A plain text file will do. Paste into this file a copy of every command and its output, in the order in which you used the commands. If problems arise during data cleaning, the log file will help you understand what went wrong. If the data you're cleaning aren't your own, the log file can be the basis of a report to the data owner.