logo

Reporting changes

The two BASH scripts on this page will generate a report on changes made to a table.
 
Note: These scripts should only be used on data tables after any records have been added, deleted or checked and fixed for breaks. Start with a tab-separated data table with a single header line, and save this original table as a backup file. Next, edit a working version of the table, correcting any errors in formatting or content of data items, and making sure that data items are in their correct fields. When the data cleaning is done, you're ready to use the scripts.


Reporter

The reporter script is a slightly improved version of one I described in 2015. The script detects the changes made when a data table is edited, and reports the changes as a plain-text logfile. That part of the script is based on the diff command. The second part of the script uses AWK and BASH to transform the logfile into an easy-to-read webpage.

As an example, here's a table called demo with 15 lines plus a header, with tab-separated data items in six fields:

original table

Note that the 'Uncertainty', 'Method' and 'Elevation' items have got mucked up in several of the records. I'd also like to edit a couple of the 'Location' items, and one of the 'LatDD' items contains an error. I first save a backup of demo as demo_old, then clean the data in demo:

cleaned table

Navigate to a directory containing both demo_old and demo, and launch reporter using the backup table as the first argument and the edited table as the second:

$ reporter demo_old demo

reporter generates a logfile in the same directory called demo_edits_[current date and time].txt and a webpage called demo_edits_[current date and time].html. The webpage then opens in your default browser. Here's the logfile, first as raw text and then with the file opened in (or copy-pasted into) a spreadsheet:

raw text
spreadsheet

And here's a screenshot of the webpage, showing changed items and their originals in bold:

webpage

Download the reporter script here.


Reportlist

reportlist generates a simple list of changes. For each change there's a line number, a field number, the original data item and the edited data item. The list is printed to screen and also to a file called [original filename]-changes-[current date and time]. The data items are in square brackets so that if leading or trailing spaces are deleted, the deletions will be apparent, as in [item   ] edited to [item].

#!/bin/bash
stamp=$(date +%F_%H:%M)
paste "$1" "$2" > merged
totf=$(head -n1 "$1" | awk -F "\t" '{print NF}')
awk -F "\t" -v f="$totf" '{for (i=1;i<=f;i++) \
if ($i != $(i+f)) \
print "line "NR", field "i": ["$i"] > ["$(i+f)"]"}' merged \
| tee "$1"-changes-"$stamp"
rm merged
exit 0

reportlist