banner

For a full list of BASHing data blog posts, see the index page.     RSS


Changing the month format: a fairly general solution

I sometimes need to change the month format in a dataset, for instance from "Jan" to "01", or "3" to "March". There are various clever ways to do this on the command line, but I'm not good at remembering clever. To save time I wrote a table with the 6 different month formats I see most often. It's the table you see below, and if you highlight and copy it, you should be able to paste it into a text editor as a tab-separated table. Save the file as "months".

101JanJanuaryiI
202FebFebruaryiiII
303MarMarchiiiIII
404AprAprilivIV
505MayMayvV
606JunJuneviVI
707JulJulyviiVII
808AugAugustviiiVIII
909SepSeptemberixIX
1010OctOctoberxX
1111NovNovemberxiXI
1212DecDecemberxiiXII

My general strategy is to use AWK to change formats, and to load "months" into an appropriate array in the AWK command. Below are a few examples.

"months" is always in the working directory, and my data files are always tab-separated.


Expand abbreviated month

first

awk 'FNR==NR {a[$3]=$4; next} {split($0,b,"-")} /-/ {print a[b[1]]"-"a[b[2]]; next} {print a[b[1]]}' months file


Split "full" date into ISO 8601 components

second

awk -F"\t" 'FNR==NR {a[$4]=$2; next} FNR==1 {print $0 FS "Year" FS "Month" FS "Day"} FNR>1 {split($2,b," "); printf("%s\t%s\t%s\t%s\t%02d\n",$1,$2,b[3],a[b[2]],b[1])}' months file


Convert "full American" date into Roman month-numeral date

third

awk -F"\t" 'FNR==NR {a[$4]=$5; next} FNR==1 {print} FNR>1 {split($2,b," "); print $1 FS gensub(",","",1,b[2])"."a[b[1]]"."b[3]}' months file


Convert date range into ISO 8601 interval date

fourth

awk 'FNR==NR {a[$3]=$2; next} /[0-9]-[0-9]/ {split($0,b," "); split(b[1],c,"-"); printf("%s-%s-%02d/%s-%s-%02d\n", b[3],a[b[2]],c[1],b[3],a[b[2]],c[2])} / - / && length($0)<21 {split($0,b," - "); split(b[1],c," "); split(b[2],d," "); printf("%s-%s-%02d/%s-%s-%02d\n",d[3],a[c[2]],c[1],d[3],a[d[2]],d[1])} / - / && length($0)>21 {split($0,b," - "); split(b[1],c," "); split(b[2],d," "); printf("%s-%s-%02d/%s-%s-%02d\n",c[3],a[c[2]],c[1],d[3],a[d[2]],d[1])}' months file

For more on splitting again the split-out parts of a string, see this BASHing data post.


To remind myself which fields are which in "months", I just cat the file before building the AWK command:

months

Last update: 2018-12-30