I am trying to sort a file by a date field. I realize this has been done before, however, I cannot find an example that has the following date format.
Canada Goose 1x03 For the Triumph of Evil Sep/30/2013
Rucksack 10x03 Everybody's Crying Mercy Oct/03/13
Test 4x01 Season 4, Episode 1 Jun/01/14
New Family 3x03 Double Date Oct/01/2013
I tried this command but it doesn't work
sort -t ' ' -k 4.8,4.11 -k 4.4M -k 4.1,4.2 -b Test.txt
CodePudding user response:
If you have a GNU awk installed, you may want to try this approach.
sort.awk
#!/bin/gawk -f
function convertToSeconds(date, fields) {
split(date, fields, /\//)
fields[1]=months[tolower(fields[1])]
fields[2]=sprintf("d", fields[2])
fields[3]=(length(fields[3]) == 2) ? sprintf("2d", fields[3]) : fields[3]
return mktime(sprintf("%s %s %s 00 00 00", fields[3], fields[1], fields[2]))
}
BEGIN {
FS="( \\ )"
months["jan"]="01"; months["feb"]="02"; months["mar"]="03"; months["apr"]="04"
months["may"]="05"; months["jun"]="06"; months["jul"]="07"; months["aug"]="08"
months["sep"]="09"; months["oct"]="10"; months["nov"]="11"; months["dec"]="12"
}
{
arr[convertToSeconds($4)]=$0
}
END {
asorti(arr, dst)
for(i=1; i<=FNR; i) {
print arr[dst[i]]
}
}
Give it an execute permission, then run it:
$ chmod x ./sort.awk
$ ./sort.awk Test.txt
To save the changes into a new file, append this > operator.
$ ./sort.awk Test.txt > SortedTest.txt
CodePudding user response:
** UPDATE 1 **
revised sort key to explicitly list 4 digit year as prefix to circumvent year-end crossover issues
since OP only wants to sort date field, the exact epochs mapping isn't needed at all ::
mawk '$ NF = 366 * ( (_=($3) % 100) 1900 100 * (_<50) ) \ int(_ * 10^8) ($2) (31) * \ (index(" JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC", toupper($2)) / 3 - 1)'
23284 SEP 30 2013 201300737036
23285 OCT 1 2013 201300737038
23287 OCT 3 2013 201300737040
23541 JUN 14 2014 201400737293
1st column is original date generation order (the correct rank ordering), and the last column is the calculated sort index value - i tested every date from jan 1st 1950 to dec 31 2025, and this simplistic approach ranks order just fine, even though it doesn't bother to calculate exact julian dates, or exact leap years,
since the objective is merely finding a rank ordering method that yields the same sorting output as exact epoch seconds
