Home > database >  Bash: splitting a list of strings each containing space-separated words in different variables for e
Bash: splitting a list of strings each containing space-separated words in different variables for e

Time:01-05

I'm trying to parse the apache error log to grep the lines that corresponds to the "offending" IPs found in the fail2ban log.

I'm using a script in bash.

First I extract the offending IPs:

offenders=$(grep -F "[apache-errors] Found" /var/log/fail2ban.log | awk '{print $8}' | sort | uniq)

Then for each IP I get the entries from the fail2ban.log; there may be multiple entries, because the IP may have done requests at multiple times:

for ip in $offenders; do
    entries=$(grep -F "[apache-errors] Found $ip" /var/log/fail2ban.log | awk '{print $8" "$10" "$11}' | sort | uniq)
 
    declare _count_entries=$(echo "${entries[@]}" | wc -l)
    echo "Found $_count_entries error entries for IP $ip"

    for entry in "${entries[@]}"; do
        echo "$entry"
    done
done

This is what I get so far (IPs have been anonymized):

[INFO] Found 1 error entries for IP 10.10.0.29
10.10.0.29 2021-12-20 06:33:12
[INFO] Found 2 error entries for IP 10.20.0.242
10.20.0.242 2021-12-21 10:51:44
10.20.0.242 2021-12-30 12:03:55
[INFO] Found 3 error entries for IP 10.30.0.186
10.30.0.186 2022-01-02 05:20:49
10.30.0.186 2022-01-02 05:40:24
10.30.0.186 2022-01-02 07:38:55

Now what I want to do is, for each line extract the ip, date and time portions. I tried something like this, but IT DOES NOT WORK, it prints only the (ip,date,time) for the first entry:

for ip in $offenders; do
    entries=$(grep -F "[apache-errors] Found $ip" /var/log/fail2ban.log | awk '{print $8" "$10" "$11}' | sort | uniq)
    
    for entry in "${entries[@]}"; do

        echo "$entry"

        _ip=($(echo "$entry" | cut -d ' ' -f1))
        _date=($(echo "$entry" | cut -d ' ' -f2))
        _time=($(echo "$entry" | cut -d ' ' -f3))
        echo "ip=$_ip , date=$_date , time=$_time"

    done
done

Output: for each entry, only the (ip,date,time) portions of the first one is echoed:

[INFO] Found 1 error entries for IP 10.10.0.29
10.10.0.29 2021-12-20 06:33:12
ip=10.10.0.29 , date=2021-12-20 , time=06:33:12
[INFO] Found 2 error entries for IP 10.20.0.242
10.20.0.242 2021-12-21 10:51:44
10.20.0.242 2021-12-30 12:03:55
ip=10.20.0.242 , date=2021-12-21 , time=10:51:44
[INFO] Found 3 error entries for IP 10.30.0.186
10.30.0.186 2022-01-02 05:20:49
10.30.0.186 2022-01-02 05:40:24
10.30.0.186 2022-01-02 07:38:55
ip=10.30.0.186 , date=2022-01-02 , time=05:20:49

The desired output would be:

[INFO] Found 1 error entries for IP 10.10.0.29
10.10.0.29 2021-12-20 06:33:12
ip=10.10.0.29 , date=2021-12-20 , time=06:33:12
[INFO] Found 2 error entries for IP 10.20.0.242
10.20.0.242 2021-12-21 10:51:44
10.20.0.242 2021-12-30 12:03:55
ip=10.20.0.242 , date=2021-12-21 , time=10:51:44
ip=10.20.0.242 , date=2021-12-30 , time=12:03:55
[INFO] Found 3 error entries for IP 10.30.0.186
10.30.0.186 2022-01-02 05:20:49
10.30.0.186 2022-01-02 05:40:24
10.30.0.186 2022-01-02 07:38:55
ip=10.30.0.186 , date=2022-01-02 , time=05:20:49
ip=10.30.0.186 , date=2022-01-02 , time=05:40:24
ip=10.30.0.186 , date=2022-01-02 , time=07:38:55

So how can I do that in bash?

The final goal is to use the ip, date and time portions to build a regex like this, because I want to grep the lines from the error logs that correspond exactly to the findings in the fail2ban log:

grep -P "^(\[$_date $_time)(. \[client )($_ip). $" /var/log/apache2/error.log

CodePudding user response:

You could go with something like this:

#!/bin/bash
  
print_errors() {
  local ip=$1
  [ -n "$ip" ] || return
  shift
  echo "[INFO] Found ${#@} error entries for IP $ip"
  printf '%s\n' "$@"
}

prev_ip=
errors=()
while read -r ip date time
do
    if [ "$prev_ip" != "$ip" ]
    then
        print_errors "$prev_ip" "${errors[@]}"
        prev_ip=$ip
        errors=()
    fi
    errors =("ip=$ip , date=$date , time=$time")
done < <(
    grep -F "[apache-errors] Found" /var/log/fail2ban.log |
    awk '{print $8" "$10" "$11}' |
    sort
)

print_errors "$prev_ip" "${errors[@]}"

But bash is not really meant for that, it's better to write the same logic with awk (I'm doing the sorting outside of awk here):

grep -F "[apache-errors] Found" /var/log/fail2ban.log | sort -k 8,1 |
awk '
    function print_errors(ip, arr) {
        if (ip == "") return
        print "[INFO] Found "length(arr)" error entries for IP "ip
        for (i in arr) print arr[i]
    }
    BEGIN { ip = "" }
    {
        if ($8 != ip) {
            print_errors(ip, arr)
            delete arr
            ip = $8
        }
        arr[length(arr) 1] = "ip="$8" , date="$10" , time="$11
    }
    END{ print_errors(ip, arr) }
'

Or even better, write the whole thing in a language that has multidimensional associative arrays and text processing facilities:

Example with ruby:

#!/usr/bin/env ruby
  
ARGF.each_line.with_object(Hash.new{|h,k| h[k] = []}) do |line,hash|
  ip,date,time = line.split.values_at(7,9,10)
  hash[ip] << "ip=#{ip} , date=#{date} , time=#{time}"
end.each do |ip,arr|
  puts "[INFO] Found #{arr.count} error entries for IP #{ip}"
  puts arr.join("\n")
end

output example of the three programs above:

[INFO] Found 1 error entries for IP 10.10.0.129
ip=10.10.0.129 , date=2021-12-20 , time=06:33:12
[INFO] Found 2 error entries for IP 10.20.0.242
ip=10.20.0.242 , date=2021-12-21 , time=10:51:44
ip=10.20.0.242 , date=2021-12-30 , time=12:03:55
[INFO] Found 3 error entries for IP 10.30.0.186
ip=10.30.0.186 , date=2022-01-02 , time=05:20:49
ip=10.30.0.186 , date=2022-01-02 , time=05:40:24
ip=10.30.0.186 , date=2022-01-02 , time=07:38:55
  •  Tags:  
  • Related