我在文件中有以下行,我希望按小时获得第3列的平均值.
2010-10-28 12:02:36: 5.1721851 secs 2010-10-28 12:03:43: 4.4692638 secs 2010-10-28 12:04:51: 3.3770310 secs 2010-10-28 12:05:58: 4.6227063 secs 2010-10-28 12:07:08: 5.1650404 secs 2010-10-28 12:08:16: 3.2819025 secs 2010-10-28 13:01:36: 2.1721851 secs 2010-10-28 13:02:43: 3.4692638 secs 2010-10-28 13:03:51: 4.3770310 secs 2010-10-28 13:04:58: 3.6227063 secs 2010-10-28 13:05:08: 3.1650404 secs 2010-10-28 13:06:16: 4.2819025 secs 2010-10-28 14:12:36: 7.1721851 secs 2010-10-28 14:23:43: 7.4692638 secs 2010-10-28 14:24:51: 7.3770310 secs 2010-10-28 14:25:58: 9.6227063 secs 2010-10-28 14:37:08: 7.1650404 secs 2010-10-28 14:48:16: 7.2819025 secs
我已经做好了
cat filename | awk '{sum+=$3} END {print "Average = ",sum/NR}'
与输出
Average = 4.49154
获取整个文件的平均值,但希望按小时打破平均值.我可以在输出到awk之前的一小时偷偷摸摸一下grep,但是我希望,用一个衬垫来做它.
理想情况下,输出就像是
Average 12:00 = _computed_avg_ Average 13:00 = _computed_avg_ Average 14:00 = _computed_avg_
等等.
不一定要寻找答案,但希望能指出正确的方向.
非常感谢!
KM
我将字段分隔符设置为冒号,然后在数组中的不同键的关联数组中聚合,最后计算平均值:
原文链接:https://www.f2er.com/bash/384040.htmlgawk -F: 'NF == 4 { sum[$1] += $4; N[$1]++ } END { for (key in sum) { avg = sum[key] / N[key]; printf "%s %f\n",key,avg; } }' filename | sort
在您的测试数据上,这给出:
2010-10-28 12 4.348022 2010-10-28 13 3.514688 2010-10-28 14 7.681355
即使数据不按时间顺序(假设你不按顺序连接两个日志文件),这应该产生正确的答案.请注意,gawk将以数字方式汇总“3.123秒”值.最后的排序按时间顺序显示平均值;无法保证按时间顺序打印按键.