I have a log file that simplified looks like this (it has enough columns so that direct addressing of the columns is not feasible):
id,time,host,ip,user_uuid
foo1,2022-05-10T00:01.001Z,,,
foo1,2022-05-10T00:01.002Z,foo_host,,
foo1,2022-05-10T00:01.003Z,,192.168.0.1,
foo1,2022-05-10T00:01.004Z,,,foo_user
bar1,2022-05-10T00:02.005Z,,,
bar1,2022-05-10T00:03.006Z,bar_host,,
bar1,2022-05-10T00:04.007Z,,192.168.0.13,
bar1,2022-05-10T00:05.008Z,,,bar_user
Most of the fields appear only once by id but not all of them (see time, for example).
What I want to achieve is to have one line per id that combines the columns of all records with the same id:
id,time,host,ip,user_uuid
foo1,2022-05-10T00:01.001Z,foo_host,192.168.0.1,foo_user
bar1,2022-05-10T00:03.006Z,bar_host,192.168.0.13,bar_user
For the columns that appear more than once in each id, I don't care which one is returned as long as it relates to a record with the same id.
I would exploit GNU AWK 2D arrays following way, let file.txt content be
id,time,host,ip,user_uuid
foo1,2022-05-10T00:01.001Z,,,
foo1,2022-05-10T00:01.002Z,foo_host,,
foo1,2022-05-10T00:01.003Z,,192.168.0.1,
foo1,2022-05-10T00:01.004Z,,,foo_user
bar1,2022-05-10T00:02.005Z,,,
bar1,2022-05-10T00:03.006Z,bar_host,,
bar1,2022-05-10T00:04.007Z,,192.168.0.13,
bar1,2022-05-10T00:05.008Z,,,bar_user
then
awk 'BEGIN{FS=OFS=",";cols=5}NR==1{print}NR>1{for(i=1;i<=cols;i+=1){arr[$1][i]=arr[$1][i]?arr[$1][i]:$i}}END{for(i in arr){for(j in arr[i]){$j=arr[i][j]};print}}' file.txt
output
id,time,host,ip,user_uuid
bar1,2022-05-10T00:02.005Z,bar_host,192.168.0.13,bar_user
foo1,2022-05-10T00:01.001Z,foo_host,192.168.0.1,foo_user
Explanation: Firstly I inform GNU AWK that both field separator (FS) and output field separator (OFS) is ,, I use cols variable for holding information how many columns you wish to have. First row I simply print, for following rows for each column I check if there is already some truthy value in arr[id][number of field] using so-called ternary operator if yes I use it otherwise I set value to current field. In END I use nested for loops, for each id I do set value of its field in current line, so GNU AWK build string from these which I can print. Disclaimer: this solution assumes number of columns is equal in all lines and number of columns is known a priori and any order of output is acceptable. If this does not hold then develop own superior solution.
(tested in gawk 4.2.1)
You can use the ruby csv parser to group then reduce the repeated entries:
ruby -r csv -e '
data=CSV.parse($<.read, **{:col_sep=>","})
puts data[0].to_csv
data[1..].group_by { |row| row[0] }.
each{ |k, arr|
puts arr.transpose().map{ |ta| ta.find { |x| !x.nil? }}.to_csv
}
' file
Prints:
id,time,host,ip,user_uuid
foo1,2022-05-10T00:01.001Z,foo_host,192.168.0.1,foo_user
bar1,2022-05-10T00:02.005Z,bar_host,192.168.0.13,bar_user
This assumes the valid data is the first non-nil, nonblank encountered for that particular column.
I would like to straight up merge multiple pipe delimited files using Awk. Every example I have found on here is several times more complicated than what i am trying to so. I have several text files formatted identically, and just want to merge them together, like a UNION ALL in SQL. Don't need to join on a column, and don't care about duplicate rows.
Concatenating the files should work for you then:
cat file1.txt file2.txt file3.txt > finalFile.txt
No need for awk.
That is a job for cat (see #mjuarez's answer) but if you really want to use awk for it:
$ awk 1 files* > another_file
(g)awk '{print}' file1 file2 file* >> outputfile
Given that you aim to use awk only.
James' answer is way better,
However I still want to show what I came with, the very basic using of awk. :)
I have following file format:
AAA-12345~TRAX~~AAAAAAAAAAAA111111ETC
AAA-12345~RCV~~BBBBBBBBBBBB222222ETC
BBB-78900~TRAX~~CCCCCCCCCCCC444444ETC
BBB-78900~RCV~~DDDDDDDDDDDD555555ETC
CCC-65432~TRAX~~HHHHHHHHHHHH888888ETC
All lines are in pairs and each pair is identical up single ~.
Sometimes there are orphans like last record which has TRAX but no RCV.
Question is: using bash utilies like sed or awk or commands like grep or cut how do I find and display orphans only?
Using awk:
awk -F~ '{a[$1]+=1} END{for(key in a) if(a[key]==1){print key}}'
This is just loading the first field (split by tilde) as they key of an array and incrementing the value for that key each time it's found. Then when the file is finished, it iterates the array and prints out key's with just 1 for the value.
This question already has answers here:
Ruby ampersand colon shortcut [duplicate]
(2 answers)
What does map(&:name) mean in Ruby?
(17 answers)
Closed 9 years ago.
Here's the line of code I'm trying to wrap my head around:
Category.all.map(&:id).each { |id| Category.reset_counters(id, :products) }
Hoping someone can help me understand what (&:id) is doing and how it impacts the rest of the line? I believe it turns the symbol :id into a proc that'll respond to id?!? But then it gets confusing...
Thanks in advance!
Category.all.map(&:id)
is shorthand for
Category.all.map { |a| a.id }
as for how it affects the rest of the line, the above section returns all id values as a single Array. This Array of ids is then passed into another call to each, which iteratively passes each id into reset_counters.
I need to include the word "Table" at the beginning of each line in my List of Tables. That is, instead of:
LIST OF TABLES
1 The first table ........... 10
2 The second table ........... 20
I need it to say:
LIST OF TABLES
Table 1 The first table ........... 10
Table 2 The second table ........... 20
Yes, I know that's ugly, but those are the rules.
I also need the Table of contents to say:
Table of Contents
1 The first Chapter ...... 1
Appendices
Appendix A The A appendix ........ 10
Any idea how to do this in a simple and consistent manner?
To answer your three questions:
1: Table prefix in the list of tables put the following in your preamble:
\usepackage{tocloft}
\newlength\tablelen
\settowidth\tablelen{Table}
\addtolength\cfttabnumwidth{\tablelen}
\renewcommand\cfttabpresnum{Table }
2: To have "Appendices" appear in your table of contents put the following just after your call to \appendix:
\addcontentsline{toc}{chapter}{Appendices}
3: To have "Appendix" as a prefix for each appendix in the table of contents, see:
http://for.mat.bham.ac.uk/pgweb/thesisfiles/bhamthesisman.pdf
http://for.mat.bham.ac.uk/pgweb/thesisfiles/bhamthesis.dtx
in particular, search for his \renewcommand{\appendix} in which add to contents is changed.
The easier way is to replace the word \listoftables with
{%
\let\oldnumberline\numberline%
\renewcommand{\numberline}{\tablename~\oldnumberline}%
\listoftables%
}