Calculating the most frequent pairs in a dataset - google-sheets

Is it possible to calculate most frequent pairs from a combinations of pairs in a dataset with five columns?
I can do this with a macro in excel, I'd be curious to see if there's a simple solution for this in google sheets.
I have a sample data and results page here :
Data:
B1 B2 B3 B4 B5
6 22 28 32 36
7 10 17 31 35
8 33 38 40 42
10 17 36 40 41
8 10 17 36 54
9 30 32 51 55
1 4 16 26 35
12 28 30 40 43
42 45 47 49 52
10 17 30 31 47
10 17 33 51 58
4 10 17 30 32
2 35 36 37 43
6 10 17 38 55
3 10 17 25 32
Results would be like:
Value1 Value2 Frequency
10 17 8
10 31 2
17 31 2
10 36 2
17 36 2
30 32 2
10 30 2
17 30 2
10 32 2
17 32 2
etc
Each row represents a data set. The pairs don't have to be adjoining. There can be numbers between them.

Create a combination of pairs for each row using the method mentioned here. Then REDUCE all the pairs to create a virtual 2D array. Then use QUERY to group and find the count:
=QUERY(
REDUCE(
{"",""},
A2:A16,
LAMBDA(acc,cur,
{
acc;
QUERY(
LAMBDA(mrg,
REDUCE(
{"",""},
SEQUENCE(COLUMNS(mrg)-1,1,0),
LAMBDA(a_,c_,
{
a_;
LAMBDA(rg,
REDUCE(
{"",""},
OFFSET(rg,0,1,1,COLUMNS(rg)-1),
LAMBDA(a,c,{a;{INDEX(rg,1),c}})
)
)(OFFSET(mrg,0,c_,1,COLUMNS(mrg)-c_))
}
)
)
)(OFFSET(cur,0,0,1,5)),
"where Col1 is not null",0
)
}
)
),
"Select Col1,Col2, count(Col1) group by Col1,Col2 order by count(Col1) desc "
)
Input:
B1(A1)
B2
B3
B4
B5
6
22
28
32
36
7
10
17
31
35
8
33
38
40
42
10
17
36
40
41
8
10
17
36
54
9
30
32
51
55
1
4
16
26
35
12
28
30
40
43
42
45
47
49
52
10
17
30
31
47
10
17
33
51
58
4
10
17
30
32
2
35
36
37
43
6
10
17
38
55
3
10
17
25
32
Output(partial):
count
10
17
8
10
30
2
10
31
2
10
32
2
10
36
2
17
30
2
17
31
2
17
32
2
17
36
2
30
32
2
1
4
1
1
16
1
1
26
1
1
35
1
2
35
1
2
36
1
2
37
1
2
43
1
3
10
1
3
17
1
3
25
1

Related

How to Convert String into bytearray(EUC-KR)?

Hello There i am using swift 5.0 and developing BLE App.
As we have android app there are using default function as below
byte nByte[] = Name.getBytes( charsetName: "EUC-KR")
Output of android
Value[0] = 32 Value[1] = 30 Value[2] = 32 Value[3] =
32 Value[4] = 31 Value[5] = 31 Value[6] = 32 Value[7] = 37 Value[8] = 2d Value[9] = c3 Value[10] =
e6 Value[11] = ba Value[12] = cf Value[13] = 38 Value[14] = 30 Value[15] = c0 Value[16] = da Value[17] =
39 Value[18] = 30 Value[19] = 31 Value[20] = 35 Value[21] = 2d Value[22] = 58 Value[23] = 2d Value[24] =
30 Value[25] = 32 Value[26] = 2d Value[27] = 31 Value[28] = 32 Value[29] = 31 Value[30] = 32 Value[31] =
31 Value[32] = 32 Value[33] = 31 Value[34] = 2e Value[35] = 54 Value[36] = 58 Value[37] = 54
We used in iOS different type of string convert as below
Code 1
let rawEncoding = CFStringConvertEncodingToNSStringEncoding(CFStringEncoding(CFStringEncodings.EUC_KR.rawValue))
let encoding = String.Encoding(rawValue: rawEncoding)
let strEUCData = "20221127-충북80자9015-X-02-1212121.TXT".data(using: encoding) ?? Data()
Output of iOS
bytes : 38 elements
0 : 50
1 : 48
2 : 50
3 : 50
4 : 49
5 : 49
6 : 50
8 : 45
9 : 195
10 : 230
11 : 186
12 : 207
13 : 56
14 : 48
15 : 192
16 : 218
17 : 57
18 : 48
19 : 49
20 : 53
21 : 45
22 : 88
23 : 45
24 : 48
25 : 50
26 : 45
27 : 49
28 : 50
29 : 49
30 : 50
31 : 49
32 : 50
33 : 49
34 : 46
35 : 84
36 : 88
37 : 84
Code 2
let strEUCData1 = "20221127-충북80자9015-X-02-1212121.TXT".data(using: String.Encoding(rawValue: 0x80000940)) ?? Data()
All above functions given wrong byte array.
Any help will be appreciated.
Thank you.
They are the same data. The Android output is in Hex (base 16) number and the iOS output is in decimal (base 10) number:
Android (Hex) iOS (Decimal)
32 50
30 48
32 50
32 50
31 49
31 49
32 50
37 55
2d 45
c3 195
e6 230
ba 186
cf 207
38 56
30 48
c0 192
da 218
39 57
30 48
31 49
35 53
2d 45
58 88
2d 45
30 48
32 50
2d 45
31 49
32 50
31 49
32 50
31 49
32 50
31 49
2e 46
54 84
58 88
54 84

Export datas from mongodb using mongoexport from a specific time range with query and timestamp

I'm trying to export a collection from mongodb using mongoexport. This works so far:
mongoexport.exe --db dataloggin --collection p1 --out myRecords.json
The problem is that the file is huge and I can not open it anymore (araound 20GB, 30 days, every half a second a document).
I just need the data between the 3. March and the 5 March, so I tried to select this date range with the query selector as following:
mongoexport.exe --db dataloggin --collection p1 -q='{"timestamp":{"$gte":{"$timestamp":"2016-03-3T00:00:00.000Z"}:},"timestamp":{"$lt":{"$timestamp":"2016-03-05T00:00:00.000Z"}}}' --out myRecords.json
But I get an error:
error validating settings: query '[39 123 116 105 109 101 115 116 97 109 112 58 123 36 103 116 101 58 123 36 116 105 109 101 115 116 97 109 112 58 50 48 49 54 45 48 49 45 48 49 84 48 48 58 48 48 58 48 48 46 48 48 48 90 125 58 125 44 116 105 109 101 115 116 97 109 112 58 123 36 108 116 58 123 36 116 105 109 101 115 116 97 109 112 58 50 48 49 54 45 48 49 45 48 49 84 48 48 58 48 48 58 48 48 46 48 48 48 90 125 125 125 39]' is not valid JSON: json: cannot unmarshal string into Go value of type map[string]interface {}
Someone have an idea?
Many Thanks and regards

How to overcome with this error when using Networkx's kernighan_lin_bisection

I want to use kernighan_lin_bisection from Networkx to separate a network data.
But the error below showed up and I'm stuck.
It would be highly appreciated if you could help me overcome this error.
QT-------------------------------------------------------------------------
IndexError Traceback (most recent call last)
in ()
17 for c in init_partition:
18 for n in c:
---> 19 color_map_i[n]=colors[counter]
20 counter=counter+1
21
IndexError: list assignment index out of range
UNQT---------------------------------------------------------------------------
The coding I used and data source"200224_04_act.prn" are below.
QT---------------------------------------------------
G=nx.read_edgelist("200224_04_act.prn",nodetype=int)
colors=["red","blue","green"]
pos=nx.spring_layout(G)
init_nodes=np.array_split(G.nodes(),2)
init_partition=[set(init_nodes[0]),set(init_nodes[1])]
print(init_partition)
from networkx.algorithms.community import kernighan_lin_bisection
color_map_i=["black"]*nx.number_of_nodes(G)
print(color_map_i)
counter=0
for c in init_partition:
for n in c:
color_map_i[n]=colors[counter]
counter=counter+1
print(color_map_i)
nx.draw_networkx_edges(G,pos)
nx.draw_networkx_nodes(G,pos,node_color=color_map_i)
nx.draw_networkx_labels(G,pos)
plt.axis("off")
plt.show()
lst_b=kernighan_lin_bisection(G,partition=init_partition)
color_map_b=["black"]*nx.number_of_nodes(G)
counter=0
for c in lst_b:
for n in c:
color_map_b[n]=colors[counter]
counter=counter+1
nx.draw_networkx_edges(G,pos)
nx.draw_networkx_nodes(G,pos,node_color=color_map_b)
nx.draw_networkx_labels(G,pos)
plt.axis("off")
plt.show()
UNQT--------------------------------------------------------------
"200224_04_act.prn" below.(Number of nodes is around 2000 but I made it
small due to the limit of number of character)
1 415
2 415
3 415
3 1350
4 1351
5 1352
6 383
7 993
8 1353
9 887
10 887
11 887
12 887
13 887
14 1185
15 1185
16 1185
17 1185
18 1185
19 1146
20 1146
21 1146
22 1146
21 776
23 776
24 707
25 707
26 707
27 707
28 707
29 754
21 754
30 754
31 754
32 754
33 778
34 778
35 778
36 778
37 778
38 859
39 859
40 1354
41 563
42 563
43 563
44 563
45 563
46 1209
47 1209
48 1209
49 1209
50 1209
51 715
52 715
53 715
54 715
55 715
56 1048
57 1048
58 1047
59 1047
60 1047
61 1047
62 1047
63 718
64 718
65 718
66 718
67 718
68 947
17 947
69 947
70 889
71 744
72 744
73 744
74 744
75 744
76 1137
77 1137
78 1137
79 1137
80 612
81 612
82 612
83 612
17 612
84 790
85 790
86 790
87 790
88 790
89 922
90 922
91 922
92 922
93 922
21 738
94 738
95 738
96 738
97 738
98 1355
81 807
99 807
17 807
100 725
101 725
17 725
102 725
103 725
23 1046
104 661
105 661
106 661
107 661
108 661
109 907
110 907
111 907
112 907
113 907
114 840
115 840
116 840
117 840
17 840
118 759
23 759
119 759
23 761
120 761
121 761
122 761
123 1356
124 1265
125 1265
126 1265
127 1265
128 1265
129 894
29 894
130 894
131 894
132 667
133 667
124 758
134 758
135 758
122 758
136 758
137 471
138 471
You've got
for c in init_partition:
for n in c:
color_map_i[n]=colors[counter]
counter=counter+1
It looks to me like n will loop over all of the nodes of the graph. I do not see any entries in the graph that are 0. So probably the nodes are numbered 1 to N, while color_map_i is indexed from 0 to N-1. So it would break when n=N.
A good way to hunt for bugs like this in general would be to print n right before the line giving the error. This would give a hint to what the problem is.

Convert 8bit Color image to Gray For VGA

I have a 8 bit color image . What is the method to convert this into a Grayscale Image .
For a normal 24 bit true color RGB image, we either perform averaging ( R + G + B ) / 3
And then there's' the Weighted Averaging wherein we calculate 0.21 R + 0.72 G + 0.07 B.
However these above formula works for a 24 bit image (correct me if i'm wrong) . Where 8 bits are used to denote R, G, B content each. Thus when we apply the above averaging methods, we get a resultant 8 bit grayscale image from a 24 bit True color image.
So how to calculate grayscale image for an 8 bit color image :
Please note :
Structure of an 8 bit color image is as follows :
Refer this link
Bit 7 6 5 4 3 2 1 0
Data R R R G G G B B
As we can see,
Bits 7,6,5 denote Red content
Bits 4,3,2 denote Green content
Bits 1,0 denote Blue content
So the above image will actually have 4 shades in total
(because, in grayscale, a white pixel is obtained when there is 100 % contribution of each of the R,G,B components. And since Blue component has only 2 bits, effectively, there are 22 combinations i.e. 4 levels. )
Therefore, if i consider 2 bits of R ,G and B, i manage to obtain gray levels as follows :
R G B GrayLevel
00 00 00 Black
01 01 01 Gray 1
10 10 10 Gray 2
11 11 11 White
Which bits to consider from Red and Green components and which to ignore .!
How to quantify the graylevels for values of bits other than the ones mentioned above.
EDIT
I want to implement the above system upon an FPGA, hence memory is a keen aspect. Quality of the image doesn't matter much. Somehow is it possible to quantify all the values of the 8 bit color img into the respective gray shades ?
This approach gives output range of gray 0..255 (not all gray levels are used):
b = rgb8 & 3;
g = (rgb8 >> 2) & 7;
r = rgb8 >> 5;
gray255 = 8 * b + 11 * r + 22 * g;
If you have 256 bytes available, you can fill LUT (Look-Up Table) once, and use it instead of calculations:
grayimage[i] = LUT[rgb8image[i]];
If you really want to stick to 2 bits per gray pixel and you can afford simple multipliers, you can think of the formula
G = 5 x R + 9 x G + 4 B
where R and G are taken with 3 bits and B with just 2 (the coefficient has been adapted). This will yield a 7 bits value, in range [0,110], of which you will keep the most significant 2.
You may think to adapt the coefficients to occupy the four levels more evenly.
You essentially have a Rubik's cube of colours, which measures 8 x 8 x 4 if you can take a moment to imagine that. One side has 8 squares going from black to red, one side has 8 squares going from black to green and one side has 4 squares going from black to blue.
In essence, you can divide it up how you like since you don't care too much for quality. So, if you want 4 output grey levels, you can essentially make any two cuts you like and lump together everything inside each of the resulting shapes as a single grey level. Normally, you would aim to make the volumes of each lump the same - so you could cut the red side in half and the green side in half and ignore any differences in the blue channel as one option.
One way to do it might be to make equi-volumed lumps according to the distance from the origin, i.e. from black. I don't have an 8x8x4 cube available, but imagine the Earth was 8x8x4, then we would be making all pixels in the inner core black, those in the outer core dark grey, those in the mantle light grey and the crust white - such that the number of your original pixels in each lump was the same. It sounds complicated but isn't!
Let's run through all your possible Red, Green and Blue values and calculate the distance of each one from black, using
d=R^2 +G^2 +B^2
then sort the values by that distance and then number the lines:
#!/bin/bash
for r in 0 1 2 3 4 5 6 7; do
for g in 0 1 2 3 4 5 6 7; do
for b in 0 1 2 3; do
# Calculate distance from black corner (r=g=b=0) - actually squared but it doesn't matter
((d2=(r*r)+(g*g)+(b*b)))
echo $d2 $r $g $b
done
done
done | sort -n | nl
# sort numerically by distance from black, then number output lines sequentially
That gives this where the first column is the line number, the second column is the distance from black (and the values are sorted by this column), and then there follows R, G and B:
1 0 0 0 0 # From here onwards, pixels map to black
2 1 0 0 1
3 1 0 1 0
4 1 1 0 0
5 2 0 1 1
6 2 1 0 1
7 2 1 1 0
8 3 1 1 1
9 4 0 0 2
10 4 0 2 0
11 4 2 0 0
12 5 0 1 2
13 5 0 2 1
14 5 1 0 2
15 5 1 2 0
16 5 2 0 1
17 5 2 1 0
18 6 1 1 2
19 6 1 2 1
20 6 2 1 1
21 8 0 2 2
22 8 2 0 2
23 8 2 2 0
24 9 0 0 3
25 9 0 3 0
26 9 1 2 2
27 9 2 1 2
28 9 2 2 1
29 9 3 0 0
30 10 0 1 3
31 10 0 3 1
32 10 1 0 3
33 10 1 3 0
34 10 3 0 1
35 10 3 1 0
36 11 1 1 3
37 11 1 3 1
38 11 3 1 1
39 12 2 2 2
40 13 0 2 3
41 13 0 3 2
42 13 2 0 3
43 13 2 3 0
44 13 3 0 2
45 13 3 2 0
46 14 1 2 3
47 14 1 3 2
48 14 2 1 3
49 14 2 3 1
50 14 3 1 2
51 14 3 2 1
52 16 0 4 0
53 16 4 0 0
54 17 0 4 1
55 17 1 4 0
56 17 2 2 3
57 17 2 3 2
58 17 3 2 2
59 17 4 0 1
60 17 4 1 0
61 18 0 3 3
62 18 1 4 1
63 18 3 0 3
64 18 3 3 0 # From here onwards pixels map to dark grey
65 18 4 1 1
66 19 1 3 3
67 19 3 1 3
68 19 3 3 1
69 20 0 4 2
70 20 2 4 0
71 20 4 0 2
72 20 4 2 0
73 21 1 4 2
74 21 2 4 1
75 21 4 1 2
76 21 4 2 1
77 22 2 3 3
78 22 3 2 3
79 22 3 3 2
80 24 2 4 2
81 24 4 2 2
82 25 0 4 3
83 25 0 5 0
84 25 3 4 0
85 25 4 0 3
86 25 4 3 0
87 25 5 0 0
88 26 0 5 1
89 26 1 4 3
90 26 1 5 0
91 26 3 4 1
92 26 4 1 3
93 26 4 3 1
94 26 5 0 1
95 26 5 1 0
96 27 1 5 1
97 27 3 3 3
98 27 5 1 1
99 29 0 5 2
100 29 2 4 3
101 29 2 5 0
102 29 3 4 2
103 29 4 2 3
104 29 4 3 2
105 29 5 0 2
106 29 5 2 0
107 30 1 5 2
108 30 2 5 1
109 30 5 1 2
110 30 5 2 1
111 32 4 4 0
112 33 2 5 2
113 33 4 4 1
114 33 5 2 2
115 34 0 5 3
116 34 3 4 3
117 34 3 5 0
118 34 4 3 3
119 34 5 0 3
120 34 5 3 0
121 35 1 5 3
122 35 3 5 1
123 35 5 1 3
124 35 5 3 1
125 36 0 6 0
126 36 4 4 2
127 36 6 0 0
128 37 0 6 1
129 37 1 6 0 # From here onwards pixels map to light grey
130 37 6 0 1
131 37 6 1 0
132 38 1 6 1
133 38 2 5 3
134 38 3 5 2
135 38 5 2 3
136 38 5 3 2
137 38 6 1 1
138 40 0 6 2
139 40 2 6 0
140 40 6 0 2
141 40 6 2 0
142 41 1 6 2
143 41 2 6 1
144 41 4 4 3
145 41 4 5 0
146 41 5 4 0
147 41 6 1 2
148 41 6 2 1
149 42 4 5 1
150 42 5 4 1
151 43 3 5 3
152 43 5 3 3
153 44 2 6 2
154 44 6 2 2
155 45 0 6 3
156 45 3 6 0
157 45 4 5 2
158 45 5 4 2
159 45 6 0 3
160 45 6 3 0
161 46 1 6 3
162 46 3 6 1
163 46 6 1 3
164 46 6 3 1
165 49 0 7 0
166 49 2 6 3
167 49 3 6 2
168 49 6 2 3
169 49 6 3 2
170 49 7 0 0
171 50 0 7 1
172 50 1 7 0
173 50 4 5 3
174 50 5 4 3
175 50 5 5 0
176 50 7 0 1
177 50 7 1 0
178 51 1 7 1
179 51 5 5 1
180 51 7 1 1
181 52 4 6 0
182 52 6 4 0
183 53 0 7 2
184 53 2 7 0
185 53 4 6 1
186 53 6 4 1
187 53 7 0 2
188 53 7 2 0
189 54 1 7 2
190 54 2 7 1
191 54 3 6 3
192 54 5 5 2
193 54 6 3 3 # From here onwards pixels map to white
194 54 7 1 2
195 54 7 2 1
196 56 4 6 2
197 56 6 4 2
198 57 2 7 2
199 57 7 2 2
200 58 0 7 3
201 58 3 7 0
202 58 7 0 3
203 58 7 3 0
204 59 1 7 3
205 59 3 7 1
206 59 5 5 3
207 59 7 1 3
208 59 7 3 1
209 61 4 6 3
210 61 5 6 0
211 61 6 4 3
212 61 6 5 0
213 62 2 7 3
214 62 3 7 2
215 62 5 6 1
216 62 6 5 1
217 62 7 2 3
218 62 7 3 2
219 65 4 7 0
220 65 5 6 2
221 65 6 5 2
222 65 7 4 0
223 66 4 7 1
224 66 7 4 1
225 67 3 7 3
226 67 7 3 3
227 69 4 7 2
228 69 7 4 2
229 70 5 6 3
230 70 6 5 3
231 72 6 6 0
232 73 6 6 1
233 74 4 7 3
234 74 5 7 0
235 74 7 4 3
236 74 7 5 0
237 75 5 7 1
238 75 7 5 1
239 76 6 6 2
240 78 5 7 2
241 78 7 5 2
242 81 6 6 3
243 83 5 7 3
244 83 7 5 3
245 85 6 7 0
246 85 7 6 0
247 86 6 7 1
248 86 7 6 1
249 89 6 7 2
250 89 7 6 2
251 94 6 7 3
252 94 7 6 3
253 98 7 7 0
254 99 7 7 1
255 102 7 7 2
256 107 7 7 3
Obviously, the best way to do that is with a lookup table, which is exactly what this is.
Just for kicks, we can look at how it performs if we make some sample images with ImageMagick and process them with this lookup table:
# Make a sample
convert -size 100x100 xc: -sparse-color Bilinear '30,10 red 10,80 blue 70,60 lime 80,20 yellow' -resize 400x400! gradient.png
# Process with suggested LUT
convert gradient.png -fx "#lut.fx" result.png
lut.fx implements the LUT and looks like this:
dd=(49*r*r)+(49*g*g)+(16*b*b);
(dd < 19) ? 0.0 : ((dd < 38) ? 0.25 : ((dd < 54) ? 0.75 : 1.0))
By comparison, if you implement my initial suggestion at the start of my answer, by doing:
R < 0.5 && G < 0.5 => black result
R < 0.5 && G >= 0.5 => dark grey result
R >= 0.5 && G < 0.5 => light grey result
r >= 0.5 && G >= 0.5 => white result
You will get this output - which, as you can see, is better at differentiating red from green, but worse at reflecting the brightness of the original.

parsing using awk

how to parse a file based on data from another file using awk.
i made a script:
BEGIN{ FS="\t" ; OFS="\t"
while((getline<"headfpkm")>0) {
++a
id[a]=$1
fpkm[a]=$2
print id[a],fpkm[a]
}
lastid=id[a]
print lastid
close("headfpkm")
}
/$lastid/{
print $2,$3,$5,$7,$8,$14,fpkm[a]
a--
lastid=id[a]
}
END{ print "total lines=",FNR,"\n\nfile 1 index: ",a}
when i run it :
/$ awk -f testawk.awk file2
it runs the BEGIN section properly but doesnt give any output.
NM_000014 5.04503
NM_000015 0.586677
NM_000016 1.138332278
NM_000017 0.64386
NM_000018 3.61746
NM_000019 2.8793
NM_000020 10.846
NM_000021 0.685098
NM_000022 46388.6
NM_000026 0.257471
NM_000026
total lines= 10
file 1 index: 10
Is anything wrong with the searching section??
file 2 looks like this:
34 ACADM NM_000016 9606 hsa-miR-3148 3 80 87 0.003 -0.016 -0.094 0.082 0.112 -0.160 97
34 ACADM NM_000016 9606 hsa-miR-3163 1 623 629 0.001 -0.022 -0.020 0.065 0.125 -0.01 57
35 ACADS NM_000017 9606 hsa-miR-3921 3 68 75 0.013 0.192 -0.097 0.031 -0.039 -0.147 82
35 ACADS NM_000017 9606 hsa-miR-4303 2 67 73 0.012 0.150 -0.052 0.013 -0.039 -0.036 31
35 ACADS NM_000017 9606 hsa-miR-4653-5p 3 68 75 0.003 0.192 -0.097 0.031 -0.039 -0.157 84
37 ACADVL NM_000018 9606 hsa-miR-124 2 31 37 0.003 0.023 -0.057 0.012 -0.032 -0.171 76
37 ACADVL NM_000018 9606 hsa-miR-1827 2 135 141 -0.007 -0.043 -0.058 0.039 -0.069 -0.258 91
37 ACADVL NM_000018 9606 hsa-miR-2682 2 134 140 0.003 -0.014 -0.058 0.004 -0.047 -0.232 87
37 ACADVL NM_000018 9606 hsa-miR-449c 2 134 140 -0.035 -0.014 -0.058 0.004 -0.047 -0.270 92
37 ACADVL NM_000018 9606 hsa-miR-506 2 31 37 -0.016 0.023 -0.057 0.012 -0.032 -0.190 80
This is going to be a bit of guess, because I'm not 100% sure as to what you're trying to accomplish. The better way to solve your problem, would be to do something like this:
BEGIN {
FS=OFS="\t"
}
FNR==NR {
c++
a[$1]=$2
next
}
$3 in a {
print $2,$3,$5,$7,$8,$14,a[$3]
}
END {
printf "total lines=%s\n\nfile 1 index: %s\n", FNR, c
}
Run like:
awk -f script.awk headfpkm file2
Results:
ACADM NM_000016 hsa-miR-3148 80 87 -0.160 1.138332278
ACADM NM_000016 hsa-miR-3163 623 629 -0.01 1.138332278
ACADS NM_000017 hsa-miR-3921 68 75 -0.147 0.64386
ACADS NM_000017 hsa-miR-4303 67 73 -0.036 0.64386
ACADS NM_000017 hsa-miR-4653-5p 68 75 -0.157 0.64386
ACADVL NM_000018 hsa-miR-124 31 37 -0.171 3.61746
ACADVL NM_000018 hsa-miR-1827 135 141 -0.258 3.61746
ACADVL NM_000018 hsa-miR-2682 134 140 -0.232 3.61746
ACADVL NM_000018 hsa-miR-449c 134 140 -0.270 3.61746
ACADVL NM_000018 hsa-miR-506 31 37 -0.190 3.61746
total lines=10
file 1 index: 10

Resources