I want to combine two files that are very different without any row matching:
File 1 (1000+ rows):
M03558 203 5 23464 CTTGTA
M03559 205 3 1096 CTTGTQ
M03560 209 12 1956 CTTGTW
M035561 304 5 2347 CTTGTK
...
File 2 (a table of 3 rows):
A 12 34 78 0.3
B 13 35 79 0.3
C 14 36 80 0.5
Desired outcome:
M03558 203 5 23464 CTTGTA A 12 34 78 0.3
M03559 205 3 1096 CTTGTQ B 13 35 79 0.3
M03560 209 12 1956 CTTGTW C 14 36 80 0.5
M03561 304 5 2347 CTTGTK
...
Is there any way to achieve that in bash, perl, python or R, please?
In linux you can use the paste command:
paste -d " " file1 file2 > outfile
If, instead of a space seperating the two merged records, you wanted a tab character then:
paste -d "\t" file1 file2 > outfile
Related
This question was migrated from Stack Overflow because it can be answered on Super User.
Migrated 19 days ago.
I use grep -E '^[ 0-9]{6}$' to grab strings of 5 digits (numbers or space) in files
It returns:
71 051
17 293
017299
862610
But is it possible to extract only the 2 first occurrences?
If possible like this in this example "71051-17293"?
Two options to grep only two lines max:
$ grep -Em2 '^[ 0-9]{6}$'
71 051
17 293
$ grep -E '^[ 0-9]{6}$' | head -n2
71 051
17 293
I have a process Id in windows Machine, I need to write a Power-shell script to check whether this process is running as docker container or not.
Being a newbie ,I am not able to find anything straight forward how to check it.
I have tried this by expanding the suggestion to use docker inspect.
Here's the whole config:
PS C:\Users\Microsoft> docker inspect -f '{{.State.Pid}}' 8b2f6493d26e
4492
The command above returned the ID on which the container is instantiated.
PS C:\Users\Microsoft> Get-Process -Id 4492 | select si
SI
--
6
Now, I can use the above to query the SI of the specific ID returned previously. You see that the SI for that Process ID is 6, so all processes on this container will be running on that SI. Now I can run:
PS C:\Users\Microsoft> Get-Process | Where-Object {$_.si -eq 6}
Handles NPM(K) PM(K) WS(K) CPU(s) Id SI ProcessName
------- ------ ----- ----- ------ -- -- -----------
83 6 976 4776 0.00 8380 6 CExecSvc
251 13 2040 6308 0.16 7308 6 csrss
38 6 792 3176 0.00 3772 6 fontdrvhost
793 20 3900 13688 0.44 8912 6 lsass
232 13 2624 10384 0.11 7348 6 msdtc
75 6 928 4872 0.02 4492 6 ServiceMonitor
213 10 2372 7008 0.27 8308 6 services
137 8 1496 6952 0.05 864 6 svchost
172 12 2656 9292 0.06 2352 6 svchost
110 7 1188 6084 0.03 2572 6 svchost
241 14 4616 12508 0.19 5460 6 svchost
817 30 12388 30824 9.73 6056 6 svchost
172 12 3984 11528 0.14 6420 6 svchost
405 16 7284 14284 0.25 6524 6 svchost
494 22 13480 29568 1.45 7060 6 svchost
509 38 5636 19432 0.30 7936 6 svchost
334 13 2776 10912 0.13 8604 6 svchost
122 8 3048 9180 0.19 8816 6 svchost
383 14 2392 8624 0.22 9080 6 svchost
232 19 5060 14284 0.13 9744 6 w3wp
155 11 1380 7276 0.05 5008 6 wininit
The above is the output of all processes running on my container host that match the SI 6. You can even see the w3wp process which is the IIS process running inside the container.
One note here is that this is only possible with Process isolation on Windows containers. Hyper-V containers won't have their processes shown on the host.
I have two sets of data files in .sav (EMR.sav and APP.sav)
What I want to do, it merge the two data of EMR and APP, to do "comparison of steps by sex".
The data of EMR is as follows:
pid sex
306 1
866 1
896 1
921 2
The data of APP would be something like this(the A_id would equal to pid in EMR):
A_id A_calorie A_distance
866 124 14
866 24 24
866 13 35
866 12 23
866 23 0
921 101 23
921 12 13
921 19 24
921 200 235
921 232 241
The result I want to get is the two data files to merge and have:
pid sex A_calorie A_distance
866 1 124 14
866 1 24 24
866 1 13 35
866 1 12 23
866 1 23 0
921 2 101 23
921 2 12 13
921 2 19 24
921 2 200 235
921 2 232 241
But, what I keep getting is
pid sex A_calorie A_distance
866 1 124 14
866 . 24 24
866 . 13 35
866 . 12 23
866 . 23 0
921 2 101 23
921 . 12 13
921 . 19 24
921 . 200 235
921 . 232 241
How can I get all the same pid have the same sex value??
By the way, if it was R, one would use something like merge(EMR, APP, key=pid)
You can sort the files and use match files to get what you need:
get file=" ...... EMR ...... ".
sort cases by pid.
dataset name EMR.
get file=" ...... APP ...... ".
dataset name APP.
sort cases by A_id.
match files /file=* /rename A_id=pid /table=EMR /by pid.
exe.
I am trying to export summary statistics that are saved as variables in the main dataset from Stata to LaTeX using the community-contributed command esttab. Here is the code:
sysuse auto, clear
collapse (sum) price mpg, by(make)
estpost tabstat price mpg, by(make)
esttab
The estpost tabstat command generates exactly the table I want to create in LaTeX but esttab only generates an empty table.
I also posted this question on Statalist.
The following works for me:
sysuse auto, clear
collapse (sum) price mpg, by(make)
estpost tabstat price mpg, by(make)
matrix A = e(price)', e(mpg)'
esttab matrix(A), title("Summary statistics: mean") nomtitle
Summary statistics: mean
--------------------------------------
price mpg
--------------------------------------
1 4099 22
2 4749 17
3 3799 22
4 9690 17
5 6295 23
6 9735 25
7 4816 20
8 7827 15
9 5788 18
10 4453 26
11 5189 20
12 10372 16
13 4082 19
14 11385 14
15 14500 14
16 15906 21
17 3299 29
18 5705 16
19 4504 22
20 5104 22
21 3667 24
22 3955 19
23 6229 23
24 4589 35
25 5079 24
26 8129 21
27 3984 30
28 4010 18
29 5886 16
30 6342 17
31 4296 21
32 4389 28
33 4187 21
34 5799 25
35 4499 28
36 11497 12
37 13594 12
38 13466 14
39 3995 30
40 3829 22
41 5379 14
42 6165 15
43 4516 18
44 6303 14
45 3291 20
46 8814 21
47 5172 19
48 4733 19
49 4890 18
50 4181 19
51 4195 24
52 10371 16
53 12990 14
54 4647 28
55 4425 34
56 4482 25
57 6486 26
58 4060 18
59 5798 18
60 4934 18
61 5222 19
62 4723 19
63 4424 19
64 4172 24
65 3895 26
66 3798 35
67 5899 18
68 3748 31
69 5719 18
70 7140 23
71 5397 41
72 4697 25
73 6850 25
74 11995 17
Total 6165.257 21.2973
--------------------------------------
i follow the tutorial from matt on:
http://jhipster.github.io/video-tutorial/
when i do cloc . i see i have much and much more files i would expect:
$ cloc .
66717 text files.
20401 unique files.
24466 files ignored.
http://cloc.sourceforge.net v 1.60 T=128.46 s (115.7 files/s, 15523.0 lines/s)
--------------------------------------------------------------------------------
Language files blank comment code
--------------------------------------------------------------------------------
Javascript 13322 222956 357190 1266221
HTML 676 6984 1047 44885
CSS 76 1883 932 22029
Java 262 3548 1854 15641
XML 53 3383 1395 11307
LESS 79 1388 1546 7269
C/C++ Header 18 1032 300 5109
YAML 190 221 346 3466
CoffeeScript 47 783 699 2467
make 58 417 523 1271
Bourne Shell 31 234 202 1097
Maven 1 12 34 824
Perl 2 87 170 584
DTD 1 179 177 514
SASS 5 42 25 273
C++ 4 43 26 260
IDL 6 38 0 167
Bourne Again Shell 3 28 36 140
D 6 0 0 118
Scala 1 16 7 118
JavaServer Faces 3 3 0 109
Smarty 6 17 30 91
DOS Batch 1 24 2 64
Python 1 7 7 36
XSLT 1 5 0 32
C# 2 3 1 27
ASP.Net 2 5 0 23
C 1 7 4 23
OCaml 1 5 15 6
Lisp 1 0 0 6
PowerShell 1 2 2 4
Lua 1 0 0 2
--------------------------------------------------------------------------------
SUM: 14862 243352 366570 1384183
--------------------------------------------------------------------------------
why is that?
in total it is 610 mb large!
it seems there are a lot of node modules:
$ du -h -d1
584M ./node_modules
24K ./gulp
26M ./src
64K ./.mvn
610M .
is this correct?
and what do i need to add to source control?
thanks
This is normal. Most of those files are NPM dependencies, as you mentioned.
The generated .gitignore should already be configured properly and will ignore node_modules.