Parsing system log file with pig

Parsing system log file with pig - parsing

I have the log below trying to parse it by the indicated column number 1 as Date, 2 as Time, 3 as Task, 4 as Error_Line, and 5 all the rest columns as Error_Message
|1 | |2 | |3 | |4 | |5 |
09-15-16 05:23:45 B:VVBN 09064 Port 22 Device 10400 Remote 44 13331 Link Up RP2016
09-15-16 05:23:44 A:QAWE 09064 Port 22 Device 10400 Remote 44 13331 Link Up RP2016
09-15-16 05:23:44 B:VVBN 13425 Port 22 Device 10400 Remote 44 13331 Receive Time Error: 24666 23270 1396 69
09-15-16 05:23:43 B:QAWE 13372 Port 22 Device 10400 Remote 44 13331 Send Time Error: 444 1888 1444 69
09-15-16 05:23:43 A:VVBN 13425 Port 22 Device 10400 Remote 44 13331 Receive Time Error: 24666 23270 1396 69
09-15-16 05:23:43 A:CCBE 13372 Port 22 Device 10400 Remote 44 13331 Send Time Error: 444 1888 1444 69
09-15-16 05:21:56 B:VVBN 07270 Port 22 Device 10400 Remote 44 13331 AT Timer Expired
09-15-16 05:21:56 A:CCBE 07270 Port 22 Device 10400 Remote 44 13331 AT Timer Expired
here is my script
logs = LOAD '/data/test_log.txt' USING PigStorge(' ') AS (date: chararray, time: chararray, task: chararray, line_error: int, error_message: chararray);
date = GROUP logs BY date;
counts = FOREACH date GENERATE COUNT($4) as count;
DUMP counts;
notice there is one space between columns only there is five spaces between 3 and 4 columns.
I tried the script above but it just work good for date not for last column Error_message.
I am trying to get this output bag:
(09-15-16,05:23:45,B:VVBN,09064,Port 22 Device 10400 Remote 44 13331 Link Up RP2016)
(09-15-16,05:23:44,A:QAWE,09064,Port 22 Device 10400 Remote 44 13331 Link Up RP2016)
:
:
I just need to consider the first four columns any other columns in the log file mix them in one column 5.
Any suggestion to get the desired output.

You need to use MyRegExLoader provided by piggybank to process custom log files.
logs = LOAD '/data/test_log.txt' USING org.apache.pig.piggybank.storage.MyRegExLoader ('provide the regex ');

Related

Powershell command to identify whether a process running in windows machine is a docker container process?

I have a process Id in windows Machine, I need to write a Power-shell script to check whether this process is running as docker container or not.
Being a newbie ,I am not able to find anything straight forward how to check it.

I have tried this by expanding the suggestion to use docker inspect.
Here's the whole config:
PS C:\Users\Microsoft> docker inspect -f '{{.State.Pid}}' 8b2f6493d26e
4492
The command above returned the ID on which the container is instantiated.
PS C:\Users\Microsoft> Get-Process -Id 4492 | select si
SI
--
6
Now, I can use the above to query the SI of the specific ID returned previously. You see that the SI for that Process ID is 6, so all processes on this container will be running on that SI. Now I can run:
PS C:\Users\Microsoft> Get-Process | Where-Object {$_.si -eq 6}
Handles NPM(K) PM(K) WS(K) CPU(s) Id SI ProcessName
------- ------ ----- ----- ------ -- -- -----------
83 6 976 4776 0.00 8380 6 CExecSvc
251 13 2040 6308 0.16 7308 6 csrss
38 6 792 3176 0.00 3772 6 fontdrvhost
793 20 3900 13688 0.44 8912 6 lsass
232 13 2624 10384 0.11 7348 6 msdtc
75 6 928 4872 0.02 4492 6 ServiceMonitor
213 10 2372 7008 0.27 8308 6 services
137 8 1496 6952 0.05 864 6 svchost
172 12 2656 9292 0.06 2352 6 svchost
110 7 1188 6084 0.03 2572 6 svchost
241 14 4616 12508 0.19 5460 6 svchost
817 30 12388 30824 9.73 6056 6 svchost
172 12 3984 11528 0.14 6420 6 svchost
405 16 7284 14284 0.25 6524 6 svchost
494 22 13480 29568 1.45 7060 6 svchost
509 38 5636 19432 0.30 7936 6 svchost
334 13 2776 10912 0.13 8604 6 svchost
122 8 3048 9180 0.19 8816 6 svchost
383 14 2392 8624 0.22 9080 6 svchost
232 19 5060 14284 0.13 9744 6 w3wp
155 11 1380 7276 0.05 5008 6 wininit
The above is the output of all processes running on my container host that match the SI 6. You can even see the w3wp process which is the IIS process running inside the container.
One note here is that this is only possible with Process isolation on Windows containers. Hyper-V containers won't have their processes shown on the host.

ISSUE IN CONNECTING py2neo v4 to my neo4j server

I want to connect my neo4j's project server to py2neo in jupyter
I actually have 2 problems:
Given below is a picture of my neo4j browser connected with bolt//:localhost:11004, username: neo4j, password: password
But i am not able to connect to this server through py2neo on jupyter notebook.
The code in python is the following:
graphdb = Graph("bolt://localhost:11004", secure=True, auth=('neo4j', 'password'))
I am getting the following error:
KeyError Traceback (most recent call last)
~/conda3/lib/python3.6/site-packages/py2neo/database.py in __new__(cls, uri, **settings)
87 try:
---> 88 inst = cls._instances[key]
89 except KeyError:
KeyError: '0611fb007d1a660e26e66e58777225de'
During handling of the above exception, another exception occurred:
ServiceUnavailable Traceback (most recent call last)
<ipython-input-41-2d6567e9c5ba> in <module>()
3 # default uri for local Neo4j instance
4 dict_params=dict(secure=True)
----> 5 graphdb = Graph(**dict_params)
~/conda3/lib/python3.6/site-packages/py2neo/database.py in __new__(cls, uri, **settings)
303 def __new__(cls, uri=None, **settings):
304 name = settings.pop("name", "data")
--> 305 database = Database(uri, **settings)
306 if name in database:
307 inst = database[name]
~/conda3/lib/python3.6/site-packages/py2neo/database.py in __new__(cls, uri, **settings)
95 auth=connection_data["auth"],
96 encrypted=connection_data["secure"],
---> 97 user_agent=connection_data["user_agent"])
98 inst._graphs = {}
99 cls._instances[key] = inst
~/conda3/lib/python3.6/site-packages/neo4j/v1/api.py in __new__(cls, uri, **config)
131 for subclass in Driver.__subclasses__():
132 if parsed.scheme == subclass.uri_scheme:
--> 133 return subclass(uri, **config)
134 raise ValueError("URI scheme %r not supported" % parsed.scheme)
135
~/conda3/lib/python3.6/site-packages/neo4j/v1/direct.py in __new__(cls, uri, **config)
71
72 pool = DirectConnectionPool(connector, instance.address, **config)
---> 73 pool.release(pool.acquire())
74 instance._pool = pool
75 instance._max_retry_time = config.get("max_retry_time", default_config["max_retry_time"])
~/conda3/lib/python3.6/site-packages/neo4j/v1/direct.py in acquire(self, access_mode)
42
43 def acquire(self, access_mode=None):
---> 44 return self.acquire_direct(self.address)
45
46
~/conda3/lib/python3.6/site-packages/neo4j/bolt/connection.py in acquire_direct(self, address)
448 if can_create_new_connection:
449 try:
--> 450 connection = self.connector(address, self.connection_error_handler)
451 except ServiceUnavailable:
452 self.remove(address)
~/conda3/lib/python3.6/site-packages/neo4j/v1/direct.py in connector(address, error_handler)
68
69 def connector(address, error_handler):
---> 70 return connect(address, security_plan.ssl_context, error_handler, **config)
71
72 pool = DirectConnectionPool(connector, instance.address, **config)
~/conda3/lib/python3.6/site-packages/neo4j/bolt/connection.py in connect(address, ssl_context, error_handler, **config)
702 raise ServiceUnavailable("Failed to resolve addresses for %s" % address)
703 else:
--> 704 raise last_error
~/conda3/lib/python3.6/site-packages/neo4j/bolt/connection.py in connect(address, ssl_context, error_handler, **config)
692 log_debug("~~ [RESOLVED] %s -> %s", address, resolved_address)
693 try:
--> 694 s = _connect(resolved_address, **config)
695 s, der_encoded_server_certificate = _secure(s, address[0], ssl_context, **config)
696 connection = _handshake(s, resolved_address, der_encoded_server_certificate, error_handler, **config)
~/conda3/lib/python3.6/site-packages/neo4j/bolt/connection.py in _connect(resolved_address, **config)
582 _force_close(s)
583 if error.errno in (61, 99, 111, 10061):
--> 584 raise ServiceUnavailable("Failed to establish connection to {!r} (reason {})".format(resolved_address, error.errno))
585 else:
586 raise
ServiceUnavailable: Failed to establish connection to ('127.0.0.1', 7687) (reason 111)
What i want to know is
1) The connection between neo4j and py2neo is made how exactly in py2neo v4
2) Do i always have to make a local connection or can i connect to the neo4j server
3) If i can connect to my neo4j server is it such that whatever py2neo queries i run on my jupyter notebook shall synchronise with the neo4j database too?

From the last line of the error, it looks like it's trying to connect on default bolt port (i.e. 7687).
I would suggest you use this format instead of full URI.
graphdb = Graph(scheme="bolt", host="localhost", port=11004,
secure=True, auth=('neo4j', 'password'))

Informix - Locked DB due to lock created by cancelled session?

SI attempted to run a script to generate a table in my Informix database, but the script was missing a newline at EOF, so I think Informix had problems to read it and hence the script got blocked doing nothing. I had to kill the script and add the new line to the file so now the script works fine, except it does not create the table due to a lockecreated when I killed the script abruptly.
I am new to this, so sorry for the dumb question. IBM page does not have a clear and simple explanation of how to clean this now.
So, my question is: How do I unlock the locks so I can continue working in my script?
admin_proyecto#li1106-217 # onstat -k
IBM Informix Dynamic Server Version 12.10.FC9DE -- On-Line (CKPT REQ) -- Up 9 ds
Blocked:CKPT
Locks
address wtlist owner lklist type tbz
44199028 0 44ca6830 0 HDR+S
44199138 0 44cac0a0 0 HDR+S
441991c0 0 44cac0a0 4419b6f0 HDR+IX
44199358 0 44ca44d0 0 S
441993e0 0 44ca44d0 44199358 HDR+S
4419ac50 0 44cac0a0 441991c0 HDR+X
4419aef8 0 44ca44d0 441993e0 HDR+IX
4419b2b0 0 44ca79e0 0 S
4419b3c0 0 44ca82b8 0 S
4419b6f0 0 44cac0a0 44199138 HDR+X
4419b998 0 44ca8b90 0 S
4419bdd8 0 44ca44d0 4419aef8 HDR+X
12 active, 20000 total, 16384 hash buckets, 0 lock table overflows

On my "toy" systems i usually point LTAPEDEV to a directory:
LTAPEDEV /usr/informix/dumps/motor_003/backups
Then, when Informix blocks due to having all of it's logical logs full, i manually do an ontape -a to backup to files the used logical logs and free them to be reused.
For example, here I have an Informix instance blocked due to no more logical logs available:
$ onstat -l
IBM Informix Dynamic Server Version 12.10.FC8DE -- On-Line (CKPT REQ) -- Up 00:18:58 -- 213588 Kbytes
Blocked:CKPT
Physical Logging
Buffer bufused bufsize numpages numwrits pages/io
P-1 0 64 1043 21 49.67
phybegin physize phypos phyused %used
2:53 51147 28085 240 0.47
Logical Logging
Buffer bufused bufsize numrecs numpages numwrits recs/pages pages/io
L-1 13 64 191473 12472 6933 15.4 1.8
Subsystem numrecs Log Space used
OLDRSAM 191470 15247376
HA 3 132
Buffer Waiting
Buffer ioproc flags
L-1 0 0x21 0
address number flags uniqid begin size used %used
44d75f88 1 U------ 47 3:15053 5000 5 0.10
44b6df68 2 U---C-L 48 3:20053 5000 4986 99.72
44c28f38 3 U------ 41 3:25053 5000 5000 100.00
44c28fa0 4 U------ 42 3:53 5000 2843 56.86
44d59850 5 U------ 43 3:5053 5000 5 0.10
44d598b8 6 U------ 44 3:10053 5000 5 0.10
44d59920 7 U------ 45 3:30053 5000 5 0.10
44d59988 8 U------ 46 3:35053 5000 5 0.10
8 active, 8 total
On the online log I have:
$ onstat -m
04/23/18 18:20:42 Logical Log Files are Full -- Backup is Needed
So I manually issue the command:
$ ontape -a
Performing automatic backup of logical logs.
File created: /usr/informix/dumps/motor_003/backups/informix003.ifx.marqueslocal_3_Log0000000041
File created: /usr/informix/dumps/motor_003/backups/informix003.ifx.marqueslocal_3_Log0000000042
File created: /usr/informix/dumps/motor_003/backups/informix003.ifx.marqueslocal_3_Log0000000043
File created: /usr/informix/dumps/motor_003/backups/informix003.ifx.marqueslocal_3_Log0000000044
File created: /usr/informix/dumps/motor_003/backups/informix003.ifx.marqueslocal_3_Log0000000045
File created: /usr/informix/dumps/motor_003/backups/informix003.ifx.marqueslocal_3_Log0000000046
File created: /usr/informix/dumps/motor_003/backups/informix003.ifx.marqueslocal_3_Log0000000047
File created: /usr/informix/dumps/motor_003/backups/informix003.ifx.marqueslocal_3_Log0000000048
Do you want to back up the current logical log? (y/n) n
Program over.
If I check again the status of the logical logs:
$ onstat -l
IBM Informix Dynamic Server Version 12.10.FC8DE -- On-Line -- Up 00:23:42 -- 213588 Kbytes
Physical Logging
Buffer bufused bufsize numpages numwrits pages/io
P-2 33 64 1090 24 45.42
phybegin physize phypos phyused %used
2:53 51147 28091 36 0.07
Logical Logging
Buffer bufused bufsize numrecs numpages numwrits recs/pages pages/io
L-1 0 64 291335 15878 7023 18.3 2.3
Subsystem numrecs Log Space used
OLDRSAM 291331 22046456
HA 4 176
address number flags uniqid begin size used %used
44d75f88 1 U-B---- 47 3:15053 5000 5 0.10
44b6df68 2 U-B---- 48 3:20053 5000 5000 100.00
44c28f38 3 U---C-L 49 3:25053 5000 3392 67.84
44c28fa0 4 U-B---- 42 3:53 5000 2843 56.86
44d59850 5 U-B---- 43 3:5053 5000 5 0.10
44d598b8 6 U-B---- 44 3:10053 5000 5 0.10
44d59920 7 U-B---- 45 3:30053 5000 5 0.10
44d59988 8 U-B---- 46 3:35053 5000 5 0.10
8 active, 8 total
The logical logs are now marked as "Backed Up" and can be reused and the Informix instance is no longer blocked on Blocked:CKPT .

Rails server command bug

I started a new ruby-on-rails project called "myrubyblog", changed directories to my project then launched rails server command, but the terminal then outputs me this after an incredible amount of lines of information I don't understand;
-- Other runtime information -----------------------------------------------
* Loaded script: bin/rails
* Loaded features:
0 enumerator.so
1 thread.rb
2 rational.so
3 complex.so
4 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/armv7l-linux-eabihf/enc/encdb.so
5 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/armv7l-linux-eabihf/enc/trans/transdb.so
6 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/armv7l-linux-eabihf/rbconfig.rb
7 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/rubygems/compatibility.rb
8 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/rubygems/defaults.rb
9 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/rubygems/deprecate.rb
10 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/rubygems/errors.rb
11 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/rubygems/version.rb
12 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/rubygems/requirement.rb
13 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/rubygems/platform.rb
14 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/rubygems/basic_specification.rb
15 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/rubygems/stub_specification.rb
16 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/rubygems/util/list.rb
17 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/armv7l-linux-eabihf/stringio.so
18 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/uri/rfc2396_parser.rb
19 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb
20 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/uri/common.rb
21 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/uri/generic.rb
22 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/uri/ftp.rb
23 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/uri/http.rb
24 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/uri/https.rb
25 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/uri/ldap.rb
26 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/uri/ldaps.rb
27 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/uri/mailto.rb
28 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/uri.rb
29 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/rubygems/specification.rb
30 /home/pi/.rvm/rubies/ruby-2.5.1/lib/ruby/2.5.0/rubygems/exceptions.rb
... (up to 320 lines)
[NOTE]
You may have encountered a bug in the Ruby interpreter or extension libraries.
Bug reports are welcome.
For details: http://www.ruby-lang.org/bugreport.html
Abandon
What is that supposed to mean?

Vagrant slowing down http requests to rails server

I have a project (Rails 4.0.2) that I'm currently running inside of Vagrant (1.3.5) running VirtualBox (4.3.4). The Guest OS is Debian 6.0. When I run the application on the Host OS, or I start up the Virtualbox manually, I see a dramatic improvement in responsiveness. As soon as I use 'vagrant up', performance seems to become really poor. Here are the relevant Apache Bench results:
Apache Bench Command
ab -n 10 -c 1 http://127.0.0.1:3000/application.js
Host OS
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 27 44 28.3 33 118
Waiting: 25 41 28.6 31 116
Total: 27 44 28.3 33 118
Virtualbox
min mean[+/-sd] median max
Connect: 0 0 0.4 0 1
Processing: 57 71 19.1 67 119
Waiting: 46 59 19.3 57 110
Total: 57 71 19.1 68 119
Vagrant
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 849 916 76.2 901 1115
Waiting: 831 892 72.6 883 1081
Total: 849 916 76.2 901 1115
I would expect a slowdown running the application in Virtualbox, but not an order of magnitude. I'm also not doing anything fancy with my Vagrantfile:
Vagrantfile
VAGRANTFILE_API_VERSION = "2"
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
config.vm.box = "squeeze"
config.vm.network :forwarded_port, guest: 3000, host: 3000
end
I've tried the fixes specified in this github issue and this HackerNews comment but to no avail.

Make sure you don't place your project in synced folder (by default it uses vboxsf which has known performance issues with large numbers of files/directories).
This may also be related to "Webrick Reverse DNS Lookup", take a look at https://stackoverflow.com/a/19284483/1801697
Hope it helps.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Parsing system log file with pig - parsing

You need to use MyRegExLoader provided by piggybank to process custom log files. logs = LOAD '/data/test_log.txt' USING org.apache.pig.piggybank.storage.MyRegExLoader ('provide the regex ');

Related

Powershell command to identify whether a process running in windows machine is a docker container process?

ISSUE IN CONNECTING py2neo v4 to my neo4j server

Informix - Locked DB due to lock created by cancelled session?

Rails server command bug

Vagrant slowing down http requests to rails server

Categories

Resources