how to change the output file format in pSQL from default to csv? - psql

I am using Linux and has connected to a pSQL DB server. After using the command \o to export file, the output file is separated by "|" horizontally and "_" (and '+') vertically. Please see below:
abc | cde | fgh | xyz
----+-----+-----+-----
123 | 321 | 123 | 123
123 | 321 | 222 | 111
923 | 238 | 928 | 192
ect.
This format might be a default but not very useful for data analysis.
Can I change the output file format into ".csv" by some additional optional command in pSQL?
Thanks,

You can export CSV in psql. There's a dedicated command to do so. I've written a longer article on this.
The gist is this:
\copy (SELECT ...) TO 'locale_file.csv' WITH (FORMAT csv, HEADER)
This will copy the data as CSV to your local drive (i.e. where psql is running from).

Related

Alternative of grep in lua script

I have the following text file output.txt that I created (it has 15 colums including symbol | ):
[66] | alert:n | 3.0 | 10/22/2020-14:45:50.066928 | local_ip | 123.123.123.123 | United States of America | SURICATA STREAM ESTABLISHED SYNACK resend with different ACK
[67] | alert:n | 3.0 | 10/22/2020-14:45:51.096955 | local_ip | 12.12.12.11 | United States of America | SURICATA STREAM ESTABLISHED SYNACK resend with different ACK
[68] | alert:n | 3.0 | 10/22/2020-14:45:53.144942 | 123.123.123.123 | local_ip | United States of America | SURICATA STREAM ESTABLISHED SYNACK resend with different ACK
[69] | alert:n | 3.0 | 10/22/2020-14:45:57.176956 | local_ip | 68.73.203.109 | United States of America | SURICATA STREAM ESTABLISHED SYNACK resend with different ACK
[70] | alert:n | 3.0 | 10/22/2020-14:46:05.240953 | 123.123.123.123 | local_ip | United States of America | SURICATA STREAM ESTABLISHED SYNACK resend with different ACK
[71] | alert:n | 3.0 | 10/22/2020-14:46:21.624979 | local_ip | 68.73.203.109 | United States of America | SURICATA STREAM ESTABLISHED SYNACK resend with different ACK
I'm familiar with the bash script, let say if I want to count total specific ip of 123.123.123.123 that can be found in the 9th column, I can implement like this:
#!/bin/bash
ip = "123.123.123.123"
report = output.txt
src_ip_count=$(grep "${ip}" "${report}" | awk '{ print $9 }' | grep -v "local_ip" | uniq -c | awk '{ print $1 }')
and the output is:
[root#me lua-output]# ./test.sh
2
How do I implement the same code above in lua ? I know there is popen function can be used.. but is there a native way to do this in lua ? Also if I use popen, I also need to pass variable $ip and $report inside that command which I'm not sure if it's possible.
There's a bunch of ways to go about this, really. Assuming you read your data from stdin (though the same works for any file you manually open), you can do something like this:
local c = 0
for line in io.lines() do -- or or file:lines() if you have a different file
if line:find("123.123.123.123") -- Only lines containing the IP we care about
if (true) -- Whatever other conditions you want to apply
c = c + 1
end
end
end
print(c)
Lua doesn't have a concept of what a "column" is, so you have to build that yourself as well. Either use a pattern to count spaces, or split the string into a table and index it.
You mentioned that if it is possible to use variable inside popen in lua. It is possible, and you can use grep command in lua.
So in lua you can do this:
-- lua script using grep example
ip = "123.123.123.123"
report = output.txt
local cmd = "grep -F " .. ip .. " " .. report .. " | awk '{ print $9 }' | grep -v 'local_ip' | uniq -c | awk '{ print $1 }'"
local handle = io.popen(cmd)
local src_ip_count = handle:read("*a")
print(src_ip_count)
handle:close()
output:
2

grep invert match on two files

I have two text files containing one column each, for example -
File_A File_B
1 1
2 2
3 8
If I do grep -f File_A File_B > File_C, I get File_C containing 1 and 2. I want to know how to use grep -v on two files so that I can get the non-matching values, 3 and 8 in the above example.
Thanks.
You can also use comm if it allows empty output delimiter
$ # -3 means suppress lines common to both input files
$ # by default, tab character appears before lines from second file
$ comm -3 f1 f2
3
8
$ # change it to empty string
$ comm -3 --output-delimiter='' f1 f2
3
8
Note: comm requires sorted input, so use comm -3 --output-delimiter='' <(sort f1) <(sort f2) if they are not already sorted
You can also pass common lines got from grep as input to grep -v. Tested with GNU grep, some version might not support all these options
$ grep -Fxf f1 f2 | grep -hxvFf- f1 f2
3
8
-F option to match strings literally, not as regex
-x option to match whole lines only
-h to suppress file name prefix
f- to accept stdin instead of file input
awk 'NR==FNR{a[$0]=$0;next} !($0 in a) {print a[(FNR)], $0}' f1 f2
3 8
To Understand the meaning of NR and FNR check below output of their print.
awk '{print NR,FNR}' f1 f2
1 1
2 2
3 3
4 4
5 1
6 2
7 3
8 4
Condition NR==FNR is used to extract the data from first file as both NR and FNR would be same for first file only.
With GNU diff command (to compare files line by line):
diff --suppress-common-lines -y f1 f2 | column -t
The output (left column contain lines from f1, right column - from f2):
3 | 8
-y, --side-by-side - output in two columns

Cypher load CSV eager and long action duration

im loading a file with 85K lines - 19M,
server has 2 cores, 14GB RAM, running centos 7.1 and oracle JDK 8
and it can take 5-10 minutes with the following server config:
dbms.pagecache.memory=8g
cypher_parser_version=2.0
wrapper.java.initmemory=4096
wrapper.java.maxmemory=4096
disk mounted in /etc/fstab:
UUID=fc21456b-afab-4ff0-9ead-fdb31c14151a /mnt/neodata
ext4 defaults,noatime,barrier=0 1 2
added this to /etc/security/limits.conf:
* soft memlock unlimited
* hard memlock unlimited
* soft nofile 40000
* hard nofile 40000
added this to /etc/pam.d/su
session required pam_limits.so
added this to /etc/sysctl.conf:
vm.dirty_background_ratio = 50
vm.dirty_ratio = 80
disabled journal by running:
sudo e2fsck /dev/sdc1
sudo tune2fs /dev/sdc1
sudo tune2fs -o journal_data_writeback /dev/sdc1
sudo tune2fs -O ^has_journal /dev/sdc1
sudo e2fsck -f /dev/sdc1
sudo dumpe2fs /dev/sdc1
besides that,
when running a profiler, i get lots of "Eagers", and i really cant understand why:
PROFILE LOAD CSV WITH HEADERS FROM 'file:///home/csv10.csv' AS line
FIELDTERMINATOR '|'
WITH line limit 0
MERGE (session :Session { wz_session:line.wz_session })
MERGE (page :Page { page_key:line.domain+line.page })
ON CREATE SET page.name=line.page, page.domain=line.domain,
page.protocol=line.protocol,page.file=line.file
Compiler CYPHER 2.3
Planner RULE
Runtime INTERPRETED
+---------------+------+---------+---------------------+--------------------------------------------------------+
| Operator | Rows | DB Hits | Identifiers | Other |
+---------------+------+---------+---------------------+--------------------------------------------------------+
| +EmptyResult | 0 | 0 | | |
| | +------+---------+---------------------+--------------------------------------------------------+
| +UpdateGraph | 9 | 9 | line, page, session | MergeNode; Add(line.domain,line.page); :Page(page_key) |
| | +------+---------+---------------------+--------------------------------------------------------+
| +Eager | 9 | 0 | line, session | |
| | +------+---------+---------------------+--------------------------------------------------------+
| +UpdateGraph | 9 | 9 | line, session | MergeNode; line.wz_session; :Session(wz_session) |
| | +------+---------+---------------------+--------------------------------------------------------+
| +ColumnFilter | 9 | 0 | line | keep columns line |
| | +------+---------+---------------------+--------------------------------------------------------+
| +Filter | 9 | 0 | anon[181], line | anon[181] |
| | +------+---------+---------------------+--------------------------------------------------------+
| +Extract | 9 | 0 | anon[181], line | anon[181] |
| | +------+---------+---------------------+--------------------------------------------------------+
| +LoadCSV | 9 | 0 | line | |
+---------------+------+---------+---------------------+--------------------------------------------------------+
all the labels and properties have indices / constrains
thanks for the help
Lior
He Lior,
we tried to explain the Eager Loading here:
And Marks original blog post is here: http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/
Rik tried to explain it in easier terms:
http://blog.bruggen.com/2015/07/loading-belgian-corporate-registry-into_20.html
Trying to understand the "Eager Operation"
I had read about this before, but did not really understand it until Andres explained it to me again: in all normal operations, Cypher loads data lazily. See for example this page in the manual - it basically just loads as little as possible into memory when doing an operation. This laziness is usually a really good thing. But it can get you into a lot of trouble as well - as Michael explained it to me:
"Cypher tries to honor the contract that the different operations
within a statement are not affecting each other. Otherwise you might
up with non-deterministic behavior or endless loops. Imagine a
statement like this:
MATCH (n:Foo) WHERE n.value > 100 CREATE (m:Foo {m.value = n.value + 100});
If the two statements would not be
isolated, then each node the CREATE generates would cause the MATCH to
match again etc. an endless loop. That's why in such cases, Cypher
eagerly runs all MATCH statements to exhaustion so that all the
intermediate results are accumulated and kept (in memory).
Usually
with most operations that's not an issue as we mostly match only a few
hundred thousand elements max.
With data imports using LOAD CSV,
however, this operation will pull in ALL the rows of the CSV (which
might be millions), execute all operations eagerly (which might be
millions of creates/merges/matches) and also keeps the intermediate
results in memory to feed the next operations in line.
This also
disables PERIODIC COMMIT effectively because when we get to the end of
the statement execution all create operations will already have
happened and the gigantic tx-state has accumulated."
So that's what's going on my load csv queries. MATCH/MERGE/CREATE caused an eager pipe to be added to the execution plan, and it effectively disables the batching of my operations "using periodic commit". Apparently quite a few users run into this issue even with seemingly simple LOAD CSV statements. Very often you can avoid it, but sometimes you can't."

grep specific pattern from a log file

I am passing all my svn commit log messages to a file and want to grep only the JIRA issue numbers from that.
Some lines might have more than 1 issue number, but I want to grab only the first occurrence.
The pattern is XXXX-999 (number of alpha and numeric char is not constant)
Also, I don't want the entire line to be displayed, just the JIRA number, without duplicates. I use the following command but it didn't work.
Could someone help please?
cat /tmp/jira.txt | grep '^[A-Z]+[-]+[0-9]'
Log file sample
------------------------------------------------------------------------
r62086 | userx | 2015-05-12 11:12:52 -0600 (Tue, 12 May 2015) | 1 line
Changed paths:
M /projects/trunk/gradle.properties
ABC-1000 This is a sample commit message
------------------------------------------------------------------------
r62084 | usery | 2015-05-12 11:12:12 -0600 (Tue, 12 May 2015) | 1 line
Changed paths:
M /projects/training/package.jar
EFG-1001 Test commit
Output expected:
ABC-1000
EFG-1001
First of all, it seems like you have the second + in the wrong place, it should be at the end of [0-9] expression.
Second, I think all you need to do this is use the -o option to grep (to display only the matching portion of the line), then pipe the grep output through sort -u, like this:
cat /tmp/jira.txt | grep -oE '^[A-Z]+-[0-9]+' | sort -u
Although if it were me, I'd skip the cat step and just give the filename to grep, as so:
grep -oE '^[A-Z]+-[0-9]+' /tmp/jira.txt | sort -u
Six of one, half a dozen of the other, really.

Removal of Role PostgreSQL Failed - cache lookup failed for database

This is my first time using PostgreSQL for production.
I made a database blog_production with username blog_production and generated password from gemfile capistrano-postgresql. Once it is generated, I tried to delete database blog_production with this command from terminal:
$ sudo -u postgres dropdb blog_production
After that I tried to delete user blog_production with this command:
$ sudo -u postgres droprole blog_production
And it returned dropuser: removal of role "blog_production" failed: ERROR: cache lookup failed for database 16417
1.) Why is this happening?
2.) I also tried to delete from psql using DELETE FROM pg_roles WHERE rolname='blog_production' but it returned the same error (cache lookup failed)
3.) How do I solve this problem?
Thank you.
Additional Information
PostgreSQL Version
PostgreSQL 9.1.15 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
(1 row)
select * from pg_shdepend where dbid = 16417;
dbid | classid | objid | objsubid | refclassid | refobjid | deptype
-------+---------+-------+----------+------------+----------+---------
16417 | 1259 | 16419 | 0 | 1260 | 16418 | o
16417 | 1259 | 16426 | 0 | 1260 | 16418 | o
16417 | 1259 | 16428 | 0 | 1260 | 16418 | o
(3 rows)
select * from pg_database where oid = 16417;
datname | datdba | encoding | datcollate | datctype | datistemplate | datallowconn | datconnlimit | datlastsysoid | datfrozenxid | dattablespace | datacl
---------+--------+----------+------------+----------+---------------+--------------+--------------+---------------+--------------+---------------+--------
(0 rows)
select * from pg_authid where rolname = 'blog_production'
rolname | rolsuper | rolinherit | rolcreaterole | rolcreatedb | rolcatupdate | rolcanlogin | rolreplication | rolconnlimit | rolpassword | rolvaliduntil
-----------------+----------+------------+---------------+-------------+--------------+-------------+----------------+--------------+-------------------------------------+---------------
blog_production | f | t | f | f | f | t | f | -1 | md5d4d2f8789ab11ba2bd019bab8be627e6 |
(1 row)
Somehow the DROP database; didn't drop the shared dependencies correctly. PostgreSQL still thinks that your user owns three tables in the database you dropped.
Needless to say this should not happen; it's almost certainly a bug, though I don't know how we'd even begin tracking it down unless you know exactly what commands you ran etc to get to this point, right from creating the DB.
If the PostgreSQL install's data isn't very big and if you can share the contents, can I get you to stop the database server and make a tarball of the whole database directory, then send it to me? I'd like to see if I can tell what happened to get you to this point.
Send a dropbox link to craig#2ndquadrant.com . Just:
sudo service postgresql stop
sudo tar cpjf ~abrahamks/abrahamks-postgres.tar.gz \
/var/lib/postgresql/9.1/main \
/etc/postgresql/9.1/main \
/var/log/postgresql/postgresql-9.1-main-*.
/usr/lib/postgresql/9.1
sudo chown abrahamks ~abrahamks/abrahamks-postgres.tar.gz
and upload abrahamks-postgres.tar.gz from your home folder.
Replace abrahamks with your username on your system. You might need to adjust the paths above if I'm misremembering where the PostgreSQL data lives on Debian-derived systems.
Note that this contains all your databases not just the one that was an issue, and it also contains your PostgreSQL user accounts.
(If you're going to send me a copy, do so before continuing):
Anyway, since the database is dropped, it is safe to manually remove the dependencies that should've been removed by DROP DATABASE:
DELETE FROM pg_shdepend WHERE dbid = 16417
It should then be possible to DROP USER blog_production;

Resources