Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 10 days ago.
This post was edited and submitted for review 10 days ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I am facing a weird issue regarding my PostgreSQL 14.6 which is running on ubuntu 20.
Using Ruby on Rails framework also using pgbouncer Didn't understand why it's happening?
Our application must have at least 250 pools because sometimes we run more than 300 threads for several hours or one day. While this type of job is running, the database sometimes goes into recovery mode and terminates all connections. Then after 1 or 2 seconds, everything goes back to normal. But the background services have a problem. I don't know why sometimes this postgresql database goes into recovery mode.
System ram is 48GB I've increased max connections based on How to increase the max connections in postgres? So currently shared_buffers = 8192MB and kernel.shmmax=1342177280
Also tried to fix any PostgreSQL configured settings using postgresqltuner.
As I am running background jobs in ruby on rails which involves over 100 concurrent threads. The database goes into recovery mode and stops that background job.
Any suggestion to avoid the database recovery mode issue?
# pgbouncer.ini
[databases]
app_production = host=localhost dbname=app_production port=5432
[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 16432
auth_file = userlist.txt
logfile = pgbouncer.log
pidfile = pgbouncer.pid
admin_users = appuser
max_client_conn = 1000
default_pool_size = 1000
# database.yml
production:
adapter: postgresql
host: 127.0.0.1
database: app_production
username: appuser
password: defaultpassword
encoding: unicode
pool: 1000
port: 16432
# OS Type: linux
# DB Type: web
# Total Memory (RAM): 48 GB
# CPUs num: 12
# Connections num: 1000
# Data Storage: hdd
max_connections = 1000
shared_buffers = 12GB
effective_cache_size = 36GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
random_page_cost = 4
effective_io_concurrency = 2
work_mem = 3145kB
min_wal_size = 1GB
max_wal_size = 4GB
max_worker_processes = 12
max_parallel_workers_per_gather = 4
max_parallel_workers = 12
Having a dask.DataFrame which consumes around 100GB in memory::
ddf = client.persist(ddf)
len(ddf_c.index)
# 246652596 rows
## Running some other code like groupby/aggregate etc
Now I want to filter out the data by using .loc operator, but after running the following, the RAM consumption is 165GB:
ddf_c = ddf_c.loc[ddf_c.is_in_valid_set_of_combis == True]
ddf_c = client.persist(ddf_c) # Now we have 165GB RAM consumptioon
How can I check for open/pending/waiting futures/tasks/datasets which are preventing Dask from really overwriting the ddf_c dask.DataFrame?
This is what the info page looks liks:
('loc-series-b0f23c725a607fed56584d9e41e57de8', 77) 227.41 MB
[... around 50 entries ...]
You can track dependencies in the "info" pages of the dashboard (click the info tab at the top)
In your situation I would probly not persist until the latter step.
FYI, This question is posted on PG-General mailing list too,
We have a problem since we migrate from 8.4 to 9.1.
When we play:
ANALYSE VERBOSE;
( stat on all databases, with 500 tables and 1to DATA in all tables)
We now have this message :
org.postgresql.util.PSQLException: ERROR: out of shared memory Indice : You might need to increase max_locks_per_transaction.
When we was in 8.4, there was no error, there is our specific postgresql.conf configuration on the server:
default_statistics_target = 200
maintenance_work_mem = 1GB
constraint_exclusion = on
checkpoint_completion_target = 0.9
effective_cache_size = 7GB
work_mem = 48MB
wal_buffers = 32MB
checkpoint_segments = 64
shared_buffers = 2304MB
max_connections = 150
random_page_cost = 2.0
max_locks_per_transaction = 128 **
max_lock_per_transaction was at is default value before (64?), we already tried to increase it according to error hint.
We already tried to increase linux shared memory too. Have you any suggestions?
I'd try lowering maintenance_work_mem (try 256MB) and setting max_locks_per_transaction to much higher value e.g. 1024.
We copied a 150 mb csv file into flume's spool directory, when it is getting loaded into hdfs, the file was splitting into smaller size files like 80 kb's. is there a way to load the file without getting split into smaller files using flume? because more metadata will be generated inside namenode about the smaller files, so we need to avoid it.
My flume-ng code looks like this
# Initialize agent's source, channel and sink
agent.sources = TwitterExampleDir
agent.channels = memoryChannel
agent.sinks = flumeHDFS
# Setting the source to spool directory where the file exists
agent.sources.TwitterExampleDir.type = spooldir
agent.sources.TwitterExampleDir.spoolDir = /usr/local/flume/live
# Setting the channel to memory
agent.channels.memoryChannel.type = memory
# Max number of events stored in the memory channel
agent.channels.memoryChannel.capacity = 10000
# agent.channels.memoryChannel.batchSize = 15000
agent.channels.memoryChannel.transactioncapacity = 1000000
# Setting the sink to HDFS
agent.sinks.flumeHDFS.type = hdfs
agent.sinks.flumeHDFS.hdfs.path = hdfs://info3s7:54310/spool5
agent.sinks.flumeHDFS.hdfs.fileType = DataStream
# Write format can be text or writable
agent.sinks.flumeHDFS.hdfs.writeFormat = Text
# use a single csv file at a time
agent.sinks.flumeHDFS.hdfs.maxOpenFiles = 1
# rollover file based on maximum size of 10 MB
agent.sinks.flumeHDFS.hdfs.rollCount=0
agent.sinks.flumeHDFS.hdfs.rollInterval=2000
agent.sinks.flumeHDFS.hdfs.rollSize = 0
agent.sinks.flumeHDFS.hdfs.batchSize =1000000
# never rollover based on the number of events
agent.sinks.flumeHDFS.hdfs.rollCount = 0
# rollover file based on max time of 1 min
#agent.sinks.flumeHDFS.hdfs.rollInterval = 0
# agent.sinks.flumeHDFS.hdfs.idleTimeout = 600
# Connect source and sink with channel
agent.sources.TwitterExampleDir.channels = memoryChannel
agent.sinks.flumeHDFS.channel = memoryChannel
What you want is this:
# rollover file based on maximum size of 10 MB
agent.sinks.flumeHDFS.hdfs.rollCount = 0
agent.sinks.flumeHDFS.hdfs.rollInterval = 0
agent.sinks.flumeHDFS.hdfs.rollSize = 10000000
agent.sinks.flumeHDFS.hdfs.batchSize = 10000
From the flume documentation
hdfs.rollSize: File size to trigger roll, in bytes (0: never roll based on file size)
In your example you use rollInterval of 2000 which will roll over the file after 2000 seconds, resulting in small files.
Also note that batchSize reflects the number of events before the file is flushed to HDFS, not necessarily the number of events before the file is closed and a new one created. You'll want to set that to some value small enough to not time out writing a large file but large enough to avoid overhead of many requests to HDFS.
I have 30000 files to process each file has 80000 x 5 lines. I need to read all files and process them finding the average of each line. I have written the code to read and extract all data from the file. My code is in Fortran. There is an array of (30000 X 800000) My program could not go over (3300 X 80000). I need to add the 4th column of each file in 300 file steps, I mean 4th column of 1st file with 4th column of 301st file, 4th col of 2nd file with 4th col of 302nd file and so on .Do you think this is because of the limitation of the size of array that Fortran can handle? If so, is there any way to increase the size of the array that Fortran can handle? What about the no of files? My code looks like this:
This program runs well.
implicit double precision (a-h,o-z),integer(i-n)
dimension x(78805,5),y(78805,5),den(78805,5)
dimension b(3300,78805),bb(78805)
character*70,fn
nf = 3300 ! NUMBER OF FILES
nj = 78804 ! Number of rows in file.
ns = 300 ! No. of steps for files.
ncores = 11 ! No of Cores
c--------------------------------------------------------------------
c--------------------------------------------------------------------
!Initialization
do i = 0,nf
do j = 1, nj
x(j,1) = 0.0
y(j,2) = 0.0
den(j,4) = 0.0
c a(i,j) = 0.0
b(i,j) = 0.0
c aa(j) = 0.0
bb(j) = 0.0
end do
end do
c-------!Body program-----------------------------------------------
iout = 6 ! Output Files upto "ns" no.
DO i= 1,nf ! LOOP FOR THE NUMBER OF FILES
write(fn,10)i
open(1,file=fn)
do j=1,nj ! Loop for the no of rows in the domain
read(1,*)x(j,1),y(j,2),den(j,4)
if(i.le.ns) then
c a(i,j) = prob(j,3)
b(i,j) = den(j,4)
else
c a(i,j) = prob(j,3) + a(i-ns,j)
b(i,j) = den(j,4) + b(i-ns,j)
end if
end do
close(1)
c ----------------------------------------------------------
c -----Write Out put [Probability and density matrix]-------
c ----------------------------------------------------------
if(i.ge.(nf-ns)) then
do j = 1, nj
c aa(j) = a(i,j)/(ncores*1.0)
bb(j) = b(i,j)/(ncores*1.0)
write(iout,*) int(x(j,1)),int(y(j,2)),bb(j)
end do
close(iout)
iout = iout + 1
end if
END DO
10 format(i0,'.txt')
END
It's hard to say for sure because you haven't given all the details yet, but your problem is quite possibly that you are using a 32 bit compiler producing 32 bit executables and you are simply running out of address space.
Although your operating system supports 64 bit address space, your 32 bit process is still limited to 32 bit addresses.
You have found a limit at 3300*78805*8 which is just under 2GB and this supports my theory.
No matter what is the cause of your immediate problem, your fundamental problem is that you appear to be loading everything into memory at once. I've not closely studied your algorithm but on first inspection it seems likely that you could re-arrange it to avoid having everything in memory at once.