I have clasters postgres 9.6.15 and 9.6.20 (streaming replications)
In to 9.6.15 for master node
pg_last_xact_replay_timestamp();
pg_last_xlog_receive_location();
pg_last_xlog_replay_location();
is empty In to 9.6.20 for master node
pg_last_xlog_receive_location - is empty
pg_last_xact_replay_timestamp - have time start replications
pg_last_xlog_replay_location - have value, but != SELECT
sent_location FROM pg_stat_replication s;
Cluster replications is work.
SELECT client_addr,pg_xlog_location_diff(s.sent_location,s.replay_location) byte_lag FROM pg_stat_replication s;
have byte_lag = 0
Why
pg_last_xact_replay_timestamp();
pg_last_xlog_receive_location();
pg_last_xlog_replay_location();
is empty from working replications?
How to monitor Posrgres cluster replications?
http://eulerto.blogspot.com/2011/11/understanding-wal-nomenclature.html
pg_last_xlog_receive_location() and pg_last_xlog_replay_location() need monitor in the replica.
Related
We are using Prometheus and Grafana for our monitoring and we have a panel for response time however I noticed after while the metrics are missing and there are a lots of gap in the panel (only for response time panel) and they comeback as soon as I restart the app (redeploying it in openshift). the service has been written in Go and the logic for the gathering response time is quite simple.
we declared the metric
var (
responseTime = promauto.NewSummaryVec(prometheus.SummaryOpts{
Namespace: "app",
Subsystem: "rest",
Name: "response_time",
}, []string{
"path",
"code",
"method",
})
)
and fill it in our handler
func handler(.......) {
start := time.Now()
// do stuff
....
code := "200"
path := r.URL.Path
method := r.Method
elapsed := float64(time.Since(start)) / float64(time.Second)
responseTime.WithLabelValues(path, code, method).Observe(elapsed)
}
and query in the Grafana panel is like:
sum(rate(app_rest_response_time_sum{path='/v4/content'}[5m]) /
rate(app_rest_response_time_count{path='/v4/content'}[5m])) by (path)
but the result is like this!!
can anyone explain what do we do wrong or how to fix this issue? is it possible that we facing some kind of overflow issue (the average RPS is about 250)? I'm suspecting this because this happen more often to the routes with higher RPS and response time!
Prometheus records the metrics continuously normally and if you query it, it returns all the metrics it collected for the time you queried.
If there is no metric when you query, that has typically three reasons:
the metric was not there (it happens when the instance restarts and you have a dynamic set of labels and there was no request yet for the label value you queried (in your case there was no query for path='/v4/content'). In such case you should see other metrics of the same job (at least up).
Prometheus had problems storing the metrics. (see the log files of prometheus for that timeframe).
Prometheus was down for that timeframe and therefore did not collect any metrics. (In that case you should have no metrics at all for that timeframe.
I have master detail
CachedUpdates for the master true
CachedUpdates for the detail true
DetailCascde for the detail true
The master deals with one record:
select * from orders where order_id=:order
First I pass -1 as dummy parameter to get an empty master record:
orders.Close;
orders.Params[0].AsInteger := -1;
orders.Open;
Than I fill the order id with -1 to build the relationship between the master and the detail:
orders.Append;
orders.Fields[0].AsInteger := -1;
orders.Post;
I insert into the detail successfully with Append and Post
The problem is in the Firebird database I have this line on before insert trigger for the master
new.order_id = coalesce((select max(order_id) from orders) + 1, 1);
I ApplyUpdates
Orders.ApplyUpdates(-1);
dOrder.ApplyUpdates(-1);
So when I ApplyUpdates for the master, the detail won't apply because the master id is altered by the server.
How to solve such scenario?
I have an application deployed on PCF and have a new relic service binded to it. In new relic I want to get an alert when my application is stopped . I don't know whether it is possible or not. If it is possible can someone tell me how?
Edit: I don't have access to New Relic Infrastructure
Although an 'app not reporting' alert condition is not built into New Relic Alerts, it's possible to rig one using NRQL alerts. Here are the steps:
Go to New Relic Alerts and begin creating a NRQL alert condition:
NRQL alert conditions
Query your app with:
SELECT count(*) FROM Transaction WHERE appName = 'foo'
Set your threshold to :
Static
sum of query results is below x
at least once in y minutes
The query runs once per minute. If the app stops reporting then count will turn the null values into 0 and then we sum them. When the number goes below whatever your threshold is then you get a notification. I recommend using the preview graph to determine how low you want your transactions to get before receiving a notification. Here's some good information:
Relic Solution: NRQL alerting with “sum of the query results”
Basically you need to create a NewRelic Alert with conditions that check if application available, Specially you can use Host not reporting alert condition
The Host not reporting event triggers when data from the Infrastructure agent does not reach the New Relic collector within the time frame you specify.
You could do something like this:
// ...
aggregation_method = "cadence" // Use cadence for process monitoring otherwise it might not alert
// ...
nrql {
// Limitation: only works for processes with ONE instance; otherwise use just uniqueCount() and set a LoS (loss of signal)
query = "SELECT filter(uniqueCount(hostname), WHERE processDisplayName LIKE 'cdpmgr') OR -1 FROM ProcessSample WHERE GENERIC_CONDITIONS FACET hostname, entityGuid as 'entity.guid'"
}
critical {
operator = "below"
threshold = 0
threshold_duration = 5*60
threshold_occurrences = "ALL"
}
Previous solution - turned out it is not that robust:
// ...
critical {
operator = "below"
threshold = 0.0001
threshold_duration = 600
threshold_occurrences = "ALL"
}
nrql {
query = "SELECT percentage(uniqueCount(entityAndPid), WHERE commandLine LIKE 'yourExecutable.exe') FROM ProcessSample FACET hostname"
}
This will calculate the fraction your process has against all other processes.
If the process is not running the percentage will turn to 0. If you have a system running a vast amount of processes it could fall below 0.0001 but this is very unprobable.
The advantage here is that you can still have an active alert even if the process slips out of your current time alert window after it stopped. Like this you prevent the alert from auto-recovering (compared to just filtering with WHERE).
I want to use the Database scalar NOW() value to get the date time for a table component at post time
tb.Append;
tb.DateTime.asTDateTime := ???; //<--Database.Now() value
tb.Post;
The connection is a remote connection and the server is in the same LAN as the client machine
using Free Tables - not a data dictionary
You can use the TADSConnection.GetServerTime method, even using free tables. (Presuming, of course, that you've got a TADSConnection in use for your tables.)
tb.Insert;
tb.DateTimeField.AsDateTime := myConnection.GetServerTime;
tb.Post;
I'm supporting a Grails web application which shows different visuals for client using AmCharts. On one of the tabs there are three charts which each return the top ten, so only ten rows, from the database based on different measures. It takes 4-5 or sometimes even more time to finish. The query runs on the DB in under 10 seconds.
The following service method is called to return results:
List fetchTopPages(params, Map querySettings, String orderClause) {
if(!((params['country'] && params['country'].size() > 0) || (params['brand'] && params['brand'].size() > 0) || (params['url'] && params['url'].size() > 0))) {
throw new RuntimeException('Filters country or brand or url not selected.')
}
Sql sql = new Sql(dataSource)
sql.withStatement { stmt -> stmt.fetchSize = 100 }
Map filterParams = acquisitionService.getDateFilters(params, querySettings)
ParamUtils.addWhereArgs(params, filterParams)
String query = "This is where the query is"
ParamUtils.saveQueryInRequest(ParamUtils.prettyPrintQuery(query, filterParams))
log.debug("engagement pageviews-by-source query: " + ParamUtils.prettyPrintQuery(query, filterParams))
List rows = sql.rows(query, filterParams)
rows
}
After some investigation it was clear that the List rows = sql.rows(query, filterParams) line is the one that takes up this load time.
Has anyone expreienced this issue before? Why is sql.rows() taking so long when it's only returning 10 rows worth of results, and the query is runnig super fast on the DB side?
Additional info:
DB: FSL1D
Running following command on DB side: java -jar ojdbc5.jar - getversion returns:
"Oracle 11.2.0.3.0 JDBC 3.0 compiled with JDK5 on Thu_Jul_11_15:41:55_PDT_2013
Default Connection Properties Resource
Wed Dec 16 08:18:32 EST 2015"
Groovy Version: 2.3.7
Grails Version: 2.4.41
JDK: 1.7.0
My set up with Groovy Version: 2.3.6 JVM: 1.8.0_11 and Oracle 12.1.0.2.0 using driver ojdbc7.jar
Note the activation of the 10046 trace before the run to allow diagnostics.
import oracle.jdbc.pool.OracleDataSource
def ods = new OracleDataSource();
ods.setURL('url')
ods.setUser('usr')
ods.setPassword('pwd')
def con = ods.getConnection()
def sql = new groovy.sql.Sql(con)
sql.withStatement { stmt -> stmt.fetchSize = 100 }
def SQL_QUERY = """select id, col1 from table1 order by id"""
def offset = 150
def maxRows = 20
// activate trace 10046
con.createStatement().execute "alter session set events '10046 trace name context forever, level 12'"
def t = System.currentTimeMillis()
def rows = sql.rows(SQL_QUERY, offset, maxRows)
println "time1 : ${System.currentTimeMillis()-t} with offset ${offset} and maxRows ${maxRows}"
The examination of the trace shows that the stament is parsed and executed, this means if there is ORDER BY clause, all data is sorted.
The fetch size is used correctly and no more than required records are fetched - here 170 = 150 + 20.
With fetch size 100 this is done in two steps (note the r parameter - number of fetched rows).
FETCH #627590664:c=0,e=155,p=0,cr=5,cu=0,mis=0,r=100,dep=0,og=1,plh=1169613780,tim=3898349818398
FETCH #627590664:c=0,e=46,p=0,cr=0,cu=0,mis=0,r=70,dep=0,og=1,plh=1169613780,tim=3898349851458
So basically the only problem I see that the "skipped" data are passed over the network to the client (to be ignored there).
This could produce with very high offset lot of overhead (and take more time that the same query running interactively producing the first page).
But the best way to identify your problem is simple the enable the 10046 trace and see what going on. I'm using the
level 12 which means you get also information
about the waits in the DB and bind variables.