False high values in Grafana causing false alerts - influxdb

I configured alerting in Grafana yesterday and get from two servers alerts. It's always the same two servers which got high IO, high CPU or anything else.
The thing is, they do not have such high data. In fact they're almost on idle. All servers are configured exactly the same via Ansible. So the Telegraf config is the same on all servers.
Also if I filter the stats in Grafana to the corresponding server the data displayed in the graph is correct as you can see in the screenshot below. Still the Rule-Test results in a false positive.
I checked vmstat which also displays correct informations:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 47100 151152 20948 454556 2 2 16 38 2 1 2 1 96 0 1
0 0 47100 151136 20948 454592 0 0 0 0 125 135 0 1 96 0 2
0 0 47100 150408 20956 454584 0 0 0 84 222 282 1 3 93 0 4
0 0 47100 150424 20956 454592 0 0 0 0 151 225 0 0 97 0 2
0 0 47100 150424 20956 454592 0 0 0 0 115 140 0 0 96 0 4
0 0 47100 150424 20956 454592 0 0 0 0 109 125 0 0 97 0 2
0 0 47100 150424 20956 454592 0 0 0 0 121 131 0 0 98 0 2
0 0 47100 150412 20972 454576 0 0 0 92 139 208 0 1 96 0 3
0 0 47100 150456 20972 454592 0 0 0 0 65 117 0 0 99 0 1
0 0 47100 150876 20972 454592 0 0 0 16 692 705 2 4 88 0 5
And the telegraf.log if something's wrong.
2017-07-07T09:22:04Z I! Starting Telegraf (version 1.3.3)
2017-07-07T09:22:04Z I! Loaded outputs: influxdb
2017-07-07T09:22:04Z I! Loaded inputs: inputs.diskio inputs.processes inputs.swap inputs.system inputs.redis inputs.disk inputs.kernel inputs.mem inputs.net inputs.nginx inputs.postgresql inputs.cpu
2017-07-07T09:22:04Z I! Tags enabled: environment=production host=om-1-prod rails_env=production role=telegraf
2017-07-07T09:22:04Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"om-1-prod", Flush Interval:10s
Any ideas whats wrong here?

I kept monitoring the servers manually and found these high peaks for a short period of time.
So the issue here is that these peaks aren't visible in the selected range of time within Grafana. It gets aggregated to a smaler average which then looks like there only have been 40 ips. If I zoom into the corresponding time range I see these peaks.
Long story short: There's no issue witch Grafana, Telegraf of InfluxDB. The problem existed between keyboard and chair.

Related

Why is my yolo model has one extra output?

I have trained yolo model to detect 24 different classes and now when I'm trying to extract outputs of it, it returns 29 numbers for each prediction. Here they are:
0.605734 0.0720678 0.0147335 0.0434446 0.999661 0 0 0 0.999577 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I suppose that last 24 numbers are scores for each class, first 4 are parameters of bbox, but what is 5th? It is always bigger than 0.9. I'm confused. Please, help me.
It's the probability that the specific box has an object

when using the default 'randomForest' algorithm for classification, why doesn't the number of terminal nodes match the number of cases?

According to https://cran.r-project.org/web/packages/randomForest/randomForest.pdf, classification trees are fully grown, meaning node size = 1.
However, if trees are really grown to a maximum, then shouldn't each terminal node contain a single case (data point, species, etc)?
If I run:
library(randomForest)
data(iris) #150 cases
set.seed(352)
rf <- randomForest(Species ~ ., iris)
hist(treesize(rf),main ="number of nodes")
I can see that most "fully grown" trees only have about 10 nodes, meaning node size can't be equal to 1...Right?
for example, (-1) below represents a terminal node for the 134th tree in the forest. Only 8 terminal nodes!?
> getTree(rf,134)
left daughter right daughter split var split point status prediction
1 2 3 3 2.50 1 0
2 0 0 0 0.00 -1 1
3 4 5 4 1.75 1 0
4 6 7 3 4.95 1 0
5 8 9 3 4.85 1 0
6 10 11 4 1.60 1 0
7 12 13 1 6.50 1 0
8 14 15 1 5.95 1 0
9 0 0 0 0.00 -1 3
10 0 0 0 0.00 -1 2
11 0 0 0 0.00 -1 3
12 0 0 0 0.00 -1 3
13 0 0 0 0.00 -1 2
14 0 0 0 0.00 -1 2
15 0 0 0 0.00 -1 3
I would be greatful if someone can explain
"Fully grown" -> "Nothing left to split". A (node of a-) decision tree is fully grown, if all data records assigned to it hold/make the same prediction.
In the iris dataset case, once you reach a node with 50 setosa data records in it, it doesn't make sense to split it into two child nodes with 25 and 25 setosas each.

Where are class names stored in a machine learning dataset in Python?

I'm learning machine learning using the iris dataset on Python 3.6 with sklearn, and I don't understand where the class names that are being retrieved are stored. In Iris, there are 3 classes, and each class contains 50 observations. You can use several commands to print the classes, and their associated numerical values:
print(iris.target)
print(iris.target_names)
This will result in the output:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]
['setosa' 'versicolor' 'virginica']
So as can be seen, the classes are Setosa, Versicolor, and Virginica. What I don't understand is where these class names are being stored, or how they're called upon within the model. If you use the shape command on the data, or target, the result is (150,4) and (150,) meaning there is 150 observations and 4 rows in the data, and 150 rows in the target. I am just not able to bridge the gap with my mind as to where this is coming from, however.
What I don't understand is where the class names are supposed to be stored. If I made a brand new dataset for pokemon types and had ice, fire, water, flying, where could I store these types? Would they be required to be numerical as well, like iris, with 0,1,2,3?
Sklearn uses a custom type of object to store its datasets, exactly so that they can store metadata along with the raw data.
If you load the iris dataset
In [2]: from sklearn import datasets
In [3]: iris = datasets.load_iris()
You can inspect the type of object with type:
In [4]: type(iris)
Out[4]: sklearn.utils.Bunch
You can look at the attributes inside the object with dir:
In [5]: dir(iris)
Out[5]: ['DESCR', 'data', 'feature_names', 'target', 'target_names']
And then use . notation to take a look at the attributes themselves:
In [6]: type(iris.data)
Out[6]: numpy.ndarray
In [7]: type(iris.target)
Out[7]: numpy.ndarray
In [8]: type(iris.feature_names)
Out[8]: list
If you want to mimic this for your own datasets, you will have to define your own custom object type to mimic this structure. That would involve defining your own class.

ERROR while implementing Cox PH model for recurrent event survival analysis using counting process

I have been trying to run Cox PH model on a sample data set of 10k customers (randomly taken from 32 million customer base) for predicting probability of survival in time t (which is month in my case). I am using recurrent event survival analysis using counting process for e-commerce. For this...
1. Observation starting point: right after a customer makes first purchase
2. Start/Stop times: Months of two consecutive purchases (as in the data)
I have a few independent variables as in the sample data below:
id start stop status tenure orders revenue Quantity
A 0 20 0 0 1 $89.0 1
B 0 17 0 0 1 $556.0 2
C 0 17 0 0 1 $900.0 2
D 32 33 0 1679 9 $357.8 9
D 26 32 1 1497 7 $326.8 7
D 23 26 1 1405 4 $142.9 4
D 17 23 1 1219 3 $63.9 3
D 9 17 1 978 2 $50.0 2
D 0 9 1 694 1 $35.0 1
E 0 15 0 28 2 $156.0 2
F 0 15 0 0 1 $348.0 1
F 12 14 0 375 2 $216.8 3
F 0 12 1 0 1 $67.8 2
G 9 15 0 277 2 $419.0 2
G 0 9 1 0 1 $359.0 1
While running cox PH using the following code:
fit10=coxph(Surv(start,stop,status)~orders+tenure+Quantity+revenue,data=test)
I keep getting the following error:
Warning: X matrix deemed to be singular; variable 1 2 3 4
I tried searching for the same error online but the answers I found said this could be because of interacting independent variables, whereas my variables are individual and continuous.

Which of the IDS 11.70 onconfig parameters can be changed to maximize performance for a DSS app?

Informix 11.70.TC5DE,
Windows Vista with Dual Core Processor, 8GB RAM, 1TB HDD:
During the installation of this server, I specified it was going to be used for a data warehousing application. These are the onconfig parameters the install script generated.
Can any of these parameters be changed to maximize the performance of the server?
#(onconfig.ol_informix1170) - for data warehousing app.
ROOTNAME rootdbs
ROOTPATH C:\PROGRA~1\IBM\Informix\11.70\OL_INF~2\dbspaces\rootdbs.000
ROOTOFFSET 0
ROOTSIZE 312992
MIRROR 0
MIRRORPATH
MIRROROFFSET 0
PHYSFILE 49152
PLOG_OVERFLOW_PATH
PHYSBUFF 512
LOGFILES 6
LOGSIZE 10000
DYNAMIC_LOGS 2
LOGBUFF 256
LTXHWM 70
LTXEHWM 80
MSGPATH C:\PROGRA~1\IBM\Informix\11.70\ol_informix1170_1.log
CONSOLE C:\PROGRA~1\IBM\Informix\11.70\ol_informix1170_1.con
TBLTBLFIRST 0
TBLTBLNEXT 0
TBLSPACE_STATS 1
DBSPACETEMP tempdbs
SBSPACETEMP
SBSPACENAME sbspace
SYSSBSPACENAME
ONDBSPACEDOWN 2
SERVERNUM 6
DBSERVERNAME ol_informix1170_1
DBSERVERALIASES dr_informix1170_1
NETTYPE olsoctcp,1,150,NET
LISTEN_TIMEOUT 60
MAX_INCOMPLETE_CONNECTIONS 1024
FASTPOLL 1
NS_CACHE host=900,service=900,user=900,group=900
MULTIPROCESSOR 0
VPCLASS cpu,num=1,noage
VP_MEMORY_CACHE_KB 0
SINGLE_CPU_VP 1
#VPCLASS aio,num=1
CLEANERS 2
AUTO_AIOVPS 1
DIRECT_IO 0
LOCKS 2000
DEF_TABLE_LOCKMODE page
RESIDENT 0
SHMBASE 0xc000000L
SHMVIRTSIZE 209920
SHMADD 6560
EXTSHMADD 8192
SHMTOTAL 0
SHMVIRT_ALLOCSEG 0,3
#SHMNOACCESS 0x70000000-0x7FFFFFFF
CKPTINTVL 300
AUTO_CKPTS 1
RTO_SERVER_RESTART 60
BLOCKTIMEOUT 3600
CONVERSION_GUARD 2
RESTORE_POINT_DIR $INFORMIXDIR\tmp
TXTIMEOUT 300
DEADLOCK_TIMEOUT 60
HETERO_COMMIT 0
TAPEDEV \\.\TAPE0
TAPEBLK 16
TAPESIZE 0
LTAPEDEV
LTAPEBLK 16
LTAPESIZE 0
BAR_ACT_LOG $INFORMIXDIR\tmp\bar_act.log
BAR_DEBUG_LOG $INFORMIXDIR\tmp\bar_dbug.log
BAR_DEBUG 0
BAR_MAX_BACKUP 0
BAR_RETRY 1
BAR_NB_XPORT_COUNT 20
BAR_XFER_BUF_SIZE 15
RESTARTABLE_RESTORE ON
BAR_PROGRESS_FREQ 0
BAR_BSALIB_PATH
BACKUP_FILTER
RESTORE_FILTER
BAR_PERFORMANCE 0
BAR_CKPTSEC_TIMEOUT 15
ISM_DATA_POOL ISMData
ISM_LOG_POOL ISMLogs
DD_HASHSIZE 31
DD_HASHMAX 10
DS_HASHSIZE 31
DS_POOLSIZE 127
PC_HASHSIZE 31
PC_POOLSIZE 127
PRELOAD_DLL_FILE
STMT_CACHE 0
STMT_CACHE_HITS 0
STMT_CACHE_SIZE 512
STMT_CACHE_NOLIMIT 0
STMT_CACHE_NUMPOOL 1
USEOSTIME 0
STACKSIZE 64
ALLOW_NEWLINE 0
USELASTCOMMITTED NONE
FILLFACTOR 90
MAX_FILL_DATA_PAGES 0
BTSCANNER num=1,threshold=5000,rangesize=-1,alice=6,compression=default
ONLIDX_MAXMEM 188928
MAX_PDQPRIORITY 100
DS_MAX_QUERIES 1
DS_TOTAL_MEMORY 188928
DS_MAX_SCANS 1
DS_NONPDQ_QUERY_MEM 188928
DATASKIP
OPTCOMPIND 2
DIRECTIVES 1
EXT_DIRECTIVES 0
OPT_GOAL -1
IFX_FOLDVIEW 0
AUTO_REPREPARE 1
USTLOW_SAMPLE 0
RA_PAGES 64
RA_THRESHOLD 16
BATCHEDREAD_TABLE 1
BATCHEDREAD_INDEX 1
BATCHEDREAD_KEYONLY 0
EXPLAIN_STAT 1
#SQLTRACE level=low,ntraces=1000,size=2,mode=global
#DBCREATE_PERMISSION informix
#DB_LIBRARY_PATH
IFX_EXTEND_ROLE 1
SECURITY_LOCALCONNECTION
UNSECURE_ONSTAT
ADMIN_USER_MODE_WITH_DBSA
ADMIN_MODE_USERS
PLCY_POOLSIZE 127
PLCY_HASHSIZE 31
USRC_POOLSIZE 127
USRC_HASHSIZE 31
STAGEBLOB
OPCACHEMAX 0
SQL_LOGICAL_CHAR OFF
SEQ_CACHE_SIZE 10
ENCRYPT_HDR
ENCRYPT_SMX
ENCRYPT_CDR 0
ENCRYPT_CIPHERS
ENCRYPT_MAC
ENCRYPT_MACFILE
ENCRYPT_SWITCH
CDR_EVALTHREADS 1,2
CDR_DSLOCKWAIT 5
CDR_QUEUEMEM 4096
CDR_NIFCOMPRESS 0
CDR_SERIAL 0
CDR_DBSPACE
CDR_QHDR_DBSPACE
CDR_QDATA_SBSPACE
CDR_SUPPRESS_ATSRISWARN
CDR_DELAY_PURGE_DTC 0
CDR_LOG_LAG_ACTION ddrblock
CDR_LOG_STAGING_MAXSIZE 0
CDR_MAX_DYNAMIC_LOGS 0
DRAUTO 0
DRINTERVAL 30
DRTIMEOUT 30
HA_ALIAS
DRLOSTFOUND $INFORMIXDIR\etc\dr.lostfound
DRIDXAUTO 0
LOG_INDEX_BUILDS
SDS_ENABLE
SDS_TIMEOUT 20
SDS_TEMPDBS
SDS_PAGING
SDS_LOGCHECK 0
UPDATABLE_SECONDARY 0
FAILOVER_CALLBACK
FAILOVER_TX_TIMEOUT 0
TEMPTAB_NOLOG 0
DELAY_APPLY 0
STOP_APPLY 0
LOG_STAGING_DIR
RSS_FLOW_CONTROL 0
ENABLE_SNAPSHOT_COPY 0
SMX_COMPRESS 0
ON_RECVRY_THREADS 2
OFF_RECVRY_THREADS 5
DUMPDIR $INFORMIXDIR\tmp
DUMPSHMEM 1
DUMPGCORE 0
DUMPCORE 0
DUMPCNT 1
ALARMPROGRAM $INFORMIXDIR\etc\alarmprogram.bat
ALRM_ALL_EVENTS 0
#SYSALARMPROGRAM $INFORMIXDIR\etc\evidence.bat
STORAGE_FULL_ALARM 600,3
RAS_PLOG_SPEED 10982
RAS_LLOG_SPEED 0
EILSEQ_COMPAT_MODE 0
QSTATS 0
WSTATS 0
#VPCLASS MQ,noyield
MQSERVER
MQCHLLIB
MQCHLTAB
#VPCLASS jvp,num=1
#JVPJAVAHOME $INFORMIXDIR\extend\krakatoa\jre
#JVPHOME $INFORMIXDIR\extend\krakatoa
JVPPROPFILE $INFORMIXDIR\extend\krakatoa\.jvpprops
JVPLOGFILE $INFORMIXDIR\jvp.log
#JDKVERSION 1.5
#JVPJAVALIB \bin
#JVPJAVAVM jvm
#JVPARGS -verbose:jni
#JVPCLASSPATH $INFORMIXDIR\extend\krakatoa\krakatoa_g.jar;$INFORMIXDIR\extend\krakatoa\jdbc_g.jar
JVPARGS -Dcom.ibm.tools.attach.enable=no
JVPCLASSPATH $INFORMIXDIR\extend\krakatoa\krakatoa.jar;$INFORMIXDIR\extend\krakatoa\jdbc.jar
BUFFERPOOL default,buffers=10000,lrus=8,lru_min_dirty=50.00,lru_max_dirty=60.50
BUFFERPOOL size=4K,buffers=13108,lrus=16,lru_min_dirty=70.00,lru_max_dirty=80.00
AUTO_LRU_TUNING 1
USERMAPPING OFF
SP_AUTOEXPAND 1
SP_THRESHOLD 0
SP_WAITTIME 30
DEFAULTESCCHAR \
LOW_MEMORY_RESERVE 0
LOW_MEMORY_MGR 0
REMOTE_SERVER_CFG
REMOTE_USERS_CFG
S6_USE_REMOTE_SERVER_CFG 0
GSKIT_VERSION
NETTYPE drsoctcp,1,150,NET
If it is a multiprocessor machine, definitely consider turning on MULTIPROCESSOR by setting it to a non-zero value.
The ONCONFIG parameters of greatest interest to you for DSS are those related to Parallel Data Query, or PDQ. The block that commences with MAX_PDQPRIORITY. It is worth perusing the fine manual on these specifically, because the inter-relationship between them and some other parameters is too complex to go into here.
But in essence, DS_MAX_QUERIES is the maxumum number of parallel queries permitted at any time, and DS_MAX_SCANS determines the number of IO threads for scanning your tables. DS_TOTAL_MEMORY determines the amount of memory allocated for PDQ processing, and there is an algorithm in the manual that shows how these variables and the user's PDQPRIORITY setting combine.
You might also want to consider lifting the RA_PAGES and RA_THRESHOLD values - these determine how many pages are read into memory as 'blocks' before grabbing the next batch. If you're wanting to favour table-scans (which generally you do in DSS) then increasing these to something like 256 and 128 might improve performance.
My experience is with SMP and MPP unix boxes, rather than Windows, so I'm not sure how much you can wring out of your architecture, but this is where you want to start.
I would recommend identifying a good DSS query that runs for a decent length of time, and changing one parameter at a time to see the effect. SET EXPLAIN ON is your friend here, too.
One last thing - 11.7 supports table compression, and the tests I've seen show dramatic improvements in a DSS environment with large reads and irregular writes.

Resources