Zabbix - Trigger dependencies (multiple % CPU load) - monitoring

i have 3 triggers on the one item.
BTW It reads precent utility from windows server from WMI by bash script.
Question is.
How to make dependence between 3 triggers. for example when CPU load is more then 90% doesnt make any sense to activate trigger with 70% or 50%.
I need see only last alert. How to make cascade? If 90% ...i dont need 70 and 50. If 70% i dont need see 50%..... (lot of alerts with no sense)
Triggers for one server:
{Server_windows_reading_from_WMIroc_percUtility.sh[{HOST.HOST},"_Total"].avg(5m)}>90
{Server_windows_reading_from_WMIroc_percUtility.sh[{HOST.HOST},"_Total"].avg(5m)}>70
{Server_windows_reading_from_WMIroc_percUtility.sh[{HOST.HOST},"_Total"].avg(5m)}>50

You need to configure trigger dependencies on host or template level.
The 50threshold trigger will depend on the 70, which will depend on the 90 trigger.

Related

Ramp up time in Jenkins not reducing tps

I have a Jmeter script which I am running from local environment and achieved 20 tps(transactions) per second.
I have moved the same script to Jenkins and ran it from there. It worked as expected.
My Next step is to reduce tps from 20 to 2.
So I introduced Ramp up time of 30 seconds and it worked as expected from local environment.
I moved the script to jenkins, it gave me 20 tps when I run the script from jenkins.
Can someone tell me why this is happening and what I need to do to fix this.
I have tried several approaches like hard coding the ramp up time, creating a new jenkins project with new script.
Thanks in advance
It's hard to say what exactly is wrong without seeing your full Thread Group configuration, normally people use Timers for throttling JMeter's throughput to the given number of requests/transactions per second
Depending on what you're trying to achieve you can consider using:
Constant Throughput Timer (however it's precise enough on "minute" level, for the first minute of your test execution you will need to manipulate the load using the ramp-up approach
Precise Throughput Timer which is more powerful and "precise" however it's configuration is more complex and you need to provide test duration, units, etc.
Throughput Shaping Timer which is kind of a balance between precision and simplicity however it's a custom plugin so you will have to install it on Jenkins master/slaves

Call REST API from Azure Queue

I know this question is wired, But I am not 100% sure if it is possible or not. Need expert advise.
I am using this architecture (see Fig 1), there is a MVC WebAPI which puts data in Azure Queue and then Queue will call Azure Function to perform small tasks but very large in number e.g Queue is sending 5k - 10k requests to Azure Function in 1 minute.
Fig 1
We want to remove Azure Function because it cost us a lot. We want to go for alternate of it.
For this, someone share an idea to remove Azure function with another MVC WebAPI. (see Fig 2)
Fig 2
Is above architecture is possible ? If yes then How and If no then can anyone please suggest anything?
When using Azure Functions with Storage Queue trigger, Azure Functions will scale out based on the load on the queue. By default, batchSize is set to 16. The setting can be configured via host.json
The number of queue messages that the Functions runtime retrieves simultaneously and processes in parallel. When the number being processed gets down to the newBatchThreshold, the runtime gets another batch and starts processing those messages. So the maximum number of concurrent messages being processed per function is batchSize plus newBatchThreshold. This limit applies separately to each queue-triggered function.
This setting alone might not be sufficient when the number of messages is substantial. In that case, you want to restrict the scale-out behaviour associated with the number of VMs used to execute the Function App. The setting is an App Setting WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT. Setting it to 1 would prevent any scale-out to new VMs, but according to the documentation
This setting is a preview feature - and only reliable if set to a value <= 5
While your focus is on the cost of processing, take into consideration time as well. Unless it's OK to wait for the messages to get processed for a long time, you're likely to have other alternatives to Functions. But the trade-off between the cost and the time to process will always be there.

Dataflow job takes too long to start

I'm running a job which reads about ~70GB of (compressed data).
In order to speed up processing, I tried to start a job with a large number of instances (500), but after 20 minutes of waiting, it doesn't seem to start processing the data (I have a counter for the number of records read). The reason for having a large number of instances is that as one of the steps, I need to produce an output similar to an inner join, which results in much bigger intermediate dataset for later steps.
What should be an average delay before the job is submitted and when it starts executing? Does it depend on the number of machines?
While I might have a bug that causes that behavior, I still wonder what that number/logic is.
Thanks,
G
The time necessary to start VMs on GCE grows with the number of VMs you start, and in general VM startup/shutdown performance can have high variance. 20 minutes would definitely be much higher than normal, but it is somewhere in the tail of the distribution we have been observing for similar sizes. This is a known pain point :(
To verify whether VM startup is actually at fault this time, you can look at Cloud Logs for your job ID, and see if there's any logging going on: if there is, then some VMs definitely started up. Additionally you can enable finer-grained logging by adding an argument to your main program:
--workerLogLevelOverrides=com.google.cloud.dataflow#DEBUG
This will cause workers to log detailed information, such as receiving and processing work items.
Meanwhile I suggest to enable autoscaling instead of specifying a large number of instances manually - it should gradually scale to the appropriate number of VMs at the appropriate moment in the job's lifetime.
Another possible (and probably more likely) explanation is that you are reading a compressed file that needs to be decompressed before it is processed. It is impossible to seek in the compressed file (since gzip doesn't support it directly), so even though you specify a large number of instances, only one instance is being used to read from the file.
The best way to approach the solution of this problem would be to split a single compressed file into many files that are compressed separately.
The best way to debug this problem would be to try it with a smaller compressed input and take a look at the logs.

Nagios: Make sure x out of y services are running

I'm introducing 24/7 monitoring for our systems. To avoid unnecessary pages in the middle of the night I want Nagios to NOT page me, if only one or two of the service checks fail, as this won't have any impact on users: The other servers run the same service and the impact on users is almost zero, so fixing the problem has time until the next day.
But: I want to get paged if too many of the checks fail.
For example: 50 servers run the same service, 2 fail -> I can still sleep.
The service fails on 15 servers -> I get paged because the impact is getting to high.
What I could do is add a lot (!) of notification dependencies that only trigger if numerous hosts are down. The problem: Even though I can specify to get paged if 15 hosts are down, I still have to define exactly which hosts need to be down for this alert to be sent. I rather want to specify that if ANY 15 hosts are down a page is made.
I'd be glad if somebody could help me with that.
Personally I'm using Shinken which has business rules just for that. Shinken is backward compatible with Nagios, so it's easy to drop your nagios configuration into shinken.
It seems there is a similar addon for nagios Nagios Business Process Intelligence Addon, but I'm not having experience with this addon.

Is there a way to emulate 2 CPU Cores?

My app is ASP.NET MVC.
I am using a lot of parallel processing on my local machine (8 cores) and things are running very smoothly.
But when I roll to Azure Medium Instance (2 cores), during testing, I get weird slow downs and program stops working sometimes.
Is there a way to emulate 1, 2 or another number of cores to match what will happen in production environment?
I guess you could try setting the process affinity for the development server. To do that, (assuming Windows 7) open task manager, right-click the server process and select Set Affinity... and select the cores you want it to run on.
Just managed to find a way around #Dai's answer above, but it means you'll have to start the development server yourself. This .bat file runs notepad.exe using two cores (you can verify that by checking its affinity within task manager):
start /affinity 0x03 notepad.exe
The 0x03 specifies core 1 and core 2. The help command was a bit confusing, but it seems it combines those to get the result (as in, 1 + 2 = 3, unless I've misunderstood it). So if you need to change to a different set of cores, keep that in mind.
#JohnH's method seems the best, but you'd have to do it every time w3wp.exe runs on your machine. An alternative is to restrict your operation to 2 threads (or 4 if using hyperthreading). We need more information about how you're processing information in parallel.
A parralelisable problem should just run 4 times as fast on an 8-core machine as opposed to the 2-core Azure VM, but getting 'program stops working' situations means you've got a bug there.
I'm not sure if this will fully help the situation, but you can set the MaxDegreeOfParalellism for each core. This way you can limit the threads that run.

Resources