I have a service that uses an Docker image. About a half dozen people use it. However, occasionally containers produces big core.xxxx dump files. How do I disable it on docker images? My base image is Debian 9.
To disable core dumps set a ulimit value in /etc/security/limits.conf file and defines some shell specific restrictions.
A hard limit is something that never can be overridden, while a soft limit might only be applicable for specific users. If you would like to ensure that no process can create a core dump, you can set them both to zero. Although it may look like a boolean (0 = False, 1 = True), it actually indicates the allowed size.
soft core 0
hard core 0
The asterisk sign means it applies to all users. The second column states if we want to use a hard or soft limit, followed by the columns stating the setting and the value.
Related
I'm using Splunk to monitor my applications.
I also store resource statistics in my Splunk too.
Goal: I want to find the optimum CPU limit for each container.
How to I write a query that finds an optimum CPU limit? Or the other question is Should I?
Concern1: When I start customizing my query and let's say that I have used MAX(CPU) command. It doesn't mean that my container will be running at level most of the time. So, I might set an unnecessary high limit for my containers.
Let me explain, when I find a CPU limit value via MAX(CPU) command as 10, this top value might be happened because of a bulk operation. So, my container's expected resource may be around 1.2 all the time, except this single 1 operation that one. So, using MAX value won't work.
Concern2: Let's say that I have used the value of AVG(CPU) value and used it. And that is 2, So how many of my operations will be waited for how many minutes after this change? Or how many of them are going to be timed out? It may create a lot of side-effects. How will I decide the real average value? What parameters should be used?
Is it possible to include such conditions in the query? Or do I need an AI to decide it? :)
Here are my givin parameters:
path=statistics.cpus_system_time_secs
path=statistics.cpus_user_time_secs
path=statistics.cpus_nr_periods
path=statistics.cpus_nr_throttled
path=statistics.cpus_throttled_time_secs
path=statistics.cpus_limit
I bet you can ask better questions than me. Let's discuss.
"Optimum" is going to depend greatly on your own environment (resources available, application priority, etc)
You probably want to look at a combination of the following factors:
avg(CPU)
max(CPU) (and time spent there)
min(CPU) (and time spent there)
I suspect your "optimum" limit is going to be a % below your max...but only if you're spending 'a lot' of time maxxed-out
And, of course, being "maxed" may not matter, if other containers are running acceptably
Keep in mind, once you set that limit, your max will drop (as, likely, will your avg)
I have a LeafSystem (controller) with two output ports, each of which depend on the solution to the same MathematicalProgram. My initial idea was to solve the program and store the solution as a discrete state which the output port callbacks can access and copy appropriately.
My interpretation of the documentation (https://drake.mit.edu/doxygen_cxx/group__discrete__systems.html) and what I see when implementing this, however, is that the output callbacks use the discrete state before the PerStepDiscreteUpdateEvent.
Now for my questions -
Is this behavior that I've described above consistent with how the Simulator handles update events or am I missing something there?
Is there a way to update the discrete state before the output calculation and have the updated state be used in the output?
Is there a different design that would be more appropriate here?
The simple solution to your problem is cache entry.
Declare a cache entry that does your mathematical program work and updates the associated cache entry (it stores the results). When each output port is evaluated, they both "Eval" the cache entry and draw whatever data they need from the stored result. Then, no matter which port is evaluated first, the second one will always benefit from the pre-computation.
You can look at the cache entry notes for more detail.
Folks,
With regards to docker compose v3's 'cpus' parameter setting (under 'deploy' 'resources' 'limits') to limit the available CPUs to a service, is it an absolute number that specifies the count of CPUs or is it a more useful percentage of available CPUs setting.
From what i read it appears to be an absolute number, where in, say if a host has 4 CPUs and one were to set two services in the compose file with 0.5 then both the services combined can only use a max of 1 CPU (0.5 each) while leaving the 3 remaining CPUs idle.
But thinking loudly it appears to me that it would be nicer if this is a percentage of available cores setting in which case for the same previous example this would result in both services each being able to use up to 2 CPUs each & thereby the two combined could use up all 4 when needed. This way when i increase or decrease the available cores the relative settings would help avoid modifying this value again.
EDIT(09/10/21):
On reading this it appears that the above can be achieved with 'cpu-shares' setting instead of setting 'cpus'. Is my understanding correct?
The doc for 'cpu-shares' however mentions the below cautionary note,
"It does not guarantee or reserve any specific CPU access."
If the above is achieved with this setting, then what does it mean (what is to lose) to not have a guarantee or reservation?
EDIT(09/13/21):
Just to summarize,
The 'cpus' parameter setting is an an absolute number that refers to the number of CPUs a service has reserved for it to use at all times. Correct?
The 'cpu-shares' parameter setting is a relative weight number the value of which is used to compute/determine the percentage of total available CPU that a service can use only when there is contention. Correct?
I'm trying to figure out the best or a reasonable approach to defining alerts in InfluxDB. For example, I might use the CPU batch tickscript that comes with telegraf. This could be setup as a global monitor/alert for all hosts being monitored by telegraf.
What is the approach when you want to deviate from the above setup for a host, ie instead of X% for a specific server we want to alert on Y%?
I'm happy that a distinct tickscript could be created for the custom values but how do I go about excluding the host from the original 'global' one?
This is a simple scenario but this needs to meet the needs of 10,000 hosts of which there will be 100s of exceptions and this will also encompass 10s/100s of global alert definitions.
I'm struggling to see how you could use the platform as the primary source of monitoring/alerting.
As said in the comments, you can use the sideload node to achieve that.
Say you want to ensure that your InfluxDB servers are not overloaded. You may want to allow 100 measurements by default. Only on one server, which happens to get a massive number of datapoints, you want to limit it to 10 (a value which is exceeded by the _internal database easily, but good for our example).
Given the following excerpt from a tick script
var data = stream
|from()
.database(db)
.retentionPolicy(rp)
.measurement(measurement)
.groupBy(groupBy)
.where(whereFilter)
|eval(lambda: "numMeasurements")
.as('value')
var customized = data
|sideload()
.source('file:///etc/kapacitor/customizations/demo/')
.order('hosts/host-{{.hostname}}.yaml')
.field('maxNumMeasurements',100)
|log()
var trigger = customized
|alert()
.crit(lambda: "value" > "maxNumMeasurements")
and the name of the server with the exception being influxdb and the file /etc/kapacitor/customizations/demo/hosts/host-influxdb.yaml looking as follows
maxNumMeasurements: 10
A critical alert will be triggered if value and hence numMeasurements will exceed 10 AND the hostname tag equals influxdb OR if value exceeds 100.
There is an example in the documentation handling scheduled downtimes using sideload
Furthermore, I have created an example available on github using docker-compose
Note that there is a caveat with the example: The alert flaps because of a second database dynamically generated. But it should be sufficient to show how to approach the problem.
What is the cost of using sideload nodes in terms of performance and computation if you have over 10 thousand servers?
Managing alerts manually directly in Chronograph/Kapacitor is not feasible for big number of custom alerts.
At AMMP Technologies we need to manage alerts per database, customer, customer_objects. The number can go into the 1000s. We've opted for a custom solution where keep a standard set of template tickscripts (not to be confused with Kapacitor templates), and we provide an interface to the user where only expose relevant variables. After that a service (written in python) combines the values for those variables with a tickscript and using the Kapacitor API deploys (updates, or deletes) the task on the Kapacitor server. This is then automated so that data for new customers/objects is combined with the templates and automatically deployed to Kapacitor.
You obviously need to design your tasks to be specific enough so that they don't overlap and generic enough so that it's not too much work to create tasks for every little thing.
I have a Redis Master and 2 slaves. All 3 are currently on the same unix server. The memory used by the 3 instances is approximately 3.5 G , 3 G , 3G. There are about 275000 keys in the redis db. About 4000 are hashes. 1 Set has 100000 values. 1 List has 275000 keys in it. Its a List of Hashes and Sets. The server has total memory of 16 GB. Currently 9.5 GB is used. The persistence is currently off. The rdb file is written once in a day by forced background save. Please provide any suggestions for optimizations. max-ziplist configuration is default currently.
Optimizing Hashes
First, let's look at the hashes. Two important questions - how many elements in each hash, and what is the largest value in those hashes? A hash uses the memory efficient ziplist representation if the following condition is met:
len(hash) < hash-max-ziplist-entries && length-of-largest-field(hash) < hash-max-ziplist-value
You should increase the two settings in redis.conf based on your data, but don't increase it more than 3-4 times the default.
Optimizing Sets
A set with 100000 cannot be optimized, unless you provide additional details on your use case. Some general strategies though -
Maybe use HyperLogLog - Are you using the set to count unique elements? If the only commands you run are sadd and scard - maybe you should switch to a hyperloglog.
Maybe use Bloom Filter - Are you using the set to check for existence of a member? If the only commands you run are sadd and sismember - maybe you should implement a bloom filter and use it instead of the set.
How big is each element? - Set members should be small. If you are storing big objects, you are perhaps doing something incorrect.
Optimizing Lists
A single list with 275000 seems wrong. It is going to be slow to access elements in the center of the list. Are you sure you list is the right data structure for your use case?
Change list-compress-depth to 1 or higher. Read about this setting in redis.conf - there are tradeoffs. But for a list of 275000 elements, you certainly want to enable compression.
Tools
Use the open source redis-rdb-tools to analyze your data set (disclaimer: I am the author of this tool). It will tell you how much memory each key is taking. It will help you to decide where to concentrate your efforts on.
You can also refer to this memory optimization cheat sheet.
What else?
You have provided very little details on your use case. The best savings come from picking the right data structure for your use case. I'd encourage you to update your question with more details on what you are storing within the hash / list / set.
We did following configuration and that helped to reduce the memory footprint by 40%
list-max-ziplist-entries 2048
list-max-ziplist-value 10000
list-compress-depth 1
set-max-intset-entries 2048
hash-max-ziplist-entries 2048
hash-max-ziplist-value 10000
Also, we increased the RAM on the linux server and that helped us with the Redis memory issues.