How to add some OS level metrics to JMX for java JVM - jmx

Using JConsole someone can access to the metrics that were gathered by default for OS like memory, CPU load and ..., in addition to process specific metrics. My question is can we add some OS customized metrics, like the usage of some directory using Java Files API or checking if a specific port is responsive?
I gather so-called metrics using remote SSH and the commands like du -sh /directory that has so many delays and I want to get it using JMX so it could run faster.
This question talked about adding spring metrics.

As the linked question shows it is easy to expose a Java class as an MBean, so you could certainly write a class that collects the metrics you need. Implementing du in Java is not difficult. However, I'm not sure that it will solve your problem. The example of du -sh /directory is probably slow because it needs to recursively measure the size of a directory hierarchy. That will be just as slow (probably slower!) in Java.
As a side note I would normally use collectd or Telegraf for that kind of thing, but again the I/O cost for finding disk usage would be the same.
I would suggest adding some logs with times to your current script so that you can see where it spends time. If it takes less than a second to connect with SSH and 15 seconds to determine the directory size, for example, moving from SSH to JMX won't help.

Related

Influxdb high CPU usage jumping to 80 %?

I am relatively new to time series db world . I am running a Influxdb 1.8.x as a docker container, and I have configured the influxdb.conf file as a default config. Currently I am facing a issue of high CPU usage by influxdb, the CPU jumps to 80 to 90% and creating a problem for other process running on same machine.
I have tried a solution given here ->> Influx high CPU issue but unfortunately It did not work? I am unable to understand the reason behind the issue and also struggling to get support in terms of documentation or community help.
What I have tried so far:
updated the monitor section of influxdb.conf file like this ->> monitor DB
Checked the series cardinality SHOW SERIES CARDINALITY and it looks well within limits--9400(I am also not sure about the ideal number for high cardinality red flag)
I am looking for an approach, which will help me understand this problem the root cause?
Please let me know if you need any further information on same.
After reading about Influxdb debug and CPU profiling HTTP API influxdb I was able to pin-down the issue, the problem was in the way I was making the query, my query involved more complex functions and also GROUP BAY tag.I also tried query analysis using EXPLAIN ANALYZE (query) command to check how much time a query is taking to execute. I resolved that and noticed a huge Improvement in CPU load.
Basically I can suggest the following:
Run the CPU profile analysis using influxdb HTTP API with the command curl -o <file name> http://localhost:8086/debug/pprof/all?cpu=true e and collect result.
Visualize the result using tool like Pprof tool and find the problem
Also one can run basic commands like SHOW SERIES CARDINALITY and EXPLAIN ANALYZE <query> to understand the execution of the query
Before designing any schema and Influx client check the hardware recommendation ->> Hardware sizing guidelines

Can I write a file to a specific cluster location?

You know, when an application opens a file and write to it, the system chooses in which cluster will be stored. I want to choose myself ! Let me tell you what I really want to do... In fact, I don't necessarily want to write anything. I have a HDD with a BAD range of clusters in the middle and I want to mark that space as it is occupied by a file, and eventually set it as a hidden-unmoveable-system one (like page file in windows) so that it won't be accessed anymore. Any ideas on how to do that ?
Later Edit:
I think THIS is my last hope. I just found it, but I need to investigate... Maybe a file could be created anywhere and then relocated to the desired cluster. But that requires writing, and the function may fail if that cluster is bad.
I believe the answer to your specific question: "Can I write a file to a specific cluster location" is, in general, "No".
The reason for that is that the architecture of modern operating systems is layered so that the underlying disk store is accessed at a lower level than you can access, and of course disks can be formatted in different ways so there will be different kernel mode drivers that support different formats. Even so, an intelligent disk controller can remap the addresses used by the kernel mode driver anyway. In short there are too many levels of possible redirection for you to be sure that your intervention is happening at the correct level.
If you are talking about Windows - which you haven't stated but which appears to assumed - then you need to be looking at storage drivers in the kernel (see https://learn.microsoft.com/en-us/windows-hardware/drivers/storage/). I think the closest you could reasonably come would be to write your own Installable File System driver (see https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/_ifsk/). This is really a 'filter' as it sits in the IO request chain and can intercept and change IO Request Packets (IRPs). Of course this would run in the kernel, not in userspace, and normally this would be written in C and I note your question is tagged for Delphi.
Your IFS Driver can sit at differnt levels in the request chain. I have used this technique to intercept calls to specific file system locations (paths / file names) and alter the IRP so as to virtualise the request - even calling back to user space from the kernel to resolve how the request should be handled. Using the provided examples implementing basic functionality with an IFS driver is not too involved because it's a filter and not a complete storgae system.
However the very nature of this approach means that another filter can also alter what you are doing in your driver.
You could look at replacing the file system driver that interfaces to the hardware, but I think that's likely to be an excessive task under the circumstances ... and as pointed out already by #fpiette the disk controller hardware can remap your request anyway.
In the days of MSDOS the access to the hardware was simpler and provided by the BIOS which could be hooked to allow the requests to be intercepted. Modern environments aren't that simple anymore. The IFS approach does allow IO to be hooked, but it does not provide the level of control you need.
EDIT regarding suggestion by the OP of using FSCTL_MOVE_FILE
For simple environment this may well do what you want, it is designed to support a defragmentation process.
However I still think there's no guarantee that this actually will do what you want.
You will note from the page you have linked to it states that it is moving one or more virtual clusters of a file from one logical cluster to another within the same volume
This is a code that's passed to the underlying storage drivers which I have referred to above. What the storage layer does is up to the storage layer and will depend on the underlying technology. With more advanced storage there's no guarantee this actually addresses the physical locations which I believe your question is asking about.
However that's entirely dependent on the underlying storage system. For some types of storage relocation by the OS may not be honoured in the same way. As an example consider an enterprise storage array that has a built in data-tiering function. Without the awareness of the OS data will be relocated within the storage based on the tiering algorithms. Also consider that there are technologies which allow data to be directly accessed (like NVMe) and that you are working with 'virtual' and 'logical' clusters, not physical locations.
However, you may well find that in a simple case, with support in the underlying drivers and no remapping done outside the OS and kernel, this does what you need.
Since you problem is to mark bad cluster, you don't need to write any program. Use the command line utility CHKDSK that Windows provides.
I an elevated command prompt (Run as administrator), run the command:
chkdsk /r c:
The check will be done on the next reboot.
Don't forget to read the documentation.

Configuration Dask Distributed

I'm setting up an environment for our data scientists to work on. Currently we have a single node running Jupyterhub with Anaconda and Dask installed. (2 sockets with 6 cores and 2 threads per core with 140 gb ram). When users create a LocalCluster, currently the default settings are to take all the available cores and memory (as far as I can tell). This is okay when done explicitly, but I want the standard LocalCluster to use less than this. Because almost everything we do is
Now when looking into the config I see no configuration dealing with n_workers, n_threads_per_worker, n_cores etc. For memory, in dask.config.get('distributed.worker') I see two memory related options (memory and memory-limit) both specifying the behaviour listed here: https://distributed.dask.org/en/latest/worker.html.
I've also looked at the jupyterlab dask extension, which lets me do all this. However, I can't force people to use jupyterlab.
TL;DR I want to be able set the following standard configuration when creating a cluster:
n_workers
processes = False (I think?)
threads_per_worker
memory_limit either per worker, or for the cluster. I know this can only be a soft limit.
Any suggestions for configuration is also very welcome.
As of 2019-09-20 this isn't implemented. I recommend raising an feature request at https://github.com/dask/distributed/issues/new , or even a pull request.

What is recommended solution for monitoring heterogeneous infrastructure?

I am looking for monitoring tool for the following use cases:
Collect basic metrics about virtual machine (cpu usage, memory usage, i/o, available space)
Extract metrics from SQL Server (probably running some queries)
Extract information from external service about processing i.e how many processing are currently running and for how long. I am thinking about writing python scripts, but don't know how to combine with monitoring tool
Have the ability to plot charts and manage alerts and it will nice to have ability to send not only mails, but send message to slack/ms teams.
I was thing about Prometheus, because it has wmi_exporter, node_exporter, sql exporter, alert manager with possibility to send notifications to multiple destinations, but I don't know what to do with this external service and python scripts.
Any suggestions?
Prometheus can definitely do what you say you need done. Some of it may not be trivial, but you can definitely fill in the blanks yourself.
E.g. you can get machine metrics basically out of the box by firing up a node_exporter and having it scraped by Prometheus, but I don't think it has e.g. information on all running processes. The latter might require you to write an agent/exporter: a simple web server that exposes metrics on /metrics; there exists a Python client library to help with that. Or have said processes (assuming they're your code) push metrics to a Pushgateway instead, if they're short lived batch jobs.
Oh, and for charts/dashboards you probably want Grafana, as Prometheus' abilities in that area are rather limited and Grafana integrates rather well with Prometheus.

Best tool to record CPU and memory usage with Grinder?

I am using grinder in order to generate reports for the performance tests for my application. But I noticed that it does not generate any report on CPU and memory usage. On further investigation, I found that Grinder does not provide this information. Now, my question is, is there any tool that can be hooked up with grinder, to record the CPU and memory usage details?
As you have discovered, this is not supported directly in The Grinder itself. You will need to use a collection of tools to accomplish this.
I use a combination of Quickstatd, Graphite, and Grinder to Graphite to get all my results in the same place where I can see them. If you need to support Windows, you can probably use collectd (with ssc-serv and the Graphite plugin) instead of Quickstatd, which is based on bash scripts.
You can also pull in server side metrics (like DB lookups per second, etc.) with tools like jmxtrans, statsd, and metrics.
Having all that information in the same place is really powerful, and can give you some good insights.
If you grind a Java server, you can get data via JMX from OperatingSystemMXBean and MemoryMXBean.
Then add the data to a Grinder user Statistic and the data will end up in the -data.log
grinder.statistics.registerDataLogExpression("Load", "userDouble0")
..
grinder.statistics.forCurrentTest.setDouble("userDouble0", systemLoadAverage)
the -data.log can directly be fed into Gnuplot
gnuplot> plot 'client-0-data.log' using 2:7 title "System Load"

Resources