Cockroachdb storage increasing without any reason - storage

I am using cockroachDB for storing the login credentials of the users. I only have one table with no records in it. But I noticed that the cluster storage is increasing slowly without any activity in database. Why is it happening? How do I stop this ?yesterday's screenshot
Today's screenshot

A CockroachDB cluster will regularly store information about its own health and status in system tables, so that it is available to you should you need it (e.g. in your dashboard). Once these tables hit their retention limit, storage will level off. Overall, however, the amount of stored system data should be very small in relation to overall limits. For example, in this case, 3MB is only 0.06% of the free storage limit.

Related

Is the query over data stored in remote storage by Prometheus efficient? Do I need Grafana Mimir to scale?

We know that Prometheus has three phases of data storage:
In-memory: this is where the recent memory is stored. It allows for fast query using PromQL as it is RAM memory. [Am I wrong?]
After a few hours the in-memory data is formally saved to the disk in the format of Blocks.
After the data retention period is over, data is stored in a remote access.
I wanted to ask if it is efficient to query over the data stored in the remote access. If I need a lot of metrics to monitor for my org, do I need Grafana Mimir, which handles upto 1 billion active metrics.
Also, as a side question, how many MBs/GBs metrics can Prometheus store before the retention period is over?
Sparingly. Yes. Prom wont like it if you try query over a few years for example since it will go to storage for everything. but getting metrics from storage for an hour is easy and wont be a problem.
how many MBs/GBs metrics can Prometheus store? Its irrelevant. The retention period is intendant of the amount of data stored. You can store 100MB in a day or 100GB in a day it doesn't matter. What will matter is cardinality

postgreSQL-12: Is there a way to enable FIFO to limit database table storage capacity?

I have a ThingsBoard PE setup using AWS EC2 instance, with postgreSQL-12 as the database.
There is a table ts_kv_2020_10 which stores all telemetry data for the month of October.
Is there a way I can enable FIFO on this ts_kv_2020_10 table to keep storage at a fixed capacity of example 1GB? (i.e. When limit is reached, data that was first stored onto the table will automatically be replaced by the lastest incoming data.)
No, there is no built-in feature in Postgres for that.
You will either need to roll your own (e.g. using triggers) or use partitioning to get rid of an entire month once it's not needed any longer.

Is there a way to limit the performance data being recorded by AKS clusters?

I am using azure log analytics to store monitoring data from AKS clusters. 72% of the data stored is performance data. Is there a way to limit how often AKS reports performance data?
At this point we do not provide a mechanism to change performance metric collection frequency. It is set to 1 minute and cannot be changed.
We were actually thinking about adding an option to make more frequent collection as was requested by some customers.
Given the number of objects (pods, containers, etc) running in the cluster collecting even a few perf metrics may generate noticeable amount of data... You need that data in order to figure out what is going on in case of a problem.
Curious: you say your perf data is 72% of total - how much is it in terms om Gb/day, do you now? Do you have any active applications running on the cluster generating tracing? What we see is that once you stand up a new cluster, perf data is "the king" of volume, but once you start ading active apps that trace, logs become more and more of a factor in telemetry data volume...

Azure app service availability loss. The memory counter Page Reads/sec was at a dangerous level

Environment:
Asp Net MVC app(.net framework 4.5.1) hosted on Azure app service with two instances.
App uses Azure SQL server database.
Also, app uses MemoryCache (System.Runtime.Caching) for caching purposes.
Recently, I noticed availability loss of the app. It happens almost every day.
Observations:
The memory counter Page Reads/sec was at a dangerous level (242) on instance RD0003FF1F6B1B. Any value over 200 can cause delays or failures for any app on that instance.
What 'The memory counter Page Reads/sec' means?
How to fix this issue?
What 'The memory counter Page Reads/sec' means?
We could get the answer from this blog. The recommended Page reads/sec value should be under 90. Higher values indicate insufficient memory and indexing issues.
“Page reads/sec indicates the number of physical database page reads that are issued per second. This statistic displays the total number of physical page reads across all databases. Because physical I/O is expensive, you may be able to minimize the cost, either by using a larger data cache, intelligent indexes, and more efficient queries, or by changing the database design.”
How to fix this issue?
Based on my experience, you could have a try to enable Local Cache in App
Service.
You enable Local Cache on a per-web-app basis by using this app setting: WEBSITE_LOCAL_CACHE_OPTION = Always
By default, the local cache size is 300 MB. This includes the /site and /siteextensions folders that are copied from the content store, as well as any locally created logs and data folders. To increase this limit, use the app setting WEBSITE_LOCAL_CACHE_SIZEINMB. You can increase the size up to 2 GB (2000 MB) per web app.
There is some memory performance problems can be listed
excessive paging,
memory shortages,
memory leaks
Memory counter values can be used to detect the presence of various performance problems. Tracking counter values both on a system-wide and a per-process basis helps you to pinpoint the cause in Azure such as in other systems.
Even if there is no change in the process, a change in the system can cause memory problems. the system-wide
researching in the azure:
Shared resources plans (Free and Basic) have memory limits as seen here: https://learn.microsoft.com/en-us/azure/azure-subscription-service-limits#app-service-limits.
Quotas: https://learn.microsoft.com/en-us/azure/app-service-web/web-sites-monitor
Also, you can check in the portal under your web app settings, search for “quotas”, and also check out “Diagnose and solve problems” and hit “metrics per instance (app service plan)” which will show you memory used for the plan.
A MemoryCache bug in .net 4 can also cause this type of behavior
https://stackoverflow.com/a/15715990/914284

What happens to Local SSD if the entire zone were to lose power?

What happens to data on local SSD if the entire google data center were to suffer a cataclysmic loss of power? When the compute engine instance comes back online eventually, will it still have the data on the Local SSD? It seems like it handles planned downtime just fine:
No planned downtime: Local SSD data will not be lost when Google does
datacenter maintenance, even without replication or redundancy. We
will use our live migration technology to move your VMs along with
their local SSD to a new machine in advance of any planned
maintenance, so your applications are not disrupted and your data is
not lost.
But I'm concerned about unplanned downtime. Disk failure is an ever-present risk, but if you combine Local SSD with replication, you can protect against that. However, I'm trying to guard against correlated failure, where e.g. the whole region goes dark. Then the in-memory replicated data is lost, but does the data fsynced to the local SSD likely survive when the instances come back up? If you lose it, then fsyncing data to local SSD really doesn't buy you any more safety than RAM - e.g. for running a local database instance or something.
As an aside, please note that Google data center power supplies are redundant and have backup power generators in case of correlated power supply failures:
Powering our data centers
To keep things running 24/7 and ensure uninterrupted services, Google’s data centers feature redundant power systems and environmental controls. Every critical component has a primary and alternate power source, each with equal power. Diesel engine backup generators can provide enough emergency electrical power to run each data center at full capacity. Cooling systems maintain a constant operating temperature for servers and other hardware, reducing the risk of service outages. Fire detection and suppression equipment helps prevent damage to hardware. Heat, fire, and smoke detectors trigger audible and visible alarms in the affected zone, at security operations consoles, and at remote monitoring desks.
Back to your questions. You asked:
Then the in-memory replicated data is lost, but does the data fsynced to the local SSD likely survive when the instances come back up?
Per the local SSD documentation (emphasis in the original):
[...] local SSD storage is not automatically replicated and all data can be lost in the event of an instance reset, host error, or user configuration error that makes the disk unreachable. Users must take extra precautions to back up their data.
If all of the above protections fail, a power outage would be equivalent to an instance reset, which may render local SSD volumes to be inaccessible—a VM is very likely to restart on a different physical machine, and if it does, the data would be essentially "lost" as it would be inaccessible and wiped.
Thus, you should consider local SSD data as transient as you consider RAM to be.
You also asked:
However, I'm trying to guard against correlated failure, where e.g. the whole region goes dark.
If you want to protect against a zone outage, replicate across multiple zones in a region. If you want to protect against an entire region outage, replicate to other regions. If you want to protect against correlated region failures, replicate to even more regions.
You can also store snapshots of your data in Google Cloud Storage which provides a high level of durability:
Google Cloud Storage is designed for 99.999999999% durability; multiple copies, multiple locations with checksums and cross region striping of data.

Resources