InfluxDB storage size on disk

InfluxDB storage size on disk - influxdb

All I want is simply to know how much space my InfluxDB database takes on my HDD. The stats() command gives me dozens of numbers but I don't know which one shows what I want.

Stats output does not contain that information. The size of the directory structure on disk will give that info.
du -sh /var/lib/influxdb/data/<db name>
Where /var/lib/influxdb/data is the data directory defined in influxdb.conf.

In InfluxDB v1.x, you can use following command to find out the disk usage of database, measurement and even shards:
influx_inspect report-disk -detailed /var/lib/influxdb/data/
In InfluxDB v2.x, you could take advantage of the internal stats as following:
from(bucket: "yourBucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "storage_shard_disk_size")
|> filter(fn: (r) => r["_field"] == "gauge")
|> last()
It will show you the size of each database.

For InfluxDB 2.0 on MacOS - (for me at least) - the location is ~/.influxdbv2/engine.
Running "du -sh *" will show you the disk usage.

Also check .influxdb in your user's home directory.
If you already run influxd, you can check which file descriptors it keeps open:
$ pgrep -a influxd
<influxd PID> <full command path>
$ ls -l /proc/<influxd PID>/fd
For example, I have an influxd from a prebuilt package influxdb-1.8.6_linux_amd64.tar.gz. It is simply unpacked in /home/me/bin/ and runs as a user command. There is neither /var/lib/influxdb/ nor /etc/influxdb/influxdb.conf. There is ~/bin/influxdb-1.8.6-1/etc/influxdb/influxdb.conf, but it is not actually used. However, the list of file descriptors in /proc/<PID>/fd shows that it keeps several files opened under:
/home/me/.influxdb/data
/home/me/.influxdb/data/<my_db_name>/_series
/home/me/.influxdb/wal/<my_db_name>/
But do not take this for granted, I am not an influxdb expert. Notice that 1.8 is an old version, there might be some tricks in other versions.

Related

Using R to Query a SQLite file Help: DBI::dbGetQuery() never completes, very slow

I have a SQLite file that contains a table with 9 million or so rows and 30+ columns. Up until a few days ago, the following code worked fine:
path <- file.path(basepath, "Rmark_web", "SQL_Data", "2020Q3_Enterprise_Exposure_Wind.sqlite")
cn <- DBI::dbConnect(RSQLite::SQLite(), path)
df <- DBI::dbGetQuery(cn, "select Longitude, Latitude, City, State, Total_Value, GridID, new_LOB from [2020Q3_Enterprise_Exposure_Wind] where State in ('GA')")
DBI::dbDisconnect(cn)
When I run the code that contains this query on my local machine, it takes some time but it does finish. I am currently trying to run it in a docker image with the following metrics:
docker run --memory=10g --rm -d -p 8787:8787 -v /efs:/home/rstudio/efs -e ROOT=TRUE -e DISABLE_AUTH=true myrstudio
Is there a way to debug the RSQLite package? Is there another way to perform this query without using this package? The rest of the code runs fine, but it gets held up on this specific step and usually does not finish (especially if it is the 2nd or 3rd time that this piece of code runs in the docker image).
The number of states to include in the query changes from run to run.

If you have this issue, be sure to remove any columns you are not using from the SQL file. In the end, I loaded it as a postgres database online and that seems to fix the issue I was experiencing. Here is the new query for anyone's benefit.
library(RPostgres)
library(RPostgreSQL)
library(DBI)
db <- 'airflow_db' #provide the name of your db
host_db <- [omitted for privacy]
db_port <- [omitted for privacy] # or any other port specified by the DBA
db_user <- [omitted for privacy]
db_password <- Sys.getenv("DB_PASS")
con <- dbConnect(RPostgres::Postgres(), dbname = db, host=host_db, port=db_port, user=db_user, password=db_password)
query <- paste('select * from weather_report.ent_expo_data where "State" in (', in_states_clause,')', sep='')
print(query)
df <- dbGetQuery(con, query)

Cannot restore data because of dot in database name

InfluxDB-version: 1.6.3
I've created a backup of a database called 'test.mydb' using the legacy backup format:
influxd backup -database <mydatabase> <path-to-backup>
The backup went fine but when I tried to restore:
sudo influxd restore -db "test.mydb" -newdb "test.mydb" -datadir /var/lib/influxdb/data /home/ubuntu/influxdb/test.mydb/
I got the error: backup tarfile name incorrect format.
After searching I think it is because of this code in influxdb/cmd/influxd/restore/restore.go:
// should get us ["db","rp", "00001", "00"]
pathParts := strings.Split(filepath.Base(tarFile), ".")
if len(pathParts) != 4 {
return fmt.Errorf("backup tarfile name incorrect format")
}
It checks how many dots there are in the backup file names. The amount needs to be 4 but because of my database-name the files have 5 dots.
Are there any workarounds?

I did not find an optimal solution to this problem so I manually copied and pasted the data to InfluxDB.

Fortify, how to start analysis through command

How we can generate FortiFy report using command ??? on linux.
In command, how we can include only some folders or files for analyzing and how we can give the location to store the report. etc.
Please help....
Thanks,
Karthik

1. Step#1 (clean cache)
you need to plan scan structure before starting:
scanid = 9999 (can be anything you like)
ProjectRoot = /local/proj/9999/
WorkingDirectory = /local/proj/9999/working
(this dir is huge, you need to "rm -rf ./working && mkdir ./working" before every scan, or byte code piles underneath this dir and consume your harddisk fast)
log = /local/proj/9999/working/sca.log
source='/local/proj/9999/source/src/**.*'
classpath='local/proj/9999/source/WEB-INF/lib/*.jar; /local/proj/9999/source/jars/**.*; /local/proj/9999/source/classes/**.*'
./sourceanalyzer -b 9999 -Dcom.fortify.sca.ProjectRoot=/local/proj/9999/ -Dcom.fortify.WorkingDirectory=/local/proj/9999/working -logfile /local/proj/working/9999/working/sca.log -clean
It is important to specify ProjectRoot, if not overwrite this system default, it will put under your /home/user.fortify
sca.log location is very important, if fortify does not find this file, it cannot find byte code to scan.
You can alter the ProjectRoot and Working Directory once for all if your are the only user: FORTIFY_HOME/Core/config/fortify_sca.properties).
In such case, your command line would be ./sourceanalyzer -b 9999 -clean
2. Step#2 (translate source code to byte code)
nohup ./sourceanalyzer -b 9999 -verbose -64 -Xmx8000M -Xss24M -XX:MaxPermSize=128M -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:+UseParallelGC -Dcom.fortify.sca.ProjectRoot=/local/proj/9999/ -Dcom.fortify.WorkingDirectory=/local/proj/9999/working -logfile /local/proj/9999/sca.log -source 1.5 -classpath '/local/proj/9999/source/WEB-INF/lib/*.jar:/local/proj/9999/source/jars/**/*.jar:/local/proj/9999/source/classes/**/*.class' -extdirs '/local/proj/9999/source/wars/*.war' '/local/proj/9999/source/src/**/*' &
always unix background job (&) in case your session to server is timeout, it will keep working.
cp : put all your known classpath here for fortify to resolve the functiodfn calls. If function not found, fortify will skip the source code translation, so this part will not be scanned later. You will get a poor scan quality but FPR looks good (low issue reported). It is important to have all dependency jars in place.
-extdir: put all directories/files you don't want to be scanned here.
the last section, files between ' ' are your source.
-64 is to use 64-bit java, if not specified, 32-bit will be used and the max heap should be <1.3 GB (-Xmx1200M is safe).
-XX: are the same meaning as in launch application server. only use these to control the class heap and garbage collection. This is to tweak performance.
-source is java version (1.5 to 1.8)
3. Step#3 (scan with rulepack, custom rules, filters, etc)
nohup ./sourceanalyzer -b 9999 -64 -Xmx8000M -Dcom.fortify.sca.ProjectRoot=/local/proj/9999 -Dcom.fortify.WorkingDirectory=/local/proj/9999/working -logfile /local/ssap/proj/9999/working/sca.log **-scan** -filter '/local/other/filter.txt' -rules '/local/other/custom/*.xml -f '/local/proj/9999.fpr' &
-filter: file name must be filter.txt, any ruleguid in this file will not be reported.
rules: this is the custom rule you wrote. the HP rulepack is in FORTIFY_HOME/Core/config/rules directory
-scan : keyword to tell fortify engine to scan existing scanid. You can skip step#2 and only do step#3 if you did notchange code, just want to play with different filter/custom rules
4. Step#4 Generate PDF from the FPR file (if required)
./ReportGenerator -format pdf -f '/local/proj/9999.pdf' -source '/local/proj/9999.fpr'

how to check the size of the folder on terminal using fast method

I know following command gives the size of a particular folder. But it takes longer time when I am using this command on HPC. Is there any other fast command to get the size of folder.
du -sh

It's only slow because it's recursive. The deeper and larger the subfolder structure, the longer it will take to traverse.
You tagged both iOS and high performance computing -- I am assuming iOS was a mistake. Depending on your filesystem you may have other options.
For example, on OS X you can use the index created by spotlight in a command-line fashion:
Usage: mdfind [-live] [-count] [-onlyin directory] [-name fileName | -s smartFolderName | query]
list the files matching the query
query can be an expression or a sequence of words
-attr <attr> Fetches the value of the specified attribute
-count Query only reports matching items count
-onlyin <dir> Search only within given directory
-live Query should stay active
-name <name> Search on file name only
-reprint Reprint results on live update
-s <name> Show contents of smart folder <name>
-0 Use NUL (``\0'') as a path separator, for use with xargs -0.
example: mdfind image
example: mdfind -onlyin ~ image
example: mdfind -name stdlib.h
example: mdfind "kMDItemAuthor == '*MyFavoriteAuthor*'"
example: mdfind -live MyFavoriteAuthor
The details on the MDItem used by mdfind is here.
For the most part though, this is something you could calculate yourself, then check periodically. Store the result in a cache like memcached or ehcache. Depending on which journaled file system you use, you could get notifications when a folder is changed and recalculate that folder only.

HDF5 integration with ROOT framework

I've worked extensively with ROOT, which has it's own format for data files, but for various reasons we would like to switch to HDF5 files. Unfortunately we'd still require some way of translating files between formats. Does anyone know of any existing libraries which do this?

You might check out rootpy, which has a facility for converting ROOT files into HDF5 via PyTables: http://www.rootpy.org/commands/root2hdf5.html

If this issue is still of interest to you, recently there have been large improvements to rootpy's root2hdf5 script and the root_numpy package (which root2hdf5 uses to convert TTrees into NumPy structured arrays):
root2hdf5 -h
usage: root2hdf5 [-h] [-n ENTRIES] [-f] [--ext EXT] [-c {0,1,2,3,4,5,6,7,8,9}]
[-l {zlib,lzo,bzip2,blosc}] [--script SCRIPT] [-q]
files [files ...]
positional arguments:
files
optional arguments:
-h, --help show this help message and exit
-n ENTRIES, --entries ENTRIES
number of entries to read at once (default: 100000.0)
-f, --force overwrite existing output files (default: False)
--ext EXT output file extension (default: h5)
-c {0,1,2,3,4,5,6,7,8,9}, --complevel {0,1,2,3,4,5,6,7,8,9}
compression level (default: 5)
-l {zlib,lzo,bzip2,blosc}, --complib {zlib,lzo,bzip2,blosc}
compression algorithm (default: zlib)
--script SCRIPT Python script containing a function with the same name
that will be called on each tree and must return a tree or
list of trees that will be converted instead of the
original tree (default: None)
-q, --quiet suppress all warnings (default: False)

As of when I last checked (a few months ago) root2hdf5 had a limitation that it could not handle TBranches which were arrays. For this reason I wrote a bash script: root2hdf (sorry for non-creative name).
It takes a ROOT file and the path to the TTree in the file as input arguments and generates source code & compiles to an executable which can be run on ROOT files, converting them into HDF5 datasets.
It also has a limitation that it cannot handle compound TBranch types, but I don't know that root2hdf5 does either.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

InfluxDB storage size on disk - influxdb

All I want is simply to know how much space my InfluxDB database takes on my HDD. The stats() command gives me dozens of numbers but I don't know which one shows what I want.

Stats output does not contain that information. The size of the directory structure on disk will give that info. du -sh /var/lib/influxdb/data/<db name> Where /var/lib/influxdb/data is the data directory defined in influxdb.conf.

For InfluxDB 2.0 on MacOS - (for me at least) - the location is ~/.influxdbv2/engine. Running "du -sh *" will show you the disk usage.

Related

Using R to Query a SQLite file Help: DBI::dbGetQuery() never completes, very slow

Cannot restore data because of dot in database name

Fortify, how to start analysis through command

how to check the size of the folder on terminal using fast method

HDF5 integration with ROOT framework

Categories

Resources