I just now started with influx, Need help to get the data sorted by timestamp and the latest data by it.
select DB,AREA,sptotal,spfree,pctfree from ORA_SIZE GROUP BY DB order by time ;
name: ORA_SIZE
tags: DB=DB43B
time DB AREA sptotal spfree pctfree
---- -- ---- ------- ------ -------
1587919100011225116 DB43B DATA 442 303 68
1587919100011225116 DB43B SYSTEM 40 35 87
1587919088732608975 DB43B DATA 442 303 68
1587919088732608975 DB43B SYSTEM 40 35 87
Here, I want to retrieve only
1587919088732608975 DB43B DATA 442 303 68
1587919088732608975 DB43B SYSTEM 40 35 87
as they are the latest data based on time,
is there any query for this? Please advise.
Thanks,
You can use the Last Function to get the latest value of any field.
Example :
SELECT LAST(DB,AREA,sptotal,spfree,pctfree) FROM ORA_SIZE [WHERE_clause] [GROUP_BY_clause]
This query should return your desired data.
You can find the documentation here
Related
I'm querying data from different shards and used EXPLAIN to check how many series are being fetched for that particular date range.
> SHOW SHARDS
.
.
658 mydb autogen 658 2019-07-22T00:00:00Z 2019-07-29T00:00:00Z 2020-07-27T00:00:00Z
676 mydb autogen 676 2019-07-29T00:00:00Z 2019-08-05T00:00:00Z 2020-08-03T00:00:00Z
.
.
Executing EXPLAIN for data from shard 658 and it's giving expected result in terms of number of series. SensorId is only tag key and as date range fall into only shard it's giving NUMBER OF SERIES: 1
> EXPLAIN select "kWh" from Reading where (SensorId =~ /^1186$/) AND time >= '2019-07-27 00:00:00' AND time <= '2019-07-28 00:00:00' limit 10;
QUERY PLAN
----------
EXPRESSION: <nil>
AUXILIARY FIELDS: "kWh"::float
NUMBER OF SHARDS: 1
NUMBER OF SERIES: 1
CACHED VALUES: 0
NUMBER OF FILES: 2
NUMBER OF BLOCKS: 4
SIZE OF BLOCKS: 32482
But when I run the same query on date range that falls into shard 676, number of series is 13140 instead of just one.
> EXPLAIN select "kWh" from Reading where (SensorId =~ /^1186$/) AND time >= '2019-07-29 00:00:00' AND time < '2019-07-30 00:00:00';
QUERY PLAN
----------
EXPRESSION: <nil>
AUXILIARY FIELDS: "kWh"::float
NUMBER OF SHARDS: 1
NUMBER OF SERIES: 13140
CACHED VALUES: 0
NUMBER OF FILES: 11426
NUMBER OF BLOCKS: 23561
SIZE OF BLOCKS: 108031642
Environment info:
System info: Linux 4.4.0-1087-aws x86_64
InfluxDB version: InfluxDB v1.7.6 (git: 1.7 01c8dd4)
Update - 1
On checking field cardinality, I observed a spike in RAM.
> SHOW FIELD KEY CARDINALITY
Update - 2
I've rebuilt the indexes, but the cardinality is still high.
Update - 3
I found out that shard has "SensorId" as tag as well as field that causing high cardinality when querying with the "SensorId" filter.
> SELECT COUNT("SensorId") from Reading GROUP BY "SensorId";
name: Reading
tags: SensorId=
time count
---- -----
1970-01-01T00:00:00Z 40
But when I'm checking tag values with key 'SensorId', it's not showing empty string that present in the above query.
> show tag values with key = "SensorId"
name: Reading
key value
--- -----
SensorId 10034
SensorId 10037
SensorId 10038
SensorId 10039
SensorId 10040
SensorId 10041
.
.
.
SensorId 9938
SensorId 9939
SensorId 9941
SensorId 9942
SensorId 9944
SensorId 9949
Update - 4
Inspected data using influx_inspect dumptsm and re-validated that null tag values are present
$ influx_inspect dumptsm -index -filter-key "" /var/lib/influxdb/data/mydb/autogen/235/000008442-000000013.tsm
Index:
Pos Min Time Max Time Ofs Size Key Field
1 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 5 103 Reading 1001
2 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 108 275 Reading 2001
3 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 383 248 Reading 2002
4 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 631 278 Reading 2003
5 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 909 278 Reading 2004
6 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1187 184 Reading 2005
7 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1371 103 Reading 2006
8 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1474 250 Reading 2007
9 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1724 103 Reading 2008
10 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 1827 275 Reading 2012
11 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 2102 416 Reading 2101
12 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 2518 103 Reading 2692
13 2019-08-01T01:46:31Z 2019-08-01T17:42:03Z 2621 101 Reading SensorId
14 2019-07-29T00:00:05Z 2019-07-29T05:31:07Z 2722 1569 Reading,SensorId=10034 2005
15 2019-07-29T05:31:26Z 2019-07-29T11:03:54Z 4291 1467 Reading,SensorId=10034 2005
16 2019-07-29T11:04:14Z 2019-07-29T17:10:16Z 5758 1785 Reading,SensorId=10034 2005
I am new in using Cognos and I have data for the overall project, and I need to create some kind of table or cross tab may be to count the overall project of each year and how many of them are active, canceled and inactive
I have tried using a cross tab but no success.
ProjectId Status Date
1589 Active 8/29/2018
1566 Inactive 4/17/2018
1042 Cancelled 1/6/2014
1374 Completed 1/20/2015
1543 Completed 8/4/2014
1065 Cancelled 7/15/2014
1397 Completed 10/1/2012
1520 Inactive 4/13/2017
1420 Completed 1/1/2015
1443 Completed 1/1/2015
1048 Cancelled 10/16/2014
1002 Active 2/6/2017
1357 Completed 1/19/2017
1606 Active 11/6/2018
Output should look like this
New Projects Active Cancelled/Terminated/Inactive Carried Forward
2013 32 45 4 11 30
2014 45 75 17 14 44
2015 46 90 25 21 44
2016 30 74 27 10 37
2017 82 119 11 26 82
2018 86 168 29 24 115
2019 23 138 9 4 125
Going with -- project Id, status, Date
The ideal scenario is we have a data item for year. If not, to get the year
extract(year, Date)
Calculation data items: for each count
For example, this is for active
if (status = 'Active')Then(1)Else(0)
For properties
Make sure the aggregation is set to total
Adding the column should give you the count
I have a simple weather station DB with example content:
time humi1 humi2 light pressure station-id temp1 temp2
---- ----- ----- ----- -------- ---------- ----- -----
1530635257289147315 66 66 1834 1006 bee1 18.6 18.6
1530635317385229860 66 66 1832 1006 bee1 18.6 18.6
1530635377466534866 66 66 1829 1006 bee1 18.6 18.6
Station writes data every minute. I want to get SELECT not with all series, but just series written every hour (or every 60th series, simply said). How can I achieve it?
I tried to experiment with ...WHERE time % 60 = 0, but it didn`t work. It seems, that time column doesnt permit any math operations (/, %, etc).
Group by along with a one of the functions can do what you want:
SELECT FIRST("humi1"), FIRST("humi2"), ... GROUP BY time(1h)
I would imagine for most climate data you'd want the MEAN or MEDIAN rather than a single data point every hour
basic example, and more complex example
I'm newbie in machine learning, so I need your advice.
Imagine, we have two data sets (df1 and df2).
First data set include about 5000 observations and some features, to simplify:
name age company degree_of_skill average_working_time alma_mater
1 John 39 A 89 38 Harvard
2 Steve 35 B 56 46 UCB
3 Ivan 27 C 88 42 MIT
4 Jack 26 A 87 37 MIT
5 Oliver 23 B 76 36 MIT
6 Daniel 45 C 79 39 Harvard
7 James 34 A 60 40 MIT
8 Thomas 28 B 89 39 Stanford
9 Charlie 29 C 83 43 Oxford
The learning problem - to predict productivity of companies from second data set (df2) for next period of time (june-2016), based on data from the first data set (df1).
df2:
company productivity date
1 A 1240 april-2016
2 B 1389 april-2016
3 C 1388 april-2016
4 A 1350 may-2016
5 B 1647 may-2016
6 C 1272 may-2016
So as we can see both data sets include feature "company". But I don't understand how I can create a link between these two features. What shoud I do with two data sets to solve the learning problem? Is it possible?
Here is a sample of my CSV
10820 0 0 0 0
10900 2 4 4 4
11000 21 50 54 58
11100 23 54 59 63
11200 25 59 63 68
11300 27 63 68 73
11400 29 68 73 78
11500 31 72 78 83
11600 32 76 82 88
11700 34 81 87 93
I'm looking to create to use xcode to retreive one line of code from this very lengthy CSV based on the first line.
For example:
if the user enters "10900", the second line columns will be returned.
If the user returns 11650, the 11600 line columns will be returned...always taking the lower line when the input value is less then the following line.
Any help would be appreciated. I've seen code to parse an entire CSV file, but I'm thinking this may be a big memory drain, right now my CSV has 2000 lines of values, which are all in ascending order based on the first column.
You have to load a file into memory anyways to find correct value.
With such a big CSV file I would recommend to turn CSV file into binary file (plist file for example) and put it as binary into your application - instead of parsing it each time in RunTime. It has much better performance and it's easier to work with that since you are working directly with NSDictonaries an NSArrays.
If you don't want to do it for some reason, the next solution is to use something like CHCSVParser:
https://github.com/davedelong/CHCSVParser
It provides optimization for loading only part of file at a time - which is the optimization you might be looking for.