Storing numbers above 2e19 - influxdb

I am using InfluxDB and need to store very large numbers (uint256) with full precision (no floating point).
I am currently using strings to achieve this, but I thus loose the ability to perform arithmetic operations on these numbers, which is something I need to implement.
ex.
{ _measurement=transfer, _field=amount, _value=90000000000000000000001 }
{ _measurement=transfer, _field=amount, _value=12000000000000000000000 }
from(bucket: "xxx")
|> range(start: 0)
|> filter(fn: (r) => r._measurement == "transfer" and r._field == "amount")
|> toUint()
|> sum()
I get the following error: runtime error #4:6-4:13: toInt: failed to evaluate map function: cannot convert string "90000000000000000000001" to int due to invalid syntax
Is there a built-in solution like ClickHouse's uint256, or a third-party package to achieve this?

Unfortunately InfluxDB doesn't support big numbers yet. It stores all integers as signed int64 data types. The minimum and maximum valid values for int64 are -9223372036854775808 and 9223372036854775807. See more info here.

Related

Combining two different sources with different timestamps in influxdb/flux

I have 2 measurements as follows:
metrics,app=app1 cpu=10 1654150510
metrics,app=app1 cpu=12 1654150512
metrics,app=app1 cpu=13 1654150514
metrics,app=app1 cpu=14 1654150516
metrics,app=app1 cpu=15 1654150519
The frequency of the "metrics" measurement is about once in 2/3 seconds.
And the second one is:
http_requests,app=app1 response_time=12 1654150509
http_requests,app=app1 response_time=11 1654150510
http_requests,app=app1 response_time=15 1654150511
http_requests,app=app1 response_time=14 1654150512
http_requests,app=app1 response_time=13 1654150513
http_requests,app=app1 response_time=10 1654150514
http_requests,app=app1 response_time=12 1654150515
http_requests,app=app1 response_time=11 1654150516
http_requests,app=app1 response_time=13 1654150517
http_requests,app=app1 response_time=12 1654150518
The frequency for http_requests is about 1 second.
I want to combine the 2 metrics into a single table.
_time,value_cpu,value_response_time
1654150509,10,12
1654150510,10,11
1654150511,12,15
As timestamps may be different, is there a way to combine them in flux? Is fill the way. I'm not sure if timeshift will help here. Although I didnt understand it completly. I assume some sort of downsampling is needed (not sure how to do that either in flux). Is there a way to mathch the measuerment based on the closest time differece?
IE...
if response measurements came at time instances
1654150510,app=app1 response_time=10
1654150513,app=app1 response_time=12
1654150514,app=app1 response_time=11
1654150516,app=app1 response_time=13
and CPU came in at
1654150512,app=app1 cpu=20
1654150515,app=app1 cpu=30
Then resulting table is
_time,response_time,cpu
1654150510,10,
1654150513,12,20
1654150514,11,
1654150516,13,30
The CPU value combines to the point with the closest timestamp (+/- difference)
How can this be achieved in flux in influxdb?
I guess downsampling with aggregateWindow and fill could work.
Alternative way is to pivot and then fill missing values using previous value. The advantage, at least from performance point of view, is that when there is no record in neither measurement at given time, no new row filled with previous values is created.
With
from(bucket: "stackx")
|> range(start: -1d)
|> filter(fn: (r) => r._measurement == "metrics" or r._measurement == "http_requests")
|> drop(columns: ["_measurement"]) // or remove from group other way
|> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
|> sort(columns: ["_time"], desc: false)
|> fill(column: "cpu", usePrevious: true)
|> fill(column: "response_time", usePrevious: true)
the result would be
_time,app,cpu,response_time
2022-06-02T06:15:09Z,app1,,12
2022-06-02T06:15:10Z,app1,10,11
2022-06-02T06:15:11Z,app1,10,15
2022-06-02T06:15:12Z,app1,12,14
2022-06-02T06:15:13Z,app1,12,13
2022-06-02T06:15:14Z,app1,13,10
2022-06-02T06:15:15Z,app1,13,12
2022-06-02T06:15:16Z,app1,14,11
2022-06-02T06:15:17Z,app1,14,13
2022-06-02T06:15:18Z,app1,14,12
2022-06-02T06:15:19Z,app1,15,12

How can I query an aggregation of values grouped by hour of day from InfluxDB?

Question
I have an InfluxDB v1.8 running on a Raspberry Pi 2.
I have a measurement "gridElectricityMeter" that has (next to other fields not important here) a field "import" which contains the total reading of the energy meter in Watt-hours. The measurement receives a new value every 10 seconds.
I would like to create a Bar chart that shows the imported amount of energy by hour of day in some time frame to be specified (I am using Grafana to create the chart, so the time frame is set there).
E.g. if the raw data in InfluxDB looks like this:
time
import
2021-01-01T00:00:00Z
0
2021-01-01T01:00:00Z
2
2021-01-01T01:20:00Z
8
2021-01-01T02:00:00Z
10
2021-01-02T00:00:00Z
10
2021-01-02T01:00:00Z
20
(In reality there are of course far more values, one value per 10 seconds. A slight lack of precision because the timestamps are likely a few seconds of the full hour may be accepted.)
Then I want the following result:
hour
sum
explanation
0
12
2 (first day) + 10 (second day)
1
8
8 (first day)
2
0
0
3
0
0
...
...
Data should be filled with zero if unavailable (e.g. querying "today", of course there is no data for the time after "now").
What I have tried so far
As far as I understand I won't be able to do this with InfluxQL. So here is what I tried using flux. Also I think I understood that InfluxDB >= 2.0 is 64-bit only and thus won't run on Raspberry Pi 2.
Here is what I came up with:
import "date"
import "generate"
// generate a table with 24 entries (hours 0-23) and "sum=0":
initial = generate.from(
count: 24,
fn: (n) => n,
// start and stop are actually irrelevant, but they are required args
start: 2021-01-01T00:00:00Z,
stop: 2021-01-02T00:00:00Z,
)
|> map(fn: (r) => ({hour: r._value, sum: 0}))
// First group data by day and hour
data = from(bucket: "myDatabase")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "gridElectricityMeter" and r._field == "import")
|> map(fn: (r) => ({r with hour: date.hour(t: r._time), group: date.truncate(t: r._time, unit: 1h)}))
|> group(columns: ["group"])
// In each group we get the first&last point so that we get the first and last reading from the energy meter. The difference between those values is the energy imported.
first = data
|> first()
last = data
|> last()
// Calculate energy used in each group and then regroup by hour (and not day):
byhour = join(tables: {first: first, last: last}, on: ["group"])
|> map(fn: (r) => ({r with hour: r.hour_first, _value: r._value_last - r._value_first}))
|> group(columns: ["hour"])
// manual summing because of https://github.com/influxdata/flux/issues/2505
|> reduce(fn: (r, accumulator) => ({sum: accumulator.sum + r._value}), identity: {sum: 0})
// Fill hours we have no data for with zero:
union(tables: [byhour, initial])
|> group(columns: ["hour"])
|> reduce(fn: (r, accumulator) => ({sum: accumulator.sum + r.sum}), identity: {sum: 0})
|> group()
|> sort(columns: ["hour"])
It does seem to work and actually do what I want it to do. However it seem ridiculously complicated. And also it is slow. Querying data for a single day takes about 7 seconds. A single day contains 66024=8640 data points in that measurements which really does not sound very much to me and it should not be that slow.
Is there a better way of doing it?

Which way to compare maps in Elixir

Given two large and different maps defined as follows
Interactive Elixir (1.9.4) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> m1 = Map.new(1..1_000_000 |> Enum.map(&{&1, &1})); :ok
:ok
iex(2)> m2 = Map.new(2..1_000_000 |> Enum.map(&{&1, &1})); :ok
:ok
it takes a significant time difference when comparing them using ==/2 and Map.equal?/2
iex(3)> :timer.tc(fn -> m1 == m2 end)
{21, false}
iex(4)> :timer.tc(fn -> Map.equal?(m1, m2) end)
{20487, false}
What is the reason of this time difference between ==/2 and Map.equal?/2, and which to use?
Equivalently, which to use between ==/2 and ===/2? (because Map.equal?/2 calls to ===/2, see here)
Thanks
Indeed, Map.equal?/2 is simply delegating to Kernel.===/2.
Kernel.===/2 is delegating to :erlang."=:="/2 and Kernel.==/2 is delegating to :erlang."=="/2. The latter compares numbers while the former compares values and types.
Consider the following example.
%{1 => 1} == %{1 => 1.0}
#⇒ true
%{1 => 1} === %{1 => 1.0}
#⇒ false
That said, Kernel.===/2 should compare all the values. OTOH, Erlang/OTP reference explicitly states that
Maps are ordered by size, two maps with the same size are compared by keys in ascending term order and then by values in key order. In maps, key order integers types are considered less than floats types.
If it was indeed true, two maps of the different size, as in your example, should have returned in the nearly same time.
The summing up, I would consider this time difference to be a bug in :erlang."=:="/2 implementation, worth to report to Erlang team.

Modifying the series data using flux language in influx

I have a function which takes the 1st value from a masurement every 5mins.
ever5Mins1st = from(bucket: "Historian/oneday")
|> range(start: dashboardTime)
|> filter(fn: (r) =>
r._measurement == "InventoryStock" and
r._field =="Value" and
r.Location=="NYC"
)
|>window(every:5m)
|>first()
|>window(every:inf)
Now If I want to do a difference of two consecutive points , I can do by using difference function
ever5Mins1st
|>difference()
But what If I do I want a sum of every consecutive points or every 5 points.
I know I can write custom functions which recieves piped data. But do I need to write a for loop? Are there any examples of for loops and conditional statements?
// Function definition
add = (tables=<-) =>
tables
|> map(fn: (r) => //What should I do here, for loop?)
I don't believe this is possible at the moment. This would require a chunk function, to split a list into chunks of lists.
I'll add an issue to Flux and start to implement this functionality :+1:

Option type benchmark using F#

I need to use Some/None options in heavy numerical simulations. The following micro benchmark gives me Fast = 485 and Slow = 5890.
I do not like nulls and even if I liked them I cannot use null because The type 'float' does not have 'null' as a proper value.
Ideally there would be a compiler option that would compile Some/None into value/null so there would be no runtime penalty. Is that possible? Or how shall I make Some/None efficient?
let s = System.Diagnostics.Stopwatch()
s.Start()
for h in 0 .. 1000 do
Array.init 100000 (fun i -> (float i + 1.)) |> ignore
printfn "Fast = %d" s.ElapsedMilliseconds
s.Restart()
for h in 0 .. 1000 do
Array.init 100000 (fun i -> Some (float i + 1.)) |> ignore
printfn "Slow = %d" s.ElapsedMilliseconds
None is actually already represented as null. But since option<_> is a reference type (which is necessary for null to be a valid value in the .NET type system), creating Some instances will necessarily require heap allocations. One alternative is to use the .NET System.Nullable<_> type, which is similar to option<_>, except that:
it's a value type, so no heap allocation is needed
it only supports value types as elements, so you can create an option<string>, but not a Nullable<string>. For your use case this seems like an unimportant factor.
it has runtime support so that boxing a nullable without a value results in a null reference, which would be impossible otherwise
Keep in mind that your benchmark does very little work, so the results are probably not typical of what you'd see with your real workload. Try to use a more meaningful benchmark based on your actual scenario if at all possible.
As a side note, you get more meaningful diagnostics (including garbage collection statistics) if you use the #time directive in F# rather than bothering with the Stopwatch.

Resources