Segment Tree - Finding all subarray sums - segment-tree

Suppose we have an array that's like [1, 2, 3, 4], if I created a segment tree for that array we'd get something like: [null, 10, 3, 7, 1, 2, 3, 4], so all of the subarray sums would exactly what we have on the segment tree.
However, if our input array is like [1, 2, 3], our segment tree would be something like: [null, 6, 3, 3, 1, 2, 3, 0], with the trailing 0 since we don't have a complete binary tree due to 3 (the array's length) not being a power of 2.
Unlike in the first example, since our binary tree isn't complete, we run into duplicate ranges. In our tree: [null, 6, 3, 3, 1, 2, 3, 0], the 2nd last 3 and the last 3 represent the same range, since the right tree has a 0.
Is there any way to distinguish between this duplicate range? Or should I be using another data structure for a problem that's susceptible to this kind of segment tree duplicate range issue that I'm having with my second example?

Related

what is meant by time based spliting in cross validation techniques?

I have a timestamp for every record in the data set.
I heard about time based spiting but don't know anything about it.
Normal cross-validation
You have a set of data points:
data_points = [2, 4, 5, 8, 6, 9]
Then, if you do a 2-fold split, your data points will get randomly assigned to 2 different groups.
For example:
split_1 = [2, 5, 9]
split_2 = [3, 8, 6]
However, this assumes that there is no need to keep the order of your data points.
You can train your model with split_1 and test it with split_2.
Time based splitting
However, this assumption isn't always correct for time series prediction.
For example, given the same data points:
data_points = [2, 4, 5, 8, 6, 9]
It can be that they are arranged by time.
You could then have a model that to predict the next number, it looks back 3 time steps. (e.g. to predict the number after 9, it will have [8, 6, 9] as input. Meaning that the order of which the data points appear is important. Because of that, in order to test your model, you cannot randomly split your data points. The order in which they appear needs to be kept.
So if you do a 2-fold split, you could get the following splits:
split_1 = [2, 4, 5, 8]
split_2 = [5, 8, 6, 9]
Implementation
There is an implementation of time-based cross-validation from Sklearn: the TimeSeriesSplit.

Influxdb: Query for distinct values

First: I am aware of the distinct() function, but that's not what I want.
My problem: Imagine a series with sensor readings that barely change like e.g:
[2, 2, 2, 3, 3, 3, 4, 4, 4, 4, 4, 3, 3, 5, 5, 2]
In my application this series is very long (thousands of entries) and I would like to visualize it in a Diagram (on Android, but that doesn't matter).
What I'd like to achieve:
I would like to get the values, where the series changes e.g:
[2, 3, 4, 3, 5, 2]
of course with their respective timestamps and tags.
With the distinct() function the result would look like this:
[2, 3, 4, 5, ]
Thanks!

Rails: Query to sort record by number except 0

I'm arranging the data based on a priority(ascending order), where '0' ignored in prioritising.
Below is the Rails Query:
Profile.where(active: true).order(:priority).pluck(:priority)
This query returns an ordered list of records with priorities that starts from '0'
[0, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 7]
Could you help me figure out how to order the data where the record with "0" is added to last in the query as per the example below.
Example: [1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 7, 0]
You can pass a string to #order to use raw SQL so you could say:
Profile.where(active: true)
.order('case priority when 0 then 1 else -1 end, priority')
.pluck(:priority)
to force the priority zero entries to the end. You don't have to use 1 and -1 as the numbers of course, you could use anything that is readable to you and sorts in the right order, you could even use strings (assuming they sort properly of course):
.order("case priority when 0 then 'last' else 'first' end, priority")

Quantiles function in BigQuery Standard SQL

BigQuery with Legacy SQL has a pretty convenient QUANTILES function to quickly get a histogram of values in a table without specifying the buckets by hand.
I can't find a nice equivalent in aggregation functions available in Standard SQL. Did I miss something obvious, or otherwise, what's the standard way of emulating it?
You're looking for the APPROX_QUANTILES function :) One of the examples from the docs is:
#standardSQL
SELECT APPROX_QUANTILES(x, 2) AS approx_quantiles
FROM UNNEST([NULL, NULL, 1, 1, 1, 4, 5, 6, 7, 8, 9, 10]) AS x;
+------------------+
| approx_quantiles |
+------------------+
| [1, 5, 10] |
+------------------+
Note that it returns an array, but if you want the elements of the array as individual rows, you can unnest the result:
#standardSQL
SELECT
quant, offset
FROM UNNEST((
SELECT APPROX_QUANTILES(x, 2) AS quants
FROM UNNEST([NULL, NULL, 1, 1, 1, 4, 5, 6, 7, 8, 9, 10]) AS x
)) AS quant WITH OFFSET
ORDER BY offset;

how to algorithm this?

I have a number of fruit baskets, all of them have a random amount of apples and they have different properties.
arrayOfBaskets = [
["basketId": 1, "typeOfPesticidesUsed": 1, "fromCountry":1, "numberOfApples": 5],
["basketId": 2, "typeOfPesticidesUsed": 1, "fromCountry":1, "numberOfApples": 6],
["basketId": 3, "typeOfPesticidesUsed": 2, "fromCountry":1, "numberOfApples": 3],
["basketId": 4, "typeOfPesticidesUsed": 2, "fromCountry":1, "numberOfApples": 7],
["basketId": 5, "typeOfPesticidesUsed": 1, "fromCountry":2, "numberOfApples": 8],
["basketId": 6, "typeOfPesticidesUsed": 1, "fromCountry":2, "numberOfApples": 4],
["basketId": 7, "typeOfPesticidesUsed": 2, "fromCountry":2, "numberOfApples": 9],
["basketId": 8, "typeOfPesticidesUsed": 2, "fromCountry":2, "numberOfApples": 5]
]
in this case, how do I formulate an algorithm of sorts to output into an array like so:
uniquePairingOfBasketProperties = [
["typeOfPesticidesUsed": 1, "fromCountry":1],
["typeOfPesticidesUsed": 2, "fromCountry":1],
["typeOfPesticidesUsed": 1, "fromCountry":2],
["typeOfPesticidesUsed": 2, "fromCountry":2]
]
my main point is so that I can get my UITableView to know how many rows it should have. Which in this case is 4 instead of total number of baskets.
Huh? You have an array of dictionaries. You want to divide those dictionaries into "buckets" where each bucket has a unique combination of pesticide type and country of origin?
Assuming that's the case, how about this:
let kNumberOfCountries = 2
uniqueValue = basket["typeOfPesticidesUsed"] * kNumberOfCountries +
basket["fromCountry"]
uniqueValue will jump in large steps based on the type of pesticide, and then change by 1s based on the country of origin. (think of a rectangular grid where the country number starts at 1 on the left and increases to the right, and the pesticide number starts at 1 at the top and increases as you go down. The unique value number is 1 at the top left square, counts up to the right, then wraps around to the next row and keeps counting up by 1s.
You can then group your table view based on uniqueVaue.
If you want to know how many unique parings you have, create an empty set of integers. Loop through your array of baskets. Calculate the uniqueValue for that basket, and add it to the set of uniqueValues (sets only have one entry for each value.) Once you are done looping, the number of entries in the set is the number of unique pairings you have. If you use an NSCountedSet, you can even get the count of the number of baskets with each pairing. (I don't know if Swift has a native counted set collection. It didn't last time I checked.)
EDIT:
It looks like Swift does NOT have a native counted set collection (at least not yet.) There is, however, at least one open source Swift counted set (aka a bag) on Github

Resources