I want to use InfluxDB variable as a query. For the query I will be setting 3 variables:
start=100
end=200
step=25
The query should return the list of values - 100, 125, 150, 175, 200 and be available for that variable. How should I write the query to achieve this as the result. I will not be reading from any buckets or database and this is pure mathematical number list generation.
I know I could use CSV or Map type of variables but I would then have to manually calculate the list of numbers and update it vs if I have a query I can just change the start/end/step value and the new list would automatically get generated.
I'm stuck because Influx query doesn't provide a do-while or loop statement. If any other approach helps in achieving this result?
Found a not-so-clean solution, but this gets the job done for now.
import "generate"
start = 100
end = 200
step = 25
num = ((end-start)/step)+1
generate.from(
count: num,
fn: (n) => (start + step*n),
start: 2021-01-01T00:00:00Z,
stop: 2021-01-02T00:00:00Z,
)
|> toString()
Related
I am using Dataflow and Apache Beam to process a dataset and store the result in a headerless csv file with two columns, something like this:
A1,a
A2,a
A3,b
A4,a
A5,c
...
I want to filter out certain entries based on the following two conditions:
1- In the second column, if the number of occurrences of a certain value is less than N, then remove all such rows. For instance if N=10 and c only appears 7 times, then I want all those rows to be filtered out.
2- In the second column, if the number of occurrences of a certain value is more than M, then only keep M many of such rows and filter out the rest. For instance if M=1000 and a appears 1200 times, then I want 200 of such entries to be filtered out, and the other 1000 cases to be stored in the csv file.
In other words, I want to make sure all elements of the second columns appear more than N and less than M many times.
My question is whether this is possible by using some filter in Beam? Or should it be done as a post-process step once the csv file is created and saved?
You can use beam.Filter to filter out all the second column values that matches your range's lower bound condition into a PCollection.
Then correlate that PCollection (as a side input) with your original PCollection to filter out all the lines that need to be excluded.
As for the upperbound, since you want to keep any upperbound amount of elements instead of excluding them completely, you should do some post processing or come up with some combine transforms to do that.
An example with Python SDK using word count.
class ReadWordsFromText(beam.PTransform):
def __init__(self, file_pattern):
self._file_pattern = file_pattern
def expand(self, pcoll):
return (pcoll.pipeline
| beam.io.ReadFromText(self._file_pattern)
| beam.FlatMap(lambda line: re.findall(r'[\w\']+', line.strip(), re.UNICODE)))
p = beam.Pipeline()
words = (p
| 'read' >> ReadWordsFromText('gs://apache-beam-samples/shakespeare/kinglear.txt')
| "lower" >> beam.Map(lambda word: word.lower()))
import random
# Assume this is the data PCollection you want to do filter on.
data = words | beam.Map(lambda word: (word, random.randint(1, 101)))
counts = (words
| 'count' >> beam.combiners.Count.PerElement())
words_with_counts_bigger_than_100 = counts | beam.Filter(lambda count: count[1] > 100) | beam.Map(lambda count: count[0])
Now you get a pcollection like
def cross_join(left, rights):
for x in rights:
if left[0] == x:
yield (left, x)
data_with_word_counts_bigger_than_100 = data | beam.FlatMap(cross_join, rights=beam.pvalue.AsIter(words_with_counts_bigger_than_100))
Now you filtered out elements below lowerbound from the data set and get
Note the 66 from ('king', 66) is the fake random data I put in.
To debug with such visualizations, you can use interactive beam. You can setup your own notebook runtime following instructions; Or you can use hosted solutions provided by Google Dataflow Notebooks.
Let me know how can I solve or write query in influx for this scenario ( flux or influx query will work for me )
I have a field called x and m. There is a function in influx called difference which takes the difference between the first and the next field value.
I would like to take the difference between x and next x value and also would like to have the next m value as the row
so this is what I require as a single row
(x_next - x), m_next
.....
How can I do that in influx of flux queries. I can have x_next-x using difference but how to get m_next in this.
select difference(x), moving_average(m)+0.5*difference(m) from mydata
is the closest solution you can use without group by time.
I am trying to get total number of db-hits from my Cypher query. For some reason I always get 0 when calling this:
String query = "PROFILE MATCH (a)-[r]-(b)-[p]-(c)-[q]-(a) RETURN a,b,c";
Result result = database.execute(query);
while (result.hasNext()) {
result.next();
}
System.out.println(result.getExecutionPlanDescription().getProfilerStatistics().getDbHits());
The database seems to be ok. Is there something wrong about the way of reaching such value?
ExecutionPlanDescription is a tree like structure. Most likely the top element does not directly hit the database by itself, e.g. a projection.
So you need to write a recursive function using ExecutionPlanDescription.getChildren() to drill to the individual parts of the query plan. E.g. if one of the children (or sub*-children) is a plan of type Expand you can use plan.getProfilerStatistics().getDbHits().
I am a brand new to F#, and I am having trouble with a simple first query. I have a data set, and I want to group the dollar amount based on the codes (which repeat in the data). Then, for each group I want the average (and eventually standard deviation) of the dollar amounts for each group. Also, I only want to look at ONE providerID, hence the 'where' clause. From my research, I have gotten this far:
let dc = new TypedDataContext()
let query2 = query { for x in dc.MyData do
groupBy x.Code into g
where (x.ProviderId = "some number of type string")
let average = query { for n in g do
averageBy n.DollarAmt }
select (g.Key, average) }
System.Console.WriteLine(query2)
With this I get a compiling error that says, "The namespace or module 'x' is not defined."
I do not understand this because when I ran the query that only collected the data with the specified providerID, it did not complain about this 'x', and I followed the same format with this 'x' for this larger query.
Any ideas? Thank you in advance.
From #kvb's comment: After the groupBy you can only access the group g, not the individual items x. Try putting the where before the groupBy.
what's the syntax to get random records from a specific node_auto_index using cypher?
I suppose there is this example
START x=node:node_auto_index("uname:*") RETURN x SKIP somerandomNumber LIMIT 10;
Is there a better way that won't return a contiguous set?
there is no feature similar to SQL's Random() in neo4j.
you must either declare the random number in the SKIP random section before you use cypher (in case you are not querying directly from console and you use any upper language with neo4j)
- this will give a random section of nodes continuously in a row
or you must retrieve all the nodes and than make your own random in your upper language across these nodes - this will give you a random set of ndoes.
or, to make a pseudorandom function in cypher, we can try smthing like this:
START x=node:node_auto_index("uname:*")
WITH x, length(x.uname) as len
WHERE Id(x)+len % 3 = 0
RETURN x LIMIT 10
or make a sophisticated WHERE part in this query based upon the total number of uname nodes, or the ordinary ascii value of uname param, for example