Kylin - Group by Rollup and Cube - kylin

I am trying to use the ROLLUP and CUBE operators to summarise data using the sample cube. If I execute the query the following error message is returned:
Internal error: Error while applying rule OLAPAggregateRule, args [rel#1246:LogicalAggregate.NONE.[](input=rel#1245:Subset#1.NONE.[],group={0, 1},groups=[{0, 1}, {0}, {}],indicator=true,SUMOFPRICE=SUM($2))].
I am using the documentation from http://calcite.apache.org/docs/reference.html to build the query.
Test queries:
Simple query not using a cube
select a, b, sum(c)
from (values (1, 2, 3, 4)) as t(a, b, c, d)
group by rollup(a, b)
This query returns the expected results
Query using KYLIN_SALES table using ROLLUP operator
select seller_id, leaf_categ_id, sum(price) as SumOfPrice
from kylin_sales
group by rollup(seller_id, leaf_categ_id)
The error mentioned at the start of question is returned when executing this query.
The Kylin logs show the following when the error occurs:

I see you asked the same question on the Kylin dev list. Since it's probably a bug, you'll get your answer there.

From the Kylin development mailing list: http://mail-archives.apache.org/mod_mbox/kylin-dev/201609.mbox/browser
KYLIN-1732 https://issues.apache.org/jira/browse/KYLIN-1732 is what you
want and it is to be release in v1.5.4 soon. Please try again on the coming
v1.5.4.

Related

Passing QUERY() as a parameter to a Google Sheets LAMBDA function

Google Sheets offers passing in parameters to lambdas as such:
=LAMBDA(x, y, x + y)(100, 200)
I was thinking of taking 2 columns from another Sheet, filter it with QUERY and then pass those 2 columns into the LAMBDA. Basically the 2 columns were a key and a CSV text that I wanted to split in one go.
=lambda(a, b, split(b, ","))(query('Alias Key Raw'!A1:B, "select * where A starts with 'X'"))
This gives the following ERROR Wrong number of arguments to call following LAMBDA function. Expected 2 arguments, but got 1 arguments.. Given that QUERY provides 2 columns of actual values, I thought this would be possible.
=byrow(query('Alias Key Raw'!A1:B, "select * where A starts with 'X'"), lambda(row, split(row, ",")))
This gives me only column A. No error otherwise. All of column B is ignored it appears
I've tried using BYCOL, BYROW, etc, and a lot of errors are ERROR Wrong number of arguments to call following LAMBDA function. Expected 2 arguments, but got 1 arguments.
Data
Input into the lambda
Key
Lineages
CU
B.1.1.529.5.1.26
CV
B.1.1.529.2.75.3.1.1.3
XA
B.1.1.7,B.1.177
XB
B.1.634,B.1.631
XC
AY.29,B.1.1.7
XAZ
BA.2.5,BA.5,BA.2.5
XBC
BA.2*,B.1.617.2*,BA.2*,B.1.617.2*
Expected
Output from the lambda
Key
Lineages
XA
B.1.1.7
B.1.177
XB
B.1.634
B.1.631
XC
AY.29
B.1.1.7
XAZ
BA.2.5
BA.5
BA.2.5
XBC
BA.2*
B.1.617.2*
BA.2*
B.1.617.2*
Note: There can be any number of lineages in the CSV cell
Updated
=ArrayFormula(
LAMBDA(a, {QUERY({a},"Select Col1"),SPLIT(QUERY({a},"Select Col2"),",")})
(QUERY('Alias Key Raw'!A1:B, "select * where A starts with 'X'",1)))
Explanaition:
using an Array {} to return:
Col1: {QUERY({a},"Select Col1"),...}
Col2: {...,SPLIT(QUERY({a},"Select Col2"),",")}
Of the Query QUERY('Alias Key Raw'!A1:B, "select * where A starts with 'X'",1) found in the Lambda call named a
Used formulas help
ARRAYFORMULA - LAMBDA - QUERY - SPLIT
Perhaps worth pointing out that the expected data can be returned with a more compact formula that uses neither QUERY nor LAMBDA:
=filter({A1:A,split(B1:B,",")},regexmatch(A1:A,"^X"))

Return top % of results in Neo4J

I've a graph of students and the various books that they've read. I want to find out the top 10% of students who've read the most books. How can I do that? I've tried the following cypher syntax:
MATCH (s:Student)-[:READ]->(b:Book)
WITH s, COUNT(b) AS no_of_books
WHERE no_of_books > percentileCont(no_of_books, 0.9)
RETURN s.Name, no_of_books
The error 'invalid use of aggregating function' is returned. It seems that trying to use two aggregating functions on top of each other is an issue here. How can I tweak my syntax to make it work?
I'll be happy to use the LIMIT function instead if it can work with percentages as well.
Answering my own question (again), in case someone else is looking up the same issue
MATCH (s:Student)-[:READ]->(b:Book)
WITH s, COUNT(b) AS no_of_books
ORDER BY no_of_books DESC
WITH COLLECT ({Student_Name: s.Name, No_of_Books: no_of_books}) AS books_per_stu
WITH books_per_stu, toInteger(size(books_per_stu)/100) AS percentile
UNWIND book_per_stu[0..percentile] AS top_stu
RETURN top_stu
Seems like, surprisingly, there's no straightforward way to do this, like my pseudo-code in my first post. The above syntax will return the results as a list of dictionaries rather than in a tabular format. I still welcome any answer that's simpler than mine.

Query with difference returns no data

I've a query that uses difference function and I can't understand why it returns no data.
The query is:
SELECT
difference(FIRST(grid_power_counter)) as grid_power_consumed
FROM homesolar.origin.main GROUP BY time(15m)
If I remove the difference function it returns data:
SELECT
FIRST(grid_power_counter) as grid_power_consumed
FROM homesolar.origin.main GROUP BY time(15m)
Also, I can get results if I add a where time > now()-24h to the select with difference function.
I really can't understand that behavior. Can someone help me?
Q: My query would only work if I add the where filter to it. Why is that so?
Quoted from influxdb's Groupby time doc:
Basic GROUP BY time() queries require an InfluxQL function in the
SELECT clause and a time range in the WHERE clause.
I suspect your first DIFFERENCE query didn't work because it was missing the mandatory WHERE filter for the Groupby time(...) function.
The Group by time() clause could be returning no rows and hence not.
This could potentially be a github issue for the influx team as I think their query parser should be complaining to you about the missing where filter for Group by time.
References:
https://docs.influxdata.com/influxdb/v1.5/query_language/data_exploration/#the-group-by-clause

Create graph panel with multiple query

I have the following monitoring stack:
collecting data with telegraf-0.12
storing in influxdb-0.12
visualisation in grafana (3beta)
I am collecting "system" data from several hosts and I want to create a graph showing the "system.load1" of several host NOT merged. I though I could simply add multiple queries to the graph panel.
When creating my graph panel, I create the first serie and see the result but when I add the second query, I got an error.
Here is the panel creation with 2 queries
Here is the query generated by the panel:
SELECT mean("load1") FROM "system" WHERE "host" = 'xxx' AND time > now() - 24h GROUP BY time(1m) fill(null) SELECT mean("load1") FROM "system" WHERE "host" = 'yyy' AND time > now() - 24h GROUP BY time(1m) fill(null)
And the error:
{
"error": "error parsing query: found SELECT, expected ; at line 2, char 1",
"message": "error parsing query: found SELECT, expected ; at line 2, char 1"
}
So I can see that the generated query is malformed (2 select in one line without even a ';') but I don't know how to use Grafana to achieve what I want.
When I show or hide each query individually I see the corresponding graph.
I have created a similar graph (with multiple series) with chronograf but I would rather use grafana as I have many more control and plugins...
Is there something I am doing wrong here ?
After reading couple of thread in github issues, here is a quick fix.
As mentionned by #schup, the problem and its solution are described here:
https://github.com/grafana/grafana/issues/4533
The binaries are currently not fixed in grafana-3beta (if might in the next weeks). So there are 2 options: fixing the source and compile or patched an existing install.
I actually had to patch my current install:
/usr/share/grafana/public/app/app.<number_might_differ_here>.js
sed --in-place=backup 's/join("\\n");return k=k.replace/join(";\\n");return k=k.replace/;s/.replace(\/%3B\/gi,";").replace/.replace/' app.<number_might_differ_here>.js
Hope this might help (and that it will soon be fixed)
Seems to be an API change in influxdb 0.11
https://github.com/grafana/grafana/issues/4533

How do i check for a label in neo4j 2.1.2 when using a legacy index?

I just upgraded to Neo4j 2.1.2 from 2.0.1 and some of my cypher-queries stopped working.
I am using a self-defined Lucene index to find the startnodes, navigate via a typed relationship (Partner_PartnerMeta) to a typed Node(PartnerTyp). After that i just return a subset of these nodes.
My query previously used to check for the type of startnode (PartnerMeta). Since 2.1.2 the query
START partnermeta = node:PartnerTyp_Meta("Namen:wilhelm*")
MATCH (partner:PartnerTyp)-[:Partner_PartnerMeta]->(partnermeta:PartnerMeta)
RETURN DISTINCT partner SKIP 0 LIMIT 10
results in
Cannot add labels or properties on a node which is already bound (line 2, column 52)
"MATCH (partner:PartnerTyp)-[:Partner_PartnerMeta]->(partnermeta:PartnerMeta)"
^
This error can be suppressed by omitting the ":PartnerMeta" part of the query. As the type of the node returned from the index hasn't been checked yet, i would like to verify that it is of the type "PartnerMeta" (maybe i am too paranoid that way).
My question is:
Is there a possibility to check for the type of node after the usage of START in combination with a legacy index?
This is a regression in Cypher 2.1.2 which will be fixed. It was an attempt to avoid invalid combinations of label checks.
For now, can you try:
START partnermeta = node:PartnerTyp_Meta("Namen:wilhelm*")
MATCH (partner:PartnerTyp)-[:Partner_PartnerMeta]->(partnermeta)
WHERE partnermeta:PartnerMeta
RETURN DISTINCT partner SKIP 0 LIMIT 10

Resources