Fusion Table API bug, not able to handle WHERE clauses with equality on numeric fields? - google-fusion-tables

I'm getting strange results from the FusionTable API. Specifically, it seems unable to handle a simple select statement with equality constraints on numeric values. Any query I try of the following form:
SELECT COUNT() FROM 1Nynh5pPrj1q8JqbalppAm-qzAsgKvL0ZRala7VI WHERE AGE=41
yields zero records:
{
"kind": "fusiontables#sqlresponse",
"columns": [
"count()"
],
"rows": [
[ "0" ]
]
}
By contrast, a range constraint works fine:
SELECT COUNT() FROM 1Nynh5pPrj1q8JqbalppAm-qzAsgKvL0ZRala7VI WHERE AGE>40.99 AND AGE<41.01
{
"kind": "fusiontables#sqlresponse",
"columns": [
"count()"
],
"rows": [
[ "362" ]
]
}
Maybe the numbers underneath aren't integers? SELECT AGE FROM 1Nynh5pPrj1q8JqbalppAm-qzAsgKvL0ZRala7VI WHERE AGE>40.99 AND AGE<41.01 returns
{
"kind": "fusiontables#sqlresponse",
"columns": [
"AGE"
],
"rows": [
[ "41" ],
[ "41" ],
[ "41" ],
...359 more...
]}
Now, maybe there's some floating point representation error going on? I thought that small integers can be represented exactly as floats (even if some decimal fractions, e.g. 0.1, are repeating decimals in binary).
It seems unlikely that a bug in Fusion Table SQL would get by without being discovered by others, so perhaps it's there's something unique to how this particular FusionTable is loaded?
UPDATE:
While the query appears to fail using the new Fusion Table API above, it succeeds using the old Fusion Table SQL API (recently deprecated):
www.google.com/fusiontables/api/query?sql=SELECT%20COUNT()%20FROM%204579147%20WHERE%20AGE%20LIKE%2041
which returns this JSON response:
count()
362
Also, the new FusionTable API appears confused by numeric values:
SELECT COUNT() FROM 4579147 WHERE AGE = 41 yields 0 (incorrect)
SELECT COUNT() FROM 4579147 WHERE AGE = "41" yields 0 (incorrect)
SELECT COUNT() FROM 4579147 WHERE AGE MATCHES 41 yields 362
SELECT COUNT() FROM 4579147 WHERE AGE LIKE 41 yields 362
SELECT COUNT() FROM 4579147 WHERE AGE LIKE "41" yields 362
SELECT COUNT() FROM 4579147 WHERE AGE LIKE "%41%" yields 362

This is a recently introduced bug that will be fixed shortly. As described it does only affect numeric equality queries with aggregation. Sorry for the inconvenience!

There is nothing wrong with AGE = 41 in that table:
https://www.google.com/fusiontables/DataSource?snapid=S580613IY6U
Something about the count() is causing the query to fail

Related

How to sort column of table by value of another table row?

There are two tables:
columns
id
name
1
col1
2
col2
user_settings
id
name
params
1
user1
{ "columns": [ {"col_id": 1, "place": 2}, {"col_id": 2, "place": 1} ], "anotherParam": "" }
2
user2
{ "columns": [ {"col_id": 1, "place": 2}, {"col_id": 2, "place": 1} ], "anotherParam": "" }
I want to get all columns like:
#columns = Column.all
And then sort it by value of params for current user from user_settings table based on "col_id" = columns.id. #columns at end must be Column::ActiveRecord_Relation.
How to do this?
As for me the best strategy in such cases is to forget about ActiveRecord for the moment, build the proper solution in raw SQL and when you have everything in place "migrate" it to the AR query API.
Your params column for user_settings table looks like JSON data to me - this means the solution actually depends on the database you use (because different databases - and sometimes even different versions of the same one - provide different interfaces to the json data).
But in general the solution is as follows: "unpack" json array into a recordset, join it to other data you need and then everything else is kinda straightforward...
For example, in PostgreSQL it might look like this:
select c.*
from
columns c,
user_settings s,
jsonb_to_recordset(s.params->'columns') as params(col_id integer, place integer)
where c.id = params.col_id and s.id = 1
order by params.place
;
Where jsonb_to_recordset transforms the json array into a recordset, then this recordset is joined with the columns table based on your criteria and then ordering got applied.
Here is the whole solution with your sample data: http://sqlfiddle.com/#!17/ffaa19/1
UPD. Oops. You mentioned AR_Relation as the desired output. It might be a bit trickier (find_by_sql returns an array). Could you please add wider context - why getting the AR relation is crucial in this case?
UPD2. Updated the answer based on the changed data sample.

Partitions not in metastore ERROR on Athena

I'm trying to partition data by a column. However, when I run the the query MSCK REPAIR TABLE mytable, it returns error
Partitions not in metastore: city:countrycode=AFG city:countrycode=AGO city:countrycode=AIA city:countrycode=ALB city:countrycode=AND city:countrycode=ANT city:countrycode=ARE
I created the table from Avro by this query:
CREATE external table city (
ID int,
Name string,
District string,
Population int
)
PARTITIONED by (CountryCode string)
ROW FORMAT
SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
WITH SERDEPROPERTIES ('avro.schema.literal'='
{
"fields": [
{
"name": "ID",
"type": "int"
},
{
"name": "Name",
"type": "string"
},
{
"name": "countrycode",
"type": "string"
},
{
"name": "District",
"type": "string"
},
{
"name": "Population",
"type": "int"
}
],
"name": "value",
"namespace": "world.city",
"type": "record"
}
')
STORED AS AVRO
LOCATION "s3://mybucket/city"
My partition look like s3://mybucket/city/countrycode=ABC
This is an old question, and Athena seems to have added a warning message on this, but in case anybody else misses the first several times they try something similar...
Here is the message Athena gives when you create the table:
Query successful. If your table has partitions, you need to load these partitions to be able to query data. You can either load all partitions or load them individually. If you use the load all partitions (MSCK REPAIR TABLE) command, partitions must be in a format understood by Hive. Learn more.
It seems that the codes you are using to partition don't work with Hive (I was doing something similar, partitioning by a grouping code). So, instead of MSCK REPAIR TABLE, you need to run an ALTER TABLE for each partition (see: https://docs.aws.amazon.com/athena/latest/ug/partitions.html)
ALTER TABLE city ADD PARTITION (CountryCode='ABC') location 's3://mybucket/city/ABC/' ;
...and you'll have to run that each time you add new county code bucket.
You definitely need a trailing slash in your location:
https://docs.aws.amazon.com/athena/latest/ug/create-table.html
Maybe also try lowercase for the partition column PARTITIONED by (countrycode string).
Did you try to add the partitions manually in Glue Catalog or via Crawler? Did this work?

predix sum query on non negative numbers only

I have a timeseries dataset which has both negative and non-negative numbers. There is a value (-999) which indicated nan values in the cloud. What I want to do is, I want to use a sum query which will take the negative numbers into consideration. Is there a way to omit negative numbers while querying?
If I understand your question correctly, you are looking for a Predix Time Series query that will return the sum of all tag readings but exclude any -999 values from the result.
If so, the query body might look like this:
{"start": "1w-ago",
"tags": [{
"name": "STACK",
"filters": {"measurements": {"values": -999, "condition": "gt"}},
"aggregations": [{"type": "sum", "sampling": {"datapoints": 1}}]
}]
}
I wrote a small test script with the PredixPy SDK to demonstrate the scenario and result if that's helpful for you.
# Run this is a new space to create services
import predix.admin.app
app = predix.admin.app.Manifest()
app.create_uaa('stack-admin-secret')
app.create_client('stack-client', 'stack-client-secret')
app.create_timeseries()
# Populate some test data into time series
tag = 'STACK'
values = [-999, -5, 10, 20, 30]
ts = app.get_timeseries()
for val in values:
ts.send(tag, val)
# Query and compare against expected result
expected = sum(values[1:])
response = ts.get_datapoints(tag, measurement=('gt', -999), aggregations='sum')
result = response['tags'][0]['results'][0]['values'][0][1]
print(expected, result)
You may also want to consider in the future that when data is ingested you use the quality attribute so that instead of filtering on values greater than -999 you could query for quality is GOOD or UNCERTAIN.
{"start": "1w-ago",
"tags": [{"name": "STACK",
"filters": {"qualities": {"values": ["3"]}},
"aggregations": [{"type": "sum", "sampling": {"datapoints": 1}}]
}]
}
Hope that helps.

The graph section of Cypher response, remains blank

I noticed for some queries the response populates the "graph" section as follows
}
],
"graph": {
"nodes": [
{
"id": "68",
"labels": [
"ROOM"
],
"properties": {
"id": 15,
"name": "Sun and Snow",
but for other queries, this "graph" section is not returning with nodes/relationships and associated labels/properties even though the "data" section returns valid output
Does it convey anything about the quality of the cypher query ?
It depends on what you return from your query. If you return nodes and relationships, you'll get a graph. If you return scalars such as n.name or r.weight, you don't get a graph.
Are you talking about the HTTP requests from the web UI or requests that you are making yourself?
The graph key is controlled via the resultDataContents option when making a request. You can see the documentation for that here:
http://neo4j.com/docs/stable/rest-api-transactional.html#rest-api-return-results-in-graph-format
You can request multiple formats for the result ("row" and "REST" are other examples)

Using Cypher to return nested, hierarchical JSON from a tree

I'm currently using the example data on console.neo4j.org to write a query that outputs hierarchical JSON.
The example data is created with
create (Neo:Crew {name:'Neo'}), (Morpheus:Crew {name: 'Morpheus'}), (Trinity:Crew {name: 'Trinity'}), (Cypher:Crew:Matrix {name: 'Cypher'}), (Smith:Matrix {name: 'Agent Smith'}), (Architect:Matrix {name:'The Architect'}),
(Neo)-[:KNOWS]->(Morpheus), (Neo)-[:LOVES]->(Trinity), (Morpheus)-[:KNOWS]->(Trinity),
(Morpheus)-[:KNOWS]->(Cypher), (Cypher)-[:KNOWS]->(Smith), (Smith)-[:CODED_BY]->(Architect)
The ideal output is as follows
name:"Neo"
children: [
{
name: "Morpheus",
children: [
{name: "Trinity", children: []}
{name: "Cypher", children: [
{name: "Agent Smith", children: []}
]}
]
}
]
}
Right now, I'm using the following query
MATCH p =(:Crew { name: "Neo" })-[q:KNOWS*0..]-m
RETURN extract(n IN nodes(p)| n)
and getting this
[(0:Crew {name:"Neo"})]
[(0:Crew {name:"Neo"}), (1:Crew {name:"Morpheus"})]
[(0:Crew {name:"Neo"}), (1:Crew {name:"Morpheus"}), (2:Crew {name:"Trinity"})]
[(0:Crew {name:"Neo"}), (1:Crew {name:"Morpheus"}), (3:Crew:Matrix {name:"Cypher"})]
[(0:Crew {name:"Neo"}), (1:Crew {name:"Morpheus"}), (3:Crew:Matrix {name:"Cypher"}), (4:Matrix {name:"Agent Smith"})]
Any tips to figure this out? Thanks
In neo4j 3.x, after you install the APOC plugin on the neo4j server, you can call the apoc.convert.toTree procedure to generate similar results.
For example:
MATCH p=(n:Crew {name:'Neo'})-[:KNOWS*]->(m)
WITH COLLECT(p) AS ps
CALL apoc.convert.toTree(ps) yield value
RETURN value;
... would return a result row that looks like this:
{
"_id": 127,
"_type": "Crew",
"name": "Neo",
"knows": [
{
"_id": 128,
"_type": "Crew",
"name": "Morpheus",
"knows": [
{
"_id": 129,
"_type": "Crew",
"name": "Trinity"
},
{
"_id": 130,
"_type": "Crew:Matrix",
"name": "Cypher",
"knows": [
{
"_id": 131,
"_type": "Matrix",
"name": "Agent Smith"
}
]
}
]
}
]
}
This was such a useful thread on this important topic, I thought I'd add a few thoughts after digging into this a bit further.
First off, using the APOC "toTree" proc has some limits, or better said, dependencies. It really matters how "tree-like" your architecture is. E.g., the LOVES relation is missing in the APOC call above and I understand why – that relationship is hard to include when using "toTree" – that simple addition is a bit like adding an attribute in a hierarchy, but as a relationship. Not bad to do but confounds the simple KNOWS tree. Point being, a good question to ask is “how do I handle such challenges”. This reply is about that.
I do recommend upping ones JSON skills as this will give you much more granular control. Personally, I found my initial exploration somewhat painful. Might be because I'm an XML person :) but once you figure out all the [, {, and ('s, it is really a powerful way to efficiently pull what's best described as a report on your data. And given the JSON is something that can easily become a class, it allows for a nice way to push that back to your app.
I have found perf to also be a challenge with "toTree" vs. just asking for the JSON. I've added below a very simplistic look into what your RETURN could look like. It follows the following BN format. I'd love to see this more maturely created as the possibilities are quite varied, but this was something I'd have found useful thus I’ll post this immature version for now. As they say; “a deeper dive is left up to the readers” 😊
I've obfuscated the values, but this is an actual query on what I’ll term a very poor example of a graph architecture, whose many design “mistakes” cause some significant performance headaches when trying to access a holistic report on the graph. As in this example, the initial report query I inherited took many minutes on a server, and could not run on my laptop - using this strategy, the updated query now runs in about 5 seconds on my rather wimpy laptop on a db of about 200K nodes and .5M relationships. I added the “persons” grouping alias as a reminder that "persons" will be different in each array element, but the parent construct will be repeated over and over again. Where you put that in your hand-grown tree, will matter, but having the ability to do that is powerful.
Bottom line, a mature use of JSON in the RETURN statement, gives you a powerful control over the results in a Cypher query.
RETURN STATEMENT CONTENT:
<cypher_alias>
{.<cypher_alias_attribute>,
...,
<grouping_alias>:
(<cypher_alias>
{.<cypher_alias_attribute,
...
}
)
...
}
MATCH (j:J{uuid:'abcdef'})-[:J_S]->(s:S)<-[:N_S]-(n:N)-[:N_I]->(i:I), (j)-[:J_A]->(a:P)
WHERE i.title IN ['title1', 'title2']
WITH a,j, s, i, collect(n.description) as desc
RETURN j{.title,persons:(a{.email,.name}), s_i_note:
(s{.title, i_notes:(i{.title,desc})})}
if you know how deep your tree is, you can write something like this
MATCH p =(:Crew { name: "Neo" })-[q:KNOWS*0..]-(m)
WITH nodes(p)[0] AS a, nodes(p)[1] AS b, nodes(p)[2] AS c, nodes(p)[3] AS d, nodes(p)[4] AS e
WITH (a{.name}) AS ab, (b{.name}) AS bb, (c{.name}) AS cb, (d{.name}) AS db, (e{.name}) AS eb
WITH ab, bb, cb, db{.*,children:COLLECT(eb)} AS ra
WITH ab, bb, cb{.*,children:COLLECT(ra)} AS rb
WITH ab, bb{.*,children:COLLECT(rb)} AS rc
WITH ab{.*,children:COLLECT(rc)} AS rd
RETURN rd
Line 1 is your query. You save all paths from Neo to m in p.
In line 2 p is split into a, b, c, d and e.
Line 3 takes just the namens of the nodes. If you want all properties you can write (a{.*}) AS ab. This step is optional you can also work with nodes if you want to.
In line 4 you replace db and eb with a map containing all properties of db and the new property children containing all entries of eb for the same db.
Lines 5, 6 and 7 are basically the same. You reduce the result list by grouping.
Finally you return the tree. It looks like this:
{
"name": "Neo",
"children": [
{
"name": "Morpheus",
"children": [
{"name": "Trinity", "children": []},
{"name": "Cypher","children": [
{"name": "Agent Smith","children": []}
]
}
]
}
]
}
Unfortunately this solution only works when you know how deep your tree is and you have to add a row if your tree is one step deeper.
If someone has an idea how to solve this with dynamic tree depth, please comment.

Resources