I keep getting "No matching signature for operator = for argument types: STRING, INT64. Supported signature: ANY = ANY at [18:54] in Big Query

I keep getting "No matching signature for operator = for argument types: STRING, INT64. Supported signature: ANY = ANY at [18:54] in Big Query - join

SELECT
station_id,
name,
number_of_rides AS number_of_rides_starting_at_station
FROM (
SELECT
start_station_id,
COUNT(*) number_of_rides
FROM bigquery-public-data.new_york.citibike_trips AS trips
GROUP BY start_station_id
) AS station_num_trips
INNER JOIN bigquery-public-data.new_york.citibike_stations
ON station_id = start_station_id
ORDER BY number_of_rides DESC
I keep getting
No matching signature for operator = for argument types: STRING, INT64. Supported signature: ANY = ANY at [18:54] in Big Query
I tried CAST to change the station_id to a string but it already is a string.
What am I doing wrong?

Looks like one of your columns is a string. BigQuery cannot proactively cast to the most probable types and compare values. You have to distinctively type-cast the values in your query:
SELECT
SAFE_CAST(station_id as INT64) as station_id,
name,
number_of_rides AS number_of_rides_starting_at_station
FROM (
SELECT
start_station_id,
COUNT(*) number_of_rides
FROM bigquery-public-data.new_york.citibike_trips AS trips
GROUP BY start_station_id
) AS station_num_trips
INNER JOIN bigquery-public-data.new_york.citibike_stations
ON SAFE_CAST(station_id AS INT64) = SAFE_CAST(start_station_id AS INT64)
ORDER BY number_of_rides DESC

Related

Pig Join by using OR conditional operator throws error

child = load 'file_name' using PigStorage('\t') as (child_code : chararray, child_id : int, child_precode_id : int);
parents = load 'file_name' using PigStorage('\t') as (child_id : int, child_internal_id : chararray, mother_id : int, father_id : int);
joined = JOIN child by child_id, parents by child_id;
mainparent = FOREACH joined GENERATE child_id as child_id_source, child_precode_id, child_code;
store parent into '(location of file)' using PigStorage('\t');
childfirst = JOIN mainparent by (child_id_source), parents by (mother_id OR father_id);
firstgen = FOREACH childfirst GENERATE child_id, child_precode_id, child_code;
store firstgen into 'file_location' using PigStorage('\t');
Getting the following error when I use the OR condition:
ERROR org.apache.pig.PigServer - exception during parsing: Error
during parsing. Pig script failed to parse:
NoViableAltException(91#[]) Failed to parse: Pig script failed to
parse: NoViableAltException(91#[])

The below syntax is incorrect,there is no conditional join in Pig
childfirst = JOIN mainparent by (child_id_source), parents by (mother_id OR father_id);
If you would like to join a relation with one key with another relation on 2 keys then create two joins and union the dataset.Note that you might have to distinct the resulting relation.
childfirst = JOIN mainparent by (child_id_source), parents by (mother_id);
childfirst1 = JOIN mainparent by (child_id_source), parents by (father_id);
childfirst2 = UNION childfirst,childfirst1;
childfirst3 = DISTINCT childfirst2;
firstgen = FOREACH childfirst3 GENERATE child_id, child_precode_id, child_code;
store firstgen into 'file_location' using PigStorage('\t');

Psql - How to skip row when error in address during Geocoding

Am Geocoding hundreds of thousands of records, while this query is running if the address does not produce a Lat and Long value for a particular row it shows an error "invalid input syntax for integer: "J199" ". So if this line
(geocode_intersection(crashroad,crashreferenceroad,state,city,'',1)
Produces a value like "J199",it has to skip that row. So how to do this?
update nj.condition_3
set (rating,new_address,points) = ( COALESCE((g.geo).rating,-1),pprint_addy((g.geo).addy),st_astext(ST_SnapToGrid((g.geo).geomout, 0.000001)))
-- Replace in limit value if error occurs
FROM (SELECT addid FROM nj.condition_3 WHERE rating IS NULL ORDER BY addid LIMIT 3) As a
LEFT JOIN (SELECT addid, (geocode_intersection(crashroad,crashreferenceroad,state,city,'',1)) As geo
-- Replace in limit value if error occurs
FROM nj.condition_3 As ag WHERE ag.rating IS NULL ORDER BY addid LIMIT 3) As g ON a.addid = g.addid
WHERE a.addid = nj.condition_3.addid;

I have written a function to overcome this Error. So now it is working fine.
CREATE OR REPLACE FUNCTION geocode_all_values() RETURNS VOID AS
$$
DECLARE
r record;
g record;
BEGIN
FOR r IN select * from TableName where rating is null order by Sno
LOOP
BEGIN
FOR g IN select * from geocode_intersection(r.Street1,r.Street2,r.state,r.city,'',1)
LOOP
update TableName
set new_address = pprint_addy(g.addy),
rating = g.rating,
points = ST_AsTEXT(g.geomout)
where sno = r.sno;
END LOOP;
EXCEPTION WHEN OTHERS THEN
END;
END LOOP;
END;
$$
LANGUAGE plpgsql;

Distinct with order by for different columns and hstore rails postgres

Trying to do this query:
Got models user, profile, integrations
Profile has a column meta_data that is hstore and has key twitter_followers
This is my current query which I want to do an order for:
current_user.profiles.where(found: true).select("DISTINCT ON(profiles.id) profiles.id, *, integration_profiles.data as integration_profiles_data, integrations.provider as integration_providers, profiles.*").includes(:integrations).page(params[:page]).per_page(50)
Ideally:
current_user.profiles.where(found: true).select("DISTINCT ON(profiles.id) profiles.id, *, integration_profiles.data as integration_profiles_data, integrations.provider as integration_providers, profiles.*").includes(:integrations).reorder("CAST(meta_data -> '#{params[:sort_by]}' AS INT) DESC NULLS LAST").page(params[:page]).per_page(50)
But get this error:
PG::InvalidColumnReference: ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
I tried this:
current_user.profiles.where(found: true).select("DISTINCT ON(profiles.id) profiles.id, *, integration_profiles.data as integration_profiles_data, integrations.provider as integration_providers, profiles.*").includes(:integrations).reorder("profiles.id, CAST(meta_data -> '#{params[:sort_by]}' AS INT) DESC NULLS LAST").page(params[:page]).per_page(50)
.to_sql
=> "SELECT DISTINCT ON(profiles.id) profiles.id, *, integration_profiles.data as integration_profiles_data, integrations.provider as integration_providers, profiles.* FROM \"profiles\" INNER JOIN \"integration_profiles\" ON \"profiles\".\"id\" = \"integration_profiles\".\"profile_id\" INNER JOIN \"integrations\" ON \"integration_profiles\".\"integration_id\" = \"integrations\".\"id\" WHERE \"integrations\".\"user_id\" = $1 AND \"profiles\".\"found\" = 't' ORDER BY profiles.id, CAST(meta_data -> 'twitter_followers' AS INT) DESC NULLS LAST"
And no dice. It removed the error but didn't order in any form!

Strange execution time for summary query

I am giving here part of the query I am executing:
SELECT SUM(ParentTable.Field1),
(SELECT SUM(ChildrenTable.Field1)
FROM ChildrenRable INNER JOIN
GrandChildrenTable ON ChildrenTable.Id = GrandChildrenTable.ChildrenTableId INNER JOIN
AnotherTable ON GrandChildrenTable.AnotherTableId = AnotherTable.Id
WHERE ChildrenTable.ParentBaleId = ParentTable.Id
AND AnotherTable.Type=1),
----
FROM ParentTable
WHERE some_conditions
Relationships:
ParentTable -> ChildrenTable = 1-to-many
ChildrenTable -> GrandChildrenTable = 1-to-many
GrandChildrenTable -> AnotherTable = 1-to-1
I am executing this query three times, while changing only the Type condition, and here are the results:
Number of records that are returned:
Condition Total execution time (ms)
Type = 1 : 973
Type = 2 : 78810
Type = 3 : 648318
If I execute just the inner join query, here is the count of joined records:
SELECT p.Type, COUNT(*)
FROM CycleActivities ca INNER JOIN
CycleActivityProducts cap ON ca.Id = CAP.CycleActivityId INNER JOIN
Products p ON cap.ProductId = p.Id
GROUP BY p.Type
Type
---- -----------
1 55152
2 13401
4 102730
So, why would the query with Type = 1 condition execute much faster than the query with Type = 2, although it is querying 4x larger resultset (Type is tinyint)?

The way your query is written instructs SQL Server to execute the sub-query with JOIN for every row of the output.
This way it should be faster, if I understand what you want correctly (UPDATED):
with cte_parent as (
select
Id,
SUM (ParentTable.Field1) as Parent_Sum
from ParentTable
group by Id
),
cte_child as (
SELECT
Id,
SUM (ChildrenTable.Field1) as as Child_Sum
FROM ChildrenRable
INNER JOIN
GrandChildrenTable ON ChildrenTable.Id = GrandChildrenTable.ChildrenTableId
INNER JOIN
AnotherTable ON GrandChildrenTable.AnotherTableId = AnotherTable.Id
WHERE
AnotherTable.Type=1
AND
some_conditions
GROUP BY Id
)
select cte_parent.id, Parent_Sum, Child_Sum
from parent_cte
join child_cte on parent_cte.id = child_cte.id

Linq to Entities compare DateTime in Sub Query

I'm trying to do this subquery:
var query =
from cjto in oContext.t_table_1
join cav in oContext.t_table_2 on cjto.cd_code equals cav.cd_code
where cav.dt_time >=
(from tu in oContext.t_table3
where tu.vl_code == "ABCD"
select tu.dt_check_time)
select cav;
However, I get the error:
Operator '>=' cannot be applied to operands of type 'System.DateTime' and 'System.Linq.IQueryable<System.DateTime?>'
How can I implement such query?
Tks

Ok, I got it... I needed to add the FirstOrDefault() so get the first element
var query =
from cjto in oContext.t_table_1
join cav in oContext.t_table_2 on cjto.cd_code equals cav.cd_code
where cav.dt_time >=
(from tu in oContext.t_table3
where tu.vl_code == "ABCD"
select tu.dt_check_time).FirstOrDefault()
select cav;
Tks

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

I keep getting "No matching signature for operator = for argument types: STRING, INT64. Supported signature: ANY = ANY at [18:54] in Big Query - join

Related

Pig Join by using OR conditional operator throws error

Psql - How to skip row when error in address during Geocoding

Distinct with order by for different columns and hstore rails postgres

Strange execution time for summary query

Linq to Entities compare DateTime in Sub Query

Categories

Resources