I keep getting "No matching signature for operator = for argument types: STRING, INT64. Supported signature: ANY = ANY at [18:54] in Big Query - join

SELECT
station_id,
name,
number_of_rides AS number_of_rides_starting_at_station
FROM (
SELECT
start_station_id,
COUNT(*) number_of_rides
FROM bigquery-public-data.new_york.citibike_trips AS trips
GROUP BY start_station_id
) AS station_num_trips
INNER JOIN bigquery-public-data.new_york.citibike_stations
ON station_id = start_station_id
ORDER BY number_of_rides DESC
I keep getting
No matching signature for operator = for argument types: STRING, INT64. Supported signature: ANY = ANY at [18:54] in Big Query
I tried CAST to change the station_id to a string but it already is a string.
What am I doing wrong?

Looks like one of your columns is a string. BigQuery cannot proactively cast to the most probable types and compare values. You have to distinctively type-cast the values in your query:
SELECT
SAFE_CAST(station_id as INT64) as station_id,
name,
number_of_rides AS number_of_rides_starting_at_station
FROM (
SELECT
start_station_id,
COUNT(*) number_of_rides
FROM bigquery-public-data.new_york.citibike_trips AS trips
GROUP BY start_station_id
) AS station_num_trips
INNER JOIN bigquery-public-data.new_york.citibike_stations
ON SAFE_CAST(station_id AS INT64) = SAFE_CAST(start_station_id AS INT64)
ORDER BY number_of_rides DESC

Related

Pig Join by using OR conditional operator throws error

child = load 'file_name' using PigStorage('\t') as (child_code : chararray, child_id : int, child_precode_id : int);
parents = load 'file_name' using PigStorage('\t') as (child_id : int, child_internal_id : chararray, mother_id : int, father_id : int);
joined = JOIN child by child_id, parents by child_id;
mainparent = FOREACH joined GENERATE child_id as child_id_source, child_precode_id, child_code;
store parent into '(location of file)' using PigStorage('\t');
childfirst = JOIN mainparent by (child_id_source), parents by (mother_id OR father_id);
firstgen = FOREACH childfirst GENERATE child_id, child_precode_id, child_code;
store firstgen into 'file_location' using PigStorage('\t');
Getting the following error when I use the OR condition:
ERROR org.apache.pig.PigServer - exception during parsing: Error
during parsing. Pig script failed to parse:
NoViableAltException(91#[]) Failed to parse: Pig script failed to
parse: NoViableAltException(91#[])
The below syntax is incorrect,there is no conditional join in Pig
childfirst = JOIN mainparent by (child_id_source), parents by (mother_id OR father_id);
If you would like to join a relation with one key with another relation on 2 keys then create two joins and union the dataset.Note that you might have to distinct the resulting relation.
childfirst = JOIN mainparent by (child_id_source), parents by (mother_id);
childfirst1 = JOIN mainparent by (child_id_source), parents by (father_id);
childfirst2 = UNION childfirst,childfirst1;
childfirst3 = DISTINCT childfirst2;
firstgen = FOREACH childfirst3 GENERATE child_id, child_precode_id, child_code;
store firstgen into 'file_location' using PigStorage('\t');

Psql - How to skip row when error in address during Geocoding

Am Geocoding hundreds of thousands of records, while this query is running if the address does not produce a Lat and Long value for a particular row it shows an error "invalid input syntax for integer: "J199" ". So if this line
(geocode_intersection(crashroad,crashreferenceroad,state,city,'',1)
Produces a value like "J199",it has to skip that row. So how to do this?
update nj.condition_3
set (rating,new_address,points) = ( COALESCE((g.geo).rating,-1),pprint_addy((g.geo).addy),st_astext(ST_SnapToGrid((g.geo).geomout, 0.000001)))
-- Replace in limit value if error occurs
FROM (SELECT addid FROM nj.condition_3 WHERE rating IS NULL ORDER BY addid LIMIT 3) As a
LEFT JOIN (SELECT addid, (geocode_intersection(crashroad,crashreferenceroad,state,city,'',1)) As geo
-- Replace in limit value if error occurs
FROM nj.condition_3 As ag WHERE ag.rating IS NULL ORDER BY addid LIMIT 3) As g ON a.addid = g.addid
WHERE a.addid = nj.condition_3.addid;
I have written a function to overcome this Error. So now it is working fine.
CREATE OR REPLACE FUNCTION geocode_all_values() RETURNS VOID AS
$$
DECLARE
r record;
g record;
BEGIN
FOR r IN select * from TableName where rating is null order by Sno
LOOP
BEGIN
FOR g IN select * from geocode_intersection(r.Street1,r.Street2,r.state,r.city,'',1)
LOOP
update TableName
set new_address = pprint_addy(g.addy),
rating = g.rating,
points = ST_AsTEXT(g.geomout)
where sno = r.sno;
END LOOP;
EXCEPTION WHEN OTHERS THEN
END;
END LOOP;
END;
$$
LANGUAGE plpgsql;

Distinct with order by for different columns and hstore rails postgres

Trying to do this query:
Got models user, profile, integrations
Profile has a column meta_data that is hstore and has key twitter_followers
This is my current query which I want to do an order for:
current_user.profiles.where(found: true).select("DISTINCT ON(profiles.id) profiles.id, *, integration_profiles.data as integration_profiles_data, integrations.provider as integration_providers, profiles.*").includes(:integrations).page(params[:page]).per_page(50)
Ideally:
current_user.profiles.where(found: true).select("DISTINCT ON(profiles.id) profiles.id, *, integration_profiles.data as integration_profiles_data, integrations.provider as integration_providers, profiles.*").includes(:integrations).reorder("CAST(meta_data -> '#{params[:sort_by]}' AS INT) DESC NULLS LAST").page(params[:page]).per_page(50)
But get this error:
PG::InvalidColumnReference: ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
I tried this:
current_user.profiles.where(found: true).select("DISTINCT ON(profiles.id) profiles.id, *, integration_profiles.data as integration_profiles_data, integrations.provider as integration_providers, profiles.*").includes(:integrations).reorder("profiles.id, CAST(meta_data -> '#{params[:sort_by]}' AS INT) DESC NULLS LAST").page(params[:page]).per_page(50)
.to_sql
=> "SELECT DISTINCT ON(profiles.id) profiles.id, *, integration_profiles.data as integration_profiles_data, integrations.provider as integration_providers, profiles.* FROM \"profiles\" INNER JOIN \"integration_profiles\" ON \"profiles\".\"id\" = \"integration_profiles\".\"profile_id\" INNER JOIN \"integrations\" ON \"integration_profiles\".\"integration_id\" = \"integrations\".\"id\" WHERE \"integrations\".\"user_id\" = $1 AND \"profiles\".\"found\" = 't' ORDER BY profiles.id, CAST(meta_data -> 'twitter_followers' AS INT) DESC NULLS LAST"
And no dice. It removed the error but didn't order in any form!

Strange execution time for summary query

I am giving here part of the query I am executing:
SELECT SUM(ParentTable.Field1),
(SELECT SUM(ChildrenTable.Field1)
FROM ChildrenRable INNER JOIN
GrandChildrenTable ON ChildrenTable.Id = GrandChildrenTable.ChildrenTableId INNER JOIN
AnotherTable ON GrandChildrenTable.AnotherTableId = AnotherTable.Id
WHERE ChildrenTable.ParentBaleId = ParentTable.Id
AND AnotherTable.Type=1),
----
FROM ParentTable
WHERE some_conditions
Relationships:
ParentTable -> ChildrenTable = 1-to-many
ChildrenTable -> GrandChildrenTable = 1-to-many
GrandChildrenTable -> AnotherTable = 1-to-1
I am executing this query three times, while changing only the Type condition, and here are the results:
Number of records that are returned:
Condition Total execution time (ms)
Type = 1 : 973
Type = 2 : 78810
Type = 3 : 648318
If I execute just the inner join query, here is the count of joined records:
SELECT p.Type, COUNT(*)
FROM CycleActivities ca INNER JOIN
CycleActivityProducts cap ON ca.Id = CAP.CycleActivityId INNER JOIN
Products p ON cap.ProductId = p.Id
GROUP BY p.Type
Type
---- -----------
1 55152
2 13401
4 102730
So, why would the query with Type = 1 condition execute much faster than the query with Type = 2, although it is querying 4x larger resultset (Type is tinyint)?
The way your query is written instructs SQL Server to execute the sub-query with JOIN for every row of the output.
This way it should be faster, if I understand what you want correctly (UPDATED):
with cte_parent as (
select
Id,
SUM (ParentTable.Field1) as Parent_Sum
from ParentTable
group by Id
),
cte_child as (
SELECT
Id,
SUM (ChildrenTable.Field1) as as Child_Sum
FROM ChildrenRable
INNER JOIN
GrandChildrenTable ON ChildrenTable.Id = GrandChildrenTable.ChildrenTableId
INNER JOIN
AnotherTable ON GrandChildrenTable.AnotherTableId = AnotherTable.Id
WHERE
AnotherTable.Type=1
AND
some_conditions
GROUP BY Id
)
select cte_parent.id, Parent_Sum, Child_Sum
from parent_cte
join child_cte on parent_cte.id = child_cte.id

Linq to Entities compare DateTime in Sub Query

I'm trying to do this subquery:
var query =
from cjto in oContext.t_table_1
join cav in oContext.t_table_2 on cjto.cd_code equals cav.cd_code
where cav.dt_time >=
(from tu in oContext.t_table3
where tu.vl_code == "ABCD"
select tu.dt_check_time)
select cav;
However, I get the error:
Operator '>=' cannot be applied to operands of type 'System.DateTime' and 'System.Linq.IQueryable<System.DateTime?>'
How can I implement such query?
Tks
Ok, I got it... I needed to add the FirstOrDefault() so get the first element
var query =
from cjto in oContext.t_table_1
join cav in oContext.t_table_2 on cjto.cd_code equals cav.cd_code
where cav.dt_time >=
(from tu in oContext.t_table3
where tu.vl_code == "ABCD"
select tu.dt_check_time).FirstOrDefault()
select cav;
Tks

Resources