Kylin is giving Column 'STATE_NAME' not found in any table - kylin

I followed kylin tutorial and able to create kylin model and kylin cube successfully.Kylin cube build is also completed successfully.
I create one fact table as,
create table sales_fact(product_id int,state_id int,location_id string,sales_count int)
row format delimited
fields terminated by ','
lines terminated by '\n'
stored as textfile;
create table state_details(state_id int,state_name string)
row format delimited
fields terminated by ','
lines terminated by '\n'
stored as textfile;
I loaded these tables as,
fact_table
1000,1,AP1,50
1000,2,KA1,100
1001,2,KA1,50
1002,1,AP1,50
1003,3,TL1,100
state_details
1,AP
2,Karnataka
3,Telangana
4,kerala
But if i queried simple query as,
select sales_count from sales_fact where state_name="Karnataka";
it is error as:
Error while executing SQL "select sales_count from sales_fact where state_name="Karnataka" LIMIT 50000": From line 1, column 42 to line 1, column 51: Column 'STATE_NAME' not found in any table
I am not able to find the cause.Anybody have any idea please tell me.

state_name is not on table sales_fact, please try:
select sales_count from sales_fact as f inner join state_details as d on f.state_id = d.state_id where d.state_name='Karnataka';

Related

bigquery sql table function with string interpolation

I am trying to write a BigQuery SQL function / stored procedure / table function that accepts as input:
a INT64 filter for the WHERE clause,
a table name (STRING type) as fully qualified name e.g. project_id.dataset_name.table_name
The idea is to dynamically figure out the table name and provide a filter to slice the data to return as a table.
However if try to write a Table Function (TVF) and I use SET to start dynamically writing the SQL to execute, then I see this error:
Syntax error: Expected "(" or keyword SELECT or keyword WITH but got keyword SET at [4:5]
If I try to write a stored procedure, then it expects BEGIN and END and throws this error:
Syntax error: Expected keyword BEGIN or keyword LANGUAGE but got keyword AS at [3:1]
If I try to add those, then I get various validation errors basically because I need to remove the WITH using CTEs (Common Table Expression), and semicolons ; etc.
But what I am really trying to do is using a table function:
to combine some CTEs dynamically with those inputs above (e.g. the input table name),
to PIVOT that data,
to then eventually return a table as a result of a SELECT.
A bit like producing a View that could be used in other SQL queries, but without creating the view (because the slice of data can be decided dynamically with the other INT64 input filter).
Once I dynamically build the SQL string I would like to EXECUTE IMMEDIATE that SQL and provide a SELECT as a final step of the table function to return the "dynamic table".
The thing is that:
I don't know before runtime the name of this table.
But I have all these tables with the same structure, so the SQL should apply to all of them.
Is this possible at all?
This is the not-so-working SQL I am trying to work around. See what I am trying to inject with %s and num_days:
CREATE OR REPLACE TABLE FUNCTION `my_dataset.my_table_func_name`(num_days INT64, fqn_org_table STRING)
AS (
-- this SET breaks !!!
SET f_query = """
WITH report_cst_t AS (
SELECT
DATE(start) as day,
entity_id,
conn_sub_type,
FROM `%s` AS oa
CROSS JOIN UNNEST(oa.connection_sub_type) AS conn_sub_type
WHERE
DATE(start) > DATE_SUB(CURRENT_DATE(), INTERVAL num_days DAY)
AND oa.entity_id IN ('my-very-long-id')
ORDER BY 1, 2 ASC
),
cst AS (
SELECT * FROM
(SELECT day, entity_id, report_cst_t FROM report_cst_t)
PIVOT (COUNT(*) AS connection_sub_type FOR report_cst_t.conn_sub_type IN ('cat1', 'cat2','cat3' ))
)
""";
-- here I would like to EXECUTE IMMEDIATE !!!
SELECT
cst.day,
cst.entity_id,
cst.connection_sub_type_cat1 AS cst_cat1,
cst.connection_sub_type_cat2 AS cst_cat2,
cst.connection_sub_type_cat3 AS cst_cat3,
FROM cst
ORDER BY 1, 2 ASC
);
This might not be satisfying but since Procedural language or DDL are not allowed inside Table functions currently, one possible way around would be simply using PROCEDURE like below.
CREATE OR REPLACE PROCEDURE my_dataset.temp_procedure(filter_value INT64, table_name STRING)
BEGIN
EXECUTE IMMEDIATE FORMAT(CONCAT(
"SELECT year, COUNT(1) as record_count, ",
"FROM %s ",
"WHERE year = %d ",
"GROUP BY year ",
"; "
), table_name, filter_value);
END;
CALL my_dataset.temp_procedure(2002, 'bigquery-public-data.usa_names.usa_1910_current');

neo4j - Exporting data to SQL/MS Access via apoc.load.jdbcUpdate

I am struggling to dynamically build a CREATE or ALTER statement from a list of keys(n) results to be employed using the apoc.load.jdbcUpdate function. I have this function working very well for CREATE and INSERT statements for simple CREATE and INSERT statements.
The challenge I have is the need to export numerous lists of keys(n) in order to build a table in SQL/MS Access. Is there a way to "loop" through this list or collection to dynamically build this statement? This would be followed by a similar dynamically built INSERT statement to insert the records themselves.
My actual data consists of several neo4j queries to get keys(n) of various nodes and combines them into a long list. I'll likely have 50-200 columns for each export and every column will be unique or different with only about 10 columns common to every export. At present, I can make every column TEXT 12 but next steps will be getting the associated Type(n) for each Keys(n).
Aother option I tried was trying to ALTER an existing "template" table however the same issue with "looping through columns persists.
Many thanks in advance!
:param exportUrl => 'jdbc:ucanaccess:///artifacts/export/sample.accdb'
WITH ['colA', 'colB', 'colC', 'colD', 'colE', 'colF', 'colG', 'colH', 'colI', 'colJ'] AS listCOLS
UNWIND listCOLS as columnNAMES
WITH columnNAMES,
'CREATE Table TBL_ATTR ([colA] TEXT(12), [colB] Double, [colC] TEXT(36), [colD] TEXT(38), [colE] Integer, [colF] TEXT(24), ' +
'[colG] TEXT(20), [colH] TEXT(2), [colI] TEXT(4), [colJ] TEXT(50))' as statement
CALL apoc.load.jdbcUpdate($exportUrl, statement) YIELD row
return COUNT(row);
The code above seems to correctly run and CREATE the table however I do receive the following exception
Failed to invoke procedure `apoc.load.jdbcUpdate`: Caused by: org.hsqldb.HsqlException: object name already exists: TBL_NEW```
It appears I have something that works now using a CREATE function then adding an ALTER TABLE statement for each element in the rows (after UNWIND.)
Effectively I unwind the list into a table format then the apoc function runs table with some default column and then run ALTER table commands to edit for each column but it also allows me to modify the data type for each row (in the example below)
It does seem redundant to run each line to the apoc.load.jdbcUpdate() function it allows me to have some control and doesn't really impact speed. (creates ~100 columns in 54ms for a larger data set which doesn't seem to bad)
:param exportUrl => 'jdbc:ucanaccess:///artifacts/export/sample.accdb'
WITH ['colA', 'colB', 'colC', 'colD', 'colE', 'colF', 'colG', 'colH', 'colI', 'colJ'] AS listCOLS
UNWIND listCOLS as columnNAMES
WITH columnNAMES,
'CREATE Table TBL_ATTR ([DEFAULT] TEXT(12))' as statement
CALL apoc.load.jdbcUpdate($exportUrl, statement) YIELD row
WITH columnNAMES,
CASE columnNAMES //modify data types for various data
WHEN "colB" THEN 'ALTER TABLE TBL_ATTR ADD COLUMN ' + columnNAMES + ' DOUBLE'
WHEN "colC" THEN 'ALTER TABLE TBL_ATTR ADD COLUMN ' + columnNAMES + ' INTEGER'
WHEN "colH" THEN 'ALTER TABLE TBL_ATTR ADD COLUMN ' + columnNAMES + ' DATE'
WHEN "colJ" THEN 'ALTER TABLE TBL_ATTR ADD COLUMN ' + columnNAMES + ' TEXT(12)'
ElSE 'ALTER TABLE TBL_ATTR ADD COLUMN ' + columnNAMES + ' TEXT(25)'
END as statement
WITH columnNAMES, statement
CALL apoc.load.jdbcUpdate($exportUrl, statement) YIELD row
return count(row);
While the function executes successfully and I can see the columns correctly reflected in the export file, I do receive an error so not sure what is throwing the exception.
Failed to invoke procedure `apoc.load.jdbcUpdate`: Caused by: org.hsqldb.HsqlException: object name already exists: TBL_ATTR

Link Sheets and Query with timestamps

I'm trying to link a custom query, but since i'm new in this, i'm getting stuck when Sheets tries to read timestamps
I'm getting the following error:
Something's wrong. Please try again later: Error while parsing the query: Syntax error: Table name contains '-' character. It needs to be quoted: 'xxxxxxxxxx' [at 3:10]
SELECT COUNT(DISTINCT sequence) as aaaaaaaa,bbbbbbb,
EXTRACT (date from CreationDateBR) as data,
FROM xxxxxxxxxxxx
WHERE CreationDateBR BETWEEN '2021-01-01' AND '2022-01-01'
AND loja IN ('Marketplace C')
AND parceiros NOT IN ('partner A')
GROUP BY parceiros,data
ORDER BY data,parceiros asc
The error message states that the issue is in the table name. You can try to put backticks around it (`):
FROM `xxxxxxxxxxxx`
A good trick to ensure the proper formatting is to go to your table in the BigQuery UI and click on "query this table": the proper formatting will open in a new query editor.

Ambiguous column error creating table in Aster Studio 6.0

I am new to databases and am posting a problem from work. I am creating a table in Aster Studio 6.0, but got an error about an ambiguous column. I ran the same query in Teradata SQL Assistant and did not get an error.
I have six tables with millions of rows named EDW.SWIFTIQ_TRANS_DTL, EDW.SWIFTIQ_STORE, EDW.SWIFTIQ_PROD, EDW.STORE_XREF, EDW.TDLNX_STR_OUTLT, and EDW.SURV_CWC.
EDW represents the original database, but the columns were labeled with aliases.
I did a trim() on the VARCHAR columns for saving spool space. For the error about TDLNX_RTL_OUTLT_NBR, I performed an INNER JOIN on similar columns from two different tables. Doing a preview in SQL Assistant, there was a temporary table with only one column called TDLNX_RTL_OUTLT_NBR.
Here’s the SQL query:
CREATE TABLE public.table_name
DISTRIBUTE BY HASH (SRC_SYS_PROD_ID) AS (
SELECT * FROM load_from_teradata(
ON public.load_from_teradata_dummy
TDPID(‘database_name')
USERNAME(’user_name')
PASSWORD(’ss')
QUERY ('SELECT e.TDLNX_RTL_OUTLT_NBR, e.OUTLT_ST_ADDR_TXT, e.STORE_OUTLT_ZIP_CD, d.TRANS_ID, d.TRANS_DT,
d.TRANS_TM, d.UNIT_QTY, d.SRC_SYS_STORE_ID, d.SRC_SYS_PROD_ID, d.SRC_SYS_NM, a.SRC_SYS_STORE_ID, a.SRC_SYS_NM, a.STORE_NM,
a.CITY_NM, a.ZIP_CD, a.ST_cd, p.SRC_SYS_PROD_ID, p.SRC_SYS_NM, p.UPC_CD, p.PROD_ID, f.SRC_SYS_STORE_ID, f.SRC_SYS_NM,
f.TDLNX_RTL_OUTLT_NBR, g.SURV_CWC_WSLR_CUST_PARTY_ID, g.AGE_CD, g.HIGH_END_ACCT_FLG, g.RACE_ETHNC_CD, g.OCCPN_CD
FROM EDW.SWIFTIQ_TRANS_DTL d
INNER JOIN EDW.SWIFTIQ_STORE a
ON trim( a.SRC_SYS_STORE_ID) = trim(d.SRC_SYS_STORE_ID)
INNER JOIN EDW.SWIFTIQ_PROD p
ON trim(p.SRC_SYS_PROD_ID) = trim(d.SRC_SYS_PROD_ID)
and p.SRC_SYS_NM = d.SRC_SYS_NM
INNER JOIN EDW.STORE_XREF f
ON trim(f.SRC_SYS_STORE_ID) = trim(a.SRC_SYS_STORE_ID)
INNER JOIN EDW.TDLNX_STR_OUTLT e
ON trim(e.TDLNX_RTL_OUTLT_NBR)= trim(f.TDLNX_RTL_OUTLT_NBR)
INNER JOIN EDW.SURV_CWC g
ON g.SURV_CWC_WSLR_CUST_PARTY_ID = e.WSLR_CUST_PARTY_ID
WHERE TRANS_DT between ''2015-01-01'' and ''2015-03-31''')
num_instances('4') ) );
ERROR: column reference 'TDLNX_RTL_OUTLT_NBR' is ambiguous.
EDIT: Forgot to include a description about the table aliases. a stands for EDW.SWIFTIQ_STORE, p for EDW.SWIFTIQ_PROD, f for EDW.STORE_XREF, e for EDW.TDLNX_STR_OUTLT, g for EDW.SURV_CWC, and d for EDW.SWIFTIQ_TRANS_DTL.
You will get the same error when you try CREATE TABLE AS SELECT in Teradata. There are three column names, SRC_SYS_NM & SRC_SYS_PROD_ID & SRC_SYS_STORE_ID, which are used multiple times (with different table aliases) within the SELECT.
Add column aliases to make those names unique, e.g. trans_SRC_SYS_NM instead of d.SRC_SYS_NM.
Additionally the TRIMs in the joins are a very bad idea. You will probably not save that much spool, but force the optimizer to redistribute all spools for join-preparation.

Hive Join returning zero records

I have two Hive tables and I am trying to join both of them. The tables are not clustered or partitioned by any field. Though the tables contain records for common key fields, the join query always returns 0 records. All the data types are 'string' data types.
The join query is simple and looks something like below
select count(*) cnt
from
fsr.xref_1 A join
fsr.ipfile_1 B
on
(
A.co_no = B.co_no
)
;
Any idea what could be going wrong? I have just one record (same value) in both the tables.
Below are my table definitions
CREATE TABLE xref_1
(
co_no string
)
clustered by (co_no) sorted by (co_no asc) into 10 buckets
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
CREATE TABLE ipfile_1
(
co_no string
)
clustered by (co_no) sorted by (co_no asc) into 10 buckets
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
Hi You are using Star Schema Join. Please use your query like this:
SELET COUNT(*) cnt FROM A a JOIN B b ON (a.key1 = b.key1);
If still have issue Then use MAPJOIN:
set hive.auto.convert.join=true;
select count(*) from A join B on (key1 = key2)
Please see Link for more detail.

Resources