I want to query data from multiple models and group it and order it.
How do i do it with Realm?
In the SQLite / MySQL, i can use UNION to combine the queries and GROUP BY to group common field's value.
Im switching to Realm and now i'm stucking in how to perform it.
Here is an example about the query in SQLite
SELECT w1,abc('\(word)', kd) as lscore,freq FROM ng1 WHERE kd LIKE '\(beginchar)%\(lastchar)'
UNION
SELECT w2,abc('\(word)', kd) as lscore,freq FROM ng2 WHERE w1='\(lastword)' AND kd LIKE '\(beginchar)%\(lastchar)' ORDER BY lscore ASC,tp DESC,freq DESC LIMIT 0,4
Realm doesn't currently support UNION's
Related
Let's say I have 10,000 stored procedures in my database, each with a thousand LOC. A few of these stored procedures use dynamic temp tables. I want to find out all those stored procedures which use dynamic temp tables.
Can someone help me with the most optimized query for this task?
You can start with something like this:
SELECT s.name As schema_name, p.name as procedure_name
FROM sys.procedures p
INNER JOIN sys.schemas s
ON p.schema_id = s.schema_id
WHERE object_definition(object_id) like '%create table #%'
Tough I'm nut sure what you mean by "dynamic temp tables" and I'm also not sure about performance, but this should give you a list of all stored procedures that contains the text create table # - that's the creation of a temporary table.
See a live demo or rextester.
Update (again)
Now that you have clarified what you mean by "dynamic temp tables", all you need to do is change the condition in the where clause:
SELECT s.name As schema_name, p.name as procedure_name
FROM sys.procedures p
INNER JOIN sys.schemas s
ON p.schema_id = s.schema_id
WHERE object_definition(object_id) like '%[^insert] into #%'
I am writing a hive query to join two tables; table1 and table2. In the result I just need all columns from table1 and no columns from table2.
I know the solution where I can select all the columns manually by specifying table1.column1, table1.column2.. and so on in the select statement. But I have about 22 columns in table 1. Also, I have to do the same for multiple other tables ans its painful process.
I tried using "SELECT table1.*", but I get a parse exception.
Is there a better way to do it?
Hive 0.13 onwards the following query syntax works:
SELECT a.* FROM a JOIN b ON (a.id = b.id)
This query will select all columns from a. So instead of typing all the column names (making the query cumbersome), it is a better idea to use tablealias.*
I have two huge tables in Hive. 'table 1' and 'table 2'. Both table has a common column 'key'.
I have queried 'table 1' with the desired conditions and created a DataFrame 'df1'.
Now, I want to query 'table 2' and want to use a column from 'df1' in the where clause.
Here is the code sample:
val df1 = hiveContext.sql("select * from table1 limit 100")
Can I do something like
val df2 = hiveContext.sql("select * from table2 where key = df1.key")
** Note : I don't want to make a single query with joining both tables
Any help will be appreciated.
Since you have explicitly written that you do NOT want to join the tables, then the short answer is "No, you cannot do such a query".
I'm not sure why you don't want to do the join, but it is definitely needed if you want to do the query. If you are worried about joining two "huge tables", then don't be. Spark was build for this kind of thing :)
The solution that I found is the following
Let me first give the dataset size.
Dataset1 - pretty small (10 GB)
Dataset2 - big (500 GB+)
There are two solutions to dataframe joins
Solution 1
If you are using Spark 1.6+, repartition both dataframes by the
column on which join has to be done. When I did it, the join was done
in less than 2 minutes.
df.repartition(df("key"))
Solution 2
If you are not using Spark 1.6+ (even if using 1.6+), if one
data is small, cache it and use that in broadcast
df_small.cache
df_big.join(broadcast(df_small) , "key"))
This was done in less than a minute.
I have two datasets DS1 and DS2. DS1 is 100,000rows x 40cols, DS2 is 20,000rows x 20cols. I actually need to pull COL1 from DS1 if some fields match DS2.
Since I am very-very new to SAS, I am trying to stick to SQL logic.
So basically I did (shot version)
proc sql;
...
SELECT DS1.col1
FROM DS1 INNER JOIN DS2
on DS1.COL2=DS2.COL3
OR DS1.COL3=DS2.COL3
OR DS1.COL4=DS2.COL2
...
After an hour or so, it was still running, but I was getting emails from SAS that I am using 700gb or so. Is there a better and faster SAS-way of doing this operation?
I would use 3 separate queries and use a UNION
proc sql;
...
SELECT DS1.col1
FROM DS1 INNER JOIN DS2
on DS1.COL2=DS2.COL3
UNION
SELECT DS1.col1
FROM DS1 INNER JOIN DS2
On DS1.COL3=DS2.COL3
UNION
SELECT DS1.col1
FROM DS1 INNER JOIN DS2
ON DS1.COL4=DS2.COL2
...
You may have null or blank values in the columns you are joining on. Your query is probably matching all the null/blank columns together resulting in a very large result set.
I suggest adding additional clauses to exclude null results.
Also - if the same row happens to exist in both tables, then you should also prevent the row from joining to itself.
Either of these could effectively result in a cartesian product join (or something close to a cartesian product join).
EDIT : By the way - a good way of debugging this type of problem is to limit both datasets to a certain number of rows - say 100 in each - and then running it and checking the output to make sure it's expected. You can do this using the SQL options inobs=, outobs=, and loops=. Here's a link to the documentation.
First sort the datasets that you are trying to merge using proc sort. Then merge the datasets based on id.
Here is how you can do it.
I have assumed you match field as ID
proc sort data=DS1;
by ID;
proc sort data=DS2;
by ID;
data out;
merge DS1 DS2;
by ID;
run;
You can use proc sort for Ds3 and DS4 and then include them in merge statement if you need to join them as well.
I have two tables - tool_downloads and tool_configurations. I am trying to retrieve the most recent build date for each tool in my database. The layout of the DB is simple. One table called tool_downloads keeps track of when a tool is downloaded. Another table is called tool_configurations and stores the actual data about the tool. They are linked together by the tool_conf_id.
If I run the following query which omits dates, I get back 200 records.
SELECT DISTINCT a.tool_conf_id, b.tool_conf_id
FROM tool_downloads a
JOIN tool_configurations b
ON a.tool_conf_id = b.tool_conf_id
ORDER BY a.tool_conf_id
When I try to add in date information I get back hundreds of thousands of records! Here is the query that fails horribly.
SELECT DISTINCT a.tool_conf_id, max(a.configured_date) as config_date, b.configuration_name
FROM tool_downloads a
JOIN tool_configurations b
ON a.tool_conf_id = b.tool_conf_id
ORDER BY a.tool_conf_id
I know the problem has something to do with group-bys/aggregate data and joins. I can't really search google since I don't know the name of the problem I'm encountering. Any help would be appreciated.
Solution is:
SELECT b.tool_conf_id, b.configuration_name, max(a.configured_date) as config_date
FROM tool_downloads a
JOIN tool_configurations b
ON a.tool_conf_id = b.tool_conf_id
GROUP BY b.tool_conf_id, b.configuration_name