Is there any way I can achieve following query with operator functions instead of QuiryBiulder:
SELECT * FROM subscriptions WHERE year = 1980 OR year = 1990 or year > 2000
The current solution doesn't support the last part as the filter has been defined like
import { In } from 'typeorm';
const filters['year'] = In([1980, 1990]);
There is a constraint that I should use operator functions listed here https://orkhan.gitbook.io/typeorm/docs/find-options#combining-advanced-options to achieve it.
Related
In angular dart I have a list of transactions with a date and an ID (integer). I want to sort the list by date and within each similar date sub-sort by ID. I was able to do this by sorting by ID first and then sorting the list again by date. This seems like it is a common type of sort. Can this be done in one sort statement instead of two?
transactions.sort((a, b) => (a.id.compareTo(b.id)));
transactions.sort((a, b) => (a.transdate.compareTo(b.transdate)));
Yes it's possible
You just need to compare the date first, and if dates are the same you compare the id
transactions.sort((a, b) {
final diff = a.transdate.compareTo(b.transdate);
if (diff == 0) {
return a.id.compareTo(b.id);
}
return diff;
});
I have two dataframes and I would like to join them based on one column, with a caveat that this column is a timestamp, and that timestamp has to be within a certain offset (5 seconds) in order to join records. More specifically, a record in dates_df with date=1/3/2015:00:00:00 should be joined with events_df with time=1/3/2015:00:00:01 because both timestamps are within 5 seconds from each other.
I'm trying to get this logic working with python spark, and it is extremely painful. How do people do joins like this in spark?
My approach is to add two extra columns to dates_df that will determine the lower_timestamp and upper_timestamp bounds with a 5 second offset, and perform a conditional join. And this is where it fails, more specifically:
joined_df = dates_df.join(events_df,
dates_df.lower_timestamp < events_df.time < dates_df.upper_timestamp)
joined_df.explain()
Captures only the last part of the query:
Filter (time#6 < upper_timestamp#4)
CartesianProduct
....
and it gives me a wrong result.
Do I really have to do a full blown cartesian join for each inequality, removing duplicates as I go along?
Here is the full code:
from datetime import datetime, timedelta
from pyspark import SparkContext, SparkConf
from pyspark.sql import SQLContext
from pyspark.sql.types import *
from pyspark.sql.functions import udf
master = 'local[*]'
app_name = 'stackoverflow_join'
conf = SparkConf().setAppName(app_name).setMaster(master)
sc = SparkContext(conf=conf)
sqlContext = SQLContext(sc)
def lower_range_func(x, offset=5):
return x - timedelta(seconds=offset)
def upper_range_func(x, offset=5):
return x + timedelta(seconds=offset)
lower_range = udf(lower_range_func, TimestampType())
upper_range = udf(upper_range_func, TimestampType())
dates_fields = [StructField("name", StringType(), True), StructField("date", TimestampType(), True)]
dates_schema = StructType(dates_fields)
dates = [('day_%s' % x, datetime(year=2015, day=x, month=1)) for x in range(1,5)]
dates_df = sqlContext.createDataFrame(dates, dates_schema)
dates_df.show()
# extend dates_df with time ranges
dates_df = dates_df.withColumn('lower_timestamp', lower_range(dates_df['date'])).\
withColumn('upper_timestamp', upper_range(dates_df['date']))
event_fields = [StructField("time", TimestampType(), True), StructField("event", StringType(), True)]
event_schema = StructType(event_fields)
events = [(datetime(year=2015, day=3, month=1, second=3), 'meeting')]
events_df = sqlContext.createDataFrame(events, event_schema)
events_df.show()
# finally, join the data
joined_df = dates_df.join(events_df,
dates_df.lower_timestamp < events_df.time < dates_df.upper_timestamp)
joined_df.show()
I get the following output:
+-----+--------------------+
| name| date|
+-----+--------------------+
|day_1|2015-01-01 00:00:...|
|day_2|2015-01-02 00:00:...|
|day_3|2015-01-03 00:00:...|
|day_4|2015-01-04 00:00:...|
+-----+--------------------+
+--------------------+-------+
| time| event|
+--------------------+-------+
|2015-01-03 00:00:...|meeting|
+--------------------+-------+
+-----+--------------------+--------------------+--------------------+--------------------+-------+
| name| date| lower_timestamp| upper_timestamp| time| event|
+-----+--------------------+--------------------+--------------------+--------------------+-------+
|day_3|2015-01-03 00:00:...|2015-01-02 23:59:...|2015-01-03 00:00:...|2015-01-03 00:00:...|meeting|
|day_4|2015-01-04 00:00:...|2015-01-03 23:59:...|2015-01-04 00:00:...|2015-01-03 00:00:...|meeting|
+-----+--------------------+--------------------+--------------------+--------------------+-------+
I did spark SQL query with explain() to see how it is done, and replicated the same behavior in python. First here is how to do the same with SQL spark:
dates_df.registerTempTable("dates")
events_df.registerTempTable("events")
results = sqlContext.sql("SELECT * FROM dates INNER JOIN events ON dates.lower_timestamp < events.time and events.time < dates.upper_timestamp")
results.explain()
This works, but the question was about how to do it in python, so the solution seems to be just a plain join, followed by two filters:
joined_df = dates_df.join(events_df).filter(dates_df.lower_timestamp < events_df.time).filter(events_df.time < dates_df.upper_timestamp)
joined_df.explain() yields the same query as sql spark results.explain() so I assume this is how things are done.
Although a year later, but might help others..
As you said, a full cartesian product is insane in your case. Your matching records will be close in time (5 minutes) so you can take advantage of that and save a lot of time if you first group together records to buckets based on their timestamp, then join the two dataframes on that bucket and only then apply the filter. Using that method causes Spark to use a SortMergeJoin and not a CartesianProduct and greatly boosts performance.
There is a small caveat here - you must match to both the bucket and the next one.
It's better explain in my blog, with working code examples (Scala + Spark 2.0 but you can implement the same in python too...)
http://zachmoshe.com/2016/09/26/efficient-range-joins-with-spark.html
I wish to retrieve a list of UserProfile registrations per day.
The domain object UserProfile stores a Date creationDate property.
I've tried
def results = UserProfile.executeQuery('select u.creationDate, count(u) from UserProfile as u group by u.creationDate')
println results
which obviously is not what I need because data is (already) stored with complete time in it.
Any resource savvy solution will fit: projections, hql, ...
Thnks
I use HQL cast function:
def results = UserProfile.executeQuery("""
select cast(u.creationDate as date), count(u)
from UserProfile as u
group by cast(u.creationDate as date)
""")
Underlying database must support ANSI cast(... as ...) syntax for this to work, which is the case for PostgreSQL, MySQL, Oracle, SQL Server and many other DBMSs
Break down the date to day, month and year then ignore the timestamp.
This should give you what you need.
def query =
"""
select new map(day(u.creationDate) as day,
month(u.creationDate) as month,
year(u.creationDate) as year,
count(u) as count)
from UserProfile as u
group by day(u.creationDate),
month(u.creationDate),
year(u.creationDate)
"""
//If you do not worry about dates any more then this should be enough
def results = UserProfile.executeQuery( query )
//Or create date string which can be parsed later
def refinedresults =
results.collect { [ "$it.year-$it.month-$it.day" : it.count ] }
//Or parse it right here
def refinedresults =
results.collect {
[ Date.parse( 'yyyy-MM-dd', "$it.year-$it.month-$it.day" ) : it.count ]
}
You could define a "derived" property mapped as a formula to extract the date part of the date-and-time. The exact formula will differ depending what DB you're using, for MySQL you could use something like
Date creationDay // not sure exactly what type this needs to be, it may need
// to be java.sql.Date instead of java.util.Date
static mapping = {
creationDay formula: 'DATE(creation_date)'
}
(the formula uses DB column names rather than GORM property names). Now you can group by creationDay instead of by creationDate and it should do what you need.
Or instead of a "date" you could use separate fields for year, month and day as suggested in the other answer, and I think those functions are valid in H2 as well as MySQL.
I have entity and also use dapper, I have 1 form with 2 date fields... named before and after so users can search in between those dates. The one from entity works perfectly but the one from dapper does not work for some reason what could possibly be wrong
here is the entity one
var article = (from x in db.Articles
where x.created >= before && x.created <= after
select x);
and here is the one from dapper
var article = sqlConnection.Query<Article>("Select * from articles where created>=#befor AND created<=#afte ", new { befor = before, afte = after});
and yes I have all the connections for Dapper working as it does go to the database but for some reason its not picking records between those 2 dates..any suggestions..
dapper is just a wrapper around raw TSQL (with a slight caveat around in, where dapper adds some magic to make varadic "in" queries simpler). So; if it works in TSQL it should work fine in dapper, as long as your inputs make sense. For example, I am assuming that before and after in this example are typed as non-nullable DateTime, i.e.
DateTime before = ..., after = ...;
var article = sqlConnection.Query<Article>(
"Select * from articles where created>=#befor AND created<=#afte ",
new { befor = before, afte = after});
as a side note, it would perhaps be more obvious to just use:
DateTime before = ..., after = ...;
var article = sqlConnection.Query<Article>(
"Select * from articles where created>=#before AND created<=#after",
new { before, after });
but fundamentally, as long as those parameters have non-null values that are DateTimes, it should work fine.
The problem here could be that the before and after values are send as string and the current locale may be causing the date to be interpreted incorrectly.
i have a need in which i have to return a count related to a day/week or year.
Example:
Assume i have orders which are placed on a certain date
class Order {
Date orderDate
}
How would i get an overview of the amount of orders per day / per week / per year etc?
It's untested but you need something like this:
class Order {
Date orderDate
BigDecimal amount
static namedQueries = {
summaryProojectionByDate = {
projections {
count("id", "orderCount")
sum("amount", "amountSum")
groupProperty("orderDate", "orderDate")
}
}
}
}
This groups by a specific date. I think it would be hard to group by weeks in HQL.
I just stumbled upon this blog that suggests using the new sqlProjection for Grails 2.0 and also has alternative solutions for Grails 1.x.