Joining 2 large relations in Rascal - rascal

I'm trying to join two relations in Rascal, much like a SQL join, with the following code:
rel[loc,loc,loc] methodInvocationsWithClass = {arround 40000 tuples};
rel[loc,loc] declaredClassHierarchy = {around 20000 tuples};
{ <from,to,class,super> | <from,to,class> <- methodInvocationsWithClass, <sub,super> <- declaredClassHierarchy, class == sub };
While this does exactly what I need it appears it only works well on small relations and doesn't scale well.
Is there perhaps a more efficient alternative way to accomplish this?

Indeed, we have the join keyword for this. Also lots of other useful relational operations are supported. Either by keywords or functions inside the Relation module.

Related

Is there any diferrence between includes(:associations).references(:associations) and eager_load(:associations) in Ruby on Rails 5?

It seems includes(:associations).references(:associations) and eager_load(:associations) execute exactly the same SQL (LEFT OUTER JOIN) in Rails 5. So when do I need to use includes(:associations).references(:associations) syntax?
For example,
Parent.includes(:children1, :children2).references(:children1).where('(some conditions of children1)')
can be converted to
Parent.eager_load(:children1).preload(:children2).where('(some conditions of children1)')
I think the latter (query using eager_load and preload) is simpler and looks better.
UPDATE
I found a strange behavior in my environment (rails 5.2.4.3).
Even when I includes several associations and references only one of them, all the associations I included are LEFT OUTER JOINed.
For example,
Parent.includes(:c1, :c2, :c3).references(:c1).to_sql
executes a SQL which LEFT OUTER JOINs all of c1, c2, c3.
I thought it joins only c1.
Indeed, includes + references ends up being the same as eager_load. Like many things in Rails, you have a few ways of accomplishing the same result, and here you are witnessing that first hand. If I were writing them in a single statement, I would always prefer eager_load because it is more explicit and it is a single function call.
I would also prefer eager_load because I consider references a kind of hack. It says to the SQL generator "Hey, I am referring to this object in a way that you would not otherwise detect, so include it in the JOIN statement" and is generally used when you use a String to pass a SQL fragment as part of the query.
The only time I would use the includes(:associations).references(:associations) syntax is when it was an artifact needed to make the query work and not a statement of intent. The Rails Guide gives this good example:
Article.includes(:comments).where("comments.visible = true").references(:comments)
As for why referencing 1 association causes 3 to be JOIN'ed, I do not know for sure. The code for includes uses heuristics to decide when it is faster to use a JOIN and when it is faster to use 2 separate queries, the first to retrieve the parent and the second to retrieve the associated objects. I was surprised to find how often it is faster to use 2 separate queries. It may be that since the query has to use 1 join anyway, the algorithm figures it will be faster to use 1 big join rather than 3 queries, or it may in general think 1 join is faster than 4 queries.
I would not in general use preload unless I had a strong reason to believe that it was faster than join. I would just use includes alone and let the algorithm decide.

self-join on documentdb syntax error

I'm having trouble doing an otherwise SQL valid self-join query on documentdb.
So the following query works:
SELECT * FROM c AS c1 WHERE c1.obj="car"
But this simple self join query does not: SELECT c1.url FROM c AS c1 JOIN c AS c2 WHERE c1.obj="car" AND c2.obj="person" AND c1.url = c2.url, with the error, Identifier 'c' could not be resolved.
It seems that documendb supports self-joins within the document, but I'm asking on the collection level.
I looked at the official syntax doc and understand that the collection name is basically inferred; I tried changing c to explicitly my collection name and root but neither worked.
Am I missing something obvious? Thanks!
A few things to clarify:
1.) Regarding Identifier 'c' could not be resolved
Queries are scoped to a single collection; and in the example above, c is an implicit alias for the collection which is being re-aliased to c1 with the AS keyword.
You can fix the example query changing fixing the JOIN to reference c1:
SELECT c1.url
FROM c AS c1
JOIN c1 AS c2
WHERE c1.obj="car" AND c2.obj="person" AND c1.url = c2.url`
This is also equivalent to:
SELECT c1.url
FROM c1
JOIN c1 AS c2
WHERE c1.obj="car" AND c2.obj="person" AND c1.url = c2.url`
2.) Understanding JOINs and examining your data model
With that said, I don't think fixing the query syntax issue above will produce the behavior you are expecting. The JOIN keyword in DocumentDB SQL is designed for forming a cross product with a denormalized array of elements within a document (as opposed to forming cross products across other documents in the same collection). If you run in to struggles here, it may be worth taking a step back and revisiting how to model your data for Azure Cosmos DB.
In a RDBMS, you are trained to think entity-first and normalize your data model based on entities. You rely heavily on a query engine to optimize queries to fit your workload (which typically do a good, but not always optimal, job for retrieving data). The challenges here are that many relational benefits are lost as scale increases, and scaling out to multiple shards/partitions becomes a requirement.
For a scale-out distributed database like Cosmos DB, you will want to start with understanding the workload first and optimize your data model to fit the workload (as opposed to thinking entity first). You'll want to keep in mind that collections are merely a logical abstraction composed of many replicas that live within partition sets. They do not enforce schema and are the boundary for queries.
When designing your model, you will want to incorporate the following questions in to your thought process:
What is the scale, in terms of size and throughput, for the broader solution (an estimate of order of magnitude is sufficient)?
What is the ratio of reads vs writes?
For writes - what is the pattern for writes? Is it mostly inserts, or are there a lot of updates?
For reads - what do top N queries look like?
The above should influence your choice of partition key as well as what your data / object model should look like. For example:
The ratio of requests will help guide how you make tradeoffs (use Pareto principle and optimize for the bulk of your workload).
For read-heavy workloads, commonly filtered properties become candidates for choice of partition key.
Properties that tend to be updated together frequently should be abstracted together in the data model, and away from properties that get updated with a slower cadence (to lower the RU charge for updates).
Don't be afraid to duplicate properties to enrich queryability, and annotate types, across different record types. For example, we have two types of documents: cat and person.
{
   "id": "Andrew",
   "type": "Person",
   "familyId": "Liu",
   "employer": "Microsoft"
}
 
{
   "id": "Ralph",
   "type": "Cat",
   "familyId": "Liu",
   "fur": {
         "length": "short",
         "color": "brown"
   }
}
 
We can query both types of documents without needing a JOIN simply by running a query without a filter on type:
SELECT * FROM c WHERE c.familyId = "Liu"
And if we wanted to filter on type = “Person”, we can simply add a filter on type to our query:
SELECT * FROM c WHERE c.familyId = "Liu" AND c.type = "Person"
Above Answer has queries mentioned by #Andrew Liu. This will resolve your error but Azure Cosmos DB does not support Cross-item and cross-container joins. Use this link to read about joins https://learn.microsoft.com/en-us/azure/cosmos-db/sql/sql-query-join

RoR/Squeel - How do I use Squeel::Nodes::Join/Predicates?

I just recently inherited a project where the previous developer used Squeel.
I've been studying Squeel for the past day now and know a bit about how to use it from what I could find online. The basic use of it is simple enough.
What I haven't been able to find online (except for on ruby-doc.org, which didn't give me much), is how to use Squeel::Nodes::Join and Squeel::Nodes::Predicate.
The only thing I've been able to find out is that they are nodes representing join associations / predicate expressions, which I had figured as much. What I still don't know is how to use them.
Can someone help me out or point me toward a good tutorial/guide?
I might as well answer this since I was able to figure out quite a bit through trial and error and by using ruby-doc as a guide. Everything I say here is not a final definition to each of these. It's just what I know that may be able to help someone out in the future in case anyone else is stuck making dynamic queries with Squeel.
Squeel::Nodes::Stub
Let's actually start with Squeel::Nodes::Stub. This is a Squeel object that can take either a symbol or a string and can convert it into the name of a table or column. So you can create a new Squeel::Nodes::Stube.new("value") or Squeel::Nodes::Stube.new(:value) and use this stub in other Squeel nodes. You'll see examples of it being used below.
Squeel::Nodes::Join
Squeel::Nodes::Join acts just like you might suspect. It is essentially a variable you can pass in to a Squeel joins{} that will then perform the join you want. You give it a stub (with a table name), and you can also give it another variable to change the type of join (I only know how to change it to outer join at the moment). You create one like so:
Squeel::Nodes::Join.new(Squeel::Nodes::Stub.new(:custom_fields), Arel::OuterJoin)
The stub is used to let the Join know we want to join the custom_fields table, and the Arel::OuterJoin is just to let the Join know we want to do an outer join. Again, you don't have to put a second parameter into Squeel::Nodes::Join.new(), and I think it will default to performing an inner join. You can then join this to a model:
Person.joins{Squeel::Nodes::Join.new(Squeel::Nodes::Stub.new(:custom_fields), Arel::OuterJoin)}
Squeel::Nodes::Predicate
Squeel::Nodes::Predicate may seem pretty obvious at this point. It's just a comparison. You give it a stub (with a column name), a method of comparison (you can find them all in the Predicates section on Squeel's github) and a value to compare with, like so:
Squeel::Nodes::Predicate.new(Squeel::Nodes::Stub(:value), :eq, 5)
You can even AND or OR two of them together pretty easily.
AND: Squeel::Nodes::Predicate.new(Squeel::Nodes::Stub(:value1), :eq, 5) & Squeel::Nodes::Predicate.new(Squeel::Nodes::Stub(:value2), :eq, 10)
OR: Squeel::Nodes::Predicate.new(Squeel::Nodes::Stub(:value1), :eq, 5) | Squeel::Nodes::Predicate.new(Squeel::Nodes::Stub(:value2), :eq, 10)
These will return either a Squeel::Nodes::And or a Squeel::Nodes::Or with the nested Squeel::Nodes::Predicates.
Then you can put it all together like this (of course you'd probably have the joins in a variable, a, and the predicates in a variable b, because you are doing this dynamically, otherwise you should probably be using regular Squeel instead of Squeel nodes):
Person.joins{Squeel::Nodes::Join.new(Squeel::Nodes::Stub.new(:custom_fields),
Arel::OuterJoin)}.where{Squeel::Nodes::Predicate.new(Squeel::Nodes::Stub(:value1), :eq, 5) | Squeel::Nodes::Predicate.new(Squeel::Nodes::Stub(:value2), :eq, 10)}
I unfortunately could not figure out how to do subqueries though :(

Propel query with nested statements and empty field value

I have quite a complex SQL query which I would like to transform into Propel but I am not sure about the best approach.
The query I need looks like this:
SELECT id_loan
FROM loan loanA
JOIN loan_funding on fk_loan = loanA.id_loan
JOIN `user` userA on loan_funding.fk_user = userA.id_user
WHERE
userA.`acc_internal_account_id` is not null
AND loanA.`state` = 'payment_origination'
AND loanA.id_loan IN (
SELECT id_loan from loan loanB
JOIN loan_funding on fk_loan = id_loan
JOIN `user` userB on loan_funding.fk_user = userB.id_user
WHERE
userB.`acc_internal_account_id` is null
AND loanB.`state` = 'payment_origination'
GROUP BY loanB.id_loan
)
GROUP BY loanA.id_loan
LIMIT 1;
What I would like to have is something completely based on the Generated Query Methods but I do not quite get how to do it.
Performance is not an issue but as for now it is unclear where and how those queries will be called from. However, it is important to get back an object as we need to use the getters and setters.
I found this website: http://propelorm.org/blog/2011/02/02/how-can-i-write-this-query-using-an-orm-.html which looks really cool and helpful, however, I am not sure what option fits best here.
I do not expect a complete solution but maybe some thoughts how to narrow down the problem...
What confuses me is especially the part where it compares the id_loan and fk_loan before it goes to the user table. How would this relationship be represented by propel? Might it be better to split the whole thing in multiple queries?
Any hints appreciated!

Joining two tables in WQL/SCCM

I think I'm being really stupid here.
I'm using vbscript. I've connected to an SCCM server
Set locator = CreateObject("WbemScripting.SWbemLocator")
Set connection = locator.ConnectServer("SERVERNAME", "Root\SMS\SITENAME")
I then have a WMI WQL query:
Set Collections = connection.ExecQuery("SELECT LastStatusTime,AdvertisementID,
LastStateName,AdvertisementName
FROM SMS_ClientAdvertisementStatus
INNER JOIN SMS_Advertisement
ON SMS_Advertisement.AdvertisementID = SMS_ClientAdvertisementStatus.AdvertisementID
WHERE LastStateName = 'Succeeded'
AND LastStatusTime > '2012-09-25'")
FOR each Collection in Collections
Collection.LastStatusTime
Collection.AdvertisementID
Next
I think there's a gap in my understanding of WQL. I seem to be able to join these two WQL "tables" in this query, but I can only return values from SMS_ClientAdvertisementStatus.
If I try to return anything from SMS_Advertisement, the table I've joined, I just get an error.
Can you join "tables" in WQL - if they even are tables? Or do I have to have a nested query? Or is there another way of returning data from two tables?
WQL doesn't support JOINs, but you can use MOF to define WMI classes that contain data from multiple classes. See here:
Creating a New Instance from Old Properties
The WQL language is just a subset of SQL and doesn't supports the JOIN statement, instead you can use the ASSOCIATORS OF in some cases.
WQL does support joins. Here is a sample working query, which lists the names of devices which match with collection names. Works in SCCM 2012.
select SMS_R_SYSTEM.Name from SMS_R_System inner join SMS_Collection as Systems on Systems.Name = SMS_R_System.Name
I had a similar issue when trying to use JOIN statements in my PowerShell SCCM / ConfigManager queries and found this to be a great solution:
https://gallery.technet.microsoft.com/scriptcenter/SCCM-2012-WMI-query-with-0daea30c#content
I believe the methods could translate to other languages too.

Resources