Combining table, web service data in Grails - grails

I'm trying to figure out the best approach to display combined tables based on matching logic and input search criteria.
Here is the situation:
We have a table of customers stored locally. The fields of interest are ssn, first name, last name and date of birth.
We also have a web service which provides the same information. Some of the customers from the web service are the same as the local file, some different.
SSN is not required in either.
I need to combine this data to be viewed on a Grails display.
The criteria for combination are 1) match on SSN. 2) For any remaining records, exact match on first name, last name and date of birth.
There's no need at this point for soundex or approximate logic.
It looks like what I should do is extract all the records from both inputs into a single collection, somehow making it a set on SSN. Then remove the blank ssn.
This will handle the SSN matching (once I figure out how to make that a set).
Then, I need to go back to the original two input sources (cached in a collection to prevent a re-read) and remove any records that exist in the SSN set derived previously.
Then, create another set based on first name, last name and date of birth - again if I can figure out how to make a set.
Then combine the two derived collections into a single collection. The collection should be sorted for display purposes.
Does this make sense? I think the search criteria will limit the number of record pulled in so I can do this in memory.
Essentially, I'm looking for some ideas on how the Grails code would look for achieving the above logic (assuming this is a good approach). The local customer table is a domain object, while what I'm getting from the WS is an array list of objects.
Also, I'm not entirely clear on how the maxresults, firstResult, and order used for the display would be affected. I think I need to read in all the records which match the search criteria first, do the combining, and display from the derived collection.

The traditional Java way of doing this would be to copy both the local and remote objects into TreeSet containers with a custom comparator, first for SSN, second for name/birthdate.
This might look something like:
def localCustomers = Customer.list()
def remoteCustomers = RemoteService.get()
TreeSet ssnFilter = new TreeSet(new ClosureComparator({c1, c2 -> c1.ssn <=> c2.ssn}))
ssnFilter.addAll(localCustomers)
ssnFilter.addAll(remoteCustomers)
TreeSet nameDobFilter = new TreeSet(new ClosureComparator({c1, c2 -> c1.firstName + c1.lastName + c1.dob <=> c2.firstName + c2.lastName + c2.dob}))
nameDobFilter.addAll(ssnFilter)
def filteredCustomers = nameDobFilter as List
At this point, filteredCustomers has all the records, except those that are duplicates by your two criteria.
Another approach is to filter the lists by sorting and doing a foldr operation, combining adjacent elements if they match. This way, you have an opportunity to combine the data from both sources.
For example:
def combineByNameAndDob(customers) {
customers.sort() {
c1, c2 -> (c1.firstName + c1.lastName + c1.dob) <=>
(c2.firstName + c2.lastName + c2.dob)
}.inject([]) { cs, c ->
if (cs && c.equalsByNameAndDob(cs[-1])) {
cs[-1].combine(c) //combine the attributes of both records
cs
} else {
cs << c
}
}
}

Related

Correct order of operations in neo4j - LOAD, MERGE, MATCH, WITH, SET

I am loading simple csv data into neo4j. The data is simple as follows :-
uniqueId compound value category
ACT12_M_609 mesulfen 21 carbon
ACT12_M_609 MNAF 23 carbon
ACT12_M_609 nifluridide 20 suphate
ACT12_M_609 sulfur 23 carbon
I am loading the data from the URL using the following query -
LOAD CSV WITH HEADERS
FROM "url"
AS row
MERGE( t: Transaction { transactionId: row.uniqueId })
MERGE(c:Compound {name: row.compound})
MERGE (t)-[r:CONTAINS]->(c)
ON CREATE SET c.category= row.category
ON CREATE SET r.price =row.value
Next I do the aggregation to count total orders for a compound and create property for a node in the following way -
MATCH (c:Compound) <-[:CONTAINS]- (t:Transaction)
with c.name as name, count( distinct t.transactionId) as ord
set c.orders = ord
So far so good. I can accomplish what I want but I have the following 2 questions -
How can I create the orders property for compound node in the first step itself? .i.e. when I am loading the data I would like to perform the aggregation straight away.
For a compound node I am also setting the property for category. Theoretically, it can also be modelled as category -contains-> compound by creating Categorynode. But what advantage will I have if I do it? Because I can execute the queries and get the expected output without creating this additional node.
Thank you for your answer.
I don't think that's possible, LOAD CSV goes over one row at a time, so at row 1, it doesn't know how many more rows will follow.
I guess you could create virtual nodes and relationships, aggregate those and then use those to create the real nodes, but that would be way more complicated. Virtual Nodes/Rels
That depends on the questions/queries you want to ask.
A graph database is optimised for following relationships, so if you often do a query where the category is a criteria (e.g. MATCH (c: Category {category_id: 12})-[r]-(:Compound) ), it might be more performant to create a label for it.
If you just want to get the category in the results (e.g. RETURN compound.category), then it's fine as a property.

Merging a set of nodes onto one (only in query)

I am currently investigating how to model a bitemporal graph in neo4j. Unfortunately noone seems to have publicly undertaken this before.
One particular thing I am looking at is whether I can store in a new node only those values that have changed and then express a query that would merge all those values ordered by a given timestamp:
This creates the data I am playing with:
CREATE (:P1 {id: '1'})<-[:EXPANDS {date:5200, recorded:5100}]-(:P1Data {name:'Joe', wage: 3000})
// New data, recorded 2014-10-1 for 2015-1-1
MATCH (p:P1 {id: '1'}) CREATE (:P1Data { wage:3100 })-[:EXPANDS { date:5479, recorded: 5387}]->(p)
Now, I can get a history for a given point in time so far, e.g. like
MATCH (:P1 { id: '1' })<-[x:EXPANDS]-(d:P1Data)
WHERE x.recorded < 6000
WITH {date: x.date, data:d} as data
RETURN data
ORDER BY data.date DESC
What I would like to achieve is to merge the name and wage values such that I get a whole view of the data at a given point in time. The answer may also be that this is not really possible.
(PS: I say only in query, because I found a refactor function in apoc which does merge nodes, but that procedure actually merges and persists the node, while I would just want to query it).
As with most things, you can do it using REDUCE like so:
MATCH (:P1 { id: '1' })<-[x:EXPANDS]-(d:P1Data)
WITH x.date AS date, d AS data
ORDER BY date
WITH COLLECT(data) AS datas
WITH REDUCE(s = {}, y IN datas|
{name: COALESCE(y.name, s.name),
wage: COALESCE(y.wage, s.wage)})
AS most_recent_fields
RETURN most_recent_fields.name AS name, most_recent_fields.wage AS wage
You can do it in descending order instead (swap s and y inside the COALESCE statements if so), but there isn't really a way to shortcut processing the entire set of results from your queried time back to the start.
UPDATE: This will, of course, generate a Map and not a Node, but if you only want the properties and don't want to create a permanent record, a Map is actually better suited to your needs.
EXTENDED: If you don't want to specify which keys to use, you can do it without REDUCE like this instead:
MATCH (:P1 { id: '1' })<-[x:EXPANDS]-(d:P1Data)
WITH x.date AS date, d AS data
ORDER BY date
WITH COLLECT(data) AS datas
CREATE (t:Temp)
FOREACH(data IN datas|
SET t += data)
DELETE t
RETURN t
This approach does create a node, but if you DELETE it right before you RETURN it, it won't persist at all. += ensures that pre-existing properties aren't removed, only overwritten if the data node has existing values.

Grails 3 - return list in query result from HQL query

I have a domain object:
class Business {
String name
List subUnits
static hasMany = [
subUnits : SubUnit,
]
}
I want to get name and subUnits using HQL, but I get an error
Exception: org.springframework.orm.hibernate4.HibernateQueryException: not an entity
when using:
List businesses = Business.executeQuery("select business.name, business.subUnits from Business as business")
Is there a way I can get subUnits returned in the result query result as a List using HQL? When I use a left join, the query result is a flattened List that duplicates name. The actual query is more complicated - this is a simplified version, so I can't just use Business.list().
I thought I should add it as an answer, since I been doing this sort of thing for a while and a lot of knowledge that I can share with others:
As per suggestion from Yariash above:
This is forward walking through a domain object vs grabbing info as a flat list (map). There is expense involved when having an entire object then asking it to loop through and return many relations vs having it all in one contained list
#anonymous1 that sounds correct with left join - you can take a look at 'group by name' added to end of your query. Alternatively when you have all the results you can use businesses.groupBy{it.name} (this is a cool groovy feature} take a look at the output of the groupBy to understand what it has done to the
But If you are attempting to grab the entire object and map it back then actually the cost is still very hefty and is probably as costly as the suggestion by Yariash and possibly worse.
List businesses = Business.executeQuery("select new map(business.name as name, su.field1 as field1, su.field2 as field2) from Business b left join b.subUnits su ")
The above is really what you should be trying to do, left joining then grabbing each of the inner elements of the hasMany as part of your over all map you are returning within that list.
then when you have your results
def groupedBusinesses=businesses.groupBy{it.name} where name was the main object from the main class that has the hasMany relation.
If you then look at you will see each name has its own list
groupedBusinesses: [name1: [ [field1,field2,field3], [field1,field2,field3] ]
you can now do
groupedBusinesses.get(name) to get entire list for that hasMany relation.
Enable SQL logging for above hql query then compare it to
List businesses = Business.executeQuery("select new map(b.name as name, su as subUnits) from Business b left join b.subUnits su ")
What you will see is that the 2nd query will generate huge SQL queries to get the data since it attempts to map entire entry per row.
I have tested this theory and it always tends to be around an entire page full of query if not maybe multiple pages of SQL query created from within HQL compared to a few lines of query created by first example.

SSRS: Adding a filter that returns information from entire group

I am trying to create a report in SSRS. Below is a small example of what my dataset looks like.
Example Data Set
So, there are three different stores (A,B,C) and each has a landlord (a,b,c). Landlords can pay via three different methods (1,2,3) and the amounts paid per method are shown.
Right now, I have two filters set up. The first is by Store and the second is by Landlord.
What I am having trouble with is:
How can I set up a filter by the Amount that will return information from an entire Store/Landlord?
So for example, if I wanted to filter Amount by 150, I would like to return all the "payment" information for the store(s) that have a payment of 150. Such as the following:
Desired Result
Is it possible to add a filter to return information from the entire group? (Store and Landlord are the group in this case)
I am new to SSRS so any help/insight would be greatly appreciated!
You can use LookUpSet to locate the matching groups, JOIN to put the results in a string and the INSTR function to filter your results.
=IIF(ISNOTHING(Parameters!AMOUNT.Value) OR INSTR(
Join(LOOKUPSET(Fields!Amount.Value, Fields!Amount.Value, Fields!Store.Value, "DataSet1"), ", ") ,
Fields!Store.Value
) > 0, 1, 0)
This translates to:
If the Store value is found (INSTR > 0) in the list (JOIN) of Stores where the Amount is the current Amount (Lookupset).
In your filter, put the above expression in the Expression, change the type to INTEGER and the Value to 1.
[

Search records having comma seperated values that contains any element from the given list

I have a domain class Schedule with a property 'days' holding comma separated values like '2,5,6,8,9'.
Class Schedule {
String days
...
}
Schedule schedule1 = new Schedule(days :'2,5,6,8,9')
schedule1.save()
Schedule schedule2 = new Schedule(days :'1,5,9,13')
schedule2.save()
I need to get the list of the schedules having any day from the given list say [2,8,11].
Output: [schedule1]
How do I write the criteria query or HQL for the same. We can prefix & suffix the days with comma like ',2,5,6,8,9,' if that helps.
Thanks,
Hope you have a good reason for such denormalization - otherwise it would be better to save the list to a child table.
Otherwise, querying would be complicated. Like:
def days = [2,8,11]
// note to check for empty days
Schedule.withCriteria {
days.each { day ->
or {
like('username', "$day,%") // starts with "$day"
like('username', "%,$day,%")
like('username', "%,$day") // ends with "$day"
}
}
}
In MySQL there is a SET datatype and FIND_IN_SET function, but I've never used that with Grails. Some databases have support for standard SQL2003 ARRAY datatype for storing arrays in a field. It's possible to map them using hibernate usertypes (which are supported in Grails).
If you are using MySQL, FIND_IN_SET query should work with the Criteria API sqlRestriction:
http://grails.org/doc/latest/api/grails/orm/HibernateCriteriaBuilder.html#sqlRestriction(java.lang.String)
Using SET+FIND_IN_SET makes the queries a bit more efficient than like queries if you care about performance and have a real requirement to do denormalization.

Resources