Why sql.rows Groovy method is so slow - grails

I tried to fetch some data with the sql.rows() Groovy method and it took a very long time to return the values.
So I tried the "standard" way and it's much much faster (150 times faster).
What am I missing ?
Look at the code below : the first method returns results in about 2500ms and the second in 15 ms !
class MyService {
javax.sql.DataSource dataSource
def SQL_QUERY = "select M_FIRSTNAME as firstname, M_LASTNAME as lastname, M_NATIONALITY as country from CT_PLAYER order by M_ID asc";
def getPlayers1(int offset, int maxRows)
{
def t = System.currentTimeMillis()
def sql = new Sql(dataSource)
def rows = sql.rows(SQL_QUERY, offset, maxRows)
println "time1 : ${System.currentTimeMillis()-t}"
return rows
}
def getPlayers2(int offset, int maxRows)
{
def t = System.currentTimeMillis();
Connection connection = dataSource.getConnection();
Statement statement = connection.createStatement();
statement.setMaxRows(offset + maxRows -1);
ResultSet resultSet = statement.executeQuery(SQL_QUERY);
def l_list =[];
if(resultSet.absolute(offset)) {
while (true) {
l_list << [
'firstname':resultSet.getString('firstname'),
'lastname' :resultSet.getString('lastname'),
'country' :resultSet.getString('country')
];
if(!resultSet.next()) break;
}
}
resultSet.close()
statement.close()
connection.close()
println "time2 : ${System.currentTimeMillis()-t}"
return l_list
}

When you call sql.rows, groovy eventually calls SqlGroovyMethods.toRowResult for each row returned by the resultSet.
This method interrogates the ResultSetMetaData for the resultSet each time to find the column names, and then fetches the data for each of these columns from the resultSet into a Map which it adds to the returned List.
In your second example, you directly get the columns required by name (as you know what they are), and avoid having to do this lookup every row.

I think I found the reason this method is so slow : statement.setMaxRows() is never called !
That means that a lot of useless data is sent by the database (when you want to see the first pages of a large datagrid)

I wonder how your tests would turn out if you try with setFetchSize instead of setMaxRows. A lot of this has to the underlying JDBC Driver's default behavior.

Related

Insert 10,000,000+ rows in grails

I've read a lot of articles recently about populating a grails table from huge data, but seem to have hit a ceiling. My code is as follows:
class LoadingService {
def sessionFactory
def dataSource
def propertyInstanceMap = org.codehaus.groovy.grails.plugins.DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP
def insertFile(fileName) {
InputStream inputFile = getClass().classLoader.getResourceAsStream(fileName)
def pCounter = 1
def mCounter = 1
Sql sql = new Sql(dataSource)
inputFile.splitEachLine(/\n|\r|,/) { line ->
line.each { value ->
if(value.equalsIgnoreCase('0') {
pCounter++
return
}
sql.executeInsert("insert into Patient_MRNA (patient_id, mrna_id, value) values (${pCounter}, ${mCounter}, ${value.toFloat()})")
pCounter++
}
if(mCounter % 100 == 0) {
cleanUpGorm()
}
pCounter = 1
mCounter++
}
}
def cleanUpGorm() {
session.currentSession.clear()
propertyInstanceMap.get().clear()
}
}
I have disabled secondary cache, I'm using assigned ids, and I am explicitly handling this many to many relationship through a domain, not the hasMany and belongsTo.
My speed has increased monumentally after applying these methods, but after a while the inserts slow down to the point of almost stopping compared to about 623,000 per minute at the beginning.
Is there some other memory leak that I should be aware of or have I just hit the ceiling in terms of batch inserts in grails?
To be clear it takes about 2 minutes to insert 1.2 million rows, but then they start to slow down.
Try doing batch inserts, it's much more efficient
def updateCounts = sql.withBatch { stmt ->
stmt.addBatch("insert into TABLENAME ...")
stmt.addBatch("insert into TABLENAME ...")
stmt.addBatch("insert into TABLENAME ...")
...
}
I have fought with this in earlier versions of Grails. Back then I resorted to either simply run the batch manually in proper chunks or use another tool for the batch import, such as Pentaho Data Integration (or other ETL tool or DIY).

Concurrency Issue Grails

I am using this code for updating a row.
SequenceNumber.withNewSession {
def hibSession = sessionFactory.getCurrentSession()
Sql sql = new Sql(hibSession.connection())
def rows = sql.rows("select for update query");
}
in this query I am updating the number initially sequenceNumber is 1200.
and every time this code run then it will b increamented by 1.
and I am running this code 5 times in loop.
but this is not flushing the hibernate session so that every time I am getting the same number 1201.
I have also used
hibSession.clear()
hibSession.flush()
but no success.
If I use following code then its works fine.
SequenceNumber.withNewSession {
Sql sql = new Sql(dataSource)
def rows = sql.rows("select for update query")
}
every time I am getting a new number.
Can anybody tell me what's wrong with above code
Try with Transaction, + flush it on the end, like:
SequenceNumber.withTransaction { TransactionStatus status ->
...
status.flush()
}

XmlSlurper parsing a query result

I have a query that bring back a cell in my table the has all xml in it. I have it so I can spit out what is in the cell without any delimiters. Now i need to actually take each individual element and link them with my object. Is there any easy way to do this?
def sql
def dataSource
static transactional = true
def pullLogs(String username, String id) {
if(username != null && id != null) {
sql = new Sql(dataSource)
println "Data source is: " + dataSource.toString()
def schema = dataSource.properties.defaultSchema
sql.query('select USERID, AUDIT_DETAILS from DEV.AUDIT_LOG T WHERE XMLEXISTS(\'\$s/*/user[id=\"' + id + '\" or username=\"'+username+'\"]\' passing T.AUDIT_DETAILS as \"s\") ORDER BY AUDIT_EVENT', []) { ResultSet rs ->
while (rs.next()) {
def auditDetails = new XmlSlurper().parseText(rs.getString('AUDIT_EVENT_DETAILS'))
println auditDetails.toString
}
}
sql.close()
}
}
now this will give me that cell with those audit details in it. Bad thing is that is just puts all the information from the field in on giant string without the element tags. How would I go through and assign the values to a object. I have been trying to work with this example http://gallemore.blogspot.com/2008/04/groovy-xmlslurper.html with no luck since that works with a file.
I have to be missing something. I tried running another parseText(auditDetails) but haven't had any luck on that.
Any suggestions?
EDIT:
The xml int that field looks like
<user><username>scottsmith</username><timestamp>tues 5th 2009</timestamp></user>
^ simular to how it is except mine is ALOT longer. It comes out as "scottsmithtue 5th 2009" so on and so forth. I need to actually take those tags and link them to my object instead of just printing them in one conjoined string.
Just do
auditDetails.username
Or
auditDetails.timestamp
To access the properties you require

gorm projection and loss of metainformation

When using projection on the properties, the result is returned as the list with the elements in the same sequence as that defined in the projections block. At the same time the property names are missing from the list and that is really disadvantageous to the developer as the result would be passed along and the caller needs to know what value belongs to which property. Is there a way to return a map from the Criteria query with property name as the key to the value?
so, the following code:
def c = Trade.createCriteria()
def remicTrades = c.list {
projections {
property('title', 'title')
property('author.name', 'author')
}
def now = new Date()
between('publishedDate', now-365, now)
}
This returns:
[['book1', 'author1']['book2', 'author2']]
Instead I would like it to return:
[[book:'book1', author:'author1'][book:'book2', author:'author2']]
I know I can arrange this way after getting the result but I earnestly feel that the property alias should have been used by the criteria to return a list of map that mimics the result of the SQL query and not a bland list.
Duplicate: Grails queries with criteria: how to get back a map with column?
And the corresponding answer (and solution): https://stackoverflow.com/a/16409512/1263227
Use resultTransformer.
import org.hibernate.criterion.CriteriaSpecification
Trade.withCriteria {
resultTransformer(CriteriaSpecification.ALIAS_TO_ENTITY_MAP)
projections {
property('title', 'title')
property('author.name', 'author')
}
def now = new Date()
between('publishedDate', now-365, now)
}
Agree with your question reasoning, this really should be part of the core GORM solution. That said, here's my workaround;
def props = ['name','phone']
def query = Person.where {}.projections {
props.each{
property(it)
}
}
def people = query.list().collect{ row->
def cols = [:]
row.eachWithIndex{colVal, ind->
cols[props[ind]] = colVal
}
cols
}
println people // shows [['name':'John','phone':'5551212'],['name':'Magdalena','phone':'5552423']]

list.find(closure) and executing against that value

Really my question is "Can the code sample below be even smaller? Basically the code sample is designed to first look through a list of objects, find the most granular (in this case it is branch) and then query backwards depending on what object it finds.
1 - If it finds a branch, return the findAllBy against the branch
2 - If it finds a department, return the findAllBy against the department
3 - If it finds an organization, return the findAllBy against the organization
The goal is to find the most granular object (which is why order is important), but do I need to have two separate blocks (one to define the objects, the other to check if they exist)? Or can those two executions be made into one command...
def resp
def srt = [sort:"name", order:"asc"]
def branch = listObjects.find{it instanceof Branch}
def department = listObjects.find{it instanceof Department}
def organization = listObjects.find{it instanceof Organization}
resp = !resp && branch ? Employees.findAllByBranch(branch,srt) : resp
resp = !resp && department ? Employees.findAllByDepartment(department,srt) : resp
resp = !resp && organization ? Employees.findAllByOrganization(organization,srt) : resp
return resp
What I'm thinking is something along the lines of this:
def resp
resp = Employees.findAllByBranch(listObjects.find{it instanceof Branch})
resp = !resp ? Employees.findAllByDepartment(listObjects.find{it instanceof Department}) : resp
resp = !resp ? Employees.findAllByOrganization(listObjects.find{it instanceof Organization}) : resp
But I believe that will throw an exception since those objects might be null
You can shorten it up a bit more with findResult instead of a for in loop with a variable you need to def outside:
def listObjects // = some predetermined list that you've apparently created
def srt = [sort:"name", order:"asc"]
def result = [Branch, Department, Organization].findResult { clazz ->
listObjects?.find { it.class.isAssignableFrom(clazz) }?.with { foundObj ->
Employees."findAllBy${clazz.name}"(foundObj, srt)
}
}
findResult is similar to find, but it returns the result from the first non-null item rather than the item itself. It avoids the need for a separate collection variable outside of the loop.
Edit: what I had previously didn't quite match the behavior that I think you were looking for (I don't think the other answers do either, but I could be misunderstanding). You have to ensure that there's something found in the list before doing the findAllBy or else you could pull back null items which is not what you're looking for.
In real, production code, I'd actually do things a bit differently though. I'd leverage the JVM type system to only have to spin through the listObjects once and short circuit when it found the first Branch/Department/Organization like this:
def listObjects
def sort = [sort:"name", order:"asc"]
def result = listObjects?.findResult { findEmployeesFor(it, sort) }
... // then have these methods to actually exercise the type specific findEmployeesFor
def findEmployeesFor(Branch branch, sort) { Employees.findAllByBranch(branch, sort) }
def findEmployeesFor(Department department, sort { Employees.findAllByDepartment(department, sort)}
def findEmployeesFor(Organization organization, sort { Employees.findAllByOrganization(organization, sort)}
def findEmployeesFor(Object obj, sort) { return null } // if listObjects can hold non/branch/department/organization objects
I think that this code is actually clearer and it reduces the number of times we iterate over the list and the number of reflection calls we need to make.
Edit:
A for in loop is more efficient, since you want to break processing on first non-null result (i.e. in Groovy we cannot break out of a closure iteration with "return" or "break").
def resp
for(clazz in [Branch,Department,Organization]) {
resp = Employees."findAllBy${clazz.name}"(listObjects?.find{it instanceof $clazz})
if(resp) return
}
if(resp) // do something...
Original:
List results = [Branch,Department,Organization].collect{clazz->
Employees."findAllBy${clazz.name}"(listObjects?.find{it instanceof $clazz})
}
Enjoy Groovy ;--)
I think #virtualeyes nearly had it, but instead of a collect (which as he says you can't break out of), you want to use a find, as that stops running the first valid result it gets:
List results = [Branch,Department,Organization].find { clazz->
Employees."findAllBy${clazz.name}"(listObjects?.find{it instanceof clazz})
}

Resources