sql.rows() in Groovy is running slow

sql.rows() in Groovy is running slow - grails

I'm supporting a Grails web application which shows different visuals for client using AmCharts. On one of the tabs there are three charts which each return the top ten, so only ten rows, from the database based on different measures. It takes 4-5 or sometimes even more time to finish. The query runs on the DB in under 10 seconds.
The following service method is called to return results:
List fetchTopPages(params, Map querySettings, String orderClause) {
if(!((params['country'] && params['country'].size() > 0) || (params['brand'] && params['brand'].size() > 0) || (params['url'] && params['url'].size() > 0))) {
throw new RuntimeException('Filters country or brand or url not selected.')
}
Sql sql = new Sql(dataSource)
sql.withStatement { stmt -> stmt.fetchSize = 100 }
Map filterParams = acquisitionService.getDateFilters(params, querySettings)
ParamUtils.addWhereArgs(params, filterParams)
String query = "This is where the query is"
ParamUtils.saveQueryInRequest(ParamUtils.prettyPrintQuery(query, filterParams))
log.debug("engagement pageviews-by-source query: " + ParamUtils.prettyPrintQuery(query, filterParams))
List rows = sql.rows(query, filterParams)
rows
}
After some investigation it was clear that the List rows = sql.rows(query, filterParams) line is the one that takes up this load time.
Has anyone expreienced this issue before? Why is sql.rows() taking so long when it's only returning 10 rows worth of results, and the query is runnig super fast on the DB side?
Additional info:
DB: FSL1D
Running following command on DB side: java -jar ojdbc5.jar - getversion returns:
"Oracle 11.2.0.3.0 JDBC 3.0 compiled with JDK5 on Thu_Jul_11_15:41:55_PDT_2013
Default Connection Properties Resource
Wed Dec 16 08:18:32 EST 2015"
Groovy Version: 2.3.7
Grails Version: 2.4.41
JDK: 1.7.0

My set up with Groovy Version: 2.3.6 JVM: 1.8.0_11 and Oracle 12.1.0.2.0 using driver ojdbc7.jar
Note the activation of the 10046 trace before the run to allow diagnostics.
import oracle.jdbc.pool.OracleDataSource
def ods = new OracleDataSource();
ods.setURL('url')
ods.setUser('usr')
ods.setPassword('pwd')
def con = ods.getConnection()
def sql = new groovy.sql.Sql(con)
sql.withStatement { stmt -> stmt.fetchSize = 100 }
def SQL_QUERY = """select id, col1 from table1 order by id"""
def offset = 150
def maxRows = 20
// activate trace 10046
con.createStatement().execute "alter session set events '10046 trace name context forever, level 12'"
def t = System.currentTimeMillis()
def rows = sql.rows(SQL_QUERY, offset, maxRows)
println "time1 : ${System.currentTimeMillis()-t} with offset ${offset} and maxRows ${maxRows}"
The examination of the trace shows that the stament is parsed and executed, this means if there is ORDER BY clause, all data is sorted.
The fetch size is used correctly and no more than required records are fetched - here 170 = 150 + 20.
With fetch size 100 this is done in two steps (note the r parameter - number of fetched rows).
FETCH #627590664:c=0,e=155,p=0,cr=5,cu=0,mis=0,r=100,dep=0,og=1,plh=1169613780,tim=3898349818398
FETCH #627590664:c=0,e=46,p=0,cr=0,cu=0,mis=0,r=70,dep=0,og=1,plh=1169613780,tim=3898349851458
So basically the only problem I see that the "skipped" data are passed over the network to the client (to be ignored there).
This could produce with very high offset lot of overhead (and take more time that the same query running interactively producing the first page).
But the best way to identify your problem is simple the enable the 10046 trace and see what going on. I'm using the
level 12 which means you get also information
about the waits in the DB and bind variables.

Related

How to create an update query with Open Office Base?

I want to create basically an update query on Open Office Base (the same way with Ms ACCESS).

Base does not typically use update queries (but see below). Instead, the easiest way to do an update command is to go to Tools -> SQL. Enter something similar to the following, then press Execute:
UPDATE "Table1" SET "Value" = 'BBB' WHERE ID = 0
The other way is to run the command with a macro. Here is an example using Basic:
Sub UpdateSQL
REM Run an SQL command on a table in LibreOffice Base
Context = CreateUnoService("com.sun.star.sdb.DatabaseContext")
databaseURLOrRegisteredName = "file:///C:/Users/JimStandard/Desktop/New Database.odb"
Db = Context.getByName(databaseURLOrRegisteredName )
Conn = Db.getConnection("","") 'username & password pair - HSQL default blank
Stmt = Conn.createStatement()
'strSQL = "INSERT INTO ""Table1"" (ID,""Value"") VALUES (3,'DDD')"
strSQL = "UPDATE ""Table1"" SET ""Value"" = 'CCC' WHERE ID = 0"
Stmt.executeUpdate(strSQL)
Conn.close()
End Sub
Note that the data can also be modified with a form or by editing the table directly.
Under some circumstances it is possible to create an update query. I couldn't get this to work with the default built-in HSQLDB 1.8 engine, but it worked with MYSQL.
In the Queries section, Create Query in SQL View
Click the toolbar button to Run SQL Command directly.
Enter a command like the following:
update mytable set mycolumn = 'This is some text.' where ID = 59;
Hit F5 to run the query.
It gives an error that The data content could not be loaded, but it still performs the update and changes the data. To get rid of the error, the command needs to return a value. For example, I created this stored procedure in MYSQL:
DELIMITER $$
CREATE PROCEDURE update_val
(
IN id_in INT,
IN newval_in VARCHAR(100)
)
BEGIN
UPDATE test_table SET value = newval_in WHERE id = id_in;
SELECT id, value FROM test_table WHERE id = id_in;
END
$$
DELIMITER ;
Then this query in LibreOffice Base modifies the data without giving any errors:
CALL update_val(2,'HHH')
See also:
https://forum.openoffice.org/en/forum/viewtopic.php?f=5&t=75763
https://forum.openoffice.org/en/forum/viewtopic.php?f=61&t=6655
https://ask.libreoffice.org/en/question/32700/how-to-create-an-update-query-in-base-sql/
Modifying table entries from LibreOffice Base, possible?

Grails bulk insert/update optimization

I am importing a large amount of data from a csv file, (file size is over 100MB)
the code i'm using looks like this :
def errorLignes = []
def index = 1
csvFile.toCsvReader(['charset':'UTF-8']).eachLine { tokens ->
if (index % 100 == 0) cleanUpGorm()
index++
def order = Orders.findByReferenceAndOrganization(tokens[0],organization)
if (!order) {
order = new Orders()
}
if (tokens[1]){
def user = User.findByReferenceAndOrganization(tokens[1],organization)
if (user){
order.user = user
}else{
errorLignes.add(tokens)
}
}
if (tokens[2]){
def customer = Customer.findByCustomCodeAndOrganization(tokens[2],organization)
if (customer){
order.customer = customer
}else{
errorLignes.add(tokens)
}
}
if (tokens[3]){
order.orderType = Integer.parseInt(tokens[3])
}
// etc.....................
order.save()
}
and i'm using the cleanUpGorm method to clean session after each 100 entries
def cleanUpGorm() {
println "clean up gorm"
def session = sessionFactory.currentSession
session.flush()
session.clear()
propertyInstanceMap.get().clear()
}
I also turned 2nd level cache off
hibernate {
cache.use_second_level_cache = false
cache.use_query_cache = false
cache.provider_class = 'net.sf.ehcache.hibernate.EhCacheProvider'
}
the grails version of the project is 2.0.4 and as database i am using mysql
for every entry , i am doing 3 calls to a find
to check if the order already exists
to check if user is correct
to check if customer is correct
and finally i'm saving the order instance
the import process is too slow, i am wondering how can I speed up and optimise this code.
EDIT :
I found that the searchable plugin is also making it slower .
so , to get around this , I used the command :
searchableService.stopMirroring()
But it still not fast enough,I am finally changing the code to use groovy sql instead

This found this blog entry very useful:
http://naleid.com/blog/2009/10/01/batch-import-performance-with-grails-and-mysql/
You are already cleaning up GORM, but try cleaning every 100 entries:
def propertyInstanceMap = org.codehaus.groovy.grails.plugins.DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP
propertyInstanceMap.get().clear()
Creating database indexes might help aswell and use default-storage-engine=innodb instead of MyISAM.

I'm also in the process of writing a number of services that will accomplish loads of very large datasets (multiple files of up to ~17million rows each). I initially tried the cleanUpGorm method you use, but found that, whilst it did improve things, the loading was still slow. Here's what I did to make it much faster:
Investigate what it is that is causing the app to actually be slow. I installed the Grails Melody plugin, then did a run-app then opened a browser at /monitoring. I could then see which routines took time to execute and what the worst-performing queries actually were.
Many of the Grails GORM methods map to a SQL ... where ... clause. You need to ensure that you have an index for each item used in a where clause for each query that you want to make faster, otherwise the method will become considerably slower the bigger your dataset is. This includes putting indexes on the id and version columns that are injected into each of your domain classes.
Ensure you have indexes set up for all of your hasMany and belongsTo relationships.
If the performance is still too slow, use Spring Batch. Even if you've never used it before, it should take you no time at all to set up a batch parse of a CSV file to parse into Grails domain objects. I suggest you use the grails-spring-batch plugin to do this and use the examples here to get a working implementation going quickly. It's extremely fast, very configurable and you don't have to mess around with cleaning up the session.

I had used batch insert while insert records, this is much faster than gorm cleanup method. Below example describes you how to implement it.
Date startTime = new Date()
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
(1..50000).each {counter ->
Person person = new Person()
person.firstName = "abc"
person.middleName = "abc"
person.lastName = "abc"
person.address = "abc"
person.favouriteGame = "abc"
person.favouriteActor = "abc"
session.save(person)
if(counter.mod(100)==0) {
session.flush();
session.clear();
}
if(counter.mod(10000)==0) {
Date endTime =new Date()
println "Record inserted Counter =>"+counter+" Time =>"+TimeCategory.minus(endTime,startTime)
}
}
tx.commit();
session.close();

RNeo4j Error: 400 Bad Request

I am not sure why I am getting the error below, but I suppose it's something that I am doing wrong.
First, you can grab my dataset by downloading the file dataset.r from this link and loading it into your session with dget("dataset.r").
In my case, I would do dat = dget("dataset.r").
The code below is what I am using to load data into the Neo4j.
library(RNeo4j)
graph = startGraph("http://localhost:7474/db/data/")
graph$version
# sure that the graph is clean -- you should backup first!!!
clear(graph, input = FALSE)
## ensure the constraints
addConstraint(graph, "School", "unitid")
addConstraint(graph, "Topic", "topic_id")
## create the query
## BE CAREFUL OF WHITESPACE between KEY:VALUE pairs for parameters!!!
query = "
MERGE (s:School {unitid:{unitid},
instnm:{instnm},
obereg:{obereg},
carnegie:{carnegie},
applefeeu:{applfeeu},
enrlft:{enrlft},
applcn:{applcn},
admssn:{admssn},
admit_rate:{admit_rate},
ape:{ape},
sat25:{sat25},
sat75:{sat75} })
MERGE (t:Topic {topic_id:{topic_id},
topic:{topic} })
MERGE (s)-[:HAS_TOPIC {score:{score} }]->(t)
"
for (i in 1:nrow(dat)) {
## status
cat("starting row ", i, "\n")
## run the query
cypher(graph,
query,
unitid = dat$unitid[i],
instnm = dat$instnm[i],
obereg = dat$obereg[i],
carnegie = dat$carnegie[i],
applfeeu = dat$applfeeu[i],
enrlft = dat$enrlt[i],
applcn = dat$applcn[i],
admssn = dat$admssn[i],
admit_rate = dat$admit_rate[i],
ape = dat$apps_per_enroll[i],
sat25 = dat$sat25[i],
sat75 = dat$sat75[i],
topic_id = dat$topic_id[i],
topic = dat$topic[i],
score = dat$score[i] )
} #endfor
I can successfully load the first 49 records of my dataframe dat, but errors out on the 50th row.
This is the error that I recieve:
starting row 50
Show Traceback
Rerun with Debug
Error: 400 Bad Request
{"message":"Node 1477 already exists with label School and property \"unitid\"=[110680]","exception":"CypherExecutionException","fullname":"org.neo4j.cypher.CypherExecutionException","stacktrace":["org.neo4j.cypher.internal.compiler.v2_1.spi.ExceptionTranslatingQueryContext.org$neo4j$cypher$internal$compiler$v2_1$spi$ExceptionTranslatingQueryContext$$translateException(ExceptionTranslatingQueryContext.scala:154)","org.neo4j.cypher.internal.compiler.v2_1.spi.ExceptionTranslatingQueryContext$ExceptionTranslatingOperations.setProperty(ExceptionTranslatingQueryContext.scala:121)","org.neo4j.cypher.internal.compiler.v2_1.spi.UpdateCountingQueryContext$CountingOps.setProperty(UpdateCountingQueryContext.scala:130)","org.neo4j.cypher.internal.compiler.v2_1.mutation.PropertySetAction.exec(PropertySetAction.scala:51)","org.neo4j.cypher.internal.compiler.v2_1.mutation.MergeNodeAction$$anonfun$exec$1.apply(MergeNodeAction.scala:80)","org.neo4j.cypher.internal.compiler.v2_1
Here is my session info:
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-apple-darwin13.1.0 (64-bit)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RNeo4j_1.2.0
loaded via a namespace (and not attached):
[1] RCurl_1.95-4.1 RJSONIO_1.2-0.2 tools_3.1.0
And it's worth noting that I am using Neo4j 2.1.3.
Thanks for any help in advance.

This is an issue with how MERGE works. By setting the score property within the MERGE clause itself here...
MERGE (s)-[:HAS_TOPIC {score:{score} }]->(t)
...MERGE tries to create the entire pattern, and thus your uniqueness constraint is violated. Instead, do this:
MERGE (s)-[r:HAS_TOPIC]->(t)
SET r.score = {score}
I was able to import all of your data after making this change.

Querying local cache first before querying server in Breeze JS

I have an app using Breeze to query the data. I want to first check the local cache and then the server cache if no results are returned (I followed John Papa's SPA jumpstart course). However, I have found a flaw in my logic which I am not sure how to fix. Assuming I have 10 items that match my query.
Situation 1 (which works): I go to list page (Page A) displaying all 10. Hits server as cache is empty and adds all 10 to the cache. Then go to page displaying 1 result (Page B) which is found in the cache. So all good.
Situation 2 (the problem): I go to the page displaying 1 record first (Page B). Then I go to my list page (Page A) which checks the cache and finds 1 record and because of this line ( if (recordsInCache.length > 0)) it exits and only shows that 1 record.
I somehow need to know that there are more records on the server (9) that are NOT in the cache, ie. the total records for this query is actually 10, I have 1 therefore I have to hit server for the other 9.
Here is my query for Page A:
function getDaresToUser(daresObservable, criteria, forceServerCall)
{
var query = EntityQuery.from('Dares')
.where('statusId', '!=', enums.dareStatus.Deleted)
.where('toUserId', '==', criteria.userId)
.expand("fromUser, toUser")
.orderBy('deadlineDate, changedDate');
return dataServiceHelper.executeQuery(query, daresObservable, false, forceServerCall);
}
and here is my query for Page B (single item)
function getDare(dareObservable, criteria, forceServerCall)
{
var query = EntityQuery.from('Dares')
.expand("fromUser, toUser")
.where('dareId', '==', criteria.dareId);
return dataServiceHelper.executeQuery(query, dareObservable, true, forceServerCall);
}
function executeQuery(query, itemsObservable, singleEntity, forceServerCall)
{
//check local cache first
if (!manager.metadataStore.isEmpty() && !forceServerCall)
{
var recordsInCache = executeLocalQuery(query, itemsObservable, singleEntity);
if (recordsInCache.length > 0)
{
callCompleted();
return Q.resolve();
}
}
return manager.executeQuery(query)
.then(querySucceeded)
.fail(queryFailed);
}
function executeLocalQuery(query, itemsObservable, singleEntity)
{
var recordsInCache = manager.executeQueryLocally(query);
if (recordsInCache.length > 0)
{
processQueryResults(recordsInCache, itemsObservable, singleEntity, true);
}
return recordsInCache;
}
Any advice appreciated...

If you want to just hit the server for comparison purposes then at some point (either when loading up your app or when you hit the list page) call inlineCount to compare total on server vs what you already have like shown in this answer stackoverflow.com/questions/16390897/counts-in-breeze-js/…
A way you can use this creatively while you are querying for the single record would be like this -
Set some variable in your view model or somewhere equal to total count
var totalCount = 0;
When you query the single record get the inline count -
var query = EntityQuery.from('Dares')
.expand("fromUser, toUser")
.where('dareId', '==', criteria.dareId)
.inlineCount(true);
and set totalCount = data.inlineCount; Same thing when you get the total items list, just set the totalCount to inlineCount then too so you always know if you have all of the entities.

I’ve been thinking about this problem more in the last year (and have since moved from Durandal + Breeze to Angular + Breeze : In Angular you can cache the service call easily using
return $resource(xyz + params}, {'query': { method:'GET', cache: true, isArray:true }}).query(successArrayDataLoad, errorDataLoad);
I guess Angular caches the params of this query and knows when it has it already. So when I switch this method to use Breeze I lose this functionality and all my List calls are hit on every time.
So the real problem here is List data. Single Entities can always check the local cache and if nothing is returned then check the server (because you expect exactly 1).
However, List data varies by params. For example, if I have a GetGames call which takes in a CreatedByUserId, every time I supply a new CreatedByUserId I have to go back to the server.
So I think what I really need to do here to cache my List calls is to cache the Key for each call which is a combination of the QueryName and the Params.
For example, GetGames1 for UserID 1 and then GetGames2 for UserId 2.
The logic would be: Check the Angular cache to see if this call has been made before in this session. If it has, then check the local cache first. If nothing is returned, check the server.
If it has not, check the server as the local cache MIGHT have some data in it for this query but it's not guaranteed to be the full set.
The only other way around it would be to hit the server each time first to get a count for that Query + Params and then hit the local cache and compare the count, but that is more inefficient.
Thoughts?

How to implement pagination when using amazon Dynamo DB in rails

I want to use amazon Dynamo DB with rails.But I have not found a way to implement pagination.
I will use AWS::Record::HashModel as ORM.
This ORM supports limits like this:
People.limit(10).each {|person| ... }
But I could not figured out how to implement following MySql query in Dynamo DB.
SELECT *
FROM `People`
LIMIT 1 , 30

You issue queries using LIMIT. If the subset returned does not contain the full table, a LastEvaluatedKey value is returned. You use this value as the ExclusiveStartKey in the next query. And so on...
From the DynamoDB Developer Guide.

You can provide 'page-size' in you query to set the result set size.
The response of DynamoDB contains 'LastEvaluatedKey' which will indicate the last key as per the page size. If response does't contain 'LastEvaluatedKey' it means there are no results left to fetch.
Use the 'LastEvaluatedKey' as 'ExclusiveStartKey' while fetching next time.
I hope this helps.
DynamoDB Pagination

Here's a simple copy-paste-run proof of concept (Node.js) for stateless forward/reverse navigation with dynamodb. In summary; each response includes the navigation history, allowing user to explicitly and consistently request either the next or previous page (while next/prev params exist):
GET /accounts -> first page
GET /accounts?next=A3r0ijKJ8 -> next page
GET /accounts?prev=R4tY69kUI -> previous page
Considerations:
If your ids are large and/or users might do a lot of navigation, then the potential size of the next/prev params might become too large.
Yes you do have to store the entire reverse path - if you only store the previous page marker (per some other answers) you will only be able to go back one page.
It won't handle changing pageSize midway, consider baking pageSize into the next/prev value.
base64 encode the next/prev values, and you could also encrypt.
Scans are inefficient, while this suited my current requirement it won't suit all!
// demo.js
const mockTable = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
const getPagedItems = (pageSize = 5, cursor = {}) => {
// Parse cursor
const keys = cursor.next || cursor.prev || [] // fwd first
let key = keys[keys.length-1] || null // eg ddb's PK
// Mock query (mimic dynamodb response)
const Items = mockTable.slice(parseInt(key) || 0, pageSize+key)
const LastEvaluatedKey = Items[Items.length-1] < mockTable.length
? Items[Items.length-1] : null
// Build response
const res = {items:Items}
if (keys.length > 0) // add reverse nav keys (if any)
res.prev = keys.slice(0, keys.length-1)
if (LastEvaluatedKey) // add forward nav keys (if any)
res.next = [...keys, LastEvaluatedKey]
return res
}
// Run test ------------------------------------
const runTest = () => {
const PAGE_SIZE = 6
let x = {}, i = 0
// Page to end
while (i == 0 || x.next) {
x = getPagedItems(PAGE_SIZE, {next:x.next})
console.log(`Page ${++i}: `, x.items)
}
// Page back to start
while (x.prev) {
x = getPagedItems(PAGE_SIZE, {prev:x.prev})
console.log(`Page ${--i}: `, x.items)
}
}
runTest()

I faced a similar problem.
The generic pagination approach is, use "start index" or "start page" and the "page length". 
The "ExclusiveStartKey" and "LastEvaluatedKey" based approach is very DynamoDB specific.
I feel this DynamoDB specific implementation of pagination should be hidden from the API client/UI.
Also in case, the application is serverless, using service like Lambda, it will be not be possible to maintain the state on the server. The other side is the client implementation will become very complex.
I came with a different approach, which I think is generic ( and not specific to DynamoDB)
When the API client specifies the start index, fetch all the keys from
the table and store it into an array.
Find out the key for the start index from the array, which is
specified by the client.
Make use of the ExclusiveStartKey and fetch the number of records, as
specified in the page length.
If the start index parameter is not present, the above steps are not
needed, we don't need to specify the ExclusiveStartKey in the scan
operation.
This solution has some drawbacks -
We will need to fetch all the keys when the user needs pagination with
start index.
We will need additional memory to store the Ids and the indexes.
Additional database scan operations ( one or multiple to fetch the
keys )
But I feel this will be very easy approach for the clients, which are using our APIs. The backward scan will work seamlessly. If the user wants to see "nth" page, this will be possible.

In fact I faced the same problem and I noticed that LastEvaluatedKey and ExclusiveStartKey are not working well especially when using Scan So I solved Like this.
GET/?page_no=1&page_size=10 =====> first page
response will contain count of records and first 10 records
retry and increase number of page until all record come.
Code is below
PS: I am using python
first_index = ((page_no-1)*page_size)
second_index = (page_no*page_size)
if (second_index > len(response['Items'])):
second_index = len(response['Items'])
return {
'statusCode': 200,
'count': response['Count'],
'response': response['Items'][first_index:second_index]
}

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart