graph builder of dse configuration - datastax-enterprise

It seems like messing with the configuration, like trying to tune the number of thread readers for vertices & edges, cause a lot of unexplained exceptions, also there is an issue with trying to set the batch size.
It seems to only work with the default settings produced by the executable.
I've got a lot of exception, while trying "to play" with those values,
some of them are
[1] CAS exception of cassandra, something about the inability to create more partition keys.
[2] Cassandra timeout during write query at consistency ONE
and more.
As there is no reference within the documentation about how to solve those issues, I don't know how to continue. It seems like everything there is very delicate and shaky, so any little change causes tons of exceptions.
This is for using graph-loader-6.0.1 and dse 6.0.0 or dse 6.0.1
For example :
com.datastax.dsegraphloader.exception.LoadingException: com.datastax.driver.core.exceptions.InvalidQueryException: Resource limited exceeded on added vertices, properties and edges. Maximum of 100000 allowed. Please split transaction into multiple smaller ones and retry.
This is what I get when I try to use some config.
this is the groovy file for configuration :
// config
config preparation: false
config create_schema: false
config load_new: true
config load_edge_threads: 5
config load_vertex_threads: 5
config batch_size: 5000
// orders
inputfiledir = '/home/dseuser/'
profileInput = File.text(inputfiledir + "soc-pokec-profiles.txt").
delimiter("\t").header('user_id','public','completion_percentage','gender','region','last_login','registration','age',
'body','I_am_working_in_field','spoken_languages','hobbies','I_most_enjoy_good_food','pets','body_type',
'my_eyesight','eye_color','hair_color','hair_type','completed_level_of_education','favourite_color',
'relation_to_smoking','relation_to_alcohol','sign_in_zodiac','on_pokec_i_am_looking_for','love_is_for_me',
'relation_to_casual_sex','my_partner_should_be','marital_status','children','relation_to_children','I_like_movies',
'I_like_watching_movie','I_like_music','I_mostly_like_listening_to_music','the_idea_of_good_evening',
'I_like_specialties_from_kitchen','fun','I_am_going_to_concerts','my_active_sports','my_passive_sports','profession',
'I_like_books','life_style','music','cars','politics','relationships','art_culture','hobbies_interests',
'science_technologies','computers_internet','education','sport','movies','travelling','health','companies_brands',
'holder1','holder2')
relationInput = File.text(inputfiledir + "soc-pokec-relationships.txt").
delimiter("\t").header('auser','buser')
profileInput = profileInput.transform {
if (it['completion_percentage'] == 'null') { it.remove('completion_percentage')};
if (it['gender'] == 'null') { it.remove('gender')};
if (it['last_login'] == 'null') { it.remove('last_login')};
if (it['registration'] == 'null') { it.remove('registration')};
if (it['age'] == 'null') { it.remove('age')};
it
}
load(profileInput).asVertices {
label "user"
key "user_id"
}
load(relationInput).asEdges {
label "relation"
outV "auser", {
label "user"
key "user_id"
}
inV "buser", {
label "user"
key "user_id"
}
}
I tried to use the soc-pokec (social network) from stanford (available in web).
I had to loose most of the config to solve the issue.
Note that there is totally no correlation between the numbers in the exception, and the settings I mad in the config.

Related

Comparing strings in Apache Velocity templates (AWS AppSync)

I am wanting to compare strings and am unable to. Consider this example:
#set($foo = "a")
#set($bar = "a")
#if($foo == $bar) // Not the same
#if($foo == $bar.toString()) // The same
#if($foo.toString() == $bar) // The same
If I cast a single one then it matches?
The example on Apache site shows similar use (though there is a typo saying neither choice will ever match)
#set ($foo = "deoxyribonucleic acid")
#set ($bar = "ribonucleic acid")
#if ($foo == $bar)
In this case it's clear they aren't equivalent. So...
#else
They are not equivalent and this will be the output.
#end
They do mention casting as string when items are of diff class of course but this is not my case.
What's going on? I am doing this on Serverless Framework using the AppSync Local plugin. Could the issue be there?
Update Potentially a bug in awsutils offline app sync package. Bug report filed. Stay tuned.

Neo4j+PopotoJS: filter graph based-on predefined constraints

I have a question about the query based on the predefined constraints in PopotoJs. In this example, the graph can be filtered based on the constraints defined in the search boxes. The sample file in this example visualizations folder, constraint is only defined for "Person" node. It is specified in the sample html file like the following:
"Person": {
"returnAttributes": ["name", "born"],
"constraintAttribute": "name",
// Return a predefined constraint that can be edited in the page.
"getPredefinedConstraints": function (node) {
return personPredefinedConstraints;
},
....
In my graph I would like to apply that query function for more than one node. For example I have 2 nodes: Contact (has "name" attribute) and Delivery (has "address" attribute)
I succeeded it by defining two functions for each nodes. However, I also had to put two search box forms with different input id (like constraint1 and constraint2). And I had to make the queries in the associated search boxes.
Is there a way to make queries which are defined for multiple nodes in one search box? For example searching Contact-name and/or Delivery-adress in the same search box?
Thanks
First I’d like to specify that the predefined constraints feature is still experimental (but fully functional) and doesn’t have any documentation yet.
It is intended to be used in configuration to filter data displayed in nodes and in the example the use of search boxes is just to show dynamically how it works.
A common use of this feature would be to add the list of predefined constraint you want in the configuration for every node types.
Let's take an example:
With the following configuration example the graph will be filtered to show only Person nodes having "born" attribute and only Movie nodes with title in the provided list:
"Person": {
"getPredefinedConstraints": function (node) {
return ["has($identifier.born)"];
},
...
}
"Movie": {
"getPredefinedConstraints": function (node) {
return ["$identifier.title IN [\"The Matrix\", \"The Matrix Reloaded\", \"The Matrix Revolutions\"]"];
},
...
}
The $identifier variable is then replaced during query generation with the corresponding node identifier. In this case the generated query would look like this:
MATCH (person:`Person`) WHERE has(person.born) RETURN person
In your case if I understood your question correctly you are trying to use this feature to implement a search box to filter the data. I'm still working on that feature but it won't be available soon :(
This is a workaround but maybe it could work in your use case, you could keep the search box value in a variable:
var value = d3.select("#constraint")[0][0].value;
inputValue = value;
Then use it in the predefined constraint of all the nodes type you want.
In this example Person will be filtered based on the name attribute and Movie on title:
"Person": {
"getPredefinedConstraints": function (node) {
if (inputValue) {
return ["$identifier.name =~ '(?i).*" + inputValue + ".*'"];
} else {
return [];
}
},
...
}
"Movie": {
"getPredefinedConstraints": function (node) {
if (inputValue) {
return ["$identifier.title =~ '(?i).*" + inputValue + ".*'"];
} else {
return [];
}
},
...
}
Everything is in the HTML page of this example so you can view the full source directly on the page.
#Popoto, thanks for the descriptive reply. I tried your suggestion and it worked pretty much well. With the actual codes, when I make a query it was showing only the queried node and make the other node amount zero. I wanted to make a query which queries only the related node while the number of other nodes are still same.
I tried a temporary solution for my problem. What I did is:
Export the all the node data to JSON file, search my query constraint in the exported JSONs, if the file is existing in JSON, then run the query in the related node; and if not, do nothing.
With that way, of course I needed to define many functions with different variable names (as much as the node amount). Anyhow, it is not a propoer way, bu it worked for now.

Grails bulk insert/update optimization

I am importing a large amount of data from a csv file, (file size is over 100MB)
the code i'm using looks like this :
def errorLignes = []
def index = 1
csvFile.toCsvReader(['charset':'UTF-8']).eachLine { tokens ->
if (index % 100 == 0) cleanUpGorm()
index++
def order = Orders.findByReferenceAndOrganization(tokens[0],organization)
if (!order) {
order = new Orders()
}
if (tokens[1]){
def user = User.findByReferenceAndOrganization(tokens[1],organization)
if (user){
order.user = user
}else{
errorLignes.add(tokens)
}
}
if (tokens[2]){
def customer = Customer.findByCustomCodeAndOrganization(tokens[2],organization)
if (customer){
order.customer = customer
}else{
errorLignes.add(tokens)
}
}
if (tokens[3]){
order.orderType = Integer.parseInt(tokens[3])
}
// etc.....................
order.save()
}
and i'm using the cleanUpGorm method to clean session after each 100 entries
def cleanUpGorm() {
println "clean up gorm"
def session = sessionFactory.currentSession
session.flush()
session.clear()
propertyInstanceMap.get().clear()
}
I also turned 2nd level cache off
hibernate {
cache.use_second_level_cache = false
cache.use_query_cache = false
cache.provider_class = 'net.sf.ehcache.hibernate.EhCacheProvider'
}
the grails version of the project is 2.0.4 and as database i am using mysql
for every entry , i am doing 3 calls to a find
to check if the order already exists
to check if user is correct
to check if customer is correct
and finally i'm saving the order instance
the import process is too slow, i am wondering how can I speed up and optimise this code.
EDIT :
I found that the searchable plugin is also making it slower .
so , to get around this , I used the command :
searchableService.stopMirroring()
But it still not fast enough,I am finally changing the code to use groovy sql instead
This found this blog entry very useful:
http://naleid.com/blog/2009/10/01/batch-import-performance-with-grails-and-mysql/
You are already cleaning up GORM, but try cleaning every 100 entries:
def propertyInstanceMap = org.codehaus.groovy.grails.plugins.DomainClassGrailsPlugin.PROPERTY_INSTANCE_MAP
propertyInstanceMap.get().clear()
Creating database indexes might help aswell and use default-storage-engine=innodb instead of MyISAM.
I'm also in the process of writing a number of services that will accomplish loads of very large datasets (multiple files of up to ~17million rows each). I initially tried the cleanUpGorm method you use, but found that, whilst it did improve things, the loading was still slow. Here's what I did to make it much faster:
Investigate what it is that is causing the app to actually be slow. I installed the Grails Melody plugin, then did a run-app then opened a browser at /monitoring. I could then see which routines took time to execute and what the worst-performing queries actually were.
Many of the Grails GORM methods map to a SQL ... where ... clause. You need to ensure that you have an index for each item used in a where clause for each query that you want to make faster, otherwise the method will become considerably slower the bigger your dataset is. This includes putting indexes on the id and version columns that are injected into each of your domain classes.
Ensure you have indexes set up for all of your hasMany and belongsTo relationships.
If the performance is still too slow, use Spring Batch. Even if you've never used it before, it should take you no time at all to set up a batch parse of a CSV file to parse into Grails domain objects. I suggest you use the grails-spring-batch plugin to do this and use the examples here to get a working implementation going quickly. It's extremely fast, very configurable and you don't have to mess around with cleaning up the session.
I had used batch insert while insert records, this is much faster than gorm cleanup method. Below example describes you how to implement it.
Date startTime = new Date()
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
(1..50000).each {counter ->
Person person = new Person()
person.firstName = "abc"
person.middleName = "abc"
person.lastName = "abc"
person.address = "abc"
person.favouriteGame = "abc"
person.favouriteActor = "abc"
session.save(person)
if(counter.mod(100)==0) {
session.flush();
session.clear();
}
if(counter.mod(10000)==0) {
Date endTime =new Date()
println "Record inserted Counter =>"+counter+" Time =>"+TimeCategory.minus(endTime,startTime)
}
}
tx.commit();
session.close();

How to use must_not with an empty JSON attribute with ElasticSearch + Grails?

I'm using Grails plugin to work with ElasticSearch over MySQL. I have a domain column mapped in my domain class as follows:
String updateHistoryJSON
(...)
static mapping = {
updateHistoryJSON type: 'text', column: 'update_history'
}
In MySQL, this basically maps to a TEXT column, which purpose is to store JSON content.
So, in both DB and ElasticSearch index, I have 2 instances:
- instance 1 has updateHistoryJSON = '{"zip":null,"street":null,"name":null,"categories":[],"city":null}'
- instance 2 has updateHistoryJSON = '{}'
Now, what I need is an ElasticSearch query that returns only instance 2.
I've been doing a closure like this, using Groovy DSL:
{
bool {
must_not = term(updateHistoryJSON: "{}")
minimum_should_match = 1
}
}
And ElasticSearch seems to ignore it, it keeps bringing back both instances.
On the other hand, if I use a filter like "missing":{"field":"updateHistoryJSON"}, it gives back no documents. The same goes for "exists": {"field":"updateHistoryJSON"}.
Any idea about what am I doing wrong here?
I'm still not sure about what was the problem, but at least I found a workaround.
Since the search based on updateHistoryJSON contents was not working, I decided to use a script to search based on updateHistoryJSON contents size, meaning, instead of looking for documents that had a non-empty JSON, I just look for documents which updateHistoryJSON size is greater than 2 ({} == size 2).
The closure I used is like this:
{script = {
script = "doc['updateHistoryJSON'].size() > 2"
}

How to setup service method caching in grails

My application has a couple of services that make external calls via httpClient (GET and POST) that are unlikely to change in months, but they are slow; making my application even slower.
Clarification: this is NOT about caching GORM/hibernate/queries to my db.
How can I cache these methods (persistence on disk gets bonus points...) in grails 2.1.0?
I have installed grails-cache-plugin but it doesn't seem to be working, or i configured it wrong (very hard to do since there are 2-5 lines to add only, but i've managed to do it in the past)
I also tried setting up an nginx proxy cache in front of my app, but when i submit one of my forms with slight changes, I get the first submission as result.
Any suggestions/ideas will be greatly appreciated.
EDIT: Current solution (based on Marcin's answer)
My config.groovy: (the caching part only)
//caching
grails.cache.enabled = true
grails.cache.clearAtStartup = false
grails.cache.config = {
defaults {
timeToIdleSeconds 3600
timeToLiveSeconds 2629740
maxElementsInMemory 1
eternal false
overflowToDisk true
memoryStoreEvictionPolicy 'LRU'
}
diskStore {
path 'cache'
}
cache {
name 'scoring'
}
cache {
name 'query'
}
}
The important parts are:
do not clear at startup (grails.cache.clearAtStartup = false)
overflowToDisk=true persists all results over maxElementsInMemory
maxElementsInMemory=1 reduced number of elements in memory
'diskStore' should be writable by the user running the app.
Grails Cache Plugin works quite well for me under Grails 2.3.11. Documentation is pretty neat, but just to show you a draft...
I use the following settings in Config.groovy:
grails.cache.enabled = true
grails.cache.clearAtStartup = true
grails.cache.config = {
defaults {
maxElementsInMemory 10000
overflowToDisk false
maxElementsOnDisk 0
eternal true
timeToLiveSeconds 0
}
cache {
name 'somecache'
}
}
Then, in the service I use something like:
#Cacheable(value = 'somecache', key = '#p0.id.toString().concat(#p1)')
def serviceMethod(Domain d, String s) {
// ...
}
Notice the somecache part is reused. Also, it was important to use String as key in my case. That's why I used toString() on id.
The plugin can be also set up to use disk storage, but I don't use it.
If it doesn't help, please provide more details on your issue.
This may not help, but if you upgrade the application to Grails 2.4.x you can use the #Memoize annotation. This will automagically cache the results of each method call based upon the arguments passed into it.
In order to store this "almost static" information you could use Memcached or Redis as a cache system. (There are many others)
This two cache systems allows you to store key-value data (in your case something like this "key_GET": JSON,XML,MAP,String ).
Here is a related post: Memcached vs. Redis?
Regards.

Resources