What is the default cursor location? - grails

My book collection is getting really big. When I get a list like def bookList = Book.findAllBy...() for example, is GORM returning a client-side cursor or a server-side cursor? I am using MariaDB.

GORM doesn't use cursors at all, or at least not by default. (I don't think they're supported but not 100% sure on that. Looks like they're barely supported in Hibernate.)
What you'll get is essentially all queried objects in memory. You can control which fields are lazily loaded though, for references to other objects.
Depending on what you mean by "really big" it may be wise for you do do pagination and offset yourself.

Related

Tinkerpop Blueprints Vertex Query

I've been researching the Tinkerpop stack for quite a while. I think I have a good idea of what it can do and what databases it works well with. I've got a couple of different databases I'm thinking about right now, but haven't decided on a definite. So I've decided to write my code purely to the interfaces, and not take into account any implementation right now. Out of the databases I'm looking at, they implement TransactionalGraph and KeyIndexableGraph. I think that's good enough for what I need, but I have just one question.
I have different 'classes' of vertices. Using Blueprints, I believe that's best representable by having a field in each vertex containing the class name. Doing that, I can do something like graph.getVertices("classname", "User") and it would give me all of the user vertices. And since the getVertices function specifies that an implementation should make use of indexes, I'm guaranteed to get a fast lookup (if I index that field).
But let's say that I wanted to retrieve a vertex based on two properties. The vertex must have className=Users and username=admin. What's the best way to go about finding that single vertex? And is it possible to index over both of those properties, even though not all vertices will have a username field?
FYI - The databases I'm currently thinking of are OrientDB, Neo4j and Titan, but I haven't decided for sure yet. I'm also currently planning to use Gremlin if that helps at all.
Using a "class" or a "type" for vertices is a good way to segment them. Doing:
graph.createKeyIndex("classname",Vertex.class);
graph.getVertices("classname", "User");
is a pretty common pattern and should generally yield a fast lookup, though iterating an index of tens of millions of users might not be so great (if you intend to grow a particular classname to very big size). I think that leads to the second part of your question, in regards to doing a two property lookup.
Taking your example on the surface, the two element lookup would be something like (using Gremlin):
g.V('classname',"User").has('username','admin')
So, you narrow the vertices to just "User" vertices with a key index and then filter those for "admin". But, I'd model this differently. It would be even less expensive to simply do:
graph.createKeyIndex("username",Vertex.class);
graph.getVertices("username", "admin");
or in Gremlin:
g.V('username','admin')
If you know the username you want, there's no better/faster way to model this. You really only need the classname if you want to iterate over all "User" vertices. If you just want to find one (or a set of vertices with that username) then key indexing on that property is the better way.
Even if I don't create a key index on it, I still include a type or classname property on all vertices. I find it helpful in global operations where I may or may not care about speed, but just need an answer.
graph.getVertices() will iterate through all vertexes and look for ones with that property if you do not have the auto-index turned on in your graph implementation. If you already have data and cannot just turn on the auto-indexer, you should use is index = indexableGraph.getIndex() and then index.get('classname', 'User')
It's possible to perform a query over multiple objects, but without specifics, it's hard to say. For Neo4j they use Lucene, which means that query() will take a lucene query, such as className:Users AND username:admin, but I cannot speak for the others.
Yeah of those DB is good for playing with, I personally found neo4j to be the easiest, and as long as you understand their licensing structure, you shouldn't have any problems using them.

Trimming BOLD_CLOCKLOG table

I am doing some maintenance on a database for an application that uses the Bold for Delphi object persistence framework. This database has been been in production for several years and several of the tables have grown quite large. One of them is the BOLD_CLOCKLOG which has something to do with Bold's transaction management.
I want to trim this table (it is up to 1.2GB, with entries from Jan 2006).
Can anyone confirm the system does not need this old information?
From the bolds documentation:
BOLD_CLOCKLOG
To be able to map the transaction numbers used in the TimeStamp columns to the corresponding physical time (such as 2001-01-01 12:34) the persistence mapper will store a log with timestamps and times. Normally, this log is written for each database operation, but if the traffic to the database is very intensive, it is possible to restrict how often this log is written by setting the property ClockLogGranularity. The event OnGetCurrentTime should also be implemented to ensure that all clients have the same time.The usage of this table can be controlled with the tagged value: Model.UseClockLog
So I believe this is used for versioning Boldobjects, see Object Versioning Extension in bolds documentation. If your application don't need this you can drop this in the database.
In our Bold application we don't use that feature. Why don't simply test to turn off Bold_ClockLog in the model, drop that big table and try to use your application. I'm pretty sure if something is wrong then it say so at once.
I can also mention that we have an own custom objecthistoy. It is simply big string (as TStringList.DelimetedText) in a ObjectHistory class that have Time, user and a note about action. This suits our need better that Bolds builtin objecthistory. The disadvantage is of course that we need to add calls in the code when logging to history is done.
Bold_ClockLog is an optional table, it's purpose is to store mapping between integer timestamps and corresponding DateTime values.
This allows you to find out datetime of the last modification to any object.
If you don't need this feature feel free to empty the table, it won't cause any problems.
In addition to Bold_ClockLog, the Bold_XFiles is another optional table that tends to grow large. But unlike the Bold_ClockLog the Bold_XFiles can not be emptied.
Both of these tables can be turned on/off in the model tag values.

How to change model lazyloadness at runtime in Symfony?

I use sfPropelORMPlugin.
Lazyload is ok if I operate on one object per web page. But if there are hundreds I get hundreds of separate DB queries. I'd like to completely disable lazyload or disable it for needed columns on those particularly heavy pages but couldn't find a way so far.
You should join all your relations when you build your query, that way you'll get all data in a single query. Note, you have to use joinWithRelation() where Relation is a related table name.
Elaborating on William Durand's answer, perhaps you should also look at the Propel function doSelectjoinAll(), which should pre-load all of the objects related to your relations. Just keep in mind this can be expensive as it relates to memory.
Another technique is to create a custom criteria with your needed joins, then use a manual hydrate technique to add on to your base object. I do this often when the data I need is using aggregates or other columns that are not exactly mapped to objects. There are plenty of hydrate() examples around.
Added utility method to peer to be able to set what columns I want to load. Using "pseudo columns" for this type of DB queries. Also I have overridden hydrate() to understand this "markup". All were good until I found out that even though data is hydrated symfony won't understand it and won't let you use it as intended.
PS join was never considered as an option because site is kind of high load.

Custom Listbox: Limit Maximum Item Count

I have silverlight 3.0 project that has a listbox that is databound to a list of items. What I want to do is limit the number of items displayed in the listbox to be <= 10. I originally accomplished this by limiting the data bound to the list to 10 items by doing a .Take(10) on my orignal data and databinding the result.
The problem w/ the .Take(10) approach is that the original datasource may change and since .Take() returns a reference (or copy not sure) of the original data I sometimes do not see changes in the data reflected in my UI.
I'm trying to figure out a better way of handling this rather than the .Take() approach. It seems you shouldn't 'filter' your data using LINQ functions if you have more than one UI element bound to the same data. My only thought on how to do this better is to make a custom container that will limit the count, but that seems like it might be a mountain of work to make a custom stackpanel or equivalent.
Take(10) does not make a copy, it just appends another step to the LINQ query. But all execution is still deferred till someone pulls the items of the query.
If you were setting the items statically, a copy would indeed be created, by running the query once. But since you set the constructed query as the ItemsSource property of the list box, it can run and update it any time, so it is the right approach.
The real reason why you sometimes do not see changes in the data reflected in the UI is that the list box has no way to determine why the data returned by the query have changed and it surely doesn't want to keep constantly trying to refetch the data and maybe update itself. You need to let it know.
How can you let it know? The documentation for ItemsSource says that "you should set the ItemsSource to an object that implements the INotifyCollectionChanged interface so that changes in the collection will be reflected (...).". Apparently the default way of doing things by .Net itself does not work in your case.
So there are some examples how to implement that yourself e.g. in this SO answer. If even the top-level source data collection (over which you are doing the LINQ query) does not support these notifications (which you would just forward), you might need to update the list box manually from your other code which changes the underlying data.

Why is ActiveRecord not smart enough to know that the object_id of the father should be equal to the object_id of the parent of its children?

#father = Hierarchy.find(:first, :conditions => ['label = ?', 'father'])
#father.children.each do |child|
puts #father.object_id == child.parent.object_id
end
I would have thought the results here would be all true.
Instead they are all false.
Why does ActiveRecord work this way instead of recognizing that these are the same Ruby objects?
To return existing objects when possible instead of creating new ones ActiveRecord would have to keep track of which objects were created and which entry in the database they respond to, which would be some overhead. Even then it would still have to lookup child.parent in the database to know that it's the same entry that #father represents, so there wouldn't be any considerable gain performance wise from this caching (well on the ruby side it safes allocating multiple objects but at the cost of the bookkeeping overhead, but on the database side it should be basically the same).
So given that the AR people probably decided that preventing different objects corresponding to the same database entry would either be detrimental or at least not worth the effort, so they chose not to do it.
For anyone who stumbles across this years after it was asked (like me!), check out https://stackoverflow.com/a/4116397/1000655.
It suggests using :inverse_of to set up Bi-directional associations. I wish I hadn't skimmed that bit of the documentation so many times in the past!
Object ID is the pointer (sort of) to the object. The loading of rails objects doesn't "share" memory space, so you're getting a copy of the parent object when you do child.parent. To understand this -- you can do parent.something = foo, and subsequently compare child.parent.something and you'll see they are different. You'd have to re-load the child from the database before it will reflect the changes to the parent object.
However, odds are you're using the wrong ID value. If you want the ActiveRecord ID (e.g. the value of the ID column in your SQL DBMS), use #father.id == child.parent.id
Reusing objects is problematic. What if you make changes to a model instance, but before you save it, you do another find that brings back that same object? The new one should reflect the database, but the old one should still have the changed data.
So yes, it would be a lot of overhead to keep track of each instance and the dirty fields in each one. The memory savings would not be worth the effort.
As far as I'm concerned AR should at least support an Identity Map plugin. This is the biggest issue I deal with when working with AR. I realize it adds some overhead, but short-lived memory overhead is better than the undue tax on the database by having it pull the same records again and again. It's like GC. You can do it yourself or you can use a high-level language to abstract it away so that you can concentrate on business logic. You can really fuss to optimize your AR or you can have take advantage of Identity Map.

Resources