RoR API - when I switch db from sqlite to postgres, any updated objects in API array are moved to end of array - ruby-on-rails

When I update an object in my sqlite API with ajax, it keeps the order of my object array - so the front end looks the same. When I update an object in the API after switching the db to postgres, it changes the order of the array - mostly placing the updated objects at the end of the array. Any ideas what's going on here?
I've tried deleting and remaking the database, no luck. I switched back to sqlite and is working normally again.

In SQL order is not guaranteed. If you desire a particular order, the safest thing to do is to add a sort key to your records, and make sure you're doing an ORDER BY on your select statement.
The fact that SQLite is preserving your ordering is kind of a "mistake" of implementation. You should not rely on the engine to do anything outside the specification.

Quote from the Postgres docs:
After a query has produced an output table (after the select list has been processed) it can optionally be sorted. If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in that case will depend on the scan and join plan types and the order on disk, but it must not be relied on. A particular output ordering can only be guaranteed if the sort step is explicitly chosen.
That said: without an explicit ORDER clause the order of the returned records is kind of random.

Related

How can I disable lazy loading of active record queries?

I want to query some objects from the database using a WHERE clause similar to the following:
#monuments = Monument.where("... lots of SQL ...").limit(6)
Later on, in my view I use methods like #monuments.first, then I loop through #monuments, then I display #monuments.count.
When I look at the Rails console, I see that Rails queries the database multiple times, first with a limit of 1 (for #monuments.first), then with a limit of 6 (for looping through all of them), and finally it issues a count() query.
How can I tell ActiveRecord to only execute the query once? Just executing the query once with a limit of 6 should be enough to get all the data I need. Since the query is slow (80ms), repeating it costs a lot of time.
In your situation you'll want to trigger the query before you your call to first because while first is a method on Array, it's also a “finder method” on ActiveRecord objects that'll fetch the first record.
You can prompt this with any method that requires data to work with. I prefer using to_a since it's clear that we'll be dealing with an array after:
#moments = Moment.where(foo: true).to_a
# SQL Query Executed
#moments.first #=> (Array#first) <Moment #foo=true>
#moments.count #=> (Array#count) 42
In this case, you can also use first(6) in place of limit(6), which will also trigger the query. It may be less obvious to another developer on your team that this is intentional, however.
AFAIK, #monuments.first should not hit the db, I confirmed it on my console, maybe you have multiple instance with same variable or you are doing something else(which you haven't shared here), share the exact code and query and we might debug.
Since, ActiveRecord Collections acts as array, you can use array analogies to avoid querying the db.
Regarding first you can do,
#monuments[0]
Regarding the count, yes, it is a different query which hits the db, to avoid it you can use length as..
#monuments.length

Sorting Realm records that are inserted quickly

Sometimes my app will add many Realm records at once.I need to be able to consistently keep them in the same order.
The documentation recommends that I use NSDate:
Another common motivation for auto-incrementing properties is to preserve order of insertion. In some situations, this can be accomplished by appending objects to a List or by using a createdAt property with a default value of NSDate().
However, since records are added so quickly sometimes, the dates are not always unique, especially considering Realm stores NSDate only to the second accuracy.
Is there something I'm missing about the suggestion in the documentation?Maybe the documentation wasn't considering records added in quick succession? If so, would it be recommended to keep an Int position property and to always query for the last record at the moment when adding a new record, so as to ensure sequential positions?However, querying for the last record in such a case won't return the previous record unless you've also added and finalized a write, which is wasteful if you need to add a lot of records.Then, it would require batch create logic, which is unfortunate.
However, since records are added so quickly sometimes, the dates are not always unique, especially considering Realm stores NSDate only to the second accuracy.
The limitation on date precision was addressed back in Realm v0.101. Realm can now represent dates with greater precision than NSDate.
However, querying for the last record in such a case won't return the previous record unless you've also added and finalized a write, which is wasteful if you need to add a lot of records.
It's not necessary to commit a write transaction for queries on the same thread to see data that you've added during the write transaction.
Is there something I'm missing about the suggestion in the documentation?
You skipped over the first suggestion: appending objects to a List. Lists in Realm are inherently ordered, so you do not need to find a way to create unique, ordered values. Simply append the new object to the list, and rely on the list's order to determine the order in which the objects were added. This also has the advantage of being safe when using Realm Mobile Platform's synchronization features, as incrementing fields can generate duplicates on different devices and timestamps may not be reliable.

Is is possible in ruby to set a specific active record call to read dirty

I am looking at a rather large database.. Lets say I have an exported flag on the product records.
If I want an estimate of how many products I have with the flag set to false, I can do a call something like this
Product.where(:exported => false).count.. .
The problem I have is even the count takes a long time, because the table of 1 million products is being written to. More specifically exports are happening, and the value I'm interested in counting is ever changing.
So I'd like to do a dirty read on the table... Not a dirty read always. And I 100% don't want all subsequent calls to the database on this connection to be dirty.
But for this one call, dirty is what I'd like.
Oh.. I should mention ruby 1.9.3 heroku and postgresql.
Now.. if I'm missing another way to get the count, I'd be excited to try that.
OH SNOT one last thing.. this example is contrived.
PostgreSQL doesn't support dirty reads.
You might want to use triggers to maintain a materialized view of the count - but doing so will mean that only one transaction at a time can insert a product, because they'll contend for the lock on the product count in the summary table.
Alternately, use system statistics to get a fast approximation.
Or, on PostgreSQL 9.2 and above, ensure there's a primary key (and thus a unique index) and make sure vacuum runs regularly. Then you should be able to do quite a fast count, as PostgreSQL should choose an index-only scan on the primary key.
Note that even if Pg did support dirty reads, the read would still not return perfectly up to date results because rows would sometimes inserted behind the read pointer in a sequential scan. The only way to get a perfectly up to date count is to prevent concurrent inserts: LOCK TABLE thetable IN EXCLUSIVE MODE.
As soon as a query begins to execute it's against a frozen read-only state because that's what MVCC is all about. The values are not changing in that snapshot, only in subsequent amendments to that state. It doesn't matter if your query takes an hour to run, it is operating on data that's locked in time.
If your queries are taking a very long time it sounds like you need an index on your exported column, or whatever values you use in your conditions, as a COUNT against an indexed an column is usually very fast.

How can you avoid inserting duplicate records?

I have a web service call that returns XML which I convert into domain objects, I then want to insert these domain objects into my Core Data store.
However, I really want to make sure that I dont insert duplicates (the objects have a date stamp which makes them unique which I would hope to use for the uniqueness check). I really dont want to loop over each object, do a fetch, then insert if nothing is found as that would be really poor on performance...
I am wondering if there is an easier way of doing it? Perhaps a "group by" on my objects in memory???? Is that possible?
Your question already has the answer. You need to loop over them, look for them, if they exist update; otherwise insert. There is no other way.
Since you are uniquing off of a single value you can fetch all of the relevant objects at once by setting the predicate:
[myFetchRequest setPredicate:[NSPredicate predicateWithFormat:#"timestamp in %#", myArrayOfIncomingTimestamps]];
This will give you all of the objects that already exist in a faulted state. You can then run an in memory predicate against that array to retrieve the existing objects to update them.
Also, a word of advice. A timestamp is a terribly uniqueID. I would highly recommend that you reconsider that.
Timestamps are not unique. However, we'll assume that you have unique IDs (e.g. a UUID/GUID/whatever).
In normal SQL-land, you'd add an index on the GUID and search, or add a uniqueness constraint and then just attempt the insert (and if the insert fails, do an update), or do an update (and if the update fails, do an insert, and if the insert fails, do another update...). Note that the default transactions in many databases won't work here — they lock rows, but you can't lock rows that don't exist yet.
How do you know a record would be a duplicate? Do you have a primary key or some other unique key? (You should.) Check for that key -- if it already exists in an Entity in the store, then update it, else, insert it.

Can one rely on the auto-incrementing primary key in your database?

In my present Rails application, I am resolving scheduling conflicts by sorting the models by the "created_at" field. However, I realized that when inserting multiple models from a form that allows this, all of the created_at times are exactly the same!
This is more a question of best programming practices: Can your application rely on your ID column in your database to increment greater and greater with each INSERT to get their order of creation? To put it another way, can I sort a group of rows I pull out of my database by their ID column and be assured this is an accurate sort based on creation order? And is this a good practice in my application?
The generated identification numbers will be unique.
Regardless of whether you use Sequences, like in PostgreSQL and Oracle or if you use another mechanism like auto-increment of MySQL.
However, Sequences are most often acquired in bulks of, for example 20 numbers.
So with PostgreSQL you can not determine which field was inserted first. There might even be gaps in the id's of inserted records.
Therefore you shouldn't use a generated id field for a task like that in order to not rely on database implementation details.
Generating a created or updated field during command execution is much better for sorting by creation-, or update-time later on.
For example:
INSERT INTO A (data, created) VALUES (smething, DATE())
UPDATE A SET data=something, updated=DATE()
That depends on your database vendor.
MySQL I believe absolutely orders auto increment keys. SQL Server I don't know for sure that it does or not but I believe that it does.
Where you'll run into problems is with databases that don't support this functionality, most notably Oracle that uses sequences, which are roughly but not absolutely ordered.
An alternative might be to go for created time and then ID.
I believe the answer to your question is yes...if I read between the lines, I think you are concerned that the system may re-use ID's numbers that are 'missing' in the sequence, and therefore if you had used 1,2,3,5,6,7 as ID numbers, in all the implementations I know of, the next ID number will always be 8 (or possibly higher), but I don't know of any DB that would try and figure out that record Id #4 is missing, so attempt to re-use that ID number.
Though I am most familiar with SQL Server, I don't know why any vendor who try and fill the gaps in a sequence - think of the overhead of keeping that list of unused ID's, as opposed to just always keeping track of the last I number used, and adding 1.
I'd say you could safely rely on the next ID assigned number always being higher than the last - not just unique.
Yes the id will be unique and no, you can not and should not rely on it for sorting - it is there to guarantee row uniqueness only. The best approach is, as emktas indicated, to use a separate "updated" or "created" field for just this information.
For setting the creation time, you can just use a default value like this
CREATE TABLE foo (
id INTEGER UNSIGNED AUTO_INCREMENT NOT NULL;
created TIMESTAMP NOT NULL DEFAULT NOW();
updated TIMESTAMP;
PRIMARY KEY(id);
) engine=InnoDB; ## whatever :P
Now, that takes care of creation time. with update time I would suggest an AFTER UPDATE trigger like this one (of course you can do it in a separate query, but the trigger, in my opinion, is a better solution - more transparent):
DELIMITER $$
CREATE TRIGGER foo_a_upd AFTER UPDATE ON foo
FOR EACH ROW BEGIN
SET NEW.updated = NOW();
END;
$$
DELIMITER ;
And that should do it.
EDIT:
Woe is me. Foolishly I've not specified, that this is for mysql, there might be some differences in the function names (namely, 'NOW') and other subtle itty-bitty.
One caveat to EJB's answer:
SQL does not give any guarantee of ordering if you don't specify an order by column. E.g. if you delete some early rows, then insert 'em, the new ones may end up living in the same place in the db the old ones did (albeit with new IDs), and that's what it may use as its default sort.
FWIW, I typically use order by ID as an effective version of order by created_at. It's cheaper in that it doesn't require adding an index to a datetime field (which is bigger and therefore slower than a simple integer primary key index), guaranteed to be different, and I don't really care if a few rows that were added at about the same time sort in some slightly different order.
This is probably DB engine depended. I would check how your DB implements sequences and if there are no documented problems then I would decide to rely on ID.
E.g. Postgresql sequence is OK unless you play with the sequence cache parameters.
There is a possibility that other programmer will manually create or copy records from different DB with wrong ID column. However I would simplify the problem. Do not bother with low probability cases where someone will manually destroy data integrity. You cannot protect against everything.
My advice is to rely on sequence generated IDs and move your project forward.
In theory yes the highest id number is the last created. Remember though that databases do have the ability to temporaily turn off the insert of the autogenerated value , insert some records manaully and then turn it back on. These inserts are no typically used on a production system but can happen occasionally when moving a large chunk of data from another system.

Resources