Updating core data performance

Updating core data performance - ios

I'm creating an app that uses core data to store information from a web server. When there's an internet connection, the app will check if there are any changes in the entries and update them. Now, I'm wondering which is the best way to go about it. Each entry in my database has a last updated timestamp. Which of these 2 will be more efficient:
Go through all entries and check the timestamp to see which entry needs to be updated.
Delete the whole entity and re-download everything again.
Sorry if this seems like an obvious question and thanks!

I'd say option 1 would be most efficient, as there is rarely a case where downloading everything (especially in a large database with large amounts of data) is more efficient than only downloading the parts that you need.

I recently did something similiar.
I solve the problem, by assigning an unique ID and a global 'updated timestamp' and thinking about 'delta' change.
I explain better, I have a global 'latest update' variable stored in user preferences, with a default value of 01/01/2010.
This is roughly my JSON service:
response: {
metadata: {latestUpdate: 2013...ecc}
entities: {....}
}
Then, this is what's going on:
pass the 'latest update' to the web service and retrieve a list of entities
update the core data store
if everything went fine with core data, the 'latestUpdate' from the service metadata became my new 'latest update variable' stored in user preferences
That's it. I am only retrieving the needed change, and of course the web service is structured to deliver a proper list. Which is: a web service backed by a database, can deal with this matter quite well, and leave the iphone to be a 'simple client' only.
But I have to say that for small amount of data, it is still quite performant (and more bug free) to download the whole list at each request.

As per our discussion in the comments above, you can model your core data object entries with version control like this
CoreDataEntityPerson:
name : String
name_version : int
image : BinaryData
image_version : int
You can now model the server xml in the following way:
<person>
<name>michael</name>
<name_version>1</name_version>
<image>string_converted_imageData</image>
<image_version>1</image_version>
</person>
Now, you can follow the following steps :
When the response arrives and you parse it, you initially create a new object from entity and fill the data directly.
Next time, when you perform an update on the server, you increase the version count of an entry by 1 and store it.
E.g. lets say the name michael is now changed to abraham, then version count of name_version on server will be 2
This updated version count will come in the response data.
Now, while storing the data in the same object, if you find the version count to be same, then the data update of that entry can be skipped, but if you find the version count to be changed, then the update of that entry needs to be done.
This way you can efficiently perform check on each entry and perform updates only on the changed entries.
Advice:
The above approach works best when you're dealing with large amount of data updation.
In case of simple text entries for an object, simple overwrite of data on all entries is efficient enough. And this also keeps the data reponse model simple.

Related

Delphi FireDAC: how to refresh data in cache

i need to refresh data in a TFDQuery which is in cached updates.
to simplify my problem, let's suppose my MsACCESS database is composed of 2 tables that i have to join.
LABTEST(id_test, dat_test, id_client, sample_typ)
SAMPLEType(id, SampleName)
in the Delphi application, i am using TFDConnection and 1 TFDQuery (in cached updates) in which i join the 2 tables which script is:
"SELECT T.id_test, T.dat_test, T.id_client, T.sample_typ, S.SampleName
FROM LABTEST T
left JOIN SAMPLEType S ON T.sample_typ = S.id"
in my application, i also use a DBGrid to show the result of the query.
and a button to edit the field "sample_typ", like this:
qr.Edit;
qr.FieldByName('sample_typ').AsString:=ce2.text;
qr.Post;
the edition of the 'sample_typ' field works fine but the corresponding 'sampleName' field is not changing (in the grid) after an update.
in fact it is not refreshed !
the problem is here: if i do refresh of the query, an exception is raised: "cannot refresh dataset. cached updates must be commited or canceled
and batch mode terminated before refreshing"
if i commit the updates, data will be sent to database and i don't want that, i need to keep the data in cache till the end of the operation.
also if i get out of the cache, data will be refreshed in the grid but will be sent to the database after qr.post and i don't want that.
i need to refresh data in the cache. what is the solution ?
Thanks in advance.

The issue comes down to the fact that you haven't told your UI that there is any dependency on the two fields - it clearly can't know how to do the join itself without resubmitting it so if you don't want to send the updates and reload you will have a problem.
It's not clear exactly what you are trying to do, but these two ideas may help you.
If you are not going to edit the fields in the SAMPLEType tables (S) then load the values from that table into a lookup table. You can load this into a TFDMemTable. You can use an adapter which loads from a query. Your UI controls can then show the value based on the valus looked up in your local TFDMemTable. Dependiong on the UI control this might be a 'LookupField' or some such.
You may also be able to store your main data in a TFDMemTable with an Adapter - you can specify diferent TFDCommands to read the whole recordset, refresh a record, update, insert and delete a record. The TFDCommands can act on multiple tables for joined recordsets like this. That would automatically refresh the individual record for you when you post it.

Apache Solr: Merging documents from two sources before indexing

I need to index data from a custom application in Solr. The custom app stores metadata in an Oracle RDBMS and documents (PDF, MS Word, etc.) in a file store. The two are linked in the sense that the metadata in the database refers to a physical document (PDF) in the file store.
I am able to index the metadata from the RDBMS without issues. Now I would like to update the indexed documents with an additional field in which I can store the parsed content from the PDFs.
I have considered and tried the following
1. Using Update RequestHandler to try and update the indexed document with . This didn't work and the original document indexed from the RDBMS was overwritten.
2. Using SolrJ to do atomic updates but I am not sure if this is a good approach for something like this
Has anyone come across this issue before and what would be the recommended approach?

You can update the document, but it requires that you know the id of the existing document. For example:
{
"id": "5",
"parsed_content":{"set": "long text field with parsed content"}
}
Instead of just saying "parsed_content":"something" you have to wrap the value in "parsed_content":{"set":"something"} to trigger adding it to the existing document.
See https://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22field.22 for documentation on how to work with multivalued fields etc.

Not able to import EntityState Added

I'm having an issue with importing cached data. I have a very simple page right now to use as a proof of concept. There is a single table in the back end. I have all CRUD functionality working. When the user makes a change to the local data, I update a record in local storage.
var bundle = em.exportEntities(em.getChanges());
window.localStorage.setItem("waterLevelChanges", bundle);
after the page is loaded I import the entities
var bundle = window.localStorage.getItem("waterLevelChanges");
if (bundle)
em.importEntities(bundle);
This works perfectly if I'm editing an existing record. However, any records that I have added, but not saved to the database won't populate. In the bundle they have an EntityState of "Added". I read that there is an issue if you don't use the temp keys, but I'm letting Breeze use the temp keys and manage them as it likes. I have verified the data is stored in the local cache by looking in the developer tools. I can also see it in the bundle when I debug.

Our tests show that your stated scenario should work fine.
See this DocCode:export/importTests.js test where a new Order is exported, saved to browser storage, and reimported; it's "Added" state is confirmed. The Order type has server-generated, temporary keys.
I think you'll have to create a repro of your failing scenario to convince me that it doesn't work. Perhaps you might start by forking this Todos plunker and modifying it to make your point; a TodoItem also has has server-generated, temporary keys.

Resolving conflicts between automated updates and manual overrides

This is a bit of a complex, abstract question, so forgive me if it's not specific enough.
I've encountered a specific type of problem numerous times: That on one hand, a data source is used to update a certain data structure in an automated fashion at regular intervals, but on the other hand, stakeholders want to be able to manually override the automated entries.
Example:
You have a list of products, which are kept up-to-date (title, description, etc.) by some automated script which uses external data sources (product databases, etc.).
Let's say that in your data source you have a toaster "Freshtoast XYZ 300" and if its name changes to "FreshToast! XYZ-300", you want to propagate that update into your own (differently structured) product model.
At the same time, if a co-worker doesn't like the name "Freshtoast XYZ 300" and wants to change it to "Toaster XYZ 300 by Freshtoast" (manually), you don't want to override that change automatically (he would get angry), but you also don't want to simply ignore the updated name, since if the co-worker knew about the change, he'd adjust the name to "Toaster XYZ-300 by FreshToast!".
What's the best method to "consider" updated data sources - even for overridden data - while still allowing manual override?
PS: I'm using mostly Ruby / Rails, but I guess the question is very general. Also, to be clear, automated updates are the rule, while manual overrides are the exception in this scenario. So let's say 200,000 products get updated every single day, only 20 of which have manually overridden titles. So, for example, having to approve every single update is not an option.

Here goes nothing...
Hands off approach: Add a string column to products table that contains a serialized list of user-touched columns. Anytime a user touches a column in the products table, put it in the serialized list. When the automatic updater hits that record it checks the list for columns it should ignore.
Hand-wringing micro-manager approach: Use a versioning library (e.g. vestal_versions gem) and add a user_id column to the products table. Anytime a user-touched record is automatically updated, send them a notification and allow them to view a before/after which they can approve or reject.

Parsing a CSV for Database Insertion when Formatted Incorrectly

I recently wrote a mailing platform for one of our employees to use. The system runs great, scales great, and is fun to use. However, it is currently inoperable due to a bug that I can't figure out how to fix (fairly inexperienced developer).
The process goes something like this...
Upload a CSV file to a specific FTP directory.
Go to the import_mailing_list page.
Choose a CSV file within the FTP directory.
Name and describe what the list contains.
Associate file headings with database columns.
Then, the back-end loops over each line of the file, associating the values with a heading, and importing these values into a database.
This all works wonderfully, except in a specific case, when a raw CSV is not correctly formatted. For example...
fname, lname, email
Bob, Schlumberger, bob#bob.com
Bobbette, Schlumberger
Another, Record, goeshere#email.com
As you can see, there is a missing comma on line two. This would cause an error when attempting to pull "valArray[3]" (or valArray[2], in the case of every language but mine).
I am looking for the most efficient solution to keep this error from happening. Perhaps I should check the array length, and compare it to the index we're going to attempt to pull, before pulling it. But to do this for each and every value seems inefficient. Anybody have another idea?
Our stack is ColdFusion 8/9 and MySQL 5.1. This is why I refer to the array index as [3].

There's ArrayIsDefined(array, elementIndex), or ArrayLen(array)
seems inefficient?
You gotta code what you need to code, forget about inefficiency. Get it right before you get it fast (when needed).

I suppose if you are looking for another way of doing this (instead of checking the array length each time, although that really doesn't sound that bad to me), you could wrap each line insert attempt in a try/catch block. If it fails, then stuff the failed row in a buffer (including the line number and error message) that you could then display to the user after the batch has completed, so they could see each of the failed lines and why they failed. This has the advantages of 1) not having to explicitly check the array length each time and 2) catching other errors that you might not have anticipated beforehand (maybe a value is too long for your field, for example).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart