Delete multiple documents in CouchDB - ios

I've a "best practice" question on CouchDB (actually I'm using TouchDB a CouchDB port to iOS), when using CouchCocoa framework.
I need to delete a bunch of documents that I get via a query.
I know 3 ways to do this:
1) put all the documents into an NSArray, then use [CouchDatabase deleteDocuments:]
2) foreach query rows call the delete method, like:
for (CouchQueryRow* row in query.rows)
[row.document DELETE];
3) create a query that emit the _id, _rev properties and add the _deleted property, then use the bulk update, like:
[couchDatabase putChanges:]
What's the better performance-wise? There's a better way to do it?

At the HTTP API level, the fastest way to achieve this is to run a single batch request that provides the _id and current _rev of all documents to be removed.
Your job is to make sure that CouchCocoa actually does this — I know that CouchCocoa will try to cache the _rev of documents it reads, so if you are deleting documents that have just been read, [CouchDatabase deleteDocuments:] should be enough, otherwise you will have to [CouchDatabase getDocumentsWithIDs:] first.
If your documents are very large, it might become better to get the _rev using a view instead of a bulk fetch. This forces you to use [CouchDatabase putChanges:] to perform the bulk deletion. I don't know where the document size threshold lies, so you will have to benchmark this one.
Of course, you also need to decide what happens when a conflict occurs.

Related

How to create large CSV file and send it to front end

I'm building a project where the front end is react and the backend is ruby on rails and uses a postgres DB. A required functionality is the ability for users to export a large dataset. So they'll get a table view and click "export" and that will send a request to the backend which should create a CSV file and send it to the front end.
This is the query that displays the data in the table and how it's executed (using find_by_sql)
query = <<-SQL
SELECT * FROM ORDERS WHERE ORDERS.STORE_ID = ? OFFSET ? LIMIT ?
SQL
query_result = Order.find_by_sql([query, store_id.to_i, offset.to_i, 50])
Now whenever the users click export, it's going to make a request to the same endpoint except it'll set a flag to notify the backend that it wants a CSV file and the limit will be much greater than 50...it could be hundreds of thousands to millions of records.
What is the best way to create a CSV to send to the front end, taking into account that the number of records will be large.
You have a couple of options:
Create a temporary file, use the standard CSV library to populate that temporary file, and then use send_file to dispatch that file to the user.
Depending on the size of data and/or your server's ability to host large temporary files, spooling to a tempfile might take too long or be otherwise impractical. In that case, you might want to stream the CSV data as it's generated, which is more complicated to set up but lessens the impact on your server.
This article has some well thought out steps to set up an interface for streaming data. As a bonus, it also delegates the act of generating the CSV to PostgreSQL itself. This will give you the best possible performance, but at the expense of code readability. However, it should set you on your way.

Getting more than 1000 documents using folder() in Appian

I am writing an Appian web API, to retrieve documents from our Appian system which will be used to integrate with our other systems.
To this end, I am using the folder() method to get information about the contents of a folder in Appian.
folder(
theCaseFolder,
"documentChildren"
)
The problem I am having is that while this code works most of the time - we have some cases where there are more than 1000 documents stored against the case. I note that the Appian documentation states that:
The documentChildren and folderChildren properties return up to the first 1000 documents or folders, respectively, that are direct children of the selected folder.
My problem is that we have a few cases where there are more than 3000 documents attached to the case. Is there a way to get a list of of those child documents, or am I plain out of luck?
In long term I would suggest storing some information about document in separate table in db. In this way you can query db as you wish by Appian or by SQL.
In short term you can get first 1000 as it is in documentation and then move them to subfolder/different folder or delete. This can be repeated multiple times to get all files from folder.
Move Document Appian Function

Delete rows in CloudBoost using conditions

Is there a quick way to delete rows in a CloudBoost database without sending an ID as parameter?
For example, imagine that I have a list of Dogs and would like to delete those whose color is white.
Looking in the documentation, I could create a CloudQuery to retrieve all Dogs that matches this condition and then call CloudObject.deleteAll to remove all of them. The problem in this solution is that I needed to retrieve all the data to be able to remove them.
Is there any straightforward solution for this problem to avoid making unnecessary requests to the server?
currently, cloudboost has no option like this that you are looking for. To delete you have to first fetch cloudobjects then deleteall(). Anyway, you can contact cloudboost team and request for this feature. I am sure they will help you out on this.

Differences in Umbraco cache structure?

Ok, So I have just spent the last 6-8 weeks in the weeds of Umbraco and have made some fixes/Improvements to our site and environments. I have spent a lot of that time trying to correct lower level Umbraco caching related issues. Now reflecting on my experience and I still don't have a clue what the conceptual differences are between the following:
Examine indexes
umbraco.config
cached xml file in memory (supposedly similar to umbraco.config)
CMSContentXML Table
Thanks Again,
Devin
Examine Indexes are index of umbraco content
So when ever you create/update/delete content, the current content information will be indexed
This index are use for searching - under the hood, it is lucene index
umbraco backend use these index for searching
You can create your own index if you want
more info checkout, Overview & Explanation - "Examining Examine by Peter Gregory"
umbraco.config and cached xml in memory are really the same thing.
The front end UmbracoHelper api get content from the cache not the database - the cache is from the umbraco.config
CMSContentXML contains each content's information as xml
so essentially this xml represent all the information of a node content
So in a nutshell they represent really 3 things:
examine is used for searching
umbraco.config cached data - save round trip to DB
CMSContentXML stores full information of a content
Edit to include better clarification from Robert Foster comment and the UmbracoHelper vs ExamineManager
For the umbraco.config and CMSContentXML table, #robert-foster commented
umbraco.config stores the most recent version of all published content only; the in-memory cache is a cached version of this file; and the cmscontentxml table stores a representation of all content and is used primarily for preview mode - it is updated every time a content item is saved. IIRC it also stores a representation of other content types
Regards to UmbracoHelper vs ExamineManager
UmbracoHelper api mostly get it's content from the memory cache - IMO it works best when locating direct content, such as when you know the id of content you want, you just call Umbraco.TypedContent(id)
But where do you get the id you want in the first place? or put it another way, say if you want to find all content's property Title which contain a word "Test", then you would use Examine to search for it. Because Examine is really lucene wrapper, so it is going to be fast and efficient
Although you can traverse tree by method such as Umbraco.TypedContent(id).Children then use linq to filter the result, but I think this is done in memory using linq-to-object, so it is not as efficient and preferment as lucene
So personally I think:
use Examine when you are searching (locating) for content - because you can use the capability of a proper search engine lucene
once you got the ids from the search result, use UmbracoHelper to get the full publish content representation of the content id into strong type model and work with the data.
one thing #robert-foster mention in the comment which, I did not know is that UmbracoHelper provides Search method which is a wrapper around the examine, so use that if more familiar with that api.
Lastly, if any above statement are wrong or not so correct, help me clarify so that anyone look at it later will not get it wrong, thanks all.

Updating core data performance

I'm creating an app that uses core data to store information from a web server. When there's an internet connection, the app will check if there are any changes in the entries and update them. Now, I'm wondering which is the best way to go about it. Each entry in my database has a last updated timestamp. Which of these 2 will be more efficient:
Go through all entries and check the timestamp to see which entry needs to be updated.
Delete the whole entity and re-download everything again.
Sorry if this seems like an obvious question and thanks!
I'd say option 1 would be most efficient, as there is rarely a case where downloading everything (especially in a large database with large amounts of data) is more efficient than only downloading the parts that you need.
I recently did something similiar.
I solve the problem, by assigning an unique ID and a global 'updated timestamp' and thinking about 'delta' change.
I explain better, I have a global 'latest update' variable stored in user preferences, with a default value of 01/01/2010.
This is roughly my JSON service:
response: {
metadata: {latestUpdate: 2013...ecc}
entities: {....}
}
Then, this is what's going on:
pass the 'latest update' to the web service and retrieve a list of entities
update the core data store
if everything went fine with core data, the 'latestUpdate' from the service metadata became my new 'latest update variable' stored in user preferences
That's it. I am only retrieving the needed change, and of course the web service is structured to deliver a proper list. Which is: a web service backed by a database, can deal with this matter quite well, and leave the iphone to be a 'simple client' only.
But I have to say that for small amount of data, it is still quite performant (and more bug free) to download the whole list at each request.
As per our discussion in the comments above, you can model your core data object entries with version control like this
CoreDataEntityPerson:
name : String
name_version : int
image : BinaryData
image_version : int
You can now model the server xml in the following way:
<person>
<name>michael</name>
<name_version>1</name_version>
<image>string_converted_imageData</image>
<image_version>1</image_version>
</person>
Now, you can follow the following steps :
When the response arrives and you parse it, you initially create a new object from entity and fill the data directly.
Next time, when you perform an update on the server, you increase the version count of an entry by 1 and store it.
E.g. lets say the name michael is now changed to abraham, then version count of name_version on server will be 2
This updated version count will come in the response data.
Now, while storing the data in the same object, if you find the version count to be same, then the data update of that entry can be skipped, but if you find the version count to be changed, then the update of that entry needs to be done.
This way you can efficiently perform check on each entry and perform updates only on the changed entries.
Advice:
The above approach works best when you're dealing with large amount of data updation.
In case of simple text entries for an object, simple overwrite of data on all entries is efficient enough. And this also keeps the data reponse model simple.

Resources