How do I get a continuation token for a bulk INSERT on Azure Cosmos DB? - stored-procedures

I want to upload a CSV file that represents 10k documents to be added to my Cosmos DB collection in a manner that's fast and atomic. I have a stored procedure like the following pseudo-code:
function createDocsFromCSV(csv_text) {
function parse(txt) { // ... parsing code here ... }
var collection = getContext().getCollection();
var response = getContext().getResponse();
var docs_to_create = parse(csv_text);
for(var ii=0; ii<docs_to_create.length; ii++) {
var accepted = collection.createDocument(collection.getSelfLink(),
docs_to_create[ii],
function(err, doc_created) {
if(err) throw new Error('Error' + err.message);
});
if(!accepted) {
throw new Error('Timed out creating document ' + ii);
}
}
}
When I run it, the stored procedure creates about 1200 documents before timing out (and therefore rolling back and not creating any documents).
Previously I had success updating (instead of creating) thousands of documents in a stored procedure using continuation tokens and this answer as guidance: https://stackoverflow.com/a/34761098/277504. But after searching documentation (e.g. https://azure.github.io/azure-documentdb-js-server/Collection.html) I don't see a way to get continuation tokens from creating documents like I do for querying documents.
Is there a way to take advantage of stored procedures for bulk document creation?

It’s important to note that stored procedures have bounded execution, in which all operations must complete within the server specified request timeout duration. If an operation does not complete with that time limit, the transaction is automatically rolled back.
In order to simplify development to handle time limits, all CRUD (Create, Read, Update, and Delete) operations return a Boolean value that represents whether that operation will complete. This Boolean value can be used a signal to wrap up execution and for implementing a continuation based model to resume execution (this is illustrated in our code samples below). More details, please refer to the doc.
The bulk-insert stored procedure provided above implements the continuation model by returning the number of documents successfully created.
pseudo-code:
function createDocsFromCSV(csv_text,count) {
function parse(txt) { // ... parsing code here ... }
var collection = getContext().getCollection();
var response = getContext().getResponse();
var docs_to_create = parse(csv_text);
for(var ii=count; ii<docs_to_create.length; ii++) {
var accepted = collection.createDocument(collection.getSelfLink(),
docs_to_create[ii],
function(err, doc_created) {
if(err) throw new Error('Error' + err.message);
});
if(!accepted) {
getContext().getResponse().setBody(count);
}
}
}
Then you could check the output document count on the client side and re-run the stored procedure with the count parameter to create the remaining set of documents until the count larger than the length of csv_text.
Hope it helps you.

Related

How to apply deduplication for an array returned in Zapier Code output

I have a Zapier Code block that does fetch for JSON array and the preprocess this data. I cannot use Zapier Webhook with polling, because I need to process the data a bit.
Zapier Webhook offers deduplication feature, by having id parameter associated with the items returned in an array from the URL endpoint. How can I achieve the same for Zapier Code? Currently, my zap is trying to process and trigger on the same data twice. This leads to the error that Zapier tries to send out the same tweet twice, every time the Code is triggered.
Here is mock data returned by my Code:
output = [{id: 1, name: "foo"}, {id: 2, name: "bar"}]
Currently, without deduplication, I am getting this email and having my zap disabled:
Your Zap named XXX was just stopped. This happened because our systems detected this Zap posted a duplicate tweet, which is against Twitter's Terms Of Service.
You can use storage by Zapier to achieve this. the ideal flow will be :
Trigger
Storage by Zapier [Get Value (use storage key = lastItemId) ]
Code By Zapier (Filter array return only those record that has id greater than the lastItemId)
Storage By Zapier (Set Value) : update lastItemId with the last item processed by Code By Zapier step
You can also use StoreClient in place of the Storage By zapier, but always update a existing key lastItemId and compare id of the record with ```lastItemId`` and at the end update StoreCLient key (lastItemId)
Based on the answer from Kishor Patidar, here is my code. I am adding the example code, here is too some time to figure it out. Specifically, in my case, the items cannot be processed in the order of appearance (no running counter primary keys) and also there are some limitations how far in the future Zapier can schedule actions (you can delay only up to one month).
The store also has a limitation of 500 keys.
// We need store for deduplication
// https://zapier.com/help/create/code-webhooks/store-data-from-code-steps-with-storeclient
// https://www.uuidgenerator.net/
var store = StoreClient('xxx');
// Where do we get our JSON data
const url = "https://xxx";
// Call FB public backend to get scheduled battles
const resp = await fetch(url);
const data = await resp.json();
let processed = [];
for(let entry of data.entries) {
console.log("Processing entry", entry.id);
// Filter out events with a non-wanted prize type
if(entry.prizes[0].type != "mytype") {
continue;
}
// Zapier can only delay tweets for one month
// As the code fires every 30 minutes,
// we are only interested scheduling tweets that happen
// very soon
const when = Date.parse(entry.startDate);
const now = Date.now();
if(!when) {
throw new Error("startDate missing");
}
if(when > now + 24 * 3600 * 1000) {
// Milliseconds not going to happen for next 24h
console.log("Too soon to schedule", entry.id, entry.startDate, now);
continue;
} else {
console.log("Starting to schedule", entry.id, entry.startDate, now);
}
const key = "myprefix_" + entry.id;
// Do manual deduplication
// https://stackoverflow.com/questions/64893057/how-to-apply-deduplication-for-an-array-returned-in-zapier-code-output
const existing = await store.get(key);
if(existing) {
// Already processed
console.log("Entry already processed", entry.id);
continue;
}
// Calculate some parameters on entry based on nested arrays
// and such
entry.startTimeFormat = "HH:mm";
// Generate URL for the tweet
entry.signUpURL = `https://xxx/${entry.id}`;
processed.push(entry);
// Do not tweet this entry twice,
// by setting a marker flag for it in store
await store.set(key, "deduplicated");
}
output = processed;

Firestore batch.commit adding more than 500 documents at a time

I am new to Firestore, trying to figure out a fast way to add some documents in Firestore using Dart.
I used the code below. I had about 3000 strings in a list, and it ended up adding all the 3000 documents, but it took a long time, about 10 minutes, and I also got an error message after batch.commit, that the 500 limit was exceeded, even though it added all 3000 documents.
I know I should break it down into 500 at a time, but the fact that it added all the documents in the list does not make sense to me. I checked in Firestore Console, and all the 3000 documents were there.
I need to create a document id every time I add a document. What am I doing wrong? Is it ok to use the add to get a document reference, and then batch.setData?
Future<void> addBulk(List<String> stringList) async {
WriteBatch batch = Firestore.instance.batch();
for (String str in stringList) {
// Check if str already exists, if it does not exist, add it to Firestore
if (str.isNotEmpty) {
QuerySnapshot querySnapshot = await myCollection
.where('name', isEqualTo: str)
.getDocuments(source: Source.cache);
if (querySnapshot.documents.length == 0) {
UserObject obj = UserObject(name: str);
DocumentReference ref = await myCollection.add(obj.toMap());
batch.setData(ref, UserObject.toMap(), merge: true);
}
}
}
batch.commit();
}
Your code is actually just adding each document separately, regardless of what you're doing with the batch. This line of code does it:
DocumentReference ref = await myCollection.add(obj.toMap());
If you remove that line (which, again, is not touching your batch), I'm sure you will just see a failure due to the batch size.
If you are just trying to generate a unique ID for the document to be added in the batch, use document() with no parameters instead:
DocumentReference ref = myCollection.document();

HTPP POST to Google Forms or Alternative

I have a google form setup that emails me upon a manual submission when somebody fills it out (new lead) and transfers the information to a Google spreadsheet. Easy enough to figure that out.
However, now I'm trying to figure out how to send the same information information contained within a url string and automatically POST that information to the form. Or find a company who offers that ability, via an api or other means. So far I've tested out jotform and a few others. The information passed along fine, but it doesn't auto populate the fields. I assume it's because it doesn't know that x=y due to the fields being named differently. I've found a ton of documentation about pre-populating the forms, but not much about filling out a form every time a new POST url is generated.
URL looks like the following
VARhttps://script.google.com/macros/s/XXXXXXXXXXXXXXXX/exec?/--A--
first_name--B--/--A--last_name--B--/--A--address1--B--/--A--city--B--/--A--
state--B--/--A--postal_code--B--/--A--phone_number--B--/--A--date_of_birth--
B--/--A--email--B--
Information passed is as follows
https://website
here.com/Pamela/Urne/123+Test+Street/Henderson/TX/75652/281XXXXXX/1974-01-
01/test0101cw#test.com
The script I'm testing out
// original from: http://mashe.hawksey.info/2014/07/google-sheets-as-a-database-insert-with-apps-script-using-postget-methods-with-ajax-example/
// original gist: https://gist.github.com/willpatera/ee41ae374d3c9839c2d6
function doGet(e){
return handleResponse(e);
}
// Enter sheet name where data is to be written below
var SHEET_NAME = "Sheet1";
var SCRIPT_PROP = PropertiesService.getScriptProperties(); // new property service
function handleResponse(e) {
// shortly after my original solution Google announced the LockService[1]
// this prevents concurrent access overwritting data
// [1] http://googleappsdeveloper.blogspot.co.uk/2011/10/concurrency-and-google-apps-script.html
// we want a public lock, one that locks for all invocations
var lock = LockService.getPublicLock();
lock.waitLock(30000); // wait 30 seconds before conceding defeat.
try {
// next set where we write the data - you could write to multiple/alternate destinations
var doc = SpreadsheetApp.openById(SCRIPT_PROP.getProperty("key"));
var sheet = doc.getSheetByName(SHEET_NAME);
// we'll assume header is in row 1 but you can override with header_row in GET/POST data
var headRow = e.parameter.header_row || 1;
var headers = sheet.getRange(1, 1, 1, sheet.getLastColumn()).getValues()[0];
var nextRow = sheet.getLastRow()+1; // get next row
var row = [];
// loop through the header columns
for (i in headers){
if (headers[i] == "Timestamp"){ // special case if you include a 'Timestamp' column
row.push(new Date());
} else { // else use header name to get data
row.push(e.parameter[headers[i]]);
}
}
// more efficient to set values as [][] array than individually
sheet.getRange(nextRow, 1, 1, row.length).setValues([row]);
// return json success results
return ContentService
.createTextOutput(JSON.stringify({"result":"success", "row": nextRow}))
.setMimeType(ContentService.MimeType.JSON);
} catch(e){
// if error return this
return ContentService
.createTextOutput(JSON.stringify({"result":"error", "error": e}))
.setMimeType(ContentService.MimeType.JSON);
} finally { //release lock
lock.releaseLock();
}
}
function setup() {
var doc = SpreadsheetApp.getActiveSpreadsheet();
SCRIPT_PROP.setProperty("key", doc.getId());
}
I get a success message after accessing the url, but all information listed in the spreadsheet is "Undefined."
That's as far as I got so far. If somebody knows an easier solution or can point me in the right direction I'd appreciate it.

Unable to figure out how to use post method, for a suitescript written in Netsuite

I am trying to do use the post method for a simple suitescript program, i am very new to this.
In Netsuite i have written a suitescript as follows.
function restPost()
{
var i = nlapiLoadRecord('department', 115);
var memo = nlapisetfieldvalue('custrecord225', ' ');// this is a customfield, which i want to populate the memo field, using rest client in firefox
var recordId = nlapiSubmitRecord(i);
}
i have created a script record and uploaded this suitescript and even copied the external URL to paste it in restclient.
In Restclient(firefox plugin), pasted the external URL, i have given the method as post, header authorization given, content-type: application/json, and in body i put in {"memo":"mynamehere"};
In this the error i get is
message": "missing ) after argument list
I even tried it by writting other suitescript programs the errors i get is as follows:
Unexpected token in object literal (null$lib#3) Empty JSON string
Invalid data format. You should return TEXT.
I am kinda new to the programming world, so any help would be really good.
I think you are trying to create a RESTlet for POST method. Following is the sample code for POST method -
function createRecord(datain)
{
var err = new Object();
// Validate if mandatory record type is set in the request
if (!datain.recordtype)
{
err.status = "failed";
err.message= "missing recordtype";
return err;
}
var record = nlapiCreateRecord(datain.recordtype);
for (var fieldname in datain)
{
if (datain.hasOwnProperty(fieldname))
{
if (fieldname != 'recordtype' && fieldname != 'id')
{
var value = datain[fieldname];
if (value && typeof value != 'object') // ignore other type of parameters
{
record.setFieldValue(fieldname, value);
}
}
}
}
var recordId = nlapiSubmitRecord(record);
nlapiLogExecution('DEBUG','id='+recordId);
var nlobj = nlapiLoadRecord(datain.recordtype,recordId);
return nlobj;
}
So after deploying this RESTlet you can call this POST method by passing following sample JSON payload -
{"recordtype":"customer","entityid":"John Doe","companyname":"ABCTools Inc","subsidiary":"1","email":"jdoe#email.com"}
For Authorization you have to pass request headers as follows -
var headers = {
"Authorization": "NLAuth nlauth_account=" + cred.account + ", nlauth_email=" + cred.email +
", nlauth_signature= " + cred.password + ", nlauth_role=" + cred.role,
"Content-Type": "application/json"};
I can understand your requirement and the answer posted by Parsun & NetSuite-Expert is good. You can follow that code. That is a generic code that can accept any master record without child records. For Example Customer Without Contact or Addressbook.
I would like to suggest a small change in the code and i have given it in my solution.
Changes Below
var isExistRec = isExistingRecord(objDataIn);
var record = (isExistRec) ? nlapiLoadRecord(objDataIn.recordtype, objDataIn.internalid, {
recordmode: 'dynamic'
}) : nlapiCreateRecord(objDataIn.recordtype);
//Check for Record is Existing in Netsuite or Not using a custom function
function isExistingRecord(objDataIn) {
if (objDataIn.internalid != null && objDataIn.internalid != '' && objDataIn.internalid.trim().length > 0)
return true;
else
return false;
}
So whenever you pass JSON data to the REStlet, keep in mind you have
to pass the internalid, recordtype as mandatory values.
Thanks
Frederick
I believe you will want to return something from your function. An empty object should do fine, or something like {success : true}.
Welcome to Netsuite Suitescripting #Vin :)
I strongly recommend to go through SuiteScript API Overview & SuiteScript API - Alphabetized Index in NS help Center, which is the only and most obvious place to learn the basics of Suitescripting.
nlapiLoadRecord(type, id, initializeValues)
Loads an existing record from the system and returns an nlobjRecord object containing all the field data for that record. You can then extract the desired information from the loaded record using the methods available on the returned record object. This API is a core API. It is available in both client and server contexts.
function restPost(dataIn) {
var record = nlapiLoadRecord('department', 115); // returns nlobjRecord
record.setFieldValue('custrecord225', dataIn.memo); // set the value in custom field
var recordId = nlapiSubmitRecord(record);
return recordId;
}

Reading / writing file on local machine

I pretty much copied this code right out of the MDN File I/O page.. except I added an if statement to check if the file exists already and if it does, read it instead.
Components.utils.import("resource://gre/modules/NetUtil.jsm");
Components.utils.import("resource://gre/modules/FileUtils.jsm");
var file = Components.classes["#mozilla.org/file/directory_service;1"].
getService(Components.interfaces.nsIProperties).
get("Desk", Components.interfaces.nsIFile);
file.append("test.txt");
if (!file.exists()) {
this.user_id = Math.floor(Math.random()*10001) +'-'+ Math.floor(Math.random()*10001) +'-'+ Math.floor(Math.random()*10001) +'-'+ Math.floor(Math.random()*10001);
var ostream = FileUtils.openSafeFileOutputStream(file)
var converter = Components.classes["#mozilla.org/intl/scriptableunicodeconverter"].
createInstance(Components.interfaces.nsIScriptableUnicodeConverter);
converter.charset = "UTF-8";
var istream = converter.convertToInputStream(this.user_id);
// The last argument (the callback) is optional.
NetUtil.asyncCopy(istream, ostream, function(status) {
if (!Components.isSuccessCode(status)) {
alert('Error '+ status);
return;
}
alert('File created');
});
} else
{
NetUtil.asyncFetch(file, function(inputStream, status) {
if (!Components.isSuccessCode(status)) {
alert('error '+ status);
return;
}
// The file data is contained within inputStream.
// You can read it into a string with
this.user_id = NetUtil.readInputStreamToString(inputStream, inputStream.available());
});
alert('File exists already, do not create');
}
alert(this.user_id);
It creates the file just fine, I can open it and read it. If the file already exists however, it does not populate this.user_id.. just equals null. So my issue is specifically with reading the file.
File reading in your code works asynchronously - meaning that your code completes (including the alert() call which will show that this.user_id is null), then at some point the callback from NetUtil.asyncFetch() gets called with the data. Until that happens this.user_id won't be set of course. If you move alert(this.user_id) into the callback function it should show the correct value.
Note that it is highly recommended to keep file I/O operations asynchronous because they might take significant time depending on the current state of the file system. But you have to structure your code in such a way that it doesn't assume that file operations happen immediately.

Resources