Using data movement sdk ,want to remove collection from docs in real time in marklogic? - sdk

Actually I am new to data movement SDK,I want to know how we can used data movement sdk to remove collection from docs which match's specific condition in real time in marklogic ?

Yes, DMSK can reprocess documents in the database including modifying the collections on the documents.
The most efficient way to change document collections on the server might be to take an approach similar to the out-of-the-box ApplyTransformListener (as summarized by
https://docs.marklogic.com/guide/java/data-movement#id_51555) but to execute a custom module instead of a transform.
Summarizing the main points:
Write an SJS (Server-Side JavaScript) module that declares a variable (using the JavaScript var statement) to receive the document URIs sent by the client and modifies the collections on those documents using a function such as
https://docs.marklogic.com/xdmp.documentSetCollections
Install the SJS module in the modules database as described here
https://docs.marklogic.com/guide/java/resourceservices#id_13008
Create a QueryBatcher to get the document URIs either from a query on the database or from a client iterator as described here:
https://docs.marklogic.com/guide/java/data-movement#id_46947
Supply a lambda function for the QueryBatcher.onUrisReady() method - see
https://docs.marklogic.com/javadoc/client/com/marklogic/client/datamovement/QueryBatcher.html#onUrisReady-com.marklogic.client.datamovement.QueryBatchListener-
In the lambda function, construct and execute a ServerEvaluationCall to the SJS module, assigning the variable to the URIs passed to the lambda function - see:
https://docs.marklogic.com/guide/java/resourceservices#id_84134
Be sure to register failure listeners using the QueryBatcher.onQueryFailure() ApplyTransformListener.onFailure​() methods to log error or otherwise respond to the unexpected.
Hoping that helps,

Related

Integromat Scenario: One-shot module after iterating through a loop

I have created a scenario where I iterate through multiple modules with an array of data. This works fine.
After this completes, I want to run a module once before the scenario completes.
How do I add a module that won't get called in the loop?
There are few ways to achieve this,
Use Router to Create a new Route that will be triggered after the
first route is complete
Trigger new Scenario via Webhooks after you are done with the
scenario
If you are working with array, then using Array Aggregator or other
Aggregators will allow you to first complete the iteration and then
trigger the module you want to use
I am not sure exactly what you want to do after the iteration is complete, but setting the scenarios as displayed in the screenshot below should help you get started on this,
Using Router
For this you can create a router, the upper hand of the router is always executed first, so the iterator and other operations will be done there. After which, the next hand/route will be executed which will be the module you want to trigger at last.
However, If you want to pass some values from the first hand/route to the last one then you will need to set a variable and fetch it on the second route. See details here : https://www.integromat.com/en/help/converger
Using Aggregator Module
You can either use Array, Text or Numeric Aggregator to aggregate all the iteration operations and then trigger the module that you want to use at last.
As far as my knowledge goes, there is no Integromat default modules that can be configured before the scenario ends. We can leverage the Integromat API in future that is currently in development to do so.
I found a filter to be the most easy way of doing this. Essentially chekcing if this bundle position is equal to the total number of bundles!
If you're interested in doing something on the last iteration only, you can use a filter to check if the current bundle is equal to the total number of bundles
last bundle filter
They won't let me paste pics sigh

Storing dask collection to files/CSV asynchronously

I'm implementing various kinds of data processing pipelines using dask.distributed. Usually the original data is read from S3 and in the end processed (large) collection would be written to CSV on S3 as well.
I can run the processing asynchonously and monitor progress, but I've noticed that all to_xxx() methods that store collections to file(s) seem to be synchronous calls. One downside of it is that the call blocks, potentially for a very long time. Second, I cannot easily construct a complete graph to be executed later.
Is there a way to run e.g. to_csv() asynchronously and get a future object instead of blocking?
PS: I'm pretty sure that I can implement async storage myself, e.g. by converting collection to delayed() and storing each partition. But it seems like a common case - unless I missed already existing feature it would be nice to have something like this included in the framework.
Most to_* functions have a compute=True keyword argument that can be replaced with compute=False. In these cases it will return a sequence of delayed values that you can then compute asynchronously
values = df.to_csv('s3://...', compute=False)
futures = client.compute(values)

Split datetime value received from external API in Rails app

I have a datetime value which comes from the API in this format: 2015-07-07T17:30:00+00:00. I simply want to split it up between the date and time values at this point. I am not using an Active Record model and I prefer not to use an sql database if I can.
The way I have set up the app means that the value is "stored" like this in my view: #search.dining_date_and_time
I have tried two approaches to solving this problem:
Manually based on this previous stackoverflow question from 2012: Using multiple input fields for one attribute - but the error I get is the attribute is "nil" even though I put a "try"
Using this gem, https://github.com/ccallebs/split_date_time which is a bit more recent and seems to be a more elegant solution, but after closely following the doc, I get this error, saying my Search model is not initalized and there is no method: undefined method dining_date' for #<Search not initialized>
This is when instead I put #search.dining_date in the view, which seems to be the equivalent of the doc's example (its not that clear). The doc also says the method will be automatically generated.
Do I need to alter my model so I receive the data from the API in another way? ie. not get the variable back as #search.dining_date_and_time from the Search model for any of this to work?
Do I need an Active Record model so that before_filter or before_save logic works - so i can (re)concatenate after splitting so the data is sent back to the API in a format it understands. Can I avoid this - it seems a bit of overkill to restructure the whole app and put in a full database just so I can split and join date/time as needed.
Happy to provide further details, code snippets if required.
As I am not using a conventional Rails DB like MySql Lite or Postgresql, I found that the best solution to the problem was by using this jQuery date Format plugin: https://github.com/phstc/jquery-dateFormat to split the date and time values for display when I get the data back from the API.
The Github docs were not too expansive, but once I put the simply put the library file in my Rails javascript assets folder, I just had to write a few lines of jQuery to get the result and format I wanted:
$(function() {
var rawDateTime = $('#searchDiningDateTime').html();
// console.log(rawDateTime);
var cleanDate = $.format.date(rawDateTime, "ddd, dd/MM/yyyy");
// console.log(cleanDate);
$('#searchDiningDateTime').html(cleanDate);
var cleanTime = $.format.date(rawDateTime, "HH:mm");
// console.log(cleanTime);
$('#searchTime').html(cleanTime);
});
Next challenge: rejoin the values on submit, so the API can read the data by sending/receiving a valid request/response. (The values can't be split like this when sent to the remote service).

Where is 's' object cached when using AppMeasurement in DTM

Omniture's basic page tracking function, s.t(), was not crafted for AJAX implementation. Unlike the onclick s.tl() function which has some gating instructions with s.linkTrackVars and s.linkTrackEvents, the s.t() function just perpetuates every cached property through to the next call and beyond.
I used to be able to use a ClearVars function to empty out all of the s object's attributes, but now that I am using AppMeasurement and letting DTM manage my implementation with the most updated version of that library—which I want to keep doing—I can't call the s object. I get the same "ReferenceError: s is not defined" that another person asked about here!.
I tried following Crayon Violent's instructions within that post, but I can't seem to find where DTM is stashing the cached values in between Adobe calls. This code:
window.s = new AppMeasurement();
lets me change/clear the attributes of s, but it's not the s I'm looking for. When I call the next AJAX s.t() function, all of the cached values are still there.
In my experience working with DTM and AA, there has been no end to bugs and caveats and workarounds with DTM's "native integration" of AA. This is why I have more or less decided that the best thing I can do is to either manage the lib myself or else treat AA as a 3rd party script (100% implement it through rules, just ignore that it's available as a tool).
As mentioned in my answer you linked, that line of code only works to expose the AA object in the window namespace if you are managing the library yourself. When you configure DTM to manage the library, it will instantiate AA object itself, and it will be buried within its own code (Honestly, I don't know why DTM did this, considering AA puts a number of other variables in the global namespace that DTM does nothing about).
AFAIK there is no documented way to reference it, but one thing I have found that seems to work for me - which as a disclaimer to cover my own arse I do NOT officially endorse: use at your own risk - is to use the following to get a reference of it:
var s = _satellite.getToolsByType('sc')[0].getS();
This uses getToolsByType method to get an array of the SiteCatalyst (Adobe Analytics) objects setup as tools in DTM. It does this by looping through _satellite.tools and comparing _satellite.tools[n].settings.engine to what you passed to getToolsByType.
Then I use [0] to get the first one in the array, under the assumption that there's only one configured (most people only do one). Then the getS() object pulls together the s object based on the settings in DTM. So from there, you can do things with it, including making use of s.clearVars()

Access specific extension object data from page code

I'm trying to build an addon that will observe and collect XHR and image responses received on a page and make them available to page script (on that page) for further inspection.
In my 'http-on-examine-response' observer code, I push URLs I'm interested in, into an array for their associated window, into an object, something like this -
myWindowId = resp.outerWindowID+'-'+resp.currentInnerWindowID;
storedResponses[myWindowId].push(subject.URI.spec);
(I thought that approach may be better than using tab references to identify unique source windows)
The relevant arrays are updated automatically as any page makes a request.
I'd like to be able to query the relevant array from page script or a bookmarklet at any time.
Should I set up port.on..., or postMessage() communication between the page/bookmarklet, content script and extension, or use a pageMod to write the appropriate array directly to an unsafeWindow global object on the relevant page?
I couldn't figure out how to make a pageMod write a specific array to a specific page as soon as the new responses were observed.
Full source is here -
https://builder.addons.mozilla.org/addon/1064905/latest/
I think it's all working, apart from getting the data back on to the page.
With help from Wladimir Palant, I found that XPCNativeWrapper.unwrap() is defined and does what I needed from the SDK module context. It allowed me to set variables directly in a window from my addon.
More info about wrappers here -
https://developer.mozilla.org/en/XPCNativeWrapper

Resources