Argo: Variable number of output artifacts - docker

In one of my Argo workflow steps a Docker container splits up a large file into a number of smaller files.
The tutorials show how one can save a small and pre-determined number of outputs (e.g., 2 or 3) as artifacts in an S3 bucket by going through each output one at a time.
In my use case, I do not know in advance how many smaller files will be created; it can be upwards of hundreds. The large number of output files makes it hard, if not impossible to follow the tutorials and specify each one by one even if I know how many smaller files are create in advance.
Is there a way to save all the outputs to an S3 bucket?

This sounds like standard output artifacts. You can put all your files in a single directory, and then have the directory be the output artifacts.
Here are some examples to help you:
https://argoproj.github.io/argo-workflows/examples/#artifacts

Related

In Babylonjs is it best to put multiple assets in a .glb or just one?

I have several building models that I am loading into a scene that I am going to clone into a town. I want to clone the buildings individually so I can mix them up so it isn't the same 4 buildings next to each other over and over.
If I were doing a 2D app I would probably put a bunch of the assets into a single file as a sprite sheet. Is this the approach to use in Babylonjs with glb files or is it better to do a single asset per file? Looking at the API it seems more suited to one asset per file but I am still very new to it so I may be missing something.
I am also not attached to using glb if there is a better approach.

AWS Sagemaker BlazingText Multiple Training Files

Trying to find out if you can use multiple files for your dataset in Amazon Sagemaker BlazingText.
I am trying to use it in Text Classification mode.
It appears that it's not possible, certainly not in File mode, but wondering about whether Pipe mode supports it. I don't want to have all my training data in 1 file, because if it's generated by an EMR cluster I would need to combine it afterwards which is clunky.
Thanks!
You are right in that File mode doesn't support multiple files (https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext.html).
Pipe mode would in theory work but there are a few caveats:
The format expected is Augmented Manifest (https://docs.aws.amazon.com/sagemaker/latest/dg/augmented-manifest.html). This is essentially Json lines, for instance:
{"source":"linux ready for prime time ", "label":1}
{"source":"bowled by the slower one ", "label":2}
and then you have to pass the _ AttributeNames_ argument to the createTrainingJob SageMaker API (it is all explained in the link above).
With Augmented Manifest, currently only one label is supported.
In order to use Pipe mode, you would need to modify your EMR job to generate Augmented Manifest format, and you could only use one label per sentece.
At this stage, concatenating the files generated by your EMR job into a single file seems like the best option.

How to load images faster from Azure Blob?

I've been trying to upload some images to azure blob and then using ImageReader in Azure ML studio to read them from the blob. The problem is that ImageReader takes a lot of time to load images and I need it in real time.
I also tried making a csv of 4 images (four rows) containing 800x600 pixels as columns (500,000 cols. approx) and tried simple Reader. Reader took 31 mins to read the file from the blob.
I want to know the alternate methods of loading and reading images in Azure ML studio. If anyone know any other method or can share a helpful and relevant link.
Please share if i can speed up ImageReader by any means.
Thanks
Look at the Azure CDN http://azure.microsoft.com/en-us/services/cdn/ , after which the blobs will get an alternative url. My blob downloads became about 4 times faster after switching.

Matlab Parse Binary File

I am looking to speed up the reading of a data file which has been converted from binary (it is my understanding that "binary" can mean a lot of different things - I do not know what type of binary file I have, just that it's a binary file) to plaintext. I looked into reading files quickly awhile ago, and was informed that reading/parsing a binary file is faster than text. So, I would like to parse/read the binary file (that was converted to plaintext) in an effort to speed up the program.
I'm using Matlab for this project (I have a Matlab "program" that needs the data in the file). I guess I need some information on the different "types" of binary, but I really want information on how to read/parse said binary file (I know what I'm looking for in plaintext, so I imagine I'll need to convert that to binary, search the file, then pull the result out into plaintext). The file is a logfile, if that helps in any way.
Thanks.
There are several issues in what you are asking -- however, you need to know the format of the file you are reading. If you can say "At position xx, I can expect to find data yy", that's what you need to know. In you question/comments you talk about searching for strings. You can also do it (much like a text file) "when I find xxxx in the file, give me the following data up to nth character, or up to the next yyyy".
You want to look at the documentation for fread. In the documentation there are snippets of code that will get you started, but as I (and others) said you need to know the format of your binary files. You can use a hex editor to ascertain some information if you are desperate, but what should be quicker is the documentation for the program that outputs these files.
Regarding different "binary files", well, there is least significant byte first or LSB last. You really don't need to know about that for this work. There are also other platform-dependent issues which I am almost certain you don't need to know about (unless you are moving the binary files from Mac to PC to unix machines). If you read to almost the bottom of the fread documentation, there is a section entitled "Reading Files Created on Other Systems" which talks about the issues and how to deal with them.
Another comment that I have to make, you say that "reading/parsing a binary file is faster than text". This is not true (or even if it is, odds are you won't notice the performance gain). In terms of development time, however, reading/parsing a textfile will save you huge amounts of time.
The simple way to store data in a binary file is to use the 'save' command.
If you load from a saved variable it should be significantly faster than if you load from a text file.

How to programmatically manipulate an EPS file

I am looking for libraries that would help in programatically manipulating EPS (Encapsulated PostScript) files. Basically, what I want to do is following:
Show / Hide preexisting layers in the EPS file (toggle them on and off)
Fill (color) named shapes in the EPS file
Retrieve coordinates of named points in the EPS file
draw shapes on a new layer in the EPS file
on a server, without user interaction (scripting Adobe Illustrator won't work)
I am aware of how the EPS file format is based on the PostScript language and must therefore be interpreted - for creating simple drawings from scratch this is rather easy. But for actually modifying existing files, I guess you need a library that interprets the file and provides some kind of "DOM" for manipulation.
Can I even have named shapes and points inside an EPS file?
EDIT: Assuming I had the layers saved in separate EPS files. Or better still: Just the "data" part of the layers. Could I then concatenate this stuff to create a new EPS file? And append drawing commands? Fill existing named objects?
This is extremely difficult and here is why: a PS file is a program whose execution results in pixels put on a page. Instruction in a PS program are at the level of "draw a line using the current pen and color" or "rotate the coordinate system by 90 degrees" but there is no notion of layers or complex objects like you would see them in a vector drawing application.
There are very few conventions in the structure of PS files to allow external programs to modify them: pages are marked separately, font resources, and media dimensions are spelled out in special comments. This is especially true for Embedded Postscript (EPS) which must follow these guidelines because they are meant to be read by applications but not for general PS as it is sent to a printer. A PS program is a much lower level of abstraction than what you need and there is now way to reconstruct it for arbitrary PS code. In principle could a PS file result in different output every time it is printed because it may query its execution environment and branch based on random decisions.
Applications like Adobe Illustrator emit PS code that follow a rigid structure. There is a chance that these could be parsed and manipulated without interpreting the code. I would stil suggest to rethink the current architecture: you are at a too low level of abstraction for what you need.
PDF is not manipulable since it is not possible to change any existing parts of a pdf (in general) only add stuff. EPS is the same as PostScript except that it has a boundary header.
Problem with doing what you want is that PS is a programming language whose output (mostly) is some kind of image. So the question could be stated as "how can I draw shapes on a new layer in the Java file". You probably need to generate the complete PS on the fly, or use another image format altogether.
I am not aware of any available libraries for this but you may be able to build something to meet your needs based on epstool from Ghostscript/GSview
I think your best bet is to generate a PDF from the EPS and then manipulate the PDF. Then back to EPS. PDF is much more "manipulable" than is EPS.

Resources