Kubeflow: How to supply a file as pipeline input (param) - kubeflow

From what I understand, a Kubeflow python only takes string parameters, but in case of the pipeline I need, the user should be able to supply a file as input. How can I do that?
Best

The best way is to upload the file to some remote storage (HTTP Web server, Google Cloud Storage, Amazon S3, Git, etc) and then "import" the data into the pipeline using a component like "Download from GCS".

Related

Get the internal URI storage location (gs://) after uploading data [duplicate]

When I attempt load data into BigQuery from Google Cloud Storage it asks for the Google Cloud Storage URI (gs://). I have reviewed all of your online support as well as stackoverflow and cannot find a way to identify the URL for my uploaded data via the browser based Google Developers Console. The only way I see to find the URL is via gsutil and I have not been able to get gsutil to work on my machine.
Is there a way to determine the URL via the browser based Google Developers Console?
The path should be gs://<bucket_name>/<file_path_inside_bucket>.
To answer this question more information is needed. Did you already load your data into GCS?
If not, the easiest would be to go to the project console, click on project, and Storage -> Cloud Storage -> Storage browser.
You can create buckets there and upload files to the bucket.
Then the files will be found at gs://<bucket_name>/<file_path_inside_bucket> as #nmore says.
Couldn't find a direct way to get the url. But found an indirect way and below are the steps:
Go to GCS
Go into the folder in which the file has been uploaded
Click on the three dots at the right end of your file's row
Click rename
Click on gsutil equivalent link
Copy the url alone
Follow the following steps :
1. Go to GCS
2. Go into the folder in which the file has been uploaded
3. On the top you can see overview option
4. You can see there will be Link URL and link for GSUtil
Retrieving the Google Cloud Storage URI
To create an external table using a Google Cloud Storage data source, you must provide the Cloud Storage URI.
The Cloud Storage URI comprises your bucket name and your object (filename). For example, if the Cloud Storage bucket is named mybucket and the data file is named myfile.csv, the bucket URI would be gs://mybucket/myfile.csv. If your data is separated into multiple files you can use a wildcard in the URI. For more information, see Cloud Storage Request URIs.
BigQuery does not support source URIs that include multiple consecutive slashes after the initial double slash. Cloud Storage object names can contain multiple consecutive slash ("/") characters. However, BigQuery converts multiple consecutives slashes into a single slash. For example, the following source URI, though valid in Cloud Storage, does not work in BigQuery: gs://[BUCKET]/my//object//name.
To retrieve the Cloud Storage URI:
Open the Cloud Storage web UI.
CLOUD STORAGE WEB UI
Browse to the location of the object (file) that contains the source data.
At the top of the Cloud Storage web UI, note the path to the object. To compose the URI, replace gs://[BUCKET]/[FILE] with the appropriate path, for example, gs://mybucket/myfile.json. [BUCKET] is the Cloud Storage bucket name and [FILE] is the name of the object (file) containing the data.
If you need help on subdirectories, check this out on https://cloud.google.com/storage/docs/gsutil/addlhelp/HowSubdirectoriesWork
And https://cloud.google.com/storage/images/gsutil-subdirectories-thumb.png, if you need to see how gsutil provides a hierarchical view of objects in a bucket.

Does SavedModelBundle loader support GCS path as export directory

Currently I am using a saved_model file stored on my local disk to read an inference graph and use it in servers. Unfortunately giving a GCS path doesn't work for SavedModelBundle.load api.
Tried providing GCS path for the file but did not work.
Is this even supported, if not how can i achieve this using the SavedModelBundle api because i have some production servers running on google cloud that i want to serve some tensor-flow graphs.
A recent commit inadvertently broke the ability to load files from GCS. This has been fixed and is available in github.

Read/Write to local without using DirectPipelineRunner in Google Cloud Dataflow

Is it possible to read/write data on local without using DirectPipelineRunner?
Suppose I create a dataflow template on cloud and I want it to read some local data. Is this possible?
Thanks..
You will want to stage your input files to Google Cloud Storage first and read from there. Your code will look something like this:
p.apply(TextIO.read().from(gs://bucket/folder)
where gs://bucket/folder is the path to your folder in GCS, and assuming you are using the latest Beam release (2.0.0). Afterwards, you can download the output from GCS to your local computer.

How to protect sensitive data on CircleCi?

Where to store sencitive information like signing configs, api keys etc on Cicrle without adding them to git.
Normaly I dont upload such files to git repositories, but I dont see how can I use Circle without them
You would use private environment variables. This would be loaded into CircleCI via the webapp or API, and then injected into a running build. This way, sensitive information won't have to be stored in your repository.
Here's a doc on Environment Variables in CircleCI 1.0 and CircleCI 2.0.
Use environment variable. If you want to use file, encode it as base 64 string https://support.circleci.com/hc/en-us/articles/360003540393-How-to-insert-files-as-environment-variables-with-Base64
If you need to insert sensitive text-based documents or even small binary files into your project in secret it is possible to insert them as an environment variable by leveraging base64 encoding.
It seems CircleCI does not support sensitive file yet.

Upload file directly to S3 without need to use forms in Rails

For my Rails application, I download a bunch of files from a remote URL to my application. I would like to directly upload them to Amazon S3, without needing a form to do the upload, since I will temporarily cache the file I downloaded on the EC2 instance.
I would also like to retain the links to the files I uploaded so I can download them later.
I am essentially reposting the files I downloaded.
I looked around, but most of the solution seem to involve form uploading to S3 with a user.
Is there s direct upload solution?
You can upload directly to S3 using the AWS SDK for Ruby. The easiest way is:
require 'aws-sdk'
s3 = Aws::S3::Resource.new(region:'us-west-2')
obj = s3.bucket('bucket-name').object('key')
obj.upload_file('/path/to/source/file')
Or you can find a couple other options here.
You can simply use EvaporateJS to achieve this. You can also take advantage of sending ajax request to update file name to the database after each file upload. Though javascript exposes few details your bucket is not vulnerable to hack as S3 service provide a bucket policy.
Just set the <AllowedOrigin>*</AllowedOrigin> to <AllowedOrigin>specificwebsite.com</AllowedOrigin> in production mode.

Resources