How to add an Amazon S3 data source via REST API? - dremio

I have CSV files in a directory of an S3 bucket. I would like to use all of the files as a single table in Dremio, I think this is possible as long as each file has the same header/columns as the others.
Do I need to first add an Amazon S3 data source using the UI or can I somehow add one as a Source using the Catalog API? (I'd prefer the latter.) The REST API documentation doesn't provide a clear example of how to do this (or I just didn't get it), and I have been unable to find how to get the "New Amazon S3 Source" configuration screen as shown in the documentation, perhaps because I've not logged in as an administrator?
For example, let's say I have a dataset split over two CSV files in an S3 bucket named examplebucket within a directory named datadir:
s3://examplebucket/datadir/part_0.csv
s3://examplebucket/datadir/part_1.csv
Do I somehow set the S3 bucket/path s3://examplebucket/datadir as a data source and then promote each of the files contained therein (part_0.csv and part_1.csv) as a Dataset? Is that sufficient to allow all the files to be used as a single table?

It turns out that this is only possible for admin users, normal users can't add a source. To do what I have proposed above you put the files into an S3 bucket which has already been configured as a Dremio source by an admin user. Then you promote the files or folder as a data source using the Dremio Catalog API.

Related

Get the internal URI storage location (gs://) after uploading data [duplicate]

When I attempt load data into BigQuery from Google Cloud Storage it asks for the Google Cloud Storage URI (gs://). I have reviewed all of your online support as well as stackoverflow and cannot find a way to identify the URL for my uploaded data via the browser based Google Developers Console. The only way I see to find the URL is via gsutil and I have not been able to get gsutil to work on my machine.
Is there a way to determine the URL via the browser based Google Developers Console?
The path should be gs://<bucket_name>/<file_path_inside_bucket>.
To answer this question more information is needed. Did you already load your data into GCS?
If not, the easiest would be to go to the project console, click on project, and Storage -> Cloud Storage -> Storage browser.
You can create buckets there and upload files to the bucket.
Then the files will be found at gs://<bucket_name>/<file_path_inside_bucket> as #nmore says.
Couldn't find a direct way to get the url. But found an indirect way and below are the steps:
Go to GCS
Go into the folder in which the file has been uploaded
Click on the three dots at the right end of your file's row
Click rename
Click on gsutil equivalent link
Copy the url alone
Follow the following steps :
1. Go to GCS
2. Go into the folder in which the file has been uploaded
3. On the top you can see overview option
4. You can see there will be Link URL and link for GSUtil
Retrieving the Google Cloud Storage URI
To create an external table using a Google Cloud Storage data source, you must provide the Cloud Storage URI.
The Cloud Storage URI comprises your bucket name and your object (filename). For example, if the Cloud Storage bucket is named mybucket and the data file is named myfile.csv, the bucket URI would be gs://mybucket/myfile.csv. If your data is separated into multiple files you can use a wildcard in the URI. For more information, see Cloud Storage Request URIs.
BigQuery does not support source URIs that include multiple consecutive slashes after the initial double slash. Cloud Storage object names can contain multiple consecutive slash ("/") characters. However, BigQuery converts multiple consecutives slashes into a single slash. For example, the following source URI, though valid in Cloud Storage, does not work in BigQuery: gs://[BUCKET]/my//object//name.
To retrieve the Cloud Storage URI:
Open the Cloud Storage web UI.
CLOUD STORAGE WEB UI
Browse to the location of the object (file) that contains the source data.
At the top of the Cloud Storage web UI, note the path to the object. To compose the URI, replace gs://[BUCKET]/[FILE] with the appropriate path, for example, gs://mybucket/myfile.json. [BUCKET] is the Cloud Storage bucket name and [FILE] is the name of the object (file) containing the data.
If you need help on subdirectories, check this out on https://cloud.google.com/storage/docs/gsutil/addlhelp/HowSubdirectoriesWork
And https://cloud.google.com/storage/images/gsutil-subdirectories-thumb.png, if you need to see how gsutil provides a hierarchical view of objects in a bucket.

Move Images From Parse To S3 AWS

I need help moving the images I have from Parse to S3 on AWS. I have viewed numerous supposed guides and GitHub projects, but everything stops short at giving you all the information. One even says, you need GCS bucket set up, but gives no details on how to set up one. Just someone please help me with this. I have the S3 File Adapter in my index.js all set up for the app, but none of the images are there, they are still hosted in parse.
If you are referring to old images that where hosted with parse.com that you want to move across to your own environment then it can be done with the utility tool.
Get all files across all classess in a Parse database. Print file URLs
to console OR transfer to S3, GCS, or filesystem. Rename files so that
Parse Server no longer detects that they are hosted by Parse. Update
MongoDB with new file names.
https://github.com/parse-server-modules/parse-files-utils
Moving forward if you have setup your S3 bucket correctly all new images from your app will be stored there.
https://github.com/ParsePlatform/parse-server/wiki/Configuring-File-Adapters

Storing assets in cloud and read them securely

I am developing an iOS app that uses a large amount of images that are needed for animations for short videos. I want to save my application assets as static files in cloud and once they are needed download them using secure API call (either JSON, XML or any other alternative for that matter).
What is the best option for that. I have checked Parse, Dropbox, iCloud, Google Drive, but I am puzzled since I see only instructions for dynamic data that lets users access content they have created and not static assets.
What would be best option for that?
If you just want an easy way to serve static files I would take a look at Amazon S3. You can just upload files through the online console and then get the public URL to those files to use in your app. You can also use the S3 API to upload files through your web service or iOS app.
Hope this helps!
I'd go for Parse (basically because it is fast to learn and develop), you can create a table with the images and change the writing permissions if you are afraid somebody could modify the table.
Another option that you can check it's the special Config table so you can upload custom files (zip files i.e.) and download them in demand.

How to reference and update a file on S3 from Rails 4

I have a Rails 4 application that needs to use a number of excel files, representing rosters, (20 or so, grouped by their own individual committee) that have to be read in and editable by the User. Pre-deploy I had the system working perfectly where these files would live in public/rosters and could be referenced and edited by any authenticated user, unfortunately when I deployed to Heroku I could no longer do this.
I have been using an S3 bucket to host the other files necessary for this and other related apps, and it's been working wonderfully, for what I've been using it for; so I decided to try it as a solution to this problem. Unfortunately it would appear as if I could only access the files the way I had been by making them publicly accessible, which is not something that I want to do.
So my question is this: what would be the best way to reference these files (using my access_key_id and secret_access_key to authenticate ideally) and allow a User to push changes that will overwrite the file on the S3 bucket.
You have to use aws-sdk-ruby to write file to S3 which works using access_key_id and secret_access_key. Check this documentation. Hope this helps. Thanks!

Rails synchronise with S3

Does anyone know of a way to synchronise an S3 bucket with rails?
Basically what I would like is a tool that recognised when a file on s3 was added, renamed, or moved (modified) and be able to relay data about the changes to my web application so that I can update my database with the new changes.
If not a tool to do this directly, what would be the best thing to use to interface with S3?
S3 being an Object store, operations such as rename and modify are not directly possible. For example, rename is combination of
delete A + copy B (where A is the old name and B is the new name)
Modify is also similar. It is a combination of
delete A + copy A (where A is the old and new name. However s3 can preserve older versions of A)
You can enable logging on your s3 bucket. Then download the logs periodically, parse it for the items you are interested in, and update your local metadata.
See Documentation: S3 Access Logging

Resources