I am trying to upload custom formatted data files from the UK climate site e.g.this file. There are 5 lines of metadata and 1 header line.
1) Can CKAN preprocess the file according to a format I give it so that only data are picked up. Possibly saving the metadata in the description?
I would prefer a frontend option because I want users to be able to do this themselves.
2) Is it possible to have a dataset uploaded automatically once the url is entered. I currently have to go to the manage -> datastore page and click on upload to datastore to have the data populated.
3) Can the dataset be updated at a regular interval?
Thanks
Not currently. Doing ETL on incoming data is something that is discussed a lot recently, so it may happen soon.
You shouldn't have to manually trigger a load into datastore. Is this when creating a new resource, or if you're editing an existing resource? When editing a resource I believe it is only triggered if the URL changes.
You can use https://github.com/ckan/ckanext-harvest to have data pulled into CKAN on a regular schedule - there are harvesters for various different stores, so it depends on where it is updated from.
Related
We are using azure function to process the documents uploaded on azure storage blob. We need to send an email after successful uploading of all documents on blob(i.e. 50 file at one time) as uploading status.
How we can get the status that all files are uploaded successfully on our blob?
Azure Functions has a neat extension that allows you to do exactly that, Azure Durable Functions. What you're looking for here is the monitor pattern, polling until certain conditions are met. Here's an example that checks the weather status and then send a SMS using Twilio https://learn.microsoft.com/en-gb/azure/azure-functions/durable/durable-functions-monitor.
The code is up on Github at https://github.com/Azure/azure-functions-durable-extension/tree/master/samples/csx - checkout the examples starting with E3.
I don't believe there is a built-in feature in Azure that would provide you the status programmatically or raise an event. However, a possible solution would be as follows:
Before uploading all the 50 files as a batch, create a JSON file
which contains the names of all the files that would be uploaded.
Let's call this JSON file as Batch List.
Upload the Batch List file first and then upload all your files that
you need to upload.
Through a polling process, determine if all the files in the Batch
List exist in the Blob storage. If not then ignore until the next
time you are able to do so. Once you determine that all the files in
the Batch List exist in the blob storage then send the email as per
your requirement. Delete the Batch List file.
This is a basic concept. It obviously can be more sophisticated but I hope you get the point.
More details would be required before a proper solution/recommendation can be provided.
How are you processing these files?
Are you reacting to an event as outlined below?
Are you processing these files off a queue which is why you don't have visibility to all the files?
You can use one of the following approaches depending on your requirements:
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-event-grid
The Blob storage trigger starts a function when a new or updated blob is detected. The blob contents are provided as input to the function.
The Event Grid trigger has built-in support for blob events and can also be used to start a function when a new or updated blob is detected. For an example, see the Image resize with Event Grid tutorial.
Use Event Grid instead of the Blob storage trigger for the following scenarios:
Blob storage accounts
High scale
Minimizing latency
I have a directory that contains a csv file and avatar images.
The contents of csv file are as
Id Name Avatar Dept School
1 Mark 01019.jpg Market None
2 John 21122.jpg Business None
3 Sam 33311.jpg IT None
....
....
50 James 9823.jpg IT USA
The avatar images are placed in the same folder of csv file.
What I want is that when a user uploads csv file, then the info in file is converted into business objects, say Person. I can upload and parse the csv to get Id, Name, Dept, School, but ofcourse cant make it upload avatar images (in the csv file) to server, in same web request.
What are the possible ways to achieve this? Assume that I want to avoid zipping all images+csv in a single .zip file and then upload it on server.
Thanks.
I just love when people end their question by excluding the only possible solution.
The server (where your web application is running) has no direct access to the client (where the files are). The only thing the server can work with is what the client chooses to give it. So, your option is to have the user upload each image file invidually, along with the CSV, or to zip it all up, so they can send everything in a single upload. That's it. Period. At least with a standard web page.
You can of course create a Java applet or a Flash application that the user would authorize to access their filesystem to retrieve the necessary files. Essentially, the process is still the same, it's just the Java/Flash app would automatically do the file uploads instead of requiring the user to manually do them. However, both Java (on the web) and Flash are all but dead technologies at this point, so by using either of those, you're creating a dependency on something that is constantly exploited and not guaranteed to continue to receive security patches for the life of your application. Flash, in particular, has already been end-of-lifed, so Adobe will abandon support entirely within the next few years, max.
Long and short, tell your user to zip it up and upload a zip file.
I need some help with a CouchDB iOS project.
I'm using Apache CouchDB Server and the couchbase-lite iOS Framework.
On my CouchDB I have a template document.
- CouchDB Server
- database
- template
- document 1
- document 2
- ...
My goal is to only synchronise my iPad with this template document to get the latest data which my application needs.
But when I enter some data on my iPad, I want that this data should be pushed only to couchBase Server.
How can I "tell" my application to synchronise only one file and not the entire database with my server and at the end how can I "tell" my application to only push the data that is input from user side ?
More importantly, Do I need two databases on my server? One for the template and a second one for user input data?
If YES, then I just need to know how I can only push my data.
Guidance needed. Thanks.
This is how I solve this:
I tend to add a 'last update' date to all my documents, and store this in a format that means they'll be sorted in time order (epoch or yyymmddhhmmss) both do.
Create a view that uses the update time as a date.
On your client, store the time since you last updated.
When you update, access the view with a startkey parameter set to the last update date.
You can then either use 'include-docs=true' to get the documents as you query the view.
I tend to use 'include-docs=false' though as it means when a lot of documents have been updated I transfer less data in a single query. I then just directly access each document id that the view returns.
In my application, I have a textarea input where users can type a note.
When they click Save, there is an AJAX call to Web Api that saves the note to the database.
I would like for users to be able to attach multiple files to this note (Gmail style) before saving the Note. It would be nice if the upload could start as soon as attached, before saving the note.
What is the best strategy for this?
P.S. I can't use jQuery fineuploader plugin or anything like that because I need to give the files unique names on the server before uploading them to Azure.
Is what I'm trying to do possible, or do I have to make the whole 'Note' a normal form post instead of an API call?
Thanks!
This approach is file-based, but you can apply the same logic to Azure Blob Storage containers if you wish.
What I normally do is give the user a unique GUID when they GET the AddNote page. I create a folder called:
C:\TemporaryUploads\UNIQUE-USER-GUID\
Then any files the user uploads at this stage get assigned to this folder:
C:\TemporaryUploads\UNIQUE-USER-GUID\file1.txt
C:\TemporaryUploads\UNIQUE-USER-GUID\file2.txt
C:\TemporaryUploads\UNIQUE-USER-GUID\file3.txt
When the user does a POST and I have confirmed that all validation has passed, I simply copy the files to the completed folder, with the newly generated note ID:
C:\NodeUploads\Note-100001\file1.txt
Then delete the C:\TemporaryUploads\UNIQUE-USER-GUID folder
Cleaning Up
Now. That's all well and good for users who actually go ahead and save a note, but what about the ones who uploaded a file and closed the browser? There are two options at this stage:
Have a background service clean up these files on a scheduled basis. Daily, weekly, etc. This should be a job for Azure's Web Jobs
Clean up the old files via the web app each time a new note is saved. Not a great approach as you're doing File IO when there are potentially no files to delete
Building on RGraham's answer, here's another approach you could take:
Create a blob container for storing note attachments. Let's call it note-attachments.
When the user comes to the screen of creating a note, assign a GUID to the note.
When user uploads the file, you just prefix the file name with this note id. So if a user uploads a file say file1.txt, it gets saved into blob storage as note-attachments/{note id}/file1.txt.
Depending on your requirement, once you save the note, you may move this blob to another blob container or keep it here only. Since the blob has note id in its name, searching for attachments for a note is easy.
For uploading files, I would recommend doing it directly from the browser to blob storage making use of AJAX, CORS and Shared Access Signature. This way you will avoid data going through your servers. You may find these blog posts useful:
Revisiting Windows Azure Shared Access Signature
Windows Azure Storage and Cross-Origin Resource Sharing (CORS) – Lets Have Some Fun
i am currently trying to implement a xls file read write operation in an iOS application.
So basically the requirement is, there is a big xls file which is having many dropdown , data or empty space present in a particular server.So very first time when multiple users open the app , they have to download that xls file & based on xls file a form will be created on app & later user can perform read & write operation on that form(although network is not available). But Once network is available , all the users can sync it back to the server.
Now i have 2 options
Option 1:
Create a CSV file from xls sheet on server side & send it to user.So user will perform Read & Write operation and save all data in sqlite db & on network availability they sync it back to server.
Option 2:
Create a webservice which will be created by using that xls file & send the XML to device, so based on XML user will create form and do offline mode read write operation & on network availability app will create a new XML file and sync it back to server.
So between option 1 and 2 which one is better & why ?
Any webservice is available to do such operation ?
It depends on which type of data you are getting, how many columns your csv file has, if the no. of columns are less i.e 2-5, 1st option would be better.
But if you have data with many columns, you must use XML, and XML is very good for storing hierarchical data.