Machine-readability: Guidelines to follow such that data can be previewed nicely on CKAN - preview

What are the guidelines to follow such that data can be previewed nicely on CKAN Data Preview tool? I am working on CKAN and have been uploading data or linking it to external websites. Some could be previewed nicely, some not. I have been researching online about machine-readability and could not find any resources pertaining to CKAN that states the correct way to structure data such that it can be previewed nicely on CKAN. I hope to gather responses from all of you on the do's and don'ts so that it will come in useful to CKAN publishers and developers in future.
For example, data has to be in a tabular format with labelled rows and columns. Data has to be stored on the first tab of the spreadsheet as the other tabs cannot be previewed. Spreadsheet cannot contain formulas or macros. Data has to be stored in the correct file format (refer to another topic of mine: Which file formats can be previewed on CKAN Data Preview tool?)
Thanks!

Since CKAN is an open source data management system, it does not have a specific guidelines on the machine readability of data. Instead, you might want to take a look at the current standard for data openness and machine readability right here: http://5stardata.info
UK's implementation of CKAN also includes a set of plugins which help to rate the openness of the data based on the 5 star open data scheme right here: https://github.com/ckan/ckanext-qa

Check Data Pusher Logs - When you host files in the CKAN Data Store - the tool that loads the data in provides logs - these will reveal problems with the format of data.
Store Data Locally - Where possible store the data locally - because data stored elsewhere has to go through the proxy process (https://github.com/okfn/dataproxy) which is slower and is of course subject to the external site maintaining availability.
Consider File Size and Connectivity - Keep the file size small enough for your installation and connectivity that it doesn't time out when loading into the CKAN Data Explorer. If the file is externally hosted and is large and the access to the file is slow ( poor connectivity or too much load) you will end up with timeouts since the proxy must read the entire file before it is presented for preview. Again hosting data locally should mean better control over the load on compute resource and ensure that the data explorer works consistently.
Use Open File Formats - If you are using CKAN to publish open data - then the community generally holds that is is best to publish data in open formats (e.g. CSV, TXT) rather than proprietary ones (eg. XLS). Beyond increasing access to data to all users - and reducing the chance that the data is not properly structured for preview - this has other advantages. For example, it is harder to accidentally publish information that you didn't mean to.
Validate Your Data -Use tools like CSVKIT to check that your data is in good shape.

The best way to get good previewing experiences is to start using the DataStore. When viewing remote data CKAN has to use the DataProxy to do its best to guess data types and convert the data to a form it can preview. If you put the data into the DataStore that isn't necessary as the data will already be in a good structure and types will have been set (e.g. you'll know this column is a date rather than a number).

Related

Difference between three firebase storage download methods

I couldn't find resources discussing the difference between the three download methods in the firebase storage documentation and pros/cons of each. I would like some clarification about the firebase storage documentation.
My App
Displays 100 images ranging from 10 KB-500 KB in size on a table view
Will be used in a location where internet connection and/or phone service could be very weak
Could be used by many users
3 methods for downloading from Firebase storage
Download to NSData in memory
This is the easiest way to quickly download a file, but it must load entire contents of your file into memory. If you request a file larger than your app's available memory, your app will crash. To protect against memory issues, make sure to set the max size to something you know your app can handle, or use another download method.
Question: I tried this method to display 100 images that were 10KB-500KB in size on my table view cells. Although my app didn't crash, as I scrolled through my table, my memory usage increased to 268 mb. Would this method not be recommended for displaying a lot of images?
Download to an NSURL representing a file on device
The writeToFile:completion: method downloads a file directly to a local device. Use this if your users want to have access to the file while offline or to share in a different app.
Question: Does that mean all images from firebase storage will be downloaded on user's phone? Does that mean that the app will be taking up a large percentage of the available storage on the phone?
Generate an NSURL representing the file online
If you already have download infrastructure based around URLs, or just want a URL to share, you can get the download URL for a file by calling the downloadURLWithCompletion: method on a storage reference.
Question: Does this method require a strong internet connection and/or phone service connection to work?
Generally, your memory usage should not be affected by the method of retrieval. As long as you're displaying the 100 images, their data will be stored in the memory and should have the same size if they're identically formatted/compressed.
Either way you go with, I suggest you implement pagination (for your convenience, this question's answer might serve as a good implementation reference/guide) to possibly decrease the memory and network usage.
Now, down to comparing the methods:
Method 1
...but it must load entire contents of your file into memory.
This line might throw some people off thinking it's a
memory-inefficient solution, when all it really means is that you
cannot retrieve parts of the data, you can only download the entire
file. In the case of storing images, you probably would want that for
the data to make sense.
If your application needs to download the images every time the users
access it (i.e if your images are regularly updated), then this
method will probably suit you best. The images will be downloaded
every time the application starts, then they'll get discarded when
you kill it.
You stated that a part of your user base might have a weak internet
connection and so the next method might be more efficient and
user-friendly
Method 2
First off, the answers to your questions:
Yes. The images downloaded using this method will be stored on the users' devices.
The images should take up about the same size they're taking on Firebase storage.
Secondly, if you plan to use this method, then I suggest you store a
timestamp (or any sort of marker) in your database for when the last
change to the images occurred. Then, every time the app opens up, do
the following flow:
If no images are downloaded -> download images and store the database timestamp locally
If the local timestamp does not equal the timestamp on the database -> download images and store the new timestamp locally
Else -> use the images you already have, they should be identical to the ones in Firebase storage
That would be the best way to go if your network usage priority is
higher than that of the local storage.
And finally...
Method 3 (not really)
This is not a data download method, this simply generates a
download URL given a reference to the child. You can then use that
URL to download the data in your app or elsewhere as long as the used
app or API is authorized to access your Firebase storage.
Update:
The URL is generated from a Firebase reference (FIRDatabase.database().reference().child("exampleReference")) and would look like this: (Note: this is a fake link that will not actually work, just used for illustration purpose)
https://firebasestorage.googleapis.com/v0/b/projectName.appspot.com/o/somePathHere%2FchildName%2FsomeOtherChildName%2FimageName.jpg?alt=media&token=1a8f83a7-95xf-4d3s-nf9b-99a274927bcb
If you simply try to access that link you generate through any regular web-browser (assuming you don't have any Firebase rule that conflicts with that in your project), you can directly download that image from anywhere, not just through your app.
So in conclusion, this "Method" does not download data from Firebase storage, it just returns a download URL for your data in case you want a direct link.

How to cache images in Meteor?

I'm building a mobile app using Meteor. To allow for offline usage of the app, I want the app to be able to download a large-ish json file while online, then access the data in the json file, written to MongoDB, while offline.
This works fine. However, in the downloaded json file, there are plenty of references to online images that won't display in the app once the app is offline.
So, I want to be able to download (a selection of) the images referenced in the json file to the app, so that the app can access them even when offline.
(Downloading images could happen in the background for as long as a connection is available.)
There's an implementation of imgCache.js available on Atmosphere, which fails to initialize for me.
I suppose it's theoretically possible to individually load each image to a canvas, save the canvas content to MongoDB, then load the content when needed. Info on some of this is here. But, this feels rather convoluted and, if really feasible, I would expect someone to have done this before with success.
How can I do achieve caching of images for offline use in Meteor?
So, you've probably already read this article about application cache.
If the images are static, you can just include them in the manifest. Be sure you understand the manifest and cache expirations (see the article).
If the images are dynamic, you'll find some techniques to store images in local storage
If that's the case, this may be what you want.

Ship iOS app with pre-populated Parse datastore

Given the recent addition of local datastore for iOS to Parse, it should be possible to rely exclusively on Parse to manage app's database, thus totally avoiding Core Data. Does this sound like a good idea? What would be the pros and cons of such an approach?
In particular, I am wondering whether it will be possible to pre-populate Parse local datastore with some data, and include this database as a part of the app when submitting to appstore.
UPDATE
From the comments that were posted, it seems that people misunderstood my intended use case. Sorry guys, I should have made my question more clear from the beginning. Let me clarify it now, anyway.
So, there is some amount of data in Parse database on the web, same for every user, e.g. a catalogue of books. It will be updated every now and then. What I want is to publish an app on App Store which is pre-populated with Parse data store, as it stands at the moment when the app is published. For that to happen, I'd like to pin all available data when building my app and ship that data store along with the app. The problem is that the pinned data will be stored on device's (or emulator's) file system, it won't be part of the project. That's why if I build the app and submit it to app store, the data won'd be included.
Any suggestions how to attach the local data store to the app?
The local data store is stored in the sandboxed part of the filesystem in iOS. When you package the store with the app, it'll live in the signed application folder, not in the location Parse expects it to be.
So, if you were looking to do this, you'd need to include your default local data store in the application on building/submission, and copy it into the location Parse expects it to be in (which is Library/Private Documents/Parse and the file is called ParseOfflineStore) when your application starts up. This must happen before you call enableLocalDatastore, or an empty one will be initialized.
It should be possible!
Read this in the docs. Parse has a highly resourceful and fully documented guide for their backend.
https://parse.com/docs/ios_guide#localdatastore
Per my comment above concerning didFinishLaunchingWithOptions; it has been a place for your to create objects on launch, I have been doing that for a long time. Especially with channels. However, by enabling the local data store you can access those objects you pinned or created with a simple query with no reachability per your concern. Either way they both are created on disk. Core Data has a lot more cons. Especially with NSFetchedResultsController and the flexibility it offers. It's all up to you what you want to do with your app. PFQueryTableViewController isn't bad but if your direction and vision for your app is to be exclusively Parse then why not. It's a great feature. However I didn't see anything in the docs about the local queries effecting your limit so I would suggest looking into that if you have a large audience performing numerous queries per second.
Take advantage of their docs. They do a great job at keeping us informed.

Rails: How to stream data to and from a binary column (blob)

I have a question about how to efficiently store and retrieve large amounts of data to and from a blob column (data_type :binary). Most examples and code out there show simple assignments but that cannot be efficient for large amounts of data. For instance storing data from a file may be something like this:
# assume a model MyFileStore has a column blob_content :binary
my_db_rec = MyFileStore.new
File.open("#{Rails.root}/test/fixtures/alargefile.txt", "rb") do |f|
my_db_rec.blob_content = f.read
end
my_db_rec.save
Clearly this would read the entire file content into memory before saving it to the database. This cannot be the only way you can save blobs. For instance, in Java and in .Net there are ways to stream to and from a blob column so you are not pulling every thing into memory (see Similar Questions to the right). Is there something similar in rails? Or are we limited to only small chunks of data being stored in blobs when it comes to Rails applications.
If this is Rails 4 you can use render stream. Here's an example Rails 4 Streaming
I would ask though what database you're using, and if it might be better to store the files in a filesystem (Amazon s3, Google Cloud Storage, etc..) as this can greatly affect your ability to manage blobs. Microsoft, for example, has this recommendation: To Blob or Not to Blob
Uploading is generally done through forms, all at once or multi-part. Multi-part chunks the data so you can upload larger files with more confidence. The chunks are reassembled and stored in whatever database field (and type) you have defined in your model.
Downloads can be streamed. There is a large tendency to hand off upload and streaming to third party cloud storage systems like amazon s3. This drastically reduces the burden on rails. You can also hand off upload duties to your web server. All modern web servers have a way to stream files from a user. Doing this avoids memory issues as only the currently uploading chunk is in memory at any give time. The web server should also be able to notify your app once the upload is completed.
For general streaming of output:
To add a stream to a template you need to pass the :stream option from within your controller like this: render stream: true. You also need to explicitly close the stream with response.stream.close. Since the method of rendering templates and layouts changes with streaming, it is important to pay attention to loading attributes like title, etc. This needs to be done with content_for not yield. You can explicitly open and close streams using the Live API. For this you need the puma gem. Also be aware that you need a web server that supports streaming. You can configure Unicorn to support streaming.

iOS data base architectural decision

Newbie question.
I will need to have a data base from about 200 UIImages (single of them less than 500kb size) for iPad app. Customer want to have possibility to change set of this images from time to time without releasing new version of app in appstore and app must work without connection to the web (local data base on a device). I don't see how this can be done simultaneously, I see only one common option here:
Image data base would be stored on a server, what app customer will be able to change anytime. User will need to have web connection and every time he will start the application - existing data base will load into the app.
Main questions here:
is it possible to update data base on user's device without releasing new version of app and what data base managing system is more proper to this situation(SQLite, MySQL etc...)?
Q : is it possible to update data base on user's device without releasing new version of app?
A : Yes. It is possible.
SQLite will be perfect for you.
The photographs reside on the web server.
A number of start-off photographs may reside within the boundle so that the app is not really empty at start.
However, when downloading the app, the user must be online. In most cases he would still be online directly afterwards when he launches the app for the first time.
The server provides two services:
A quick one that just provides a version number of the
photo-database content and/or the date of the last change to the
photographs on the server.
The app frequently (not more than daily I would say) checks wether there are new images on the server or not.
If they are then the user is asked, whether he wants to download them.
If the user says YES then the app sends the version number and/or last date and/or IDs of all local photographs to the server and the
server provides the information about which photographs have been
added and where to download that very photograph and which have to
be deleted.
Then you add or delete or update the photographs from the download source given by the server. (That may well be an URL to the
very same server of course.)
For 200 data sets I would strongly suggest core data with SQLite - the standard stuff.
You may then think of holding the image data in the file system or in NSData properties within the database.

Resources