I wonder if there is a better code/library that would allow reading the file metadata?
So far, I have tried using LuaFileSystem and LuaCom (Scripting.FileSystemObject) but so far none was able to extract all the data. When I mean all the data, other than the usual standard data such as date accessed, date created, date modified, etc, I wanted some more data like in the case for pdf, it will contain other data such as author and title and for image, it will contain data like bit depth, resolution.
You seem to be missing the difference between filesystem metadata and document metadata. Filesystem metadata is the metadata the filesystem stores about a file. Every file has this stuff, because every file is stored on the filesystem. This metadata is not actually stored within the file; if you loaded the file, that wouldn't give you access to the filesystem metadata. You have to talk to the filesystem to get it.
Document metadata is some bit of information within the file that serves as metadata. To get this data you have to read the file, know what the file's format is, and parse that metadata out.
I don't know of any library, Lua or otherwise, that is designed to extract arbitrary metadata from arbitrary file types.
Related
I just got started with FileManager and wanted to know if there was a way to query files against custom attributes without having to retrieve the contents of the files.
Similar to creationDate or modificationDate, I want to have an attribute called topicsCovered which is an array of strings containing things such as "technology", "politics", etc. (for news articles). Since News Articles may have many images, I don't want to unnecessarily retrieve ALL the news articles, convert them into concrete types, and then filter out the ones I don't want (specially since I'll be storing a maximum of 500 articles, which would mean retrieving ~1gb worth of files from disk).
So if there is a way to query files before retrieving them using FileManager please let me know since I feel retrieving ~1gb worth of files from disk may be a very expensive/heavy operation.
Thanks in advance!
You have a bunch of files in your system. I suggest that you create an extra file, "metadata", that contains the metadata about all the other files (for e.g. the topics covered information), and the URL/filenames to the files themselves. You could also store this metadata in CoreData or some database table.
Then when you are looking for all the "politics" articles for example, you open the metadata file and find all the entries that cover politics which will give you the filenames of the files you want to load.
Is there any way I can download json file from webserver and store it in a local folder for easy access for those with poor internet connection, so data will be downloaded once and user won't have to suffer every time.
I found similar questions on here1 and here2, but they were asked for objective-C, but I was looking something for Swift. Thanks
Yes, you can certainly do this. After you've read the remote JSON, it will be a Data object.1
Build a URL to a path in your app's caches directory and then use the Data method write(to:options:) to write that data into your file.
On read, check to see if the file exists in the caches directory before triggering a network read. Note that you need to be sure that the filenames you use are consistent and unique. (The same filename must always fetch the same unique data.)
1 Note that Mohammad has a good point. There are better ways of persisting your data than saving the raw JSON. Core Data is a pretty complex framework with a steep learning curve, but there are other options as well. You might look at conforming to the Codable protocol, which would let you serialize/deserialize your data objects in a variety of formats including JSON, property lists, and (shudder) XML.
Yes, you can create a .json file and store it in documents folder. First see how to create .json file, and then see how to store a file in documents folder.
Check this
Is it possible with Tika to get the MIME Type or other meta data without loading the whole file?
I could code a script to get the first 1MB. I am thinking of doing this to take off some of the load on Tika and my server.
For container-based formats, Apache Tika needs the whole file to be sure of the type. Container formats include pretty much everything based on a zip file (Word .docx, OpenDocumentFormat .odf, iWorks etc), anything based on the OLE2 format (Excel .xls, Hangul, MSI etc), and pretty much all multimedia formats. You can often take a good guess based on the filename and the container type, but to be sure you need to process the whole file to identify the contents and hence the file type
For everything else, if Tika can detect the file type, then only the first few tens of KBs are needed, often even only the first few hundred bytes. (Depends on the format in question - different ones have their predictable signatures in different places)
If you don't need Tika's very best detection guess, but can make do with a slightly lower certainty (especially on container-based formats), then simply just give Tika the start of the file. Or tell Tika to only use the mime magic detector without any of the container-specific detectors.
I want to categorise the video files that a user loads based on the genre stored in the file metadata. I know this is true for MP3 files, and the format of this data, and location at the eof is well documented,
Im looking for information on how video file metadata is formatted and where it is stored in the file(eg. how many bytes at the eof are dedicated to metadata). While I appreciate that different file formats will have different formatting methods they use to store the information, I'm trying to figure out if there is a known format for certain video file formats, or a basic model that can be applied to most file formats.
You would have to go threw all the video formats and get them like this.. http://www.fastgraph.com/help/avi_header_format.html
or an easier way is to use the libary all ready created, http://mediaarea.net/en/MediaInfo
I've asked similar questions before, but have not received a definitive answer. Seems that there must be a way to simply add/modify metadata to an image without loading the image into memory, without having to deal with directly reading bits.
Seems like ways exist when using CMSampleBufferRefs, but I need to be able to do this with a regular image already saved to disk.
For instance, given a very large png at /Documents/photo.png, I want to modify its exif metadata without having to load that image.
You can use libexif - I've had success with compiling it for iOS before. With libexif, you can modify any image's EXIF metadata.
If you know how to modify the EXIF, you can modify the binary data directly from the file. Just replace in the image the binary portion with the new one.
I don't know if objective-c permit this, but in ansi c should be simple. The complicate part is to identify the exact part to change.