What is mean JSON file generate by Azcopy - azcopy

When I using Azcopy v7.3 to copy Table Storage. I receive 2 files JSON and manifest. Name of JSON file will be generated with the format myfilename_XXXXXXX.When I rename JSON file Azcopy throw exception. I really want to know how to XXXXXXX will be generated and how can file JSON file map with the manifest file.
Thanks for your help

The suffix is the CRC64 calculated by the entities content in this JSON file, and the manifest file stores the total CRC64 aggregated by all the JSON files. This is to ensure that file list is complete and each JSON file isn't corrupted respectively.

Related

Find error record file while processing too many files in same bucket in apache beam java sdk

I have 20 files (csv files) in the same bucket. I am able to read all the file in one go and load on to bigquery. But when there is some data type mismatches, im able to get that row into invalidDataTag where as i am unable to find the file name that has the error record.
inputFilePattern is gs://bucket-name/* this picks up all the files that are present under the bucket. and reading the files as below
PCollection<String> sourceData = pipeline.apply(Constants.READ_CSV_STAGE_NAME, TextIO.read().from(options.getInputFilePattern()));
Is there a way where i can find the file name that has the error row in it ?
My suggestion would be to add a column to the BigQuery table that indicates which file the record came from.

Rails convert binary string to .docx

I am working in rails and I downloaded a word document from OneDrive through graph API and it returns a binary string which is a collection of files. I need to convert this string into .docx file and if I save it in a simple way or I write as a binary file after decoding it using base64, it doesn't save in the right format, it looks some awkward content in the file.
Any help in this regard will be appreciated.
Thanks
Can you not just save the binary string to a file?
data = <binary string>
File.open('document.docx', 'wb') do |f|
f.write(data)
end
A docx file is actually a gzipped collection of files, with the file extension .docx substituted for .gz. There should be no conversion necessary, and there should be no encoding necessary in order to download it across the 'net.
You should be able to change the file extension to .gz and then unzip it using gunzip, with the result being a collection of xml files (text) and directories. If you can't do this, then you haven't correctly decoded it, so you should figure out what encoding you have requested, and reverse that, or better, don't request encoding at all.

how to understand the concept of Object Container Files in avro?

I'm quite confused about the concept of Object Container Files in Avro.
https://avro.apache.org/docs/current/spec.html#Object+Container+Files
Does Object Container Files mean the files which produced by Avro when serializing the data? Avro persist the serialized data into one or more files, does this file call Object Container Files?
If you're to store Avro files on disk, those are represented by the Container file specifications mentioned there.
The files contain binary data, after data is serialized
One file contains a schema and many serialized records matching that schema

Insert local file path in json file

I am trying to insert paths of two local file folders in a json file in my project. My json file has two keys "sprite" and "glyphs" and I've included two file folders named sprites and glyphs which are to be the values of above two keys in json file. My json file looks like:
"sprite": "asset://sprites/sprite",
"glyphs": "asset://glyphs/{fontstack}/{range}.pbf"
But the code doesn't seems to locate them. I've also tried following, but doesn't worked
"sprite": "file:///sprites/sprite",
"glyphs": "file:///glyphs/{fontstack}/{range}.pbf"

How to get the file size of a GridFS file in a mongodb collection?

So every time a file is uploaded using GridFS, metadata is attached to it http://docs.mongodb.org/manual/reference/gridfs/#gridfs-files-collection including the filesize.
I'm uploading files using carrierwave-mongoid gem and I have an index page where I list the names of the files and the files can be downloaded.
I need to get the file sizes of all of those files. My question is, how can I get the file size of every file? How can I grab that information, the information that is already in the gridfs file metadata through Ruby?
Yes the filesize is stored as length in the files collection according to the mongo docs
You will have to grab some identifiable piece of information from the file to access it in mongo. The mongo docs show using filename: "<your_file_name>". To access it with the file _id-
myid = BSON::ObjectId.from_string("57898fa2b5e1b565d4b9b5c8")
result = fs_bucket.find(_id: myid).to_json
obj = JSON.parse(result)
puts obj
Then you can access it as normal json.

Resources