Add new attribute HDFView with Array Size: Scalar instead of one - hdf5

I am trying to reverse engineer some data save in HDF5 files. Trying to understand the structure of the hdf5 file, to be able to create similar files -> to interact with a pipeline already build for this kind of files.
The pipeline kept on throwing the error:
TypeError: buffer size mismatch
I was copying and adding the atrributes using HDFView. But going through them, it looks like by adding attribute through HDFView -> selecting VLength -> and pasting the JSON string. I ended up with the attribute wrapped in an array pic
I think one of the errors can come from this, but how can I add the JSON attribute without the array() wrap?
Any hints on this would be greatly appreciated

Related

How to create "list of type" for HtmlProvider<list of HTML files> in F#?

I am very new to F# and trying to convert my python script to F# code for my learning.
i want to parse multiple (around 25 files) static html files to extract similar information from each file.
I want to have a list of file handle for all the html files.
I am able to do it for single file as:
type SummaryHtmlType = HtmlProvider< #"C:/MyLocation/Summary_1.html">
I tried something similar to XmlProvider (even not sure if that's correct for XmlProvider), but no success.
type MyType = HtmlProvider<htmlFileList; SampleIsList=true>
Let me know solution even if there is all together different approach to do it.
"C:/MyLocation/Summary_1.html" in type SummaryHtmlType = HtmlProvider< #"C:/MyLocation/Summary_1.html"> is a sample file for HtmlProvider get the basic structure.
To parse file or url, use Load method like SummaryHtmlType.Load(url)
For more information, see http://fsharp.github.io/FSharp.Data/library/HtmlProvider.html

Multiple file generation while writing to XML through Apache Beam

I'm trying to write an XML file where the source is a text file stored in GCS. The code is running fine but instead of a single XML file, it is generating multiple XML files. (No. of XML files seem to follow total no. of records present in source text file). I have observed this scenario while using 'DataflowRunner'.
When I run the same code in local then two files get generated. First one contains all the records with proper elements and the second one contains only opening and closing root element.
Any idea about the occurrence of this unexpected behaviour? please find below the code snippet I'm using :
PCollection<String>input_records=p.apply(TextIO.read().from("gs://balajee_test/xml_source.txt"));
PCollection<XMLFormatter> input_object= input_records.apply(ParDo.of(new DoFn<String,XMLFormatter>(){
#ProcessElement
public void processElement(ProcessContext c)
{
String elements[]=c.element().toString().split(",");
c.output(new XMLFormatter(elements[0],elements[1],elements[2],elements[3],elements[4]));
System.out.println("Values to be written have been provided to constructor ");
}
})).setCoder(AvroCoder.of(XMLFormatter.class));
input_object.apply(XmlIO.<XMLFormatter>write()
.withRecordClass(XMLFormatter.class)
.withRootElement("library")
.to("gs://balajee_test/book_output"));
Please let me know the way to generate a single XML file(book_output.xml) at output.
XmlIO.write().to() is documented as follows:
/**
* Writes to files with the given path prefix.
*
* <p>Output files will have the name {#literal {filenamePrefix}-0000i-of-0000n.xml} where n is
* the number of output bundles.
*/
I.e. it is expected that it may produce multiple files: e.g. if the runner chooses to process your data parallelizing it into 3 tasks ("bundles"), you'll get 3 files. Some of the parts may turn out empty in some cases, but the total data written will always add up to the expected data.
Asking the IO to produce exactly one file is a reasonable request if your data is not particularly big. It is supported in TextIO and AvroIO via .withoutSharding(), but not yet supported in XmlIO. Please feel free to file a JIRA with the feature request.

How to merge multiple plist files into one?

Just to start, I really have no idea what Im doing. I was given this task for an internship, and am really learning as I go. I have multiple plist files, they consist of around 22 items each, and list values of colors. I need to merge all of these files into one, and am really not sure how to go about it. I have a certain structure I need to go by, and really Im not sure how to go about it. I was told to open the plists in texteditor and then paste all of the raw code into one text file, this doesn't seem to work as I only end up getting the values for the first plist I pasted into the text file. Any help would be nice. Thanks.
Assume your from.plist contains keys 1, 2 and to.plist contains 2, 3
Run this:
/usr/libexec/PlistBuddy -x -c "Merge from.plist" to.plist
to.plist will contain 1, 2, 3
There are a number of ways to handle this. By default a plist is a special form of XML file. If you figure out the syntax you can in fact use a text editor to merge the contents of multiple files together, but you need to make sure you get it right.
A plist file has a specific header for the entire file. You could not just copy/paste multiple plists together because then they would have that header repeated.
The next way to do it is programmatically. If you can figure out the type of outer collection these files contain (probably an array or a dictionary) then you could write a few lines of code that read in each of the plists as arrays, combines them using NSArray code (assuming they contain arrays of colors) and then save the combined array back to a new plist. As vadian says you can also use the NSPropertyListSerialization class. Thats a more general-pupose way of handling plist files, but it's also more complex and harder to figure out.
A third way to do it is in Xcode. If you right-click on a plist file and select "open in Xcode" it should give you Xcode's property list editor. You can then copy and paste the contents of the files together and save the results to a new file.
I figured it out!! First create the structure, or use the template given to you. I suggest opening this template/ structure in Xcode, as it makes it easier to switch between viewing the list as a plist and source code. Open your template as a source code. Then open each of your plists in text editor, and copy and paste the code from your plists into the appropriate area in your templates source code, then you can view it in Xcode as a property list to make sure it's correct. The only thing you have to be careful about here is making sure you are getting no errors. Otherwise this works great!!

Add new values to XML dynamically

I have an XML file in my app resources folder. I am trying to update that file with new dictionaries dynamically. In other words I am trying to edit an existing XML file to add new keys and values to it.
First of all can we edit a static XML file and add new dictionary with keys and values to it. What is the best way to do this.
In general, you can read an XML file into a document object (choose your language), use methods to modify it (add your new dictionary), and (re-)write it back out to either the original XML file, or a new one.
That's straightforward ... just roll up the ol' sleeves and code it up.
The real problem comes in with formatting in the XML file before and after said additions.
If you are going to 'unix diff' the XML file before and after, then order is important. Some standard XML processors do better with order than others.
If the order changes behind the scenes, and is gratuitously propagated into your output file, you lose standard diffing advantages, such as some gui differs, and some scm diffs (svn, cvs, etc.).
For example, browse to:
Order of XML attributes after DOM processing
They discuss that DOM loses order where SAX does not.
You can also write a custom XML 'diff'er (there may be such off-the-shelf ... for example check out 'http://diffxml.sourceforge.net/') that compares 2 XML documents tag-by-tag, attribute-by-attribute, etc.
Perhaps some standard XML-related tool such as XSLT will allow you to keep the formatting constant without changing tag or attribute order. You'd have to research that.
BTW, a related problem is the config (.ini) file problem ... many common processors flippantly announce that the write-order may not agree with the read-order.

TClientDataset: 'Fieldtype not supported for XML.'

I've got a bunch of data loaded into a TClientDataset, representing an array of complex objects. But when I try to run
Dataset.SaveToFile('c:\test.xml', dfXMLUTF8);
it doesn't like it:
Project testing.exe raised exception class EDBClient with message 'Fieldtype not supported for XML.'.
This is a lot less useful than it should be, for two reasons. First off, it doesn't say which field or which field type isn't supported, and second, the actual saving is taking place inside a black-box DLL.
The only field types I'm using in this dataset are integers, strings, booleans, and a few TArrayFields that hold arrays of integer fields. Nothing I'd expect to be all that difficult to serialize. Anyone have any idea why this isn't working?
Is everything saved or just some fields? Maybe for example TArrayFields are throwing an exception? Try removing different fieldtypes one-by-one and see when things start working.
Wild guess is that array fields are not supported in XML export,
but you should check.
Go to Project options->Compiler and turn on "Use debug DCUs". Rebuild.
Set breakpoint on your SaveToFile() call. Run.
Then you can step into VCL source and try to hunt for what is unsupported.

Resources