I need to analyze a dozen similarly formatted data files. I wish to generate a similar html report, containing some statistics and graphs which describe the data, for each file. One html report per one file, same graphs in each, just different numbers. For a single file, this is easy to do for instance using the FsLab journal. Despite my best efforts, I haven't found any way to do this efficiently for many similar files (same format, different numbers).
If I have 10 files, I'd need to copy-paste the journal 10 times and change the line that defines which file to load in each copy. Then whenever I wish to add a new graph I'd need to edit all 10 files. This clearly cannot be the best way to do this.
I am willing to use other methods than the journal and other libraries than FsLab if they suit the problem better, but I'd believe there'd be an easy solution for a basic thing like this.
This is something that is not very nicely supported by the FsLab Journals system, but you can definitely find some way to do this. One simple option I can think of would be to modify the build.fsx script for the journals so that it processes the script repeatedly and uses, e.g. environment variable to specify the input file.
If you are using the standard template, look at the generateJournals functoion:
let generateJournals ctx =
let builtFiles = Journal.processJournals ctx
traceImportant "All journals updated."
Journal.getIndexJournal ctx builtFiles
I think you should be able to modify it along the following lines:
let generateJournals ctx =
// Iterate over all inputs you want to process
for input in inputFiles do
// Set environment variable to keep 'input'
let builtFiles = Journal.processJournals ctx
// Move the resulting files, so that they do not
// get overwritten by the next run
// Just return the journal you want to open first below
traceImportant "All journals updated."
Journal.getIndexJournal ctx builtFiles
Then in the journal, you should be able to use System.Environment to read the variable set in the build script.
Related
I'm using SPSS 25 syntax to open and process a set of datafiles. I would like these syntax files to be as portable as possible. For that reason, I want the user to be able to select the file locations at runtime without having to recode the syntax itself.
I'm running Windows 10, although hopefully that doesn't matter. I do have the Python plugin for SPSS, although ideally this would be a base SPSS syntax solution.
In SPSS right now, I'm doing this:
GET
FILE='C:\Users\xkcd\studies\project\rawdata'+
'\reallyraw\veryraw.sav'
PASSWORD='CorrectHorseBatteryStaple'.
DATASET NAME Demo WINDOW=FRONT.
In R, I would do this:
message("Where is the veryraw.sav file?")
demo<-fread(file.choose())
Ideally, the user would, at runtime, select the individual files one at a time.
Less ideally, the user would select a folder in which all of the files, with known names.
I could use FILE HANDLE so that the user would only have to hardcode a few folder locations, but that's less than ideal - I really would rather that the user isn't editing the syntax at all.
Thanks in advance!
Following up on the idea of a fully automated process - the following code will work assuming there is a specific file name you need to run your code on, and only one copy exists in the folder you are searching. This is possible to run on drive C: directly, but will take much less time to run if you can narrow down the path:
* this will create a text file that has the path of the required file.
HOST COMMAND=['dir /s /b "C:\Users\somename\*required file name.sav" > C:\Users\somename\tempname.sps'].
* now to read the name and put in in a handle.
DATA LIST file = "C:\Users\somename\tempname.sps" fixed / pth 1-500 (a).
exe.
string cmd(a500).
compute cmd=concat("file handle myfile / name='", rtrim(pth), "'.").
write out="C:\Users\somename\tempname.sps" /cmd.
exe.
* inserting the new syntax will activate the handle.
insert file = "C:\Users\somename\tempname.sps".
Now you can use the handle myfile in the syntax, e.g:
get file=myfile.
I have been given the task of creating a DXL script. First problem is that I have never used DXL before, even though I have many years experience with DOORS itself. I have been surfing the Net to seek guidance on my particular problem. I also have a few specimen DXL scripts for reference.
My new client requires that for each View of a given Module, of which there are many Views, new "reduced" Modules are to be produced reflecting each View.
By "reduced", I mean that these new Modules are to contain nothing that isn't actually needed for that View., i.e. Columns, Attributes etc. These new Modules will only have the single View.
So, the way forward as I see it, is to take copies of the single master Module, one for each View, rename those copies to reflect a given Master Module/Required View, select that required View in the given copy Module and then delete everything that is not needed by that View, i.e. available Columns, Attributes etc.
This would be simple if I had the required DXL knowledge, which I am endeavouring to pick up as fast as I can.
If at all possible, this script has to be generic and be able to work upon any of the master Module copies to produce the associated "reduced" Module reflecting a particular View.
The client aims to use the script periodically for View archiving (I know, that's the way they want it).
Clarification
Some clarification of what I believe is required, given the following text from my original question:
If at all possible, this script has to be generic and be able to work upon any of the master Module copies to produce the associated "reduced" Module reflecting a particular View.
So, say there are ten views of the master Module, outside of the DXL script, I would copy the master Module ten times, renaming each copy to reflect each of the ten views. Unless you know different, each of those ten copies will reflect the same “Absolute Number”s as are in the master Module, so no problem there?
So, starting with the first of the copied Modules, each named to reflect the View it will eventually represent, its View would be set from the ten Views available to it, that which matches its title.
The single generic DXL script would then be run against that first copy Module, the aim being to delete everything not actually needed for that view, i.e. Attributes, Columns etc. Would some kind of purging command be required in the script for any aforementioned deleted items?
The single generic DXL script would then delete ALL views from that copy Module. The log that is produced when running the script also needs capturing, but I’m not sure whether this should be done from within the script, if possible or as a separate manual task outside of the script.
The aforementioned (indented) process would then be repeated, using the same generic script, against the remaining nine copied Modules. The intension is to leave us with ten copy Modules, each one reflecting one of the ten possible Views, with each one containing only the Attributes, Columns etc. required for that View.
Creating a mirror of a module with this approach is not so easy IMO. Think e.g. about "Absolute Number". If the original module contains the numbers 15 (level 1), 2000 (level 2), 1 (level 1), you will have to create 2000 objects, purge 1997 of them and move them to the correct place.
There is a "duplicate" tool at https://www.ibm.com/developerworks/community/forums/html/topic?id=43862118-113d-4eac-b3f1-21d3b73959d1 which tries to do this, but as stated there, this script is said not to work correctly in all situations.
So, I would rather use the approach "string clipCopy (Item i); string clipPaste(Folder folderRef)". Should be faster and less error prone. But: all Out-Links will also be copied with this method, you will probably have to delete these after the copy or else the link target module(s) will have lots of In-Links.
The problem is still not so easy to solve, as every view might have DXL columns that rely on some or other attribute, and it might contain DXL attributes which again might rely on sth else. I doubt that there is a way to analyze DXL code "on the fly" and find out which columns may be deleted.
Perhaps a totally different approach would be feasible: open each view and create an export to Excel, this way you will get rid of any dynamic dependencies. Then re-import the excel sheet to a new DOORS module. You will still have the "Absolute Number" problem, but perhaps you can make a deal that you will have a pseudo attribute "Original Absolute Number" and disregard the "new" "Absolute Number"'
Quite a big task for a DXL beginner....
Update: On second thought, perhaps you might want to combine these approaches
agree with your employer that you will use an alternative attribute for Absolute Number
use a loop like Russel suggested, when creating objects remember that objects might have to be created "below" or "after" its predecessor or sibling
for DXL attributes do not copy the DXL code but the actual current value of the object
for DXL columns create pseudo attributes _ and create a new view that uses these pseudo attributes instead of the original value
Copying the entire module, then deleting everything not in that view, seems worse than just copying the things you need from each particular view.
I would take the following as the outline of your program:
for view in main module do {
for column in view do {
Find attribute for each column and store (possibly in a skip list?)
Store name of column
}
create new module
create needed types / attributes in new module
create new view in new module
for object in main module {
create object in new module
for attribute in main module {
check if attribute is in new module {
copy info from old object to new
}
}
}
}
Each of these for X in y loops should be in the DXL reference manual in some for or another.
If you need more help, let me know!
I need to process a (GCS) bucket of files, where each file is compressed and contains a single multi-line JSON record. Also, the name of the file being processed is significant and I need to know it within my transform.
Starting with examples in the docs, TextIO looks pretty close, but it looks like its designed to process each file line-by-line and does not allow me to read the entire file at once. Also, I don't see any way to get the filename that's being processed?
PCollectionTuple results = p.apply(TextIO.Read
.from("gs://bucket/a/*.gz")
.withCompressionType(TextIO.CompressionType.GZIP)
.withCoder(MyJsonCoder.of()))
Looks like I need to write a custom IO reader, or some such? Any tips for best place to start?
You are correct that right now none of the existing classes do exactly what you want. There are 2 reasonable approaches:
Match the filepattern yourself (using IOChannelUtils and IOChannelFactory) and wrap the resulting files into a PCollection<String> where the String will be a filename, using Create.of(filenames). Then apply a ParDo with a function which reads the given filename.
Write your own subclass of Source (there's also FileBasedSource, but it's not quite right for your use case). It would be configured by the filepattern, and splitIntoBundles would match the filepattern and expand into individual sources each corresponding to one file.
I would recommend the first approach because it seems like less code and your use case does not require the full power of Source.
I have an XML file in my app resources folder. I am trying to update that file with new dictionaries dynamically. In other words I am trying to edit an existing XML file to add new keys and values to it.
First of all can we edit a static XML file and add new dictionary with keys and values to it. What is the best way to do this.
In general, you can read an XML file into a document object (choose your language), use methods to modify it (add your new dictionary), and (re-)write it back out to either the original XML file, or a new one.
That's straightforward ... just roll up the ol' sleeves and code it up.
The real problem comes in with formatting in the XML file before and after said additions.
If you are going to 'unix diff' the XML file before and after, then order is important. Some standard XML processors do better with order than others.
If the order changes behind the scenes, and is gratuitously propagated into your output file, you lose standard diffing advantages, such as some gui differs, and some scm diffs (svn, cvs, etc.).
For example, browse to:
Order of XML attributes after DOM processing
They discuss that DOM loses order where SAX does not.
You can also write a custom XML 'diff'er (there may be such off-the-shelf ... for example check out 'http://diffxml.sourceforge.net/') that compares 2 XML documents tag-by-tag, attribute-by-attribute, etc.
Perhaps some standard XML-related tool such as XSLT will allow you to keep the formatting constant without changing tag or attribute order. You'd have to research that.
BTW, a related problem is the config (.ini) file problem ... many common processors flippantly announce that the write-order may not agree with the read-order.
I am using tshark to filter some packets based on Display/Read filters from one file into another.
I want to have one final output file out.pcap after executing multiple read filters over number of files and combine all into out.pcap.
I was trying to use mergecap but it does not allow to append (combine) two file and store in one of them without overwriting.
Is there any way to do this, as I don't want to keep creating temporary files and merge all them together at the end.
This is not possible that I know of with existing tools, although given the way the capture file format is layed out, it should be possible to write a new tool (or extend mergecap) to do this.