What is the conceptual difference between bin and gen? - bazel

For ordinary rules, output gets written in the bin dictionary. For e.g. genrules output is written to genfiles directory. While this is not surprising given the name of the latter name, I wonder why there is a and what is the conceptual difference.

There isn't a particularly good reason (and you can actually write genrules' output to bin with the output_to_bindir attribute and put Skylark outputs anywhere you want).
It's just historical. There are actually a couple of other output directories like those (e.g., testlogs, include), they're just the most common.

Related

How to parse only user defined source files with clang tools

I am writing a clang tool, yet I am quite new to it, so i came across a problem, that I couldn't find in the docs (yet).
I am using the great Matchers API to find some nodes that I will later want to manipulate in the AST. The problem is, that the clang tool will actually parse eeeverything that belongs to the sourcefile including headers like iostream etc.
Since my manipulation will probably include some refactoring I definitely do not want to touch each and every thing the parser finds.
Right now I am dealing with this by comparing the sourceFiles of nodes that I matched against with the argumets in argv, but needless to say, that this feels wrong since it still parses through ALL the iostream code - it just ignores it whilst doing so. I just cant believe there is not a way to just tell the ClangTool something like:
"only match nodes which location's source file is something the user fed to this tool"
Thinking about it it only makes sense if its possible to individually create ASTs for each source file, but I do need them to be aware of each other or share contextual knowledge and I also haven't figured out a way to do that either.
I feel like I am missing something very obvious here.
thanks in advance :)
There are several narrowing matchers that might help: isExpansionInMainFile and isExpansionInSystemHeader. For example, one could combine the latter with unless to limit matches to AST nodes that are not in system files.
There are several examples of using these in the Code Analysis and Refactoring with Clang Tools repository. For example, see the file lib/callsite_expander.h around line 34, where unless(isExpansionInSystemHeader)) is used to exclude call expressions that are in system headers. Another example is at line 27 of lib/function_signature_expander.h, where the same is used to exclude function declarations in system headers that would otherwise match.

Converting between M3 `loc` scheme and regular `loc` type?

The M3 Core module returns a sort of simplified loc representation in Rascal. For instance, a method in file MapParser might have the loc: |java+method:///MapParser/a()|.
However, this is evidently different from the other loc scheme I tend to see, which would look more or less like: |project://main-scheme/src/tests/MapParser.java|.
This wouldn't be a problem, except that some functions only accept one scheme or another. For instance, the function appendToFile(loc file, value V...) does not accept this scheme M3 uses, and will reject it with an error like: IO("Unsupported scheme java+method").
So, how can I convert between both schemes easily? I would like to preserve all information, like highlighted sections for instance.
Cheers.
There are two differences at play here.
Physical vs Logical Locations
java+method is an logical location, and project is a physical location. I think the best way to describe their difference is that a physical location describes the location of an actual file, or a subset of an actual file. A logical location describes the location of a certain entity in the context of a bigger model. For example, a java method in a java class/project. Often logical locations can be mapped to a physical location, but that is not always true.
For m3 for example you can use resolveLocation from IO to get the actual offset in the file that the logical location points to.
Read-only vs writeable locations
Not all locations are writeable, I don't think any logical location is. But there are also physical locations that are read only. The error you are getting is generic in that sense.
Rascal does support writing in the middle of text files, most likely you do not want to use appendToFile as it will append after the location you point it too. Most likely you want to replace a section of the text with your new section, so a regular writeFile should work.
Some notes
Note that you would have to recalculate all the offsets in the file after every write. So the resolved physical locations for the logical locations would be outdated, as the file has changed since constructing the m3 model and its corresponding map between logical and physical locations.
So for this use case, you might want to think of a better way. The nicest solution is using a grammar, and rewrite the parse tree's of the file, and after rewriting overwrite the old file. Note that the most recent Java grammar shipped with Rascal is for Java 5, so this might be a bit more work than you would like. Perhaps frame your goal as a new Stack Overflow question, and we'll see what other options might be applicable.

how the multiple pdbs can be written in single pdb file using biopython libraries

I wonder how the multiple pdbs can be written in single pdb file using biopython libraries. For reading multiple pdbs such as NMR structure, there is content in documentation but for writing, I do not find. Does anybody have an idea on it?
Yes, you can. It's documented here.
Image you have a list of structure objects, let's name it structures. You might want to try:
from bio import PDB
pdb_io = PDB.PDBIO()
target_file = 'all_struc.pdb'
with pdb_file as open_file:
for struct in structures:
pdb_io.set_structure(struct[0])
pdb_io.save(open_file)
That is the simplest solution for this problem. Some important things:
Different protein crystal structures have different coordinate systems, then you probably need to superimpose them. Or apply some transformation function to compare.
In pdb_io.set_structure you can select a entity or a chain or even a bunch of atoms.
In pdb_io.save has an secondary argument which is a Select class instance. It will help you remove waters, heteroatoms, unwanted chains...
Be aware that NMR structures contain multiple entities. You might want to select one. Hope this can help you.
Mithrado's solution may not actually achieve what you want. With his code, you will indeed write all the structures into a single file. However, it does so in such a way that might not be readable by other software. It adds an "END" line after each structure. Many pieces of software will stop reading the file at that point, as that is how the PDB file format is specified.
A better solution, but still not perfect, is to remove a chain from one Structure and add it to a second Structure as a different chain. You can do this by:
# Get a list of the chains in a structure
chains = list(structure2.get_chains())
# Rename the chain (in my case, I rename from 'A' to 'B')
chains[0].id = 'B'
# Detach this chain from structure2
chains[0].detach_parent()
# Add it onto structure1
structure1[0].add(chains[0])
Note that you have to be careful that the name of the chain you're adding doesn't yet exist in structure1.
In my opinion, the Biopython library is poorly structured or non-intuitive in many respects, and this is just one example. Use something else if you can.
Inspired by Nate's solution, but adding multiple models to one structure, rather than multiple chains to one model:
ms = PDB.Structure.Structure("master")
i=0
for structure in structures:
for model in list(structure):
new_model=model.copy()
new_model.id=i
new_model.serial_num=i+1
i=i+1
ms.add(new_model)
pdb_io = PDB.PDBIO()
pdb_io.set_structure(ms)
pdb_io.save("all.pdb")

Hyper directory - Can a (Linux) directory in addition to a list of subdirectories and files also contain text itself?

Under Linux, are there ways to add comments, description (text, rich text, hypertext .. ) to a directory itself, rather than by means of auxiliary files in such a directory, like README.txt, INSTALL.txt, NOTE_ON_WHY_WE_DID_THIS_THIS_WAY.txt, .. ?
In such a generalized directory, a directory entry (subdirectory/file) would be represented as (hyper)link, at least in one view of such a generalized directory. A "classical directory view" may also be available for generalized directories, in which the commments, description, mentioned above, would be omitted, or be available through an auxiliary file. I am aware this may require either special formatting of the storage medium, or a software layer on top of a classical disk formatting structure. The views would have to be derived from the generalized directory and not vice versa (in order to avoid consistency problems between the views).
Not in general, but some file-systems have extended file attributes. You could use getfattr(1), setfattr(1). See attr(5), listxattr(2), setxattr(2) etc...
AFAIK, few utilities are using these extended file attributes (and that surprises me; I would imagine that desktop environments would e.g. use them to store e.g. the MIME type of files, but they usually don't). There is a significant (file-system specific) limit on these extended attributes, e.g. 255 bytes
A more practical and traditional way would be to decide to store your additional meta-data in some hidden directory (with a name starting with a dot, like .git/ used by git)
I can't refer to all filesystems, but at least in extX, directory contains only the names of files/dirs which are in this dir, their inode numbers, and offset beetween where the next pair (dir/file - inode) starts. Generally such data which describe dirs are kept in inode structure (not inside directory itself), for instance owner of dir, atime, ctime extended attributes, number of links and so on, all these things are there. You can look on such structure in kernel source, and there is not such field which allows to put "labels" on the file/dir. In theory you would use some "unused" fields of this structure, but only in theory since these is very limited space.
Interesting question, but I believe not. From what I remember, directories are just pointers to files other directories, so I don't think it would be possible to store text in them. Maybe if you re-enginner the whole filesystem...

How do I "diff" multiple files against a single base file?

I have a configuration file that I consider to be my "base" configuration. I'd like to compare up to 10 other configuration files against that single base file. I'm looking for a report where each file is compared against the base file.
I've been looking at diff and sdiff, but they don't completely offer what I am looking for.
I've considered diff'ing the base against each file individually, but my problem then become merging those into a report. Ideally, if the same line is missing in all 10 config files (when compared to the base config), I'd like that reported in an easy to visualize manner.
Notice that some rows are missing in several of the config files (when compared individually to the base). I'd like to be able to put those on the same line (as above).
Note, the screenshot above is simply a mockup, and not an actual application.
I've looked at using some Delphi controls for this and writing my own (I have Delphi 2007), but if there is a program that already does this, I'd prefer it.
The Delphi controls I've looked at are TDiff, and the TrmDiff* components included in rmcontrols.
For people that are still wondering how to do this, diffuse is the closest answer, it does N-way merge by way of displaying all files and doing three way merge among neighboors.
None of the existing diff/merge tools will do what you want. Based on your sample screenshot you're looking for an algorithm that performs alignments over multiple files and gives appropriate weights based on line similarity.
The first issue is weighting the alignment based on line similarity. Most popular alignment algorithms, including the one used by GNU diff, TDiff, and TrmDiff, do an alignment based on line hashes, and just check whether the lines match exactly or not. You can pre-process the lines to remove whitespace or change everything to lower-case, but that's it. Add, remove, or change a letter and the alignment things the entire line is different. Any alignment of different lines at that point is purely accidental.
Beyond Compare does take line similarity into account, but it really only works for 2-way comparisons. Compare It! also has some sort of similarity algorithm, but it also limited to 2-way comparisons. It can slow down the comparison dramatically, and I'm not aware of any other component or program, commercial or open source, that even tries.
The other issue is that you also want a multi-file comparison. That means either running the 2-way diff algorithm a bunch of times and stitching the results together or finding an algorithm that does multiple alignments at once.
Stitching will be difficult: your sample shows that the original file can have missing lines, so you'd need to compare every file to every other file to get the a bunch of alignments, and then you'd need to work out the best way to match those alignments up. A naive stitching algorithm is pretty easy to do, but it will get messed up by trivial matches (blank lines for example).
There are research papers that cover aligning multiple sequences at once, but they're usually focused on DNA comparisons, you'd definitely have to code it up yourself. Wikipedia covers a lot of the basics, then you'd probably need to switch to Google Scholar.
Sequence alignment
Multiple sequence alignment
Gap penalty
Try Scooter Software's Beyond Compare. It supports 3-way merge and is written in Delphi / Kylix for multi-platform support. I've used it pretty extensively (even over a VPN) and it's performed well.
for f in file1 file2 file3 file4 file5; do echo "$f\n\n">> outF; diff $f baseFile >> outF; echo "\n\n">> outF; done
Diff3 should help. If you're on Windows, you can use it from Cygwin or from diffutils.
I made my own diff tool DirDiff because I didn't want parts that match two times on screen, and differing parts above eachother for easy comparison. You could use it in directory-mode on a directory with an equal number of copies of the base file.
It doesn't render exports of diff's, but I'll list it as a feature request.
You might want to look at some Merge components as what you describe is exactly what Merge tools do between the common base, version control file and local file. Except that you want more than 2 files (+ base)...
Just my $0.02
SourceGear Diffmerge is nice (and free) for windows based file diffing.
I know this is an old thread but vimdiff does (almost) exactly what you're looking for with the added advantage of being able to edit the files right from the diff perspective.
But none of the solutions does more than 3 files still.
What I did was messier, but for the same purpose (comparing contents of multiple config files, no limit except memory and BASH variables)
While loop to read a file into an array:
loadsauce () {
index=0
while read SRCCNT[$index]
do let index=index+1
done < $SRC
}
Again for the target file
loadtarget () {
index=0
while read TRGCNT[$index]
do let index=index+1
done < $TRG
}
string comparison
brutediff () {
# Brute force string compare, probably duplicates diff
# This is very ugly but it will compare every line in SRC against every line in TRG
# Grep might to better, version included for completeness
for selement in $(seq 0 $((${#SRCCNT[#]} - 1)))
do for telement in $(seq 0 $((${#TRGCNT[#]} - 1)))
do [[ "$selement" == "$telement" ]] && echo "${selement} is in ${SRC} and ${TRG}" >> $OUTMATCH
done
done
}
and finally a loop to do it against a list of files
for sauces in $(cat $SRCLIST)
do echo "Checking ${sauces}..."
loadsauce
loadtarget
brutediff
echo -n "Done, "
done
It's still untested/buggy and incomplete (like sorting out duplicates or compiling a list for each line with common files,) but it's definitely a move in the direction OP was asking for.
I do think Perl would be better for this though.

Resources