What is the Diff file syntax - parsing

I am currently playing around with parsing diff files, and have yet to come across a solid documentation on diff files.
I am especially interested in specifications. E.g. I don't really understand the lines that look like this (at the beginning of each changed code block):
## -296,7 +296,8 ##
I know they have to do with line numbers, and how much lines have changed, but I wasn't really able to figure out the details so far.
What is the syntax of the output diff files (at least, the main parts)?

Check out the documentation for GNU diffutils. There you'll find this section:
Next come one or more hunks of differences; each hunk shows one area where the files differ. Unified format hunks look like this:
## from-file-line-numbers to-file-line-numbers ##
line-from-either-file
line-from-either-file...
If a hunk contains just one line, only its start line number appears. Otherwise its line numbers look like ‘start,count’. An empty hunk is considered to start at the line that follows the hunk.
If a hunk and its context contain two or more lines, its line numbers look like ‘start,count’. Otherwise only its end line number appears. An empty hunk is considered to end at the line that precedes the hunk.
The lines common to both files begin with a space character. The lines that actually differ between the two files have one of the following indicator characters in the left print column:
‘+’
A line was added here to the first file.
‘-’
A line was removed here from the first file.

The Wikipedia page on the diff utility describes the format pretty well.

Related

Can I have vi number lines starting from 0

I frequently use the vi command
:set number
I recently was trying to align some data that had a zero based index with the line numbers in vim, which has a 1 based index. Is there a way to have vi start numbering lines from 0, or more broadly from any other starting point?
#David's comment led me on a bit of a goose chase eventually finding this link: Add line numbers to source code in vim
For my purposes what I needed was not necessarily a persistent numbering, but just a temporary one. Quickly insert/remove line numbers and then resort to the standard numbering. For this case, I hadn't thought of simply inserting then deleting my own line numbering. The below works well. And the possibilities are endless for numbering (see for example https://stackoverflow.com/a/252774/2816571)
%!awk '{print NR,$0}'
For a native vi solution to adding numbers then (from https://stackoverflow.com/a/253041/2816571) the below will insert line numbers then the space character:
:%s/^/\=line('.').' '/
Then (thanks Lance) https://stackoverflow.com/a/4674574/2816571 when done with the numbering, do something like this, but you might need to tweak the search pattern depending if you put a space, comma separator
:%s/^[0-9]* //
If there is something I am missing inside vim that can do the persistent 0 based numbering, please let me know.

Ruby convention for chaining calls over multiple lines

What are the conventions for this?
I use the folowing style, but not sure it is the preferred one since if I miss a dot at the end I can run into a lot of issue without realising that.
query = reservations_scope.for_company(current_company).joins{property.development}.
group{property.development.id}.
group{property.development.name}.
group{property.number}.
group{created_at}.
group{price}.
group{reservation_path}.
group{company_id}.
group{user_id}.
group{fee_paid_date}.
group{contract_exchanged_date}.
group{deposit_paid_date}.
group{cancelled_date}.
select_with_reserving_agent_name_for(current_company, [
"developments.id as dev_id",
"developments.name as dev_name",
"properties.number as prop_number",
"reservations.created_at",
"reservations.price",
"reservations.fee_paid_date",
"reservations.contract_exchanged_date",
"reservations.deposit_paid_date",
"reservations.cancelled_date"
]).reorder("developments.name")
query.to_a # ....
So what are the conventions for chaining methods over multiple lines and which one should I prefer?
NOTE: I couldn't find a good example from the Ruby coding style guide.
There is actually a section on that in the Ruby style guide:
Adopt a consistent multi-line method chaining style. There are two
popular styles in the Ruby community, both of which are considered
good - leading . (Option A) and trailing . (Option B).
(Option A) When continuing a chained method invocation on
another line keep the . on the second line.
# bad - need to consult first line to understand second line
one.two.three.
four
# good - it's immediately clear what's going on the second line
one.two.three
.four
(Option B) When continuing a chained method invocation on another line,
include the . on the first line to indicate that the
expression continues.
# bad - need to read ahead to the second line to know that the chain continues
one.two.three
.four
# good - it's immediately clear that the expression continues beyond the first line
one.two.three.
four
A discussion on the merits of both alternative styles can be found
here.
In Ruby 1.9+ it's possible to write like this:
query = reservations_scope
.for_company(current_company)
.joins{property.development}
.group{property.development.id}
.group{property.development.name}
.group{property.number}
.group{created_at}
.group{price}
.group{reservation_path}
.group{company_id}
.group{user_id}
Much more readable, I think.
Here is a complete list of pros and cons of four options. Two of the options have not been mentioned in any other answer.
Pros and cons can be broken into unique ones and shared ones. Shared pros are the inverses of a unique con of another option. Similarly, shared cons are the inverses of a unique pro of another option. There are also some points that are pros for two options and cons for the other two.
To avoid repeating explanations, I describe each option’s shared pros and cons with just a summary of that point. Full details about a shared pro or con are available in the description of their inverse con or pro in another option’s unique section. For the points that are pros of two options and cons of the other two, I arbitrarily chose to put the full explanations in the set that starts with “. at line beginning”.
For a shorter list that leaves the shared pros and cons implicit instead of repeating them, see this old version of this answer.
. at line end
items.get.lazy.
take(10).
force
Pros
Shared with only one other option:
Continuing lines can be commented out freely, and comments can be added between lines
Pastable into IRB/Pry
Supported in Ruby 1.8
Shared with two other options:
When you read the initial line, it is clear that the expression continues
Plenty of horizontal space for continuing lines
Doesn’t require manual alignment of characters into columns
Looks fine when viewed in a proportional font
Has a minimum of punctuation, reducing typing and visual noise
Cons
Unique:
Continuing lines look strange on their own. You must read the preceding line to understand that they are a continuation.
Indentation is not a reliable indicator that a line continues from the previous line – it could merely mean the start of a block.
Shared with only one other option:
When editing the code, it is harder to comment out or reorder the last line
. at line beginning
items.get.lazy
.take(10)
.force
Pros
Shared with only one other option:
When editing the code, it is easier to comment out or change the order of the last line – no need to delete and add . or \.
Shared with two other options:
Continuing lines can be understood when seen on their own
Plenty of horizontal space for continuing lines
Doesn’t require manual alignment of characters into columns
Looks fine when viewed in a proportional font
Has a minimum of punctuation, reducing typing and visual noise
Cons
Unique:
When you read the initial line, it’s not immediately clear that the expression continues
If you use this in your codebase, then when you read a line, you must always check the line after to make sure that it doesn’t affect the initial line’s meaning.
Shared with only one other option:
The code silently breaks if you # comment out a continuing line, or add a comment between lines
You can’t paste this code into IRB/Pry without it being misinterpreted
Not supported in Ruby 1.8 and below
. at line beginning, indented to the previous .
items.get.lazy
.take(10)
.force
Pros
Shared with only one other option:
When editing the code, it is easier to comment out or change the order of the last line – no need to delete and add . or \.
Shared with two other options:
When you read the initial line, it is clear that the expression continues
Continuing lines can be understood when seen on their own
Has a minimum of punctuation, reducing typing and visual noise
Cons
Unique:
Each line’s code must fit into less horizontal space
Requires manual alignment of the .s into columns
It is easier if you have an editor plugin for aligning text, but still more work than using default indentation rules.
Even if your editing setup includes a plugin for alignment, your coworkers’ setups may not.
The code will look misaligned when viewed in a proportional font
Shared with only one other option:
The code silently breaks if you # comment out a continuing line, or add a comment between lines
You can’t paste this code into IRB/Pry without it being misinterpreted
Not supported in Ruby 1.8 and below
Shared with two other options:
When editing the code, it is harder to comment out or reorder the last line
\ at line end, . at next line’s beginning
items.get.lazy \
.take(10) \
.force
Pros
Shared with only one other option:
Continuing lines can be commented out freely, and comments can be added between lines
Pastable into IRB/Pry
Supported in Ruby 1.8
Shared with two other options:
When you read the initial line, it is clear that the expression continues
Continuing lines can be understood when seen on their own
Plenty of horizontal space for continuing lines
Doesn’t require manual alignment of characters into columns
Looks fine when viewed in a proportional font
Cons
Unique:
Requires more typing
Creates more visual noise
Shared with only one other option:
When editing the code, it is harder to comment out or reorder the last line
The reason why I choose the dot in the end of the line is that it will allow you to paste code in an IRB session. Also, you can't comment lines in the middle of the multi-line code if you use the dots in the beginning of the lines. Here's a good discussion to read: https://github.com/bbatsov/ruby-style-guide/pull/176

LaTeX - Listings - Code Indention

so I'm working on some kind of homework paper about git and I want to insert some console output examples. I'm working with TextMate.
I have my LaTeX code indented like every other normal source code, to make it more readable.
My question now is, why get listings in my output pdf indented and how do I prevent that.
Some example code:
\begin{lstlisting}
$ git ls-files
README
TU_Logo_SW.pdf
beleg.pdf
beleg.tex
\end{lstlisting}
In my file there is one tab in front of \begin and two in the lines following.
When I run pdflatex the code will be indented with two tabstops in the pdf. Quickfix is to format all the listings without indention in my tex file, but thats pretty ugly ;-(
lstlisting has a key that lets you remove spaces:
\begin{lstlisting}[gobble=4] will remove the first four characters from every input line in the environment. (I think a tab should still count as one character at that point.)

Tex command which affects the next complete word

Is it possible to have a TeX command which will take the whole next word (or the next letters up to but not including the next punctuation symbol) as an argument and not only the next letter or {} group?
I’d like to have a \caps command on certain acronyms but don’t want to type curly brackets over and over.
First of all create your command, for example
\def\capsimpl#1{{\sc #1}}% Your main macro
The solution to catch a space or punctuation:
\catcode`\#=11
\def\addtopunct#1{\expandafter\let\csname punct#\meaning#1\endcsname\let}
\addtopunct{ }
\addtopunct{.} \addtopunct{,} \addtopunct{?}
\addtopunct{!} \addtopunct{;} \addtopunct{:}
\newtoks\capsarg
\def\caps{\capsarg{}\futurelet\punctlet\capsx}
\def\capsx{\expandafter\ifx\csname punct#\meaning\punctlet\endcsname\let
\expandafter\capsend
\else \expandafter\continuecaps\fi}
\def\capsend{\expandafter\capsimpl\expandafter{\the\capsarg}}
\def\continuecaps#1{\capsarg=\expandafter{\the\capsarg#1}\futurelet\punctlet\capsx}
\catcode`\#=12
#Debilski - I wrote something similar to your active * code for the acronyms in my thesis. I activated < and then \def<#1> to print the acronym, as well as the expansion if it's the first time it's encountered. I also went a bit off the deep end by allowing defining the expansions in-line and using the .aux files to send the expansions "back in time" if they're used before they're declared, or to report errors if an acronym is never declared.
Overall, it seemed like it would be a good idea at the time - I rarely needed < to be catcode 12 in my actual text (since all my macros were in a separate .sty file), and I made it behave in math mode, so I couldn't foresee any difficulties. But boy was it brittle... I don't know how many times I accidentally broke my build by changing something seemingly unrelated. So all that to say, be very careful activating characters that are even remotely commonly-used.
On the other hand, with XeTeX and higher unicode characters, it's probably a lot safer, and there are generally easy ways to type these extra characters, such as making a multi (or compose) key (I usually map either numlock or one of the windows keys to this), so that e.g. multi-!-! produces ¡). Or if you're running in emacs, you can use C-\ to switch into TeX input mode briefly to insert unicode by typing the TeX command for it (though this is a pain for actually typing TeX documents, since it intercepts your actual \'s, and please please don't try defining your own escape character!)
Regarding whitespace after commands: see package xspace, and TeX FAQ item Commands gobble following space.
Now why this is very difficult: as you noted yourself, things like that can only be done by changing catcodes, it seems. Catcodes are assigned to characters when TeX reads them, and TeX reads one line at a time, so you can not do anything with other spaces on the same line, IMHO. There might be a way around this, but I do not see it.
Dangerous code below!
This code will do what you want only at the end of the line, so if what you want is more "fluent" typing without brackets, but you are willing to hit 'return' after each acronym (and not run any auto-indent later), you can use this:
\def\caps{\begingroup\catcode`^^20 =11\mcaps}
\def\mcaps#1{\def\next##1 {\sc #1##1\catcode`^^20 =10\endgroup\ }\next}
One solution might be setting another character as active and using this one for escaping. This does not remove the need for a closing character but avoids typing the \caps macro, thus making it overall easier to type.
Therefore under very special circumstances, the following works.
\catcode`\*=\active
\def*#1*{\textsc{\MakeTextLowercase{#1}}}
Now follows an *Acronym*.
Unfortunately, this makes uses of \section*{} impossible without additional macro definitions.
In Xetex, it seems to be possible to exploit unicode characters for this, so one could define
\catcode`\•=\active
\def•#1•{\textsc{\MakeTextLowercase{#1}}}
Now follows an •Acronym•.
Which should reduce the effects on other commands but of course needs to have the character ‘•’ mapped to the keyboard somewhere to be of use.

How do I "diff" multiple files against a single base file?

I have a configuration file that I consider to be my "base" configuration. I'd like to compare up to 10 other configuration files against that single base file. I'm looking for a report where each file is compared against the base file.
I've been looking at diff and sdiff, but they don't completely offer what I am looking for.
I've considered diff'ing the base against each file individually, but my problem then become merging those into a report. Ideally, if the same line is missing in all 10 config files (when compared to the base config), I'd like that reported in an easy to visualize manner.
Notice that some rows are missing in several of the config files (when compared individually to the base). I'd like to be able to put those on the same line (as above).
Note, the screenshot above is simply a mockup, and not an actual application.
I've looked at using some Delphi controls for this and writing my own (I have Delphi 2007), but if there is a program that already does this, I'd prefer it.
The Delphi controls I've looked at are TDiff, and the TrmDiff* components included in rmcontrols.
For people that are still wondering how to do this, diffuse is the closest answer, it does N-way merge by way of displaying all files and doing three way merge among neighboors.
None of the existing diff/merge tools will do what you want. Based on your sample screenshot you're looking for an algorithm that performs alignments over multiple files and gives appropriate weights based on line similarity.
The first issue is weighting the alignment based on line similarity. Most popular alignment algorithms, including the one used by GNU diff, TDiff, and TrmDiff, do an alignment based on line hashes, and just check whether the lines match exactly or not. You can pre-process the lines to remove whitespace or change everything to lower-case, but that's it. Add, remove, or change a letter and the alignment things the entire line is different. Any alignment of different lines at that point is purely accidental.
Beyond Compare does take line similarity into account, but it really only works for 2-way comparisons. Compare It! also has some sort of similarity algorithm, but it also limited to 2-way comparisons. It can slow down the comparison dramatically, and I'm not aware of any other component or program, commercial or open source, that even tries.
The other issue is that you also want a multi-file comparison. That means either running the 2-way diff algorithm a bunch of times and stitching the results together or finding an algorithm that does multiple alignments at once.
Stitching will be difficult: your sample shows that the original file can have missing lines, so you'd need to compare every file to every other file to get the a bunch of alignments, and then you'd need to work out the best way to match those alignments up. A naive stitching algorithm is pretty easy to do, but it will get messed up by trivial matches (blank lines for example).
There are research papers that cover aligning multiple sequences at once, but they're usually focused on DNA comparisons, you'd definitely have to code it up yourself. Wikipedia covers a lot of the basics, then you'd probably need to switch to Google Scholar.
Sequence alignment
Multiple sequence alignment
Gap penalty
Try Scooter Software's Beyond Compare. It supports 3-way merge and is written in Delphi / Kylix for multi-platform support. I've used it pretty extensively (even over a VPN) and it's performed well.
for f in file1 file2 file3 file4 file5; do echo "$f\n\n">> outF; diff $f baseFile >> outF; echo "\n\n">> outF; done
Diff3 should help. If you're on Windows, you can use it from Cygwin or from diffutils.
I made my own diff tool DirDiff because I didn't want parts that match two times on screen, and differing parts above eachother for easy comparison. You could use it in directory-mode on a directory with an equal number of copies of the base file.
It doesn't render exports of diff's, but I'll list it as a feature request.
You might want to look at some Merge components as what you describe is exactly what Merge tools do between the common base, version control file and local file. Except that you want more than 2 files (+ base)...
Just my $0.02
SourceGear Diffmerge is nice (and free) for windows based file diffing.
I know this is an old thread but vimdiff does (almost) exactly what you're looking for with the added advantage of being able to edit the files right from the diff perspective.
But none of the solutions does more than 3 files still.
What I did was messier, but for the same purpose (comparing contents of multiple config files, no limit except memory and BASH variables)
While loop to read a file into an array:
loadsauce () {
index=0
while read SRCCNT[$index]
do let index=index+1
done < $SRC
}
Again for the target file
loadtarget () {
index=0
while read TRGCNT[$index]
do let index=index+1
done < $TRG
}
string comparison
brutediff () {
# Brute force string compare, probably duplicates diff
# This is very ugly but it will compare every line in SRC against every line in TRG
# Grep might to better, version included for completeness
for selement in $(seq 0 $((${#SRCCNT[#]} - 1)))
do for telement in $(seq 0 $((${#TRGCNT[#]} - 1)))
do [[ "$selement" == "$telement" ]] && echo "${selement} is in ${SRC} and ${TRG}" >> $OUTMATCH
done
done
}
and finally a loop to do it against a list of files
for sauces in $(cat $SRCLIST)
do echo "Checking ${sauces}..."
loadsauce
loadtarget
brutediff
echo -n "Done, "
done
It's still untested/buggy and incomplete (like sorting out duplicates or compiling a list for each line with common files,) but it's definitely a move in the direction OP was asking for.
I do think Perl would be better for this though.

Resources