Different checksum but same archives

Different checksum but same archives - checksum

We generated two SHA-256 checksums for two archives (*.zip), which they suppose to have exactly the same contents. However, the two checksums are different.
We checked the files and compare them between the two archives. Their contents are the same, however, the last modified time for some of the files and folders are different.
Also when we are doing Linux 'zip', the 'zip' saves the last modified time information that causes the difference of the checksums. (tried to use zip -X, still generate different checksums)
How do you generate the same checksum in this case?

Related

How do I get the target directory in bazel

I've got a genrule that produces some output files but the tool I'm using needs to know where to put the files.
So far, I've been able to get working by using dirname $(location outputfile), but this seems like a very fragile solution

You can read about which make variables are available in a genrule here:
https://docs.bazel.build/versions/master/be/make-variables.html
In particular:
#D: The output directory. If there is only one filename in outs, this
expands to the directory containing that file. If there are multiple
filenames, this variable instead expands to the package's root
directory in the genfiles tree, even if all the generated files belong
to the same subdirectory! If the genrule needs to generate temporary
intermediate files (perhaps as a result of using some other tool like
a compiler) then it should attempt to write the temporary files to #D
(although /tmp will also be writable), and to remove any such
generated temporary files. Especially, avoid writing to directories
containing inputs - they may be on read-only filesystems, and even if
they aren't, doing so would trash the source tree.
In general, if the tool lets you (or if you're writing your own tool) it's best if you give the tool the individual input and output file names. For example, if the tool understands inputs only as directories, and that's usually ok if the directory contains only the things you want, but if it doesn't, then you have to rely on sandboxing to show the tool only the files you want, or you have to manually create temporary directories. Outputs as directories gives you less control over what the outputs are named, and you still have to enumerate the files in the genrule's outs.

What do all the options on GetOptions mean?

The MSDN documentation lists four options, with limited explanation:
Overwrite "Overwrite existing writable files if they conflict with the downloaded files." Does this apply to all files, or just ones we've told TFS we've edited?
GetAll "Gets all files." What files does TFS not normally get?
Preview "Executes a get without modifying the disk." This one seems pretty clear.
Remap "Remaps existing items on the disk to the server items where the content and disk location are not changing." I have no idea what this means.

Overwrite: will blindly overwrite writable files that you have not pended for edit. If you have marked a file as 'writable' then you have violated the contract with TFS and it assumes that you have done this for a good reason (eg, modifying the file without taking a checkout, because you were working offline). This will generally produce a writable conflict on the file, but if you specify this flag, then the writable file will be overwritten.
This only applies to server workspaces (local workspaces are always writable). This has no effect on files that you have pended for edit. Get will always produce conflicts for files that are edited locally and updated on the server; if you want to update files that are checked out, you must undo the checkout (or resolve the conflict with TakeTheirs).
Get All: will download every file and update it, even if TFS believes that the local version is the same as the remote version and that downloading a new version would be a noop. TFS tracks every version that you have locally, as well as remotely, so this is only useful if you edit files locally without checking them out.
If you have kept them writable, then then - as mentioned above - this will be a writable conflict. If you have then marked them read-only then TFS assumes that you have not made any changes and will not bother updating them when you do a get (because it knows the file contents haven't changed). If you have manually changed the file contents, then marking this will update those files to the server version.
Preview: will just fire events and provide results that indicate what would be downloaded with the given parameters.
Remap: is a clever option that allows you to perform an in-place branch switching (which is very common with some version control systems that branch at the repository level - like Git - but somewhat complicated in TFVC.)
Consider that you have mapped $/Foo/main to C:\Foo, and done a get latest. If you update your working folder mappings so that $/Foo/branches/feature now points to C:\Foo, then issue a get with Remap, then the server will download only the changed files between main and branches/feature, so it's an inexpensive way to update your local workspace to a feature branch.
(If you're looking for an example, this functionality exists in the command-line interface and in Team Explorer Everywhere but not in Visual Studio.)

Moving TFS workspace to another PC without re-downloading

I have a TFS workspace which I need to move to my new PC. I have copied the whole folder structure over and ensured that the workspace is mapping to the correct folders. However the "Latest" column for every file displays as "Not downloaded". How can I reconcile this such that TFS is aware that the files match the server version?
The standard answer seems to be to re-download the whole thing. Unfortunately the repository is huge, my connection is unreliable, and I have monthly download quotas. Is there anything in the command-line tools or power tools that can make it compare file hashes or similar and realise that the files are identical?
Thanks.

There's a binary metadata file inside the working copy that stores the mapping of every path in your repo, to the path on the filesystem.
It uses absolute paths - so unless your new project folder occupies the exact same location as it did on the original computer they won't match.
Because it's a binary format, you can't do something simple like mass replace the paths with a text editor or sed.

Is there any simple automated way of finding out all the source files associated with a Delphi project?

I like to backup up the source code set for a project when I release a version. I use GExperts project backups, which seems to gather up all the files in the project manager into the ZIP file. You can also add arbitrary files to this file set, but I'm always conscious of the fact that I haven't necessarily got all the files. Unless I specifically go though the uses clauses and add all the units I have sources for to the project, I'll never be sure of storing all the files necessary to recreate the installable/executable.
I've thought about rolling an app to traverse a project, following all the units used and looking down all the search paths and seeing if there is a source file available for that unit, and building a list of files to back up that way, but hey - maybe someone has already done the work?

You should (highly recommend) look into Version Control.
e.g. SVN (subversion), CVS
This will allow you to control revisions of all of your source. It will allow you to add or remove source files, roll back merge and all other nice things related to managing project sources.
This WILL save your a$%# one day.

You can interpret your question in two ways:
How can I make sure that I backup at least enough files so I can build the project
How can I make sure that I backup not too many files so I can still build the project
The first is to make sure you can build the system at all, the second to allow you to clean up unused files.
For both, a version control system including a separate build system is the way to go.
You then - for each new set of changes - can use these steps to assure that both conditions hold:
On your daily development system, check in the new revision of your source code into your version control system.
On your separate build system, get the latest version of your source control system.
Build the project on the build system; if this fails, go to Step 1, and add the missing files to your version control system from your development system
Start removing (one-by-one) files from the project that you suspect are not needed, then rebuild until it fails.
When the build fails, restore that particular file from the version control system, then continue step 3 with the next candidate
When the build succeed you have the minimum set of files.
Now make a difference overview of the files in your version control system, and the build machine.
Mark the files that are in your version control system but not on your build machine as deprecated or deleted.
Most version control systems have good ways of generating a difference between the files on your development or build system against the files in the version control system (usually fine grained for each historic point in time you added/removed/updated files in your version control system).
The reason you want a separate build system (or two separate development systems) is that you want them to be independent: you use one for developing, and the other for checking if the build is still OK.
This is the first step that in the future you might want to extend this into a continuous integration system (that runs unit tests, automatically creates product setups and much more).
--jeroen

I'm not sure if you're asking about version control or how to be sure you've got all the files.
One useful utility I run occasionally is a program that makes a DirList of all of the files in my dcu output folder. Changing the extensions from .dcu to .pas gives me a list of all of the source code files.
Of course it misses .inc files and other non-.pas files, but perhaps this line of thinking would be helpful to you in some way?
The value of this utility to me is that a second housekeeping utility program then makes a list of all .pas files in my source tree that do not have corresponding .dcu files. This (after a full compile of all programs) generally reveals some "junk" .pas files that are no longer in use.

For getting a list of all units compiled into an executable, you could let the compiler generate a MAP file. This file will contain entries for all the units used.

Comparison between two big directories

I have a large directory that contains only stuff in CS and Math. It is over 16GB in size. The types are text, png, pdf and chm. I have currently two branches: a branch of my brother's and mine. The initial files were the same. I need to compare them. I have tried to use Git, but there is a long loading time.
What is the best way to compare two big directories?
[Mixed Solution]
Do a "ls -R > different_files" in both directories [1]
"sdiff <(echo file1 | md5deep) <(echo file2 | md5deep)" [2]
What do you think? Any drawbacks?
[1] thanks to Paul Tomblin
[2] great thanks to all repliers!

Use fslint: website. One of the options of the tool is "Duplicates". As per the description from the site:
One of the most commonly used features of FSlint is the ability to find duplicate files. The easiest way to remove lint from a hard drive is to discard any duplicate files that may exist. Often a computer user may not know that they have four, five, or more copies of the exact same song in their music collection under different names or directories. Any file type whether it be music, photos, or work documents can easily be copied and replicated on your computer. As the duplicates are collected, they eat away at the available hard drive space. The first menu option offered by FSlint allows you to find and remove these duplicate files.

How to compare 2 folders without pre-existing commands/products:
Simply create a program that scans each directory and creates a file hash of each file. It outputs a file with each relative file path and the file hash.
Run this program on both folders.
Then you simply compare the 2 output files to see if they are the same. To compare those 2 files you just load them into a string and do a string compare.
The hashing algorithm you use doesn't matter. You can use MD5, SHA, CRC, ...
You could also use the file size in the output files to help reduce the chance of collisions.
How to compare 2 folders with pre-existing commands/products:
Now if you just want a program that does it, use diff -r or windiff for windows based systems.

Use md5deep to create recursive md5sum listings of every file in those directories.
You can the use a diff tool to compare the generated listings.

Are you just trying to discover what files are present in one that aren't in the other, and vice versa? A couple of suggestions:
Do a "ls -R" in both directories, redirect to files, and diff the files.
Do a "rsync -n" between them to see what rsync would have to copy if it were to be allowed to copy. (-n means don't do the rsync, just show you what it would do if you ran it without the -n)

I would diffing by comparing the output of md5sum * | sort
That will take you to the files that are different/missing

I know this question has already been answered, however if you are not into writing such a tool yourself, there's a very well working open source project by the name of tardiff available on sourceforge which basically does exactly what you want, and even supports automated creation of patches (in tar format obviously) to account for differences.
Hope this helps

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart