Techniques for avoiding dataset path hardcoding

Techniques for avoiding dataset path hardcoding - path

I have some shared projects that are under version control (concretely svn and bazaar, but I'm seeking for a general solution), but the datasets the projects use are not (too big and shared by different projects).
In the source code I need to "store" somewhere the path to the dataset. The path is possibly different for each user, so hardcoding is definitely a bad idea (as always, I guess).
My actual workaround is to hardcode a text file (say "dataPath.txt") where the actual path is stored, and this file is not under version control (each project contributor creates his own file with his customised info).
The solution is, however, quite fragile:
1) if some contributor add to versione control the file it is annoying
2) when I export the "executable", I need to move around the file that is supposed to be in the same dir (relative path).
In my concrete case I'm using Java, so I find this question relevant (even if I've never used properties), but I would like to know if there are more general techniques that can be reused with different programming languages.

Write your program so that it accepts the path to the dataset as a command-line argument. Make sure there are a) sensible defaults if the dataset file is not specified, or b) the program exits gracefully if no dataset file is provided. There is no need to hard-code dataset paths in the source. Then you'd invoke the program e.g. like this (of course you can take any other command-line option character you like :-) ):
prog -d dataPath.txt
In general, providing such settings in a config file is a good idea. With Java, properties help (as pointed out in the SO question you linked). In other languages I'd probably use a JSON-formatted settings file -- parsing libraries are available.

Related

How do I execute ransomware in a windows sandbox environment when the file is not an executable [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I have several malware repositories however I am unable to get bin files to execute or files as windows classifies them. I have included some file names so you can see what I'm working with. I have been trying to mount some of the files which are Bin files with no luck.
Tank_3d.jar
b0ffb939b3df60f8561fadf2cbfa1733_WEXTRACT.EXE_
userinit.exe with a desktop.BIN
why the extra file with the executable?
13ce4cd747e450a129d900e842315328 and windows says type of file is file?
Any help you can provide would be greatly appreciated I have searched the web but I haven't found any sites that tell you how to execute these files for obvious reasons. I have changed some of the file extensions to .exe and some of them will execute in this manner. However, a lot of them still will not. I have conducted static analysis of these files prior to trying to do dynamic analysis. Also I forgot to add I'm doing this research for a university thank you

The question is not completely clear to me but as I understood from what you said, you have some files (probably related to some malware/ransomware) that you don't know how to execute them.
Before just starting to execute a malware or whatever suspicious file, you need to collect as much information as possible about your files. This step is called information gathering. So this is what you need to do:
(these are optional steps and can be changed based on your experience)
Calculate the MD5 hash of the file then search the MD5 value in VirusTotal or Hybrid-Analysis to check if these engines already analyzed this sample or not.
(or you can directly upload your sample to these engines without calculating the MD5 value)
Search on Google for whatever information you have about your file (even you can search the file name itself). You don't want to re-analyze the sample if someone already did that for you unless you are looking for some variants or some specific features. Even in that case, reading other related analysis report can help you do it faster.
Get the type of the file using whatever tool to extract the magic header (signature) of the file. I would say use Linux file command but you can use other tools as well.
Try to open the file in a hex editor/display software (you can find lots of them if you search), to see if there is anything interesting in the file.
use Linux Strings or Windows Strings commands to extract human-readable strings from the file to see what you can find.
Doing all the above mentioned steps, you will have the idea how you should deal with the file.
Use Peid or Die (Detect it easy) to extract the programming language and possible packer name/entropy of the file.
and finally, to execute different file formats:
If it's a .jar file: java -jar sample.jar
If it's a .dll file: use rundll32 or OllyDBG.
If you have an .exe file: just run it.
People who start learning malware analysis, they just try to execute the file or start with dynamic analysis but one needs to know that these steps are very helpful before executing the sample since most of the time you will get what you want from information gathering and static analysis.
If you explain better the problem, maybe people can help you better!
Edit:
I am going to add this part to the answer to cover the comments.
why are there additional files in the malware folder like an executable with a bin file?
This is a simple trick which has been used by malware writers for several years. For example, in one scenario, the main file of the malware can be an executable file (.exe) but it's actually not harmful at all!!!. All it is doing is to download another file (e.g., .dll file) which is the real malicious code (you can call it the payload). However, sending and receiving .dll files is also suspicious, so malware authors use other file extensions or whatever to hide the malicious content (like .bin file or even .png file in one of the variants of Emotet). The problem is that you CAN NOT execute these files just like that! since sometimes there are encrypted/encoded.
You need to know the procedure of executing them which is only possible to know if you reverse engineer the sample.
for example:
13ce4cd747e450a129d900e842315328 -> .DLL file
This means you may be able to analyze it using Ollydbg or any debugger + rundll32 but there is no guarantee!! it may be encrypted or encoded and only the parent file (.exe sample for example) can decrypt/decode it!
I am now interested in performing memory analysis of the malware which I possess. however the problem I encountered was how to execute a lot of the ransomware files I have to examine
I would say it would be nice to execute all of them using Win10 VM + cuckoo sandbox and dump the memory for further analysis. It's all automatic job and can be done nicely.

How does `clangd` know where a function definition is when only one file has been indexed?

How does clangd know where a function definition is when only one file has been indexed through the LSP (Language Server Protocol) message textDocument/didOpen?
This question is based off of the assumption that there is no compile_commands.json file for clangd to work with.
To the best of my knowledge clangd will partially index(?) a given file when clangd receives the LSP message textDocument/didOpen with no compile_commands.json file in the workspace(?).
Thus the index of the file being partially indexed will only reside in memory.
So how is clangd aware of definitions outside of the partially indexed file when it has no awareness of any outside files?
Or is it aware?
Or is it made aware of other files by some heuristic that looks at the relative path or the includes (#include "<filename.hpp>") that only reside in directories root-project-dir/src and the likes thereof?

How to identify what projects have been affected by a code change

I have a large application to manage consisting of of three or four executables and as many as fifty .dlls. Many of the source code files are shared across many of the projects.
The problem is a familiar one to many of us - if I change some source code I want to be able to identify which of the binaries will change and, therefore, what it is appropriate to retest.
A simple approach would be simply to compare file sizes. That is an 80% acceptable solution, but there is at least a theoretical possibility of missing something. Secondly, it gives me very little indication as to WHAT has changed; It would be ideal to get some form of report on this so I can then filter out irrelevant (e.g. dates/versions copyrights etc..)
On the plus side :
all my .dcus are in a row - I mean they are all built into a single folder
the build is controlled by a script (.bat)(easy, for example, to emit .obj files if that helps)
svn makes it easy to collect together any (two) revisions for comparison
On the minus side
There is no policy to include all used units in all projects; some units get included because they are on a search path.
Just knowing that a changed unit is used/compiled by a project is not sufficient proof that the binary is affected.
Before I begin writing some code to solve the problem I would like to ask the panel what suggestions they might have as to how to approach this.
The rules of StackOverflow forbid me to ask for recommended software, but if anyone has any positive experiences of continuous integration tools that would help - great
I am open to any suggestion or observation that is relevant in this context.

It seems to me that your question boils down to knowing which units are contained in your various executables. Since you are using search paths, it will be hard for you to work this out ahead of time. The most robust way to find out is to consult the .map file that the compiler emits. This contains a list of all units contained in your executable.
Once you know which units are contained in each executable, you need to know whether or not anything has changed in those units. That information is contained in your revision control system. Put this all together and you have the information that you need.
Of course, just because the source code for a unit has changed, you might argue that re-testing is not needed. Perhaps the only change made was the version, or the date in a copyright label or some such. But it is asking too much to be able to ask a computer to make such a judgement. At some point you need a human to step up and take responsibility.
What is odd about this though is that you are asking the question at all. It seems to me to be enormously risky to attempt partial testing. I cannot understand why you don't simply retest the entire product.

After using it for > 10 years for commercial in-house and freelancer work in large projects, I can recommend to try Apache Ant. It is a build tool which supports dependencies, and has many very helpful features.
Apache Ant also integrates nicely with CI tools such as Hudson/Jenkins, Bamboo etc.
Another suggestion - based on experience with Maven - is to design the general software architecture as modular as possible. If modules (single or multiple source or DCU files in one directory) use a version number in the directory name as a version number, it is possible to control exactly how application are composed from these modules.

If you want to program such a tool yourself the approach would be something like this:
First you need to detect wheter there were any changes made to seperate source files. As you already figured out comparing the file size is bad idea as the file size can stay the same despite lots of changes made to it (as long as there is same amount of text in pas file its size won't change). So instead you could check the last modification time for specific file or create some hash value like MD5 hash for comparison (can be quite slow).
Then you need to generate yourself a dependancy tree which will tell you which files are used for which project/subproject.
Finally based on changes detected in seperate files you check the dependancy tree to see which projects needs to be recompiled.
The problem of such approach is that you would probably have to update the dependancy tree manually each time when new unit is added to the project or an existing one is removed from the project.
But the best way would be to go and use some version controll software istead of reinventing the wheel. I myself like the way how GIT works and I belive that with proper implementation of GIT into the project mannager itself could be quite powerfull do to GIT support of branching/subbranching (each project is its own branch, each version of your software can be its own subbranch).
Now latest version of Delphi does have GIT integration done though SVN but this unfortunately limits some of best GIT functionality. So if you maybe decide to go and integrate GIT support directly into Delphi I'm first in line to use it.

How do compilers create the executable file at the end of the compilation process?

I've been reading on the compilation process, I understand some of the earlier concepts like parsing but I stop short of understanding how the executable file is created at the end.
In the examples I've seen around the "compiler" takes input in the form of a lang defined by BNF and then upon parsing it outputs assembly.
Is the executable file literally just that assembly in binary form? I feel like this can't be the case given that there are applications for making executables from assembly?
If this isn't answerable (ie it's too complex for the stack overflow format) I'd totally be happy with links/books so I can educate myself.

The compiler (or more specifically, the linker) creates the executable.
The format of the file generally vary depending on the operating system.
There are currently two main formats ELF and COFF
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format
http://en.wikipedia.org/wiki/COFF
If you understand the concept of a structure, this is the same, only within a file. Each file has a first structure called a header, and from there you can access the other structures as required.
In most cases, only the resulting binary code is saved in these files, although you often find debug information. Some formats could save the source along the code, but now a day it only saves the necessary references to the source.
With dynamic linking, you also find symbol tables that include the actual symbol name. Otherwise, only relocation tables would be required.
Under the Amiga we also had the possibility to define code in a "segment". Only one segment could be loaded at a time. Once you were done with the segment, you could unload it and load another. Yet, in the end the concepts were similar. Structures in a file.
Microsoft offers a PDF about the COFF format. I could not find it on their website just now, but it looks like others have it. ELF has many links in the Wikipedia page so you should be able to find a PDF to get started.

Not all but some (gcc, etc) compilers go from the high level language to assembly language then spawn the assembler. The assembler reads the assembly language and generates machine code and generates an object file which as you have guessed contains more than just the machine code bits. If you think of it for second you may realize that a variable or function that is defined in another source file which means its code lives in another object file, until link time one object doesnt know how to get at that external function, so 1) the machine code is not finished, patching up external addresses is not done until link time 2) there needs to be some information in the object file that defines what public items are in this object file and what external items are missing, names of functions for example which are obviously not embedded in the machine code. So the objects have machine code in various states of completion as well as other data needed by the linker. the linker then...links...the objects together into one program with everything resolved, it basically completes all the machine code and puts the fragments of machine code (in separate objects) into one place. Then it has to save all that on the disk in some format and typically that format is not just raw machine code. It has extra stuff in the file, starting with a header and the a way to define each binary blob and where it needs to live in memory before executing. When you run a program on the command line of your operating system or double clicking or whatever in a file manager gui, the operating system knows how to read that file format, extract the blobs of binary, place those blobs of binary in ram defined by this file format and then start executing at the place defined by this file format.
aout, elf, coff, intel hex, motorola s-record are all popular formats as well as raw binary which some toolchains can produce. the gnu tools will default to one (coff or elf or exe or aout) and then objcopy is used to convert from one to another or at least the default one to the others and there is help to show what your possible choices are. then simply google those or wikipedia them and find the definitions of the file formats. Intel hex or motorola srecord are good ones to start with at wikipedia then elf perhaps.

if you want to produce native executable file you have 2 options. you can assembly the binary form yourself or you can transalte your program to another language and use its compiler to producte the executable

Organizing the search path

We create via "Tools | Options | Environment Variables" Variables like that:
$(Sources) = D:\Sources\Delphi
$(OurLib) = $(Sources)\OurLib\Src
$(OurApp1) = $(Sources)\Applications\App1\3.x
$(ThirdParty) = $(Sources)\ThirdPartyComponents
We use these Variables in the project search path like that:
($OurApp1)\Src\Core;($OurApp1)\Src\GUI;($OurApp1)\Src\Plugins;$(ThirdParty)\JVCL
But this is broken (meanwhile fixed) since Delphi 2009 as these variables are not evaluated completely anymore (see QC #73276). So the files in the directories are not found by the compiler. A workaround: Use only complete directories in the environment variables.
We use this approach because on all developer machines and the build servers the files can be found and we only have to point $(Sources) to the right place.
We don't have anything in our global library path (except the Delphi defaults), because that wouldn't be in the version control and isn't reflected on other developers or build machines.
One problem is: If one unit in $(OurLib) decides to include another new unit maybe in a new path, all projects break because they don't find this new unit. Then we have to go through all projects and add the search path. (BTW: I really hate the search path editor...wouldn't be a simple memo field much better to edit than this replace/add/delete logic?)
Another thing we do is not adding many units to our project. Especially everything from $(OurLib), but we often have units like plugins which add functionality only by including them. For different editions of our products, we want to include different units. As Delphi always messes up $IFDEFs in the uses clause in the .dpr we help us by including units named like "IncludePlugins" which then include the units depending on IFDEFs.
But not including units in the project makes navigating to a pain. The units don't appear in the project, they are not found by Ctrl+12 (Show Units), they are not shown in code completion etc.
Has anybody a better way to cope with these problems?

We use only relative paths, any libraries are always below the libs subdirectory while the project source code resides in the src subdir. So our search paths always look like:
..\libs\library1;..\libs\library2\common;
etc.
All libraries are added as svn:external to each project, so checking out the project will automatically check out the libraries as well and the search path will always point to the correct version of the library for that project.
Not perfect, but it works most of the time.
I have to agree about the search path editor, it is even worse for relative paths because you must not use the "..." buttons otherwise Delphi will insert an absolute path.

We use standard drive mappings.
Our current project is always on W: regardless if it is a network drive or a substitute.
This works great.
When you need to work on a different project, swap the W: and you can continue.

You can copy the search path out to an editor, modify it and then copy it back.

Your search path is much too big. It should contain only the things you want Delphi to recompile with your project. You don't really want to recompile the Jedi VCL every day, do you?
I create a single directory where all compiled units go. Say, C:\dcu. Specify that as the "unit output directory" in all packages. My "search path," then, is always just this:
$(Delphi)\Lib;C:\dcu
The compiler finds everything it needs, and it never finds any source code. The only source code it ever sees is in the files that directly belong to whatever project I'm compiling. The project's own source directories don't need to be on the search path because all of those files are already direct members of the project. The compiler knows exactly where they are.
For me, all a project's source files go in a single directory. If you want separate directories for different parts, like Core and GUI, then I would put those in separate packages so I could work on them and compile them separately. Even if the final program doesn't use the resultant BPLs, packages are still a good way of segmenting your project and defining dependencies.
When compiling units for one project doesn't automatically compile units for all the other projects, you're forced to change active projects. It takes a moment of your time, but it also serves as a mental reminder that you're "changing hats," too.
Although you're producing just one product, that doesn't mean you should have just one project in Delphi. You should have at least one project for each executable module (EXE, DLL, BPL) in your product. Use project groups to manage multiple projects in a single IDE session. No unit should be a member of more than one project.
I don't understand your part about plug-ins and different editions of your project. When you say "plug-in," I assume you're talking about separate executable modules, like DLLs or packages, that the customer can choose to include or not. Couldn't you turn your different editions' features into plug-in modules that simply don't include in the lesser editions? Then you don't have to worry about conditional compilation of your project; just have several different installer packagers that grab different sets of plug-ins.

I have always found it odd that this has never been addressed adequately. I suggested recently to David I that Delphi should allow the user to set up some sort of preferred development structure and that third party library publishers could be made aware of this so that they could automatically adjust their installers to install correctly in the preferred development framework. If the preferred development structure was stored in an XML file or similar, then, it could be copied from one computer to another on a development team.
As an alternative, it could make an interesting project to create a Delphi application that would allow a user to "refactor" their library installation in a high level way. You specify which folders on your system contain source or compiled components or whatever and where you want to keep source files or compiled units, hit Go and your system gets rearranged for you, while updating your Delphi environment so that when you start Delphi, it finds everything it should.

I've just recently discovered a way to have project specific environment variables in delphi builds using XE6, it's not quite as good as a full blown #define like in C but at least I can now have consistent search paths across multiple projects and create some shared option sets.
What I've done is setup environment variables in the same manner as the original poster and then override them in the dproj or optionset.
The BuildPaths.optset added to the project looks like
<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<PropertyGroup>
<SVN_Root>..\..\..</SVN_Root>
<SVN_Riemann>$(SVN_Root)\Riemann</SVN_Riemann>
<SVN_Library>$(SVN_Root)\Library</SVN_Library>
<SVN_ThirdParty>$(SVN_Library)\Third Party</SVN_ThirdParty>
</PropertyGroup>
<ProjectExtensions>
<Borland.Personality>Delphi.Personality.12</Borland.Personality>
<Borland.ProjectType>OptionSet</Borland.ProjectType>
<BorlandProject>
<Delphi.Personality/>
</BorlandProject>
<ProjectFileVersion>12</ProjectFileVersion>
</ProjectExtensions>
</Project>

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart