Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I've been confused by this topic several times and been bitten by it. I've looked up many best practices articles and definitions but it seems to me that paths aren't exactly defined and what is an absolute and relative path seems to vary a lot depending on the context.
Short version of the question: Always carry around (and convert into) Absolute Paths in code? But always write them out as relative.
As a hobbyist I switch context constantly due to working on backend and frontend at the same time and I haven't come up with rules to be consistent and create portable code if paths are involved.
Essentially there seem to be to types of paths. Absolute Path and Relative Path. But I constantly come upon a third type which to me would probably be named Absolute Relative Path. The challenge for me is to know which one I should store and what complications could arise.
This is an Absolute Path:
/home/me/css/style.css
But not on Windows. What would the above do on Windows? It has to be this:
C:/home/me/css/style.css
What the Absolute Path that would work on Linux refers to using Windows is something completely different. It becomes relative to the current partition.
/home/me/css/style.css
becomes
C:/home/me/css/style.css
This is correct in this case, but might not be if the actual application is running on a different partition. Using Absolute Paths "Unix style" is therefore problematic if portability is required.
This is an Absolute Path to a directory:
/home/me/css/
But this is also an Absolute Path to a directory:
/home/me/css
Adding another slash is therefore purely conveying additional information to the writer, is it?
This is a Relative Path:
css/style.css
And to a directory:
css/
This is an Absolute Relative Path if used in an HTML document:
Link
While it refers to topmost css directory irregardless of where the address is currently pointing to, it still refers to the directory the webserver is handing out. As such it is absolute in use, but relative regarding the actual filesystem. Or in many cases, to the current working directory. The same is true for pure Relative Paths. If I create such paths I therefore have to keep in mind that even though they look like Absolute Paths, they are in fact not. Confusing.
If the application itself is moved to a different directory I therefore have to come up with a solution. And this is where I wonder if it is a bad idea to ever store Relative Paths. Say we write a static blog generator called blogger which is downloaded via git to some directory.
/home/me/blogger
All paths in the application are relative, we therefore call something akin to create_dir('output') in pseudocode. Now we might think it is a good idea to move the application to somewhere else and give the user the chance to supply the output directory himself. Suddenly all path definitions will break. They become create_dir(absolute($output_dir) + 'output').
I find this kind of calculation to become very long and complicated at some point. Especially if there are still some files or directories which stay relative to the application, like templates or similar. But they might also move in the future so I find myself writing code like copy_dir(cwd() + 'templates', absolute($output_dir) + 'output').
But wait, the current working dir is not necessarily the place where the application resides, especially on unix-like systems.
And how does absolute generate an absolute path if the application doesn't know whether it's absolute to the application directory or absolute to the current directory? We must assume the latter, so calling absolute() is not a good idea, rather cwd() + $output_dir. Thankfully cwd() usually returns an already absolute path. We can only do this if the path the user provided is actually relative, but not if it's absolute. Thankfully standard libraries have ways to deal with that, but the naive approach would blow up and we would end up with something like "C:/C:/css".
Assuming the user never provides absolute paths we could use:
copy_dir(absolute(app_dir()) + 'output', cwd() + $output_dir)
Then I say to myself, well, I have to do this everytime I work with those files, so why not do it once and carry around $abs_output_dir instead? Suddenly I come across the problem that I have to output an Absolute Relative Path into an HTML document and I can't use $abs_output_dir because I then somehow have to cut out the Absolute Part so it is relative to the user supplied $output_dir. At this point my mind usually brakes down and I either hardcode every path, or I carry around $output_dir $abs_output_dir and $rel_abs_output_dir as well as $abs_app_dir and try to keep all up to date.
As such the idea to only carry around Absolute Paths seems to be a good one, unless we every come to the point where we have to output Relative Paths or Absolute Relative Paths to a file. Is that how programmers usually do it?
Related
I'm trying to implement a couple of services using terraform and I am not quite sure how to efficiently handle variables (ideally the proper terraform way).
Let's say I want to spin up a couple of vms in a couple of datacenters, one each and every datacenter differs slightly (think aws-regions, VPC-IDs, Securitygroup-IDs etc.)
Currently (in ansible) I have a dict that contains a dict per region containing the configuration specific to the region.
I would like to be able to deploy each datacenter on its own.
I have read through a lot of documentation and I came up with a couple of ways I could use to realise this.
1. use vars-files
have one vars-file per datacenter containing exactly the config per DC and call terraform -var-file ${file}
That somehow seems not that cool, but I'd rethink that if there was a way to dynamically load the vars-file according to the datacenter-name I set.
2. use maps
have loads of maps in an auto-loaded vars-file and reference them by data-center-name.
I've looked at this and that does not look like it's really readable in the future. It could work out if I create separate workspaces per datacenter, but since maps are string -> string only I can't use lists.
3. use an external source
Somehow that sounds good, but since the docs already label the external data source as an 'escape hatch for exceptional situations' it's probably not what I'm looking for.
4. use modules and vars in .tf-file
Set up a module that does the work, set up one directory per datacenter, set up one .tf-file per datacenter-directory that contains the appropriate variables and uses the module
Seems the most elegant, but then I don't have one central config but lots of them to keep track of.
Which way is the 'proper' way to tackle this?
To at least provide an answer to anyone else that's got the same problem:
I went ahead with option 4.
That means I've set up modules that take care of orchestrating the services, defaulting all variables I use to reflect the testing-environment (as in: If you don't specify anything extra you're setting up testing, if you want anything else you've got to override the defaults).
Then I set up three 'branches' in my directory tree, testing, staging and production, and added subdirectories for every datacenter/region.
Every region-directory contains main.tf that sources the modules, all but testing contain terraform.tfvars that define the overrides. I also have backend.tf in all of those directories that defines the backends for state-storage and locking.
I initially thought that doing it this way is a bit too complex and that I may be overengineering the problem, but it turned out that this solution is easier to understand and maintain.
On the iOS filesystem, is there a way to optimize file access performance by using a tiered directory structure vs. a flat directory structure?
Specifically, my app has Objects that each contain a number of images and data files. A user could create thousands of these Objects and I need to optimize access to one image for ~100 arbitrary Objects at a time.
In this situation, how should I organize files on the filesystem? Would a tiered directory structure be faster than a flat one? And if so, how should I structure the tiered system (i.e. how many tiers, and how many subdirectories / files per tier)?
THANKS!
Well first of all you might as well try it with a flat structure to see if it is slow or not. Perhaps apple has put in code to optimize how files are found and you don't even need to worry about this. You can probably build out the whole app and just test how quickly it loads and see if that meets your requirements.
If you need to speed it up I would suggest trying to make some sort of structure based on the name of the file. You could have a folder which has all of the items beginning with the letter 'a' or 'b' and so on and so forth. This would split it into 26 folders which should significantly decrease the amount of items in each. Depending on how you name the files you might want a different scheme so that each of the folders had a similar amount of items in it
If you are using Core Data, you could always just enable the Allows External Storage option in the attribute of your model and let the system decide where it should go.
That would be a decent first step to see if the performance is ok.
I am looking for information on tools, methods, techniques for analysis of file path names. I am not talking file size, read/write times, or file types, but analysis of the path or URL it self.
I am only aware of basic word frequency text tools or methods, but I am wondering if there is something more advanced that people use/apply to this to try and mine extra information out of them.
Thanks!
UPDATE:
Here is the most narrow example of what I would want. OK, so I have some full path names as strings like this:
F:\Task_Order_Projects\TO_01_NYS\Models\MapShedMaps\Random_File1.doc
F:\Task_Order_Projects\TO_01_NYS\Models\MapShedMaps\Random_File2.doc
F:\Task_Order_Projects\TO_01_NYS\Models\MapShedMaps\Random_File3.doc
F:\Task_Order_Projects\TO_01_NYS\Models\MapShedMaps\Random_File4.doc
F:\Task_Order_Projects\TO_01_NYS\Models\MapShedMaps\Random_File5.doc
F:\Task_Order_Projects\TO_02_NYS\Models\MapShedMaps\Random_File1.doc
F:\Task_Order_Projects\TO_02_NYS\Models\MapShedMaps\Random_File2.doc
F:\Task_Order_Projects\TO_02_NYS\Models\MapShedMaps\Random_File3.doc
F:\Task_Order_Projects\TO_02_NYS\Models\MapShedMaps\Random_File4.doc
F:\Task_Order_Projects\TO_02_NYS\Models\MapShedMaps\Random_File5.doc
What I want to know is that the folder MapShedMaps appears "uniquely" 2 times. If I do frequency on the strings I would get 10 appearances. The issues is that I don’t know what level in the directory this is important, so I would like a unique count at each level of the directory based on what I am describing.
This is an extremely broad question so it is difficult for me to give you a per say "Answer" but I will give you my first thoughts on this.
First,
the Regular expression class of .NET is extremely useful for parsing large amounts of information. It is so powerful that it will easily confuse the impatient, however once mastered it can be used across text editors, .NET and pretty much any other respectable language I believe. This would allow you to search strings and separate it into directories. This could be overkill depending on how you use it, but its a thought. Here is a favorite link of mine to try out some regular expressions.
Second,
You will need a database, I prefer to use SQL. Look into how to connect to databases and create databases. With this database you can store all the fields abstracted from your original path entered. Such as a parent directory, child directory, common file types accessed. Just have a field for each one of these and through queries you can form a hypothesis as to redundancy.
Third,
I don't know if its easily accessible but you might look into whether windows stores accessed file history. It seems to have some inkling as to which files have been opened in the past. So there may be a resource in windows which already stores much of the information you would be storing in your database. If you could find a way to access this information. Parse it with regular expressions and resubmit it to the database of your application. You could control the WORLD! j/k... You could get a pretty good prediction as to user access patterns though.
Fourth,
I always try to stick with what I have available. If .NET is sitting in front of you, hammer away at what your trying to do. If you reach a wall. At least your making forward progress. In today's motion towards object orientated programming, you can usually change data collected by one program into an acceptable format for another. You just gotta dig a little.
Oh and btw, Coursera.com is actually doing a free class on machine learning and algorithms. You might want to check it out or reference it for prediction formulas.
Good Luck.
I wanted to post this as a comment but SO kept editing the double \ to \ and it is important there are two because \ is a key character, without another \ to escape it, regex will interpret it as a command.
Hey I just wanted to let you know I've been playing with some regex... I know a pretty easy way to code this up in VB.net and I'll post that as my second answer but I wanted you to check out back-references. If the part between parenthesis matches it captures that text and moves on to the second query for instance....
F:\\(directory1)?(directory2)?(directory3)?
You could use these matches to find out how many directories each parent directory has under it. Are you following me? Here is a reference.
I've seen that there are unofficial fanmade English translation patches for several Japanese games. I can see that the Japanese strings in the program have to be modify into English and any Japanese text in the textures have to be modified.
Now, I am wondering what are some of the tools they use to know where to look for these resources and possibly how to modify the binary and other things and still make the game work?
Generally, this is done using software that can extract and modify resources in an executable file and modify them in place. Depending on the specific application and/or operating system, this approach will allow you to modify icons, menus, strings, and the labels on UI controls, among other things.
A common utility for this purpose was Resource Hacker by Angus Johnson. However, it is no longer under active development, and has not been released as open source. Other alternatives include:
XN Resource Editor
Resource Hacker FX
For example, in the screenshot below I am using Resource Hacker to modify one of the dialog boxes used by the 7-Zip File Manager application:
A hexadecimal editor of your choice can also be used to make modifications to the raw, binary source code that is compiled into the executable file. This can allow you to make changes to strings that haven't been placed into a string table for easy modification.
It's worth noting that this is a much more error-prone way of making modifications. It's extremely easy to corrupt the binary by overriding the wrong sequence. Generally, you must replace a string with another string of exactly the same length.
And, of course, always work on a copy of the original executable!
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Using Rails, is there a reason why I should store attachments (could be a file of any time), in the filesystem instead of in the database? The database seems simpler to me, no need to worry about filesystem paths, structure, etc., you just look in your blob field. But most people seem to use the filesystem that it leaves me guessing that there must be some benefits to doing so that I'm not getting, or some disadvantages to using the database for such storage. (In this case, I'm using postgres).
This is a pretty standard design question, and there isn't really a "one true answer".
The rule of thumb I typically follow is "data goes in databases, files go in files".
Some of the considerations to keep in mind:
If a file is stored in the database, how are you going to serve it out via http? Remember, you need to set the content type, filename, etc. If it's a file on the filesystem, the web server takes care of all that stuff for you. Very quickly and efficiently (perhaps even in kernel space), no interpreted code needed.
Files are typically big. Big databases are certainly viable, but they are slow and inconvenient to back up etc. Why make your database huge when you don't have to?
Much like 2., it's really easy to copy files to multiple machines. Say you're running a cluster, you can just periodically rsync the filesystem from your master machine to your slaves and use standard static http serving. Obviously databases can be clustered as well, it's just not necessarily as intuitive.
On the flip side of 3, if you're already clustering your database, then having to deal with clustered files in addition is administrative complexity. This would be a reason to consider storing files in the DB, I'd say.
Blob data in databases is typically opaque. You can't filter it, sort by it, or group by it. That lessens the value of storing it in the database.
On the flip side, databases understand concurrency. You can use your standard model of transaction isolation to ensure that two clients don't try to edit the same file at the same time. This might be nice. Not to say you couldn't use lockfiles, but now you've got two things to understand instead of one.
Accessibility. Files in a filesystem can be opened with regular tools. Vi, Photoshop, Word, whatever you need. This can be convenient. How are you gonna open that word document out of a blob field?
Permissions. Filesystems have permissions, and they can be a pain in the rear. Conversely, they might be useful to your application. Permissions will really bite you if you're taking advantage of 7, because it's almost guaranteed that your web server runs with different permissions than your applications.
Cacheing (from sarah mei below). This plays into the http question above on the client side (are you going to remember to set lifetimes correctly?). On the server side files on a filesystem are a very well-understood and optimized access pattern. Large blob fields may or may not be optimized well by your database, and you're almost guaranteed to have an additional network trip from the database to the web server as well.
In short, people tend to use filesystems for files because they support file-like idioms the best. There's no reason you have to do it though, and filesystems are becoming more and more like databases so it wouldn't surprise me at all to see a complete convergence eventually.
There's some good advice about using the filesystem for files, but here's something else to think about. If you are storing sensitive or secure files/attachments, using the DB really is the only way to go. I have built apps where the data can't be put out on a file. It has to be put into the DB for security reasons. You can't leave it in a file system for a user on the server/machine to look at or take with them without proper securty. Using a high-class DB like Oracle, you can lock that data down very tightly and ensure that only appropriate users have access to that data.
But the other points made are very valid. If you're simply doing things like avatar images or non-sensitive info, the filesystem is generally faster and more convenient for most plugin systems.
The DB is pretty easy to setup for sending files back; it's a little bit more work, but just a few minutes if you know what you're doing. So yes, the filesystem is the better way to go overall, IMO, but the DB is the only viable choice when security or sensitive data is a major concern.
I don't see what the problem with blobstores is. You can always reconstruct a file system store from it, e.g. by caching the stuff to the local web server while the system is being used.
But the authoritative store should always be the database. Which means you can deploy your application by tossing in the database and exporting the code from source control. Done.
And adding a web server is no issue at all.
Erik's answer is great. I will also add that if you want to do any caching, it's much easier and more straightforward to cache static files than to cache database contents.
If you use a plugin such as Paperclip, you don't have to worry about anything either. There's this thing called the filesystem, which is where files should go. Just because it is a bit harder doesn't mean you should put your files in the wrong place. And with paperclip (or other similar plugins) it isn't hard. So, gogo filesystem!
Unable to find an up-to-date answer to this question I have implemented an
database service for Active Storage available since Rails 5.2 that works just like any other Active Storage service, but stores file content in a special database column instead of a cloud service.
The implementation is based on a standard Rails Active Storage service, adding a migration with a new model: an extra table that stores blob contents in a binary field. The service creates and destroys records in this table as requested by Active Storage.
Therefore, this service, once installed, can be consumed via a standard Rails Active Storage API.
https://github.com/TitovDigital/activestorage-database-service
Please be aware of all pros and cons of using a database for storing files.
With the right database it will provide full ACID support and can wrap file storage and deletion into transactions. It is also much easier in DevOps as there is one less service to configure.
Large files or large traffic are the risky cases. Either will put an unnecessary strain on the app and database servers.