I've had the idea for a procedurally generated virtual drive bouncing around my head for a long time. It wouldn't really have any uses, and really it would just be for the memes, but I've finally settled down and decided to make it.
The idea is to make a fake drive that, whenever a program asks for a sector from, a bit of code will generate that sector on the fly (instead of reading it from a storage medium like a normal drive). Of course, writing to the disk would be impossible, but that's ok - I'm just doing this for fun.
Question is: how do I actually make this appear as a drive?
I should think there's a library out there somewhere that will let you directly, but I haven't been able to find it yet. I don't really know what keywords I should be searching for, either.
I have a lot of experience with Arduinos and hardware - would it be easier to simply connect the SPI pins to an SD card slot and get the Arduino to generate the "sectors"?
I'm thinking of using this to play around with file systems and ridiculously large files - after all, there is no limit to how big the drive can be (since it doesn't require any real memory), besides 32 bit or 64 bit limits, which could be a lot of fun - if only to pretend you have a zettabyte of disk space. Even though it would be read only, I'm curious how Notepad would handle a petabyte txt file.
If anyone has any ideas on how to do this, or knows some better keywords I can use to search for this, let me know!
(I am fairly fluent in Python, Arduino, and I can do a little C if I sit down with a coffee)
You didn't specify how you wanted to tackle this, but on Linux there's Fuse, the Filesystem in Userspace. It's a library that allows you to implement functions that handle the usual filestem operations, i.e. enumerating files and directories, opening files and reading/writing to them.
It also has a python wrapper and a tutorial.
I am looking to speed up the reading of a data file which has been converted from binary (it is my understanding that "binary" can mean a lot of different things - I do not know what type of binary file I have, just that it's a binary file) to plaintext. I looked into reading files quickly awhile ago, and was informed that reading/parsing a binary file is faster than text. So, I would like to parse/read the binary file (that was converted to plaintext) in an effort to speed up the program.
I'm using Matlab for this project (I have a Matlab "program" that needs the data in the file). I guess I need some information on the different "types" of binary, but I really want information on how to read/parse said binary file (I know what I'm looking for in plaintext, so I imagine I'll need to convert that to binary, search the file, then pull the result out into plaintext). The file is a logfile, if that helps in any way.
Thanks.
There are several issues in what you are asking -- however, you need to know the format of the file you are reading. If you can say "At position xx, I can expect to find data yy", that's what you need to know. In you question/comments you talk about searching for strings. You can also do it (much like a text file) "when I find xxxx in the file, give me the following data up to nth character, or up to the next yyyy".
You want to look at the documentation for fread. In the documentation there are snippets of code that will get you started, but as I (and others) said you need to know the format of your binary files. You can use a hex editor to ascertain some information if you are desperate, but what should be quicker is the documentation for the program that outputs these files.
Regarding different "binary files", well, there is least significant byte first or LSB last. You really don't need to know about that for this work. There are also other platform-dependent issues which I am almost certain you don't need to know about (unless you are moving the binary files from Mac to PC to unix machines). If you read to almost the bottom of the fread documentation, there is a section entitled "Reading Files Created on Other Systems" which talks about the issues and how to deal with them.
Another comment that I have to make, you say that "reading/parsing a binary file is faster than text". This is not true (or even if it is, odds are you won't notice the performance gain). In terms of development time, however, reading/parsing a textfile will save you huge amounts of time.
The simple way to store data in a binary file is to use the 'save' command.
If you load from a saved variable it should be significantly faster than if you load from a text file.
I am trying to deserialize an old file format that was serialized in Delphi, it uses binary seralization. I know nothing about the structure of the file except some very high level records that are in it.
What steps would you take to solve this problem? Any tools etc?
A good hexeditor, and use the gray matter to identify structures.
If you get a hint what kind of file it is, you can search for more specialized tools.
Running the unix/Linux "file" command can be good too (*) See Barry's comment below for how it works. It can be a quick check for common filetypes like DBF,ZIP etc hidden by using a different extension.
(*) there are 3rd party builds for windows, but they might lag in versions. If you can do it on a recent *nix distro, it is advised to do so.
The serialization process simply loops over all published properties and streams their value to a text file. If you do not know the exact classes that were streamed to the file you will have a very hard time deserializing the file. (if not impossible)
A good hex editor is first. If the file is read without buffering (eg read directly from a TFileStream) you could gain some information when using ProcMon from SysInternals; You can see exactly what data is read in what chunks and thus determine more quickly where the boundaries are between the structures you already identified.
Is there any difference or advantages using binary a file or XML file with
TClientDataSet.
Binary will be smaller and faster.
XML will be more portable and human readable.
The Binary file will be a little smaller.
The main advantage of the XML format is that you can pass it around via http(s) protocols.
Binary is smaller and faster, but only readable by TClientDataSets.
XML is larger and slower (both are not that bad, i.e. not by orders of magnitude bigger or slower).
XML is readable by people (not recommended in general, but it is doable), and software.
Therefore it is more portable (as Nick wrote).
TClientDataSets can load and save their own style of XML, or you can use the Delphi XML Mapper tool to read and write any kind of XML.
XSLT can for instance be used to transform those XML files into any kind of text, including other XML, HTML, CSV, fixed columns, etc.
In contrast to what Tim indicates, both binary and XML can be transferred through HTTP and HTTPS. However, it is often appreciated sending XML as it is easier to trace.
Without having tested it: I guess the binary format would be quite a lot faster when reading and writing. You'd better do your own benchmarks for that, though.
Another advantage of binary might be, that it cannot be easily edited which prevents people from mucking up the data outside the application.
When using Delphi 2009, we have noticed that if the file has an extension of .XML, it will not save in binary format over an existing dfXMLUTF8 format, even with a LoadFromFile, SaveToFile. Changing the file extension to something else (.DAT, for example) allows saving the file in dfBinary. Our experience is that the binary file, in addition to being somewhat more difficult for the end-user to manipulate (a plus!), is approximately 50% smaller than the dfXMLUTF8 format file.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I am looking for a text editor that will be able to load a 4+ Gigabyte file into it. Textpad doesn't work. I own a copy of it and have been to its support site, it just doesn't do it. Maybe I need new hardware, but that's a different question. The editor needs to be free OR, if its going to cost me, then no more than $30. For Windows.
glogg could also be considered, for a different usage:
Caveat (reported by Simon Tewsi in the comments, Feb. 2013)
One caveat - has two search functions, Main Search and Quick Find.
The lower one, which I assume is Quick Find, is at least an order of magnitude slower than the upper one, which is fast.
I've had to look at monster(runaway) log files (20+ GB). I used hexedit FREE version which can work with any size files. It is also open source. It is a Windows executable.
Jeff Atwood has a post on this here: http://www.codinghorror.com/blog/archives/000229.html
He eventually went with Edit Pad Pro, because "Based on my prior usage history, I felt that EditPad Pro was the best fit: it's quite fast on large text files, has best-of-breed regex support, and it doesn't pretend to be an IDE."
Instead of loading a gigantic log file in an editor, I'm using Unix command line tools like grep, tail, gawk, etc. to filter the interesting parts into a much smaller file and then, I open that.
On Windows, try Cygwin.
Have you tried context editor? It is small and fast.
I Stumbled on this post many times, as I often need to handle huge files (10 Gigas+).
After being tired of buggy and pretty limited freeware, and not willing to pay fo costly editors after trial expired (not worth the money after all), I just used VIM for Windows with great success and satisfaction.
It is simply PERFECT for this need, fully customizable, with ALL feature one can think of when dealing with text files (searching, replacing, reading, etc. you name it)
I am very surprised nobody answered that (Except a previous answer but for MacOS)...
For the record I stumbled on it on this blog post, which wisely adviced it.
It's really tough to handle a 4G file as such. I used to handle larger text files, but I never used to load them in to my editor. I mostly used UltraEdit in my previous company, now I use Notepad++, but I would get just those parts which i needed to edit. (Most of the cases, the files never needed an edit).
Why do u want to load such a big file in to an editor? When I handled files of these size, I used GNU Core Utils. The most common operations i performed on those files were head ( to get the top 250k lines etc ), tail, split, sort, shuf, uniq etc. It's really powerful.
There's a lot of things you can do with GNU Core Utils. I would definitely recommend those, instead of a new editor.
Sorry to post on such an old thread, but I tried several of the tips here, and none of them worked for me.
It's slightly different than a text editor, but I found that Beyond Compare could handle an extremely large (3.6 Gig) file on my Vista 32-bit machine.
This is a file that that Emacs, Large Text File Viewer, HexEdit, and Notepad++ all choked on.
-Eric
My favourite after trying a few to read a 6GB mysqldump file:
PilotEdit Lite http://www.pilotedit.com/
Because:
Memory usage has (somehow?!) never gone above 25MB, so basically no impact on the rest of my system - though it took several minutes to open.
There was an accurate progress bar during that time so I knew how it was getting on.
Once open, simple searching, and browsing through the file all worked as well as a small notepad file.
It's free.
Others I tried...
EmEditor Pro trial was very impressive, the file opened almost instantly, but unfortunately too expensive for my requirements.
EditPad Pro loaded the whole 6GB file into memory and slowed everything to a crawl.
For windows, unix, or Mac? On the Mac or *nix you can use command line or GUI versions of emacs or vim.
For the Mac: TextWrangler to handle big files well. I'm not versed enough on the Windows landscape to help out there.
f you just want to view a large file rather than edit it, there are a couple of freeware programs that read files a chunk at a time rather than trying to load the entire file in to memory. I use these when I need to read through large ( > 5 GB) files.
Large Text File Viewer by swiftgear http://www.swiftgear.com/ltfviewer/features.html
Big File Viewer by Team Walrus.
You'll have to find the link yourself for that last one because the I can only post a maximum of one hyperlink being a newbie.
When I'm faced with an enormous log file, I don't try to look at the whole thing, I use Free File Splitter
Admittedly this is a workaround rather than a solution, and there are times when you would need the whole file. But often I only need to see a few lines from a larger file and that seems to be your problem too. If not, maybe others would find that utility useful.
A viewer that lets you see enormous text files isn't much help if you are trying to get it loaded into Excel to use the Autofilter, for example. Since we all spend the day breaking down problems into smaller parts to be able to solve them, applying the same principle to a large file didn't strike me as contentious.
HxD -- it's a hexeditor, but it allows in place edits, and doesn't barf on large files.
Tweak is a hex editor which can handle edits to very large files, including inserts and deletes.
EmEditor should handle this. As their site claims:
EmEditor is now able to open even larger than 248 GB (or 2.1 billion lines) by opening a
portion of the file with the new custom bar - Large File Controller.
The Large File Controller allows you to specify the beginning point,
end point, and range of the file to be opened. It also allows you to
stop the opening of the file and monitor the real size of the file and
the size of the temporary disk available.
Not free though..
I found that FAR commander could open large files ( I tried 4.2 GB xml file)
And it does not load the entire file in memory and works fast.
Opened 5GB file (quickly) with:
1) Hex Editor Neo
2) 010 editor
Textpad also works well at opening files that size. I have done it many times when having to deal with extremely large log files in the 3-5gb range. Also, using grep to pull out the worthwhile lines and then look at those works great.
The question would need more details.
Do you want just to look at a file (eg. a log file) or to edit it?
Do you have more memory than the size of the file you want to load or less?
For example, TheGun, a very small text editor written in assembly language, claims to "not have an effective file size limit and the maximum size that can be loaded into it is determined by available memory and loading speed of the file. [...] It has been speed optimised for both file load and save."
To abstract the memory limit, I suppose one can use mapped memory. But then, if you need to edit the file, some clever method should be used, like storing in memory the local changes, and applying them chunk by chunk when saving. Might be ineffective in some cases (big search/replace for example).
I have had problems with TextPad on 4G files too. Notepad++ works nicely.
Emacs can handle huge file sizes and you can use it on Windows or *nix.
What OS and CPU are you using? If you are using a 32-bit OS, then a process on your system physically cannot address more than 4GB of memory. Since most text editors try to load the entire file into memory, I doubt you'll find one that will do what you want. It would have to be a very fancy text editor, that can do out-of-core processing, i. e. load a chunk of the file at a time.
You may be able to load such a huge file with if you use a 64-bit text editor on a computer with a 64-bit CPU and a 64-bit operating system. And you have to make sure that you have enough space in your swap partition or your swap file.
Why do you want to load a 4+ GB file into memory? Even if you find a text editor that can do that, does your machine have 4 GB of memory? And unless it has a lot more than 4 GB in physical memory, your machine will slow down a lot and go swap file crazy.
So why do you want a 4+ GB file? If you want to transform it, or do a search and replace, you may be better off writing a small quick program to do it.
I also like notepad++.