This seems to be the most reliable in-process data store I found. I tried a few things locally (sig kill, sig term, System.exit(), etc. in the middle of a transaction) and xodus could pick up from where its last good state was.
I'm interested to know whether xodus supports store data over NFS (using an NFS folder as the environment)? Is it possible to corrupt the datastore if the file locking may not work well, like in the case of some NFS, when multiple processes open the same folder from different hosts?
I took a quick look at the lock file (xd.lck, well, at least it looks like a lock file to me), which seems to include pid, host name, and a call stack for the LockingManager. However, I'm not sure how this lock file works with xodus. I found that this file is not removed after the environment closes. Nor does its content change.
It's not recommended to use any kind of remote or removable storage for hosting database files. The database can easily be corrupted - not only on attempt of shared access, but also due to possible connectivity issues. In upcoming versions (released after 1.3.232), an attempt to use remote or removable storage would fail if and where it can be reliably detected.
Related
You know, when an application opens a file and write to it, the system chooses in which cluster will be stored. I want to choose myself ! Let me tell you what I really want to do... In fact, I don't necessarily want to write anything. I have a HDD with a BAD range of clusters in the middle and I want to mark that space as it is occupied by a file, and eventually set it as a hidden-unmoveable-system one (like page file in windows) so that it won't be accessed anymore. Any ideas on how to do that ?
Later Edit:
I think THIS is my last hope. I just found it, but I need to investigate... Maybe a file could be created anywhere and then relocated to the desired cluster. But that requires writing, and the function may fail if that cluster is bad.
I believe the answer to your specific question: "Can I write a file to a specific cluster location" is, in general, "No".
The reason for that is that the architecture of modern operating systems is layered so that the underlying disk store is accessed at a lower level than you can access, and of course disks can be formatted in different ways so there will be different kernel mode drivers that support different formats. Even so, an intelligent disk controller can remap the addresses used by the kernel mode driver anyway. In short there are too many levels of possible redirection for you to be sure that your intervention is happening at the correct level.
If you are talking about Windows - which you haven't stated but which appears to assumed - then you need to be looking at storage drivers in the kernel (see https://learn.microsoft.com/en-us/windows-hardware/drivers/storage/). I think the closest you could reasonably come would be to write your own Installable File System driver (see https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/_ifsk/). This is really a 'filter' as it sits in the IO request chain and can intercept and change IO Request Packets (IRPs). Of course this would run in the kernel, not in userspace, and normally this would be written in C and I note your question is tagged for Delphi.
Your IFS Driver can sit at differnt levels in the request chain. I have used this technique to intercept calls to specific file system locations (paths / file names) and alter the IRP so as to virtualise the request - even calling back to user space from the kernel to resolve how the request should be handled. Using the provided examples implementing basic functionality with an IFS driver is not too involved because it's a filter and not a complete storgae system.
However the very nature of this approach means that another filter can also alter what you are doing in your driver.
You could look at replacing the file system driver that interfaces to the hardware, but I think that's likely to be an excessive task under the circumstances ... and as pointed out already by #fpiette the disk controller hardware can remap your request anyway.
In the days of MSDOS the access to the hardware was simpler and provided by the BIOS which could be hooked to allow the requests to be intercepted. Modern environments aren't that simple anymore. The IFS approach does allow IO to be hooked, but it does not provide the level of control you need.
EDIT regarding suggestion by the OP of using FSCTL_MOVE_FILE
For simple environment this may well do what you want, it is designed to support a defragmentation process.
However I still think there's no guarantee that this actually will do what you want.
You will note from the page you have linked to it states that it is moving one or more virtual clusters of a file from one logical cluster to another within the same volume
This is a code that's passed to the underlying storage drivers which I have referred to above. What the storage layer does is up to the storage layer and will depend on the underlying technology. With more advanced storage there's no guarantee this actually addresses the physical locations which I believe your question is asking about.
However that's entirely dependent on the underlying storage system. For some types of storage relocation by the OS may not be honoured in the same way. As an example consider an enterprise storage array that has a built in data-tiering function. Without the awareness of the OS data will be relocated within the storage based on the tiering algorithms. Also consider that there are technologies which allow data to be directly accessed (like NVMe) and that you are working with 'virtual' and 'logical' clusters, not physical locations.
However, you may well find that in a simple case, with support in the underlying drivers and no remapping done outside the OS and kernel, this does what you need.
Since you problem is to mark bad cluster, you don't need to write any program. Use the command line utility CHKDSK that Windows provides.
I an elevated command prompt (Run as administrator), run the command:
chkdsk /r c:
The check will be done on the next reboot.
Don't forget to read the documentation.
I have a node app that allows people to upload their profile picture. Profile pictures are stored on the file system.
I now what to turn my node app into a docker container.
I would like to be able to deploy it pretty much anywhere (Amazon, etc.) and realise that storing files within the file system is a no-go.
So:
Option 1: store files on Amazon's S3 (or something equivalent)
Option 2: creating a "data volume. This makes me wonder: if I deploy this remotely, will this work? Would this be a long-term way to go about it?
Are volumes what I want to do here? Is this how you use docker volumes in Amazon?
(Damn this stuff is hard to crack...)
The answer is: that depends hehehe
Option 1 is good, resilient, and works just "out-of-the-box", but creates a vendor lock-in. Meaning that if you ever decide to stop using AWS, you'll have some code to refactor. Plus bills for S3 will be high if you perform lots and LOTs of requests.
Option 2 works partially: your docker containers will likely run on VMs on AWS, Azure, etc. which are also ephemeral. Meaning your data can disappear just as quick as if they were on containers, unless you backup them.
Other options I know:
Option 3: AWS have a NFS service (sorry, can't remember its name right now), which seems VERY interesting. In theory, it would be like plugging an USB drive storage to VM instances, which can be mounted as volumes inside the containers. I have never done this myself, but seems viable to reduce S3 costs. However this also generates vendor lock-in.
Option 4: AWS S3 with a caching mechanism for files in the VMs.
If you are just testing, option 1 sounds good! But later on you might have to work on that.
I'm developing a site-specific installation for an office lobby which will display content on 6 iPads. The installation has several megabytes of data which will be managed by a django webapp. I'm considering different strategies for fetching the content data from the web app. So far, I have simply been dumping the data in to xml format and fetching it via a single http request from the iPad to the content server. I then load all of the content in to memory on the iPad.
I'm beginning to have some concern that I may run in to memory issues as the amount of content grows, and that storing the entire database in-memory won't work. The natural next step is to think about a database on the iPads. I'm using sqllite for the content server. Seems to me that it may be feasible to simply download the entire database file itself and query it directly from the iPad.
Proposed Approach
Download the actual sqllite database file nightly from the django content server to each of six iPads used in an office lobby installation.
Things I like about this approach:
It could be really simple. It removes the whole web services layer from the system.
It protects against network problems nicely. If the network is unavailable, the worst problem is that the iPads display stale data, as apposed to there being no content if the system is network-dependent.
Things I don't like about this approach
I'm not sure how to safely download the file. How to I ensure that the file I'm downloading is in a valid state, and I'm not downloading while someone is updating it?
I've never heard of anybody doing this, or even considered doing it. It seems like it's far from tried and true.
My questions
Can anyone think of reasons why this is a bad idea?
How can I safely download a sqllite file with confidence that it's in a valid state?
Why don't you create a Syncing system - perhaps with JSON.
I've done something like this before - I had a central repository server on site that was running my Django web application. The different iPads would sync regularly with the web app's database making sure their local data matched the server data, if not it would update via json.
On the iPad itself, I was using phonegap's SQLITE syntax which worked perfectly for storing the clientside data. But the key was syncing this database via json to the central repositorie's database - rather than physically moving the SQLite db over to the ipad.
I have an application that connects to a database and can be used in multi-user mode, whereby multiple computers can connect the the same database server to view and modify data. One of the clients is always designated to be the 'Master' client. This master also receives text information from either RS232 or UDP input and logs this data every second to a text file on the local machine.
My issue is that the other clients need to access this data from the Master client. I am just wondering the best and most efficient way to proceed to solve this problem. I am considering two options:
Write a folder synchronize class to synchronize the folder on the remote (Master) computer with the folder on the local (client) computer. This would be a threaded, buffered file copying routine.
Implement a client/server so that the Master computer can serve this data to any client that connects and requests the data. The master would send the file over TCP/UDP to the requesting client.
The solution will have to take the following into account:
a. The log files are being written to every second. It must avoid any potential file locking issues.
b. The copying routine should only copy files that have been modified at a later date than the ones already on the client machine.
c. Be as efficient as possible
d. All machines are on a LAN
e. The synchronization need only be performed, say, every 10 minutes or so.
f. The amount of data is only in the order of ~50MB, but once the initial (first) sync is complete, then the amount of data to transfer would only be in the order of ~1MB. This will increase in the future
Which would be the better method to use? What are the pros/cons? I have also seen the Fast File Copy post which i am considering using.
If you use a database, why the "master" writes data to a text file instead of to the database, if those data needs to be shared?
Why invent the wheel? Use rsync instead. Package for windows: cwrsync.
For example, on the Master machine install rsync server, and on the client machines install rsync clients or simply drop files in your project directory. Whenever needed your application on a client machine shall execute rsync.exe requesting to synchronize necessary files from the server.
In order to copy open files you will need to setup Windows Volume Shadow Copy service. Here's a very detailed description on how the Master machine can be setup to allow copying of open files using Windows Volume Shadow Copy.
Write a web service interface, so that the clients an connect to the server and pull new data as needed. Or, you could write it as a subscribe/push mechanism so that clients connect to the server, "subscribe", and then the server pushes all new content to the registered clients. Clients would need to fully sync (get all changes since last sync) when registering, in case they were offline when updates occurred.
Both solutions would work just fine on the LAN, the choice is yours. You might want to also consider those issues related to the technology you choose:
Deployment flexibility. Using file shares and file copy requires file sharing to work, and all LAN users might gain access to the log files.
Longer term plans: File shares are only good on the local network, while IP based solutions work over routed networks, including Internet.
The file-based solution would be significantly easier to implement compared to the IP solution.
I need to synchronize few directories/files within the cluster. Say if a file content changes in one node I need to propagate the change to other nodes so that the file content are same atany point of time.Same applies when some files/directories are deleted. DRBD is not a option so is there any library which can do this for me.
I'd consider using rsync :) A handy tool for syncing between remote hosts.
You need to use a distributed filesystem (GlusterFS comes to mind) that can guarantee the synchronization and locking depending on cluster's usage of the files. Otherwise, you may want to consider a centralized storage served via NFS for the simplicity. Beyond that but still centralized would be a SAN filesystem like GFS but be aware that setup requires more to it like fencing.
Have you considered NFS? SMB? If the updates don't have to be immediate you could consider rsync