how data is stored in the storage device in IPFS - storage

I am going through the concept of IPFS. And one of the important aspect in IPFS is Bitswapping which basically deals with how blocks of data are requested using the wantlists.
My question is with regards to once a peer gets the wantlists from other peers,
how does it actually fetch the data from the actual storage device?
What are the steps involved in it?
How does the conversion happen with respect to different storage protocols based on the bitswap requests.
Please help me with these answers.

I'm still learning, so questions like this are a good opportunity to dig deeper :)
how does it actually fetch the data from the actual storage device?
What are the steps involved in it?
Based on the Bitswap api docs, it looks like bitswap operates on a provided libp2p instance and blockstore instance.
The blockstore instance is an abstraction over actual data storage, which could be software abstraction of anything - storage service like S3, virtualized device, or real device.
Based on the configuration bits I've read though, fetching could be done over whichever transports the libp2p instance was configured with, and any connected nodes also support (on per node basis).
Assuming multiple transports are supported on both ends between two nodes, I don't know how best connection is negotiated/dictated by libp2p...
How does the conversion happen with respect to different storage protocols based on the bitswap requests.
IIUC, at the block level there would not be any conversion happening - that would happen at a higher level in the stack (IPLD).
I read through these to get a better understanding:
Bitswap spec
JS-IPFS Bitswap implementation
JS-IPFS Blockservice

Related

Why are read-only nodes called read-only in the case of data store replication?

I was going through the article, https://learn.microsoft.com/en-us/azure/architecture/patterns/cqrs which says, "If separate read and write databases are used, they must be kept in sync". One obvious benefit I can understand from having separate read replicas is that they can be scaled horizontally. However, I have some doubts:
It says, "Updating the database and publishing the event must occur in a single transaction". My understanding is that there is no guarantee that the updated data will be available immediately on the read-only nodes because it depends on when the event will be consumed by the read-only nodes. Did I get it correctly?
Data must be first written to read-only nodes before it can be read i.e. write operations are also performed on the read-only nodes. Why are they called read-only nodes? Is it because the write operations are performed on these nodes not directly by the data producer application; but rather by some serverless function (e.g. AWS Lambda or Azure Function) that picks up the event from the topic (e.g. Kafka topic) to which the write-only node has sent the event?
Is the data sharded across the read-only nodes or does every read-only node have the complete set of data?
All of these have "it depends"-like answers...
Yes, usually, although some implementations might choose to (try to) update read models transactionally with the update. With multiple nodes you're quickly forced to learn the CAP theorem, though, and so in many CQRS contexts, eventual consistency is just accepted as a feature, as the gains from tolerating it usually significantly outweigh the losses.
I suspect the bit you quoted anyway refers to transactionally updating the write store with publishing the event. Even this can be difficult to achieve, and is one of the problems event sourcing seeks to solve.
Yes. It's trivially obvious - in this context - that data must be written before it can be read, but your apps as consumers of the data see them as read-only.
Both are valid outcomes. Usually this part is less an application concern and is more delegated to the capabilities of your chosen read-model infrastructure (Mongo, Cosmos, Dynamo, etc).

Can I write a file to a specific cluster location?

You know, when an application opens a file and write to it, the system chooses in which cluster will be stored. I want to choose myself ! Let me tell you what I really want to do... In fact, I don't necessarily want to write anything. I have a HDD with a BAD range of clusters in the middle and I want to mark that space as it is occupied by a file, and eventually set it as a hidden-unmoveable-system one (like page file in windows) so that it won't be accessed anymore. Any ideas on how to do that ?
Later Edit:
I think THIS is my last hope. I just found it, but I need to investigate... Maybe a file could be created anywhere and then relocated to the desired cluster. But that requires writing, and the function may fail if that cluster is bad.
I believe the answer to your specific question: "Can I write a file to a specific cluster location" is, in general, "No".
The reason for that is that the architecture of modern operating systems is layered so that the underlying disk store is accessed at a lower level than you can access, and of course disks can be formatted in different ways so there will be different kernel mode drivers that support different formats. Even so, an intelligent disk controller can remap the addresses used by the kernel mode driver anyway. In short there are too many levels of possible redirection for you to be sure that your intervention is happening at the correct level.
If you are talking about Windows - which you haven't stated but which appears to assumed - then you need to be looking at storage drivers in the kernel (see https://learn.microsoft.com/en-us/windows-hardware/drivers/storage/). I think the closest you could reasonably come would be to write your own Installable File System driver (see https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/_ifsk/). This is really a 'filter' as it sits in the IO request chain and can intercept and change IO Request Packets (IRPs). Of course this would run in the kernel, not in userspace, and normally this would be written in C and I note your question is tagged for Delphi.
Your IFS Driver can sit at differnt levels in the request chain. I have used this technique to intercept calls to specific file system locations (paths / file names) and alter the IRP so as to virtualise the request - even calling back to user space from the kernel to resolve how the request should be handled. Using the provided examples implementing basic functionality with an IFS driver is not too involved because it's a filter and not a complete storgae system.
However the very nature of this approach means that another filter can also alter what you are doing in your driver.
You could look at replacing the file system driver that interfaces to the hardware, but I think that's likely to be an excessive task under the circumstances ... and as pointed out already by #fpiette the disk controller hardware can remap your request anyway.
In the days of MSDOS the access to the hardware was simpler and provided by the BIOS which could be hooked to allow the requests to be intercepted. Modern environments aren't that simple anymore. The IFS approach does allow IO to be hooked, but it does not provide the level of control you need.
EDIT regarding suggestion by the OP of using FSCTL_MOVE_FILE
For simple environment this may well do what you want, it is designed to support a defragmentation process.
However I still think there's no guarantee that this actually will do what you want.
You will note from the page you have linked to it states that it is moving one or more virtual clusters of a file from one logical cluster to another within the same volume
This is a code that's passed to the underlying storage drivers which I have referred to above. What the storage layer does is up to the storage layer and will depend on the underlying technology. With more advanced storage there's no guarantee this actually addresses the physical locations which I believe your question is asking about.
However that's entirely dependent on the underlying storage system. For some types of storage relocation by the OS may not be honoured in the same way. As an example consider an enterprise storage array that has a built in data-tiering function. Without the awareness of the OS data will be relocated within the storage based on the tiering algorithms. Also consider that there are technologies which allow data to be directly accessed (like NVMe) and that you are working with 'virtual' and 'logical' clusters, not physical locations.
However, you may well find that in a simple case, with support in the underlying drivers and no remapping done outside the OS and kernel, this does what you need.
Since you problem is to mark bad cluster, you don't need to write any program. Use the command line utility CHKDSK that Windows provides.
I an elevated command prompt (Run as administrator), run the command:
chkdsk /r c:
The check will be done on the next reboot.
Don't forget to read the documentation.

How to do Neo4j Cache-based Sharding?

I've been reading Neo4j's Operational Manual on Cache Sharding, and posts all over the web, however I can hardly find any detailed example on how to configure HAProxy for cache sharding(yes the one on Operation Manual is rather brief) on a real-world graph, which may contain multiple node labels.
Has anyone ever done this before? Would be lovely if you could share your experience.
Moreover, I'm a bit confused on the mechanism of the way to shard the graph using HAProxy. How do sub-graphs get cached on certain slaves, merely by providing rules in HAProxy? It surprised me to learn that cache sharding isn't handled by Neo4j.
The goal is to send queries hitting the same region of your graph always to the same instance. This of course means that the request data indicates the region. What to use as "region indicator" is heavily depending on the structure and shape of your graph.
In a lot of cases of customer facing applications people successfully used the current user id and set it as additional http header which is then evaluated by haproxy.

What is the performance overhead of Apache ActiveMQ vs. raw sockets?

We're looking to implement ActiveMQ to handle messaging between two of our servers, over a geographically diverse environment (Australia to the UK and back, via the internet).
I've been looking for some vague indicators of performance round the net but so far have had no luck.
My question: compared to a DIY TCP/SSL implementation of basic messaging, how would ActiveMQ perform? Similar systems of our own can send and receive messages across Australia in 100-150ms, over a SSL layer with an already established connection.
Also, does ActiveMQ persist its TLS/SSL connections, thus saving a substantial amount of time that would already be used in connection creation/teardown?
What I am hoping is that it will at least perform better than HTTPS, at a per-request level.
I am aware that performance can vary remarkably, depending on hardware, networks, code and so on. I'm just after something to start with.
I know the above is a little fuzzy - if you need any clarification please let me know and I will only be too happy to oblige.
Thank you.
What Tim means is that this is not an apples to apples comparison. If you are solely concerned with the performance of a single point to point connection to transfer data, a direct link will give you a good result (although DIY is still a dubious design decision). If you are building a system that requires the transfer of data and you have more complex functional requirements, then a broker-based messaging platform like ActiveMQ will come into play.
You should consider broker-based messaging if you want:
a post-office style system where a producer sends a message, and knows that it will be consumed at some point, even if there is no consumer there at that time
to not care where the consumer of a message is, or how many of them there are
a guarantee that a message will be consumed, even if the consumer that first handle it dies mid-way through the process (transactions, redelivery)
many consumers, with a guarantee that a message will only be consumed once - queues
many consumers that will each react to a single message - topics
These patterns are pretty standard, and apply to all off the shelf messaging products. As a general rule, DIY in this domain is a bad idea, as messaging is complex (see http://www.ohloh.net/p/activemq/estimated_cost for an estimate of how long it would take you do do same); and has many existing implementations of various flavours (some without a broker) that are all well used, commercially supported and don't require you to maintain them. I would think very hard before going down to the TCP level for any sort of data transfer as there is so much prior art.

Sharing data system wide

Good evening.
I'm looking for a method to share data from my application system-wide, so that other applications could read that data and then do whatever they want with it (e.g. format it for display, use it for logging, etc). The data needs to be updated dynamically in the method itself.
WMI came to mind first, but then you've got the issue of applications pausing while reading from WMI. Additionally, i've no real idea how to setup my own namespace or classes if that's even possible in Delphi.
Using files is another idea, but that could get disk heavy, and it's a real awful method to use for realtime data.
Using a driver would probably be the best option, but that's a little too intrusive on the users end for my liking, and i've no idea on where to even start with it.
WM_COPYDATA would be great, but i'm not sure if that's dynamic enough, and whether it'll be heavy on resources or not.
Using TCP/IP would be the best choice for over the network, but obviously is of little use when run on a single system with no networking requirement.
As you can see, i'm struggling to figure out where to go with this. I don't want to go into one method only to find that it's not gonna work out in the end. Essentially, something like a service, or background process, to record data and then allow other applications to read that data. I'm just unsure on methods. I'd prefer to NOT need elevation/UAC to do this, but if needs be, i'll settle for it.
I'm running in Delphi 2010 for this exercise.
Any ideas?
You want to create some Client-Server architecture, which is also called IPC.
Using WM_COPYDATA is a very good idea. I found out it is very fast, lightweight, and efficient on a local machine. And it can be broadcasted over the system, to all applications at once (to be used with care if some application does not handle it correctly).
You can also share some memory, using memory mapped files. This is may be the fastest IPC option around for huge amount of data, but synchronization is a bit complex (if you want to share more than one buffer at once).
Named pipes are a good candidates for local. They tend to be difficult to implement/configure over a network, due to security issues on modern Windows versions (and are using TCP/IP for network communication - so you should better use directly TCP/IP instead).
My personal advice is that you shall implement your data sharing with abstract classes, able to provide several implementations. You may use WM_COPYDATA first, then switch to named pipes, TCP/IP or HTTP in order to spread your application over a network.
For our Open Source Client-Server ORM, we implemented several protocols, including WM_COPY_DATA, named pipe, HTTP, or direct in-process access. You can take a look at the source code provided for implementation patterns. Here are some benchmarks, to give you data from real implementations:
Client server access:
- Http client keep alive: 3001 assertions passed
first in 7.87ms, done in 153.37ms i.e. 6520/s, average 153us
- Http client multi connect: 3001 assertions passed
first in 151us, done in 305.98ms i.e. 3268/s, average 305us
- Named pipe access: 3003 assertions passed
first in 78.67ms, done in 187.15ms i.e. 5343/s, average 187us
- Local window messages: 3002 assertions passed
first in 148us, done in 112.90ms i.e. 8857/s, average 112us
- Direct in process access: 3001 assertions passed
first in 44us, done in 41.69ms i.e. 23981/s, average 41us
Total failed: 0 / 15014 - Client server access PASSED
As you can see, fastest is direct access, then WM_COPY_DATA, then named pipes, then HTTP (i.e. TCP/IP). Message was around 5 KB of JSON data containing 113 rows, retrieved from server, then parsed on the client 100 times (yes, our framework is fast :) ). For huge blocks of data (like 4 MB), WM_COPY_DATA is slower than named pipes or HTTP-TCP/IP.
Where are several IPC (inter-process communication) methods in Windows. Your question is rather general, I can suggest memory-mapped files to store your shared data and message broadcasting via PostMessage to inform other application that the shared data changed.
If you don't mind running another process, you could use one of the NoSQL databases.
I'm pretty sure that a lot of them won't have Delphi drivers, but some of them have REST drivers and hence can be driven from pretty much anything.
Memcached is an easy way to share data between applications. Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects).
A Delphi 2010 client for Memcached can be found on google code:
http://code.google.com/p/delphimemcache/
related question:
Are there any Caching Frameworks for Delphi?
Googling for 'delphi interprocess communication' will give you lots of pointers.
I suggest you take a look at http://madshi.net/, especially MadCodeHook (http://help.madshi.net/madCodeHook.htm)
I have good experience with the product.

Resources