Enumerating Mainline DHT - network-programming

I'm trying to understand why, historically, was a DHT (distributed hash table) a good system to use for decentralized p2p networks.
From an efficiency point-of-view: it's a fantastic way to have a bunch of nodes know how each node is reachable without complicated communication between them (using XOR distance in the case of mainline DHT).
From an anonymity point-of-view, I don't think that's the case: I'd like to know if it is possible to enumerate a DHT's nodes and whether protection from this discovery is a problem that a DHT should even solve.
For example: imagine a DHT with 100 nodes. By virtue of the DHT's design (at least Mainline DHT), a node would (please correct me if I'm wrong):
know that resource X is in node Y
Also know how to reach node Y
I know that a DHT crawler (like https://github.com/boramalper/magnetico) would be able to enumerate all nodes.
Is my reasoning correct, or did I misunderstand the attack vector?
Many thanks

Bittorrent makes no attempt to hide the IP address of any swarm member and on top of that some trackers expose APIs that allow fetching a list of all infohashes and then in turn fetching all IPs for each infohash. So in essence the set of bittorrent peers was mostly public anyway. The DHT adds another way to get this list.
This isn't unique to the bittorrent DHT, other p2p networks have similar properties.
Also note that participating in the DHT is not the same as participating in any particular torrent. A node may simply operate as a pure DHT node without any torrent client attached.

Related

experimenting with BGP using Quagga and a set of openWrt routers

I want to learn and experiment with BGP protocol by some non-trivial scenarios: setup anycasting, see how quickly and what way routing changes after I disconnect some link, etc. As I understand I cannot easily and should not do it on "the real internet" as I would need to register/obtain an Autonomous System(s), obtain a pool of some IP addresses etc, not to mention a chaos I could cause by my experiments.
Therefore I'm considering buying few (5-6) cheap, openWRT compatible routers (I was thinking about MikroTik RB750Gr3), setting up my own small isolated "clone of the internet" and play with BGP using Quagga that I would install on these routers. So now I need help with verifying whether my idea makes sense:
is my understanding (described at the beginning) that I cannot/should not do it on "the real internet" correct? or maybe there are some publicly available "sandboxes" that would allow me to experiment?
is it even possible to create such a small isolated clone of the internet as I described or maybe it will not work because of some reasons that I'm missing? (for example some central registry like IANA would need to be also present on my clone or something else that I'm not aware of?)
is there maybe an easier/simpler way to conduct such experiments than by purchasing several routers? Maybe I could somehow create several interconnected virtual networks on Qemu-KVM/libvirt instead and play there? (I couldn't google anything related)
is Quagga BGP software capable of doing what I intend or maybe it has some limitations which will not allow me to try some/many of the typical "real internet" scenarios?
assuming that I'm more or less on the right track up to this moment, is the MikroTik RB750Gr3 a good model to conduct such experiments? or maybe I could use something significantly cheaper? or maybe the opposite: I need something "more capable"?
are there any resources on the web that describe more or less the thing that I intend to do? so far I found mostly either very high-level overviews of BGP or documents that describe situation from the point of view of a single AS.
I've asked this question originally on network engineering stackexchange, but it turned that openWRT and Quagga are forbidden topics there, so it was closed immediately: hope here is a good place ;)
I admire the ambition, but I don't see how a network of 5-6 routers is really going to be an effective "clone of the Internet".
You talk of "how quickly and in what way routing changes after I disconnect some link" -- but in the real Internet, the response to a failed link depends first on the network in which that happens, and then on how the resulting change(s) propagate across the "BGP Mesh", which in turn depends on the networks involved. A small network of closely connected routers is going to struggle to simulate that -- even if you could find out how to configure your simulation to emulate real networks.
You say that the resources you have found describe BGP in terms of a single AS. I guess that mostly because all the BGP does is to exchange routeing information between an AS and the neighbour ASes it is connected to. In a way, global routeing is a "emergent property" of all the individual AS-to-AS BGP connections across the Internet -- what I call the "BGP Mesh". If you look for "Route Flap Damping" or "BGP Route Convergence" or "BGP Route Stability" using your chosen search engine, you should start to find stuff related to the behavior the BGP Mesh. Also caida.org, RIPE and renesys.
With a 5-6 routers and data extracted from the RIPE or Caida route-collectors, you could probably set up a Quagga instance to be some AS connected to a couple of transit providers and three or four peers... but it sounds like a lot of work for not mush return.
Sorry to be negative.
It's been a while since I last did anything with Quagga, but it's a capable BGP implementation. There's also BIRD.

Geo-aware partitioning in cassandra

I'm am currently planning to setup a service that should be (sooner or later) globally available with high demands on availability and fault tolerance. There will be both a high read and hight write ratio and the system should be able to scale on demand.
A more special property of my planned service is, that the data will be extremely bound to a certain geo-location - e.g. in 99.99% of all cases, data meant for a city in the USA will never be queried from Europe (actually even data meant for a certain city will unlikely be queried from the city next to that city).
What I want to minimize is:
Administration overhead
Network latency
Unnecessary data replication (I don't want to have a full replication of the data meant for Europe in USA)
In terms of storage technologies I think that my best storage solution would be cassandra. The options that I see for my use-case are:
Use a completely isolated cassandra cluster per geo-location combined with a manually configured routing service that chooses the right cluster per insert/select query
Deploy a global cluster and define multiple data centers for certain geo-locations to ensure high availability in that regions
Deploy a global cluster without using data centers
Deploy a global cluster without using data centers and manipulate the partitioning to be geo-aware. My plan here is to manipulate the first 3 bits of the partition-key based on the geo-location (e.g. 000: North America, 001: South America, 010: Africa, 011: South/West Europe, etc.) and to assign the remaining bits by using a hash algorithm (similar to cassandras random partitioner).
The disadvantage of solution 1 would probably be a huge administrative overhead and a lot of manual work; the disadvantage of the second solution would be a huge amount of unnecessary data replication; and the disadvantage of the third solution would be a quite high network latency due to random partitioning across the world.
Therefore, in theory, I like solution 4 most. Here I would have a fair amount of administrative overhead, a low amount of unnecessary data replication and a decent availability. However, to implement this (as far as I know) I will need a ByteOrderPartitioning, which is highly disrecommended from many sources.
Is there a way to implement a solution close to solution 4 without using ByteOrderPartitioning, or is this a case where ByteOrderPartitioning could make sense or am I missing one obvious fifth solution?
Reconsider option 2.
Not only will it solve your problems. It will even solve geo-redundancy for you. As you mentioned you need to have high availability. Having one copy in a different datacenter sounds good in case that one of the datacenters dies.
If you are dead set on refraining from replication between DCs, then thats an option too. You can have multiple DCs over different regions without replicating between them.

Is there a reason that Cassandra doesn't have Geospatial support?

Since Cassandra is based off of the Dynamo paper (distributed, self-balancing hash table) + BigTable and there are spatial indexes that would fit nicely into that paradigm (quadkey or geohash). Is there a reason that Geospatial support hasn't been implemented?
You could add a GeoPoint datatype as a tuple with an internal geohash and specify a CF as containing geo data. From there you can choose the behavior as having the geo data being a secondary index, or a denormalized SCF. That could lay the ground work for geospatial development and you could start by implementing some low hanging fruit such as .nearby() which could just return columns that share the same geohash. (I know that wouldn't give you the "nearest", you'd have to do a walk of surrounding geohashes or use a shape and a space filling curve for that which could be implemented later, but is a general operation for finding some nearby columns)
I know SimpleGeo/Urban Airship built geo support into Cassandra, but it doesn't look like that was ever opened up. Also, let me know if there's a better place to ask this (quora, mailing lists, etc...)
I think there are two parts to the answer.
The reason for why it's not there, is because nobody who commits code into Cassandra has thought of this feature, or thought that this capability is of high enough priority to spend major time on it. Most of the development in Cassandra is done by Datastax, and they, being a commercial entity, are privy to user demands and suggestions and also pretty pragmatic about what can give them the most ROI in terms of new features.
If there were a good enough third-party developer (or a team) with enough time on their hands, this could be done, and conceptually C* committers would likely have no problems about adding a major feature like this.
The second aspect is that Cassandra supports blobs (byte arrays), which means that what you're describing can be implemented in the client app/driver in a relatively straightforward manner. The drive would in that case be responsible for translating geo calls into appropriate raw byte operations. I'm also suspecting this would be less work than supporting a whole new data primitive with relevant set of operators in the core storage engine.

Middleware to build data-gathering and monitoring for a distributed system [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I am currently looking for a good middleware to build a solution to for a monitoring and maintenance system. We are tasked with the challenge to monitor, gather data from and maintain a distributed system consisting of up to 10,000 individual nodes.
The system is clustered into groups of 5-20 nodes. Each group produces data (as a team) by processing incoming sensor data. Each group has a dedicated node (blue boxes) acting as a facade/proxy for the group, exposing data and state from the group to the outside world. These clusters are geographically separated and may connect to the outside world over different networks (one may run over fiber, another over 3G/Satellite). It is likely we will experience both shorter (seconds/minutes) and longer (hours) outages. The data is persisted by each cluster locally.
This data needs to be collected (continuously and reliably) by external & centralized server(s) (green boxes) for further processing, analysis and viewing by various clients (orange boxes). Also, we need to monitor the state of all nodes through each groups proxy node. It is not required to monitor each node directly, even though it would be good if the middleware could support that (handle heartbeat/state messages from ~10,000 nodes). In case of proxy failure, other methods are available to pinpoint individual nodes.
Furthermore, we need to be able to interact with each node to tweak settings etc. but that seems to be more easily solved since that is mostly manually handled per-node when needed. Some batch tweaking may be needed, but all-in-all it looks like a standard RPC situation (Web Service or alike). Of course, if the middleware can handle this too, via some Request/Response mechanism that would be a plus.
Requirements:
1000+ nodes publishing/offering continuous data
Data needs to be reliably (in some way) and continuously gathered to one or more servers. This will likely be built on top of the middleware using some kind of explicit request/response to ask for lost data. If this could be handled automatically by the middleware this is of course a plus.
More than one server/subscriber needs to be able to be connected to the same data producer/publisher and receive the same data
Data rate is max in the range of 10-20 per second per group
Messages sizes range from maybe ~100 bytes to 4-5 kbytes
Nodes range from embedded constrained systems to normal COTS Linux/Windows boxes
Nodes generally use C/C++, servers and clients generally C++/C#
Nodes should (preferable) not need to install additional SW or servers, i.e. one dedicated broker or extra service per node is expensive
Security will be message-based, i.e. no transport security needed
We are looking for a solution that can handle the communication between primarily proxy nodes (blue) and servers (green) for the data publishing/polling/downloading and from clients (orange) to individual nodes (RPC style) for tweaking settings.
There seems to be a lot of discussions and recommendations for the reversed situation; distributing data from server(s) to many clients, but it has been harder to find information related to the described situation. The general solution seems to be to use SNMP, Nagios, Ganglia etc. to monitor and modify large number of nodes, but the tricky part for us is the data gathering.
We have briefly looked at solutions like DDS, ZeroMQ, RabbitMQ (broker needed on all nodes?), SNMP, various monitoring tools, Web Services (JSON-RPC, REST/Protocol Buffers) etc.
So, do you have any recommendations for an easy-to-use, robust, stable, light, cross-platform, cross-language middleware (or other) solution that would fit the bill? As simple as possible but not simpler.
Disclosure: I am a long-time DDS specialist/enthusiast and I work for one of the DDS vendors.
Good DDS implementations will provide you with what you are looking for. Collection of data and monitoring of nodes is a traditional use-case for DDS and should be its sweet spot. Interacting with nodes and tweaking them is possible as well, for example by using so-called content filters to send data to a particular node. This assumes that you have a means to uniquely identify each node in the system, for example by means of a string or integer ID.
Because of the hierarchical nature of the system and its sheer (potential) size, you will probably have to introduce some routing mechanisms to forward data between clusters. Some DDS implementations can provide generic services for that. Bridging to other technologies, like DBMS or web-interfaces, is often supported as well.
Especially if you have multicast at your disposal, discovery of all participants in the system can be done automatically and will require minimal configuration. This is not required though.
To me, it looks like your system is complicated enough to require customization. I do not believe that any solution will "fit the bill easily", especially if your system needs to be fault-tolerant and robust. Most of all, you need to be aware of your requirements. A few words about DDS in the context of the ones you have mentioned:
1000+ nodes publishing/offering continuous data
This is a big number, but should be possible, especially since you have the option to take advantage of the data-partitioning features supported by DDS.
Data needs to be reliably (in some way) and continuously gathered to
one or more servers. This will likely be built on top of the
middleware using some kind of explicit request/response to ask for
lost data. If this could be handled automatically by the middleware
this is of course a plus.
DDS supports a rich set of so-called Quality of Service (QoS) settings specifying how the infrastructure should treat that data it is distributing. These are name-value pairs set by the developer. Reliability and data-availability area among the supported QoS-es. This should take care of your requirement automatically.
More than one server/subscriber needs to be able to be connected to
the same data producer/publisher and receive the same data
One-to-many or many-to-many distribution is a common use-case.
Data rate is max in the range of 10-20 per second per group
Adding up to a total maximum of 20,000 messages per second is doable, especially if data-flows are partitioned.
Messages sizes range from maybe ~100 bytes to 4-5 kbytes
As long as messages do not get excessively large, the number of messages is typically more limiting than the total amount of kbytes transported over the wire -- unless large messages are of very complicated structure.
Nodes range from embedded constrained systems to normal COTS
Linux/Windows boxes
Some DDS implementations support a large range of OS/platform combinations, which can be mixed in a system.
Nodes generally use C/C++, servers and clients generally C++/C#
These are typically supported and can be mixed in a system.
Nodes should (preferable) not need to install additional SW or
servers, i.e. one dedicated broker or extra service per node is
expensive
Such options are available, but the need for extra services depends on the DDS implementation and the features you want to use.
Security will be message-based, i.e. no transport security needed
That certainly makes life easier for you -- but not so much for those who have to implement that protection at the message level. DDS Security is one of the newer standards in the DDS ecosystem that provides a comprehensive security model transparent to the application.
Seems ZeroMQ will fit the bill easily, with no central infrastructure to manage. Since your monitoring servers are fixed, it's really quite a simple problem to solve. This section in the 0MQ Guide may help:
http://zguide.zeromq.org/page:all#Distributed-Logging-and-Monitoring
You mention "reliability", but could you specify the actual set of failures you want to recover? If you are using TCP then the network is by definition "reliable" already.

p2p long term storage

We are building a long term preservation cluster made of 3 geographical (far) node of 32TB (each one).
The 3 nodes must have the same files (3 level redoundancy).
My idea is to use a p2p protocol to keep the 3 nodes syncronized. I mean: If someone puts a file (a document) on one node (using a specific web based app), the other 2 nodes must take a copy of it (in asyncronous way) automatically.
I searched for p2p file systems but it seems, in general, that they split files in many nodes and optimize access performances, which is not our case. We need only an automated replica system. We prevent large amount of files.
Anyone knows some open source project can help?
Thanks.
P2P is overkill in your case. Especially if your server have all a public address. rsync or something similar would be much easier to implement.

Resources