How can I backup CouchDB data, because once we down the hyper ledger fabric network, we lost our previously stored data on CouchDB.
Is there any CouchDB cloud available for storing data?
Yes, IBM Cloudant is a cloud service based on and fully compatible with CouchDB.
IBM also has a hyperledger-based blockchain offer so you might be able to combine both for your project.
FUll disclosure: I work for IBM.
Related
We are planning to host our Artifactory on ECS (Fargate) and mount the data to EFS. We will use an ALB in front of the containers (8081 and 8082) We still have some open issues:
Can we use multiple containers at the same time or will there be upload/write issues to EFS?
Is EFS a good solution or is S3 better?
What about the metadata. I read Artifactory is hosting this in some Derby database. What if we redeploy a new container? Will the data be gone? Can this data be persisted on EFS or do we need RDS?
Can we use multiple containers at the same time or will there be upload/write issues to EFS?
Ans: Yes you can use multiple containers to host Artifactory instances in a single host. However it is generally recommended to use multiple host to avoid the 'single point of failure' scenario. I don't anticipate any RW issues with EFS/S3.
Is EFS a good solution or is S3 better?
Ans: In my opinion both S3 and EFS are better known as scalable solutions rather than high performance oriented and it completely depends on the use-case. You can overcome this issue by enabling cache-fs in Artifactory which will store the frequently used binaries in a defined place (like a local disk with higher RW speeds). You can read more about cache-fs here: https://jfrog.com/knowledge-base/what-is-cache-fs-video/
What about the metadata. I read Artifactory is hosting this in some Derby database. What if we redeploy a new container? Will the data be gone? Can this data be persisted on EFS or do we need RDS?
Ans: when you are configuring more than one Artifactory node it is mandatory to have an external database (RDS) to store the configurations/references. On a side note: Artifactory generates the metadata for the packages/artifacts and store them in the FS only. However the references will be stored in the DB
I have already start a local Fabric network 2.2 by following the official guide (fabcar project).
This is the output of docker ps -a:
docker ps
Now, I have these questions:
Where is the ledger file which stores the transaction history?
How can I see the historical transactions and the created blocks?
I have read the doc in Fabric Website, that doesn't really help.
Ledger data is under /var/hyperledger/production/ledgersData inside your peer container. You might have /var/hyperledger/production mapped to a volume to save your peer state. When using CouchDB to store world state, the world state is stored in the CouchDB container (under something like /opt/couchdb/data, which might be mapped to a volume, too).
In my opinion, the easiest way is to deploy Hyperledger Explorer for your network: https://github.com/hyperledger/blockchain-explorer/blob/main/README.md.
I have created a hyperledger google cloud platform installation.
Secondly I then installed the hyperledger sample network. All this went fine. Also the asset creation went find after I created the static IP on the VM. I now am wondering where my "hello world" asset remained.
I saw that a verification peer should have a /var/hyperledger...
Doing the default google cloud platform installation, what are my peers? This seems all to be hidden. Does that mean that the data is just "out there"?
I am checking how to tweak the google cloud platform installation to have private data storage now.
When you are using the Google cloud platform and using VM to run all your things. then all your information is being stored in the persistant disk you selected while install the platform.
Regarding Assets, you can not see physical the assets in fabric, those are stored in Level DB or CouchDb. default configuration of the fabric is LevelDB.
if you configure CouchDb then you can see the data in URL. Hope, this helps.
I am trying to setup a server for team note taking, and I am wondering what is the best way to backup its data, A.K.A my notes, automatically.
Currently I plan to run the server in a docker image.
The docker image will be hosted by a hosting service (such as Google).
I found a free hosting service that fits my need, but it does not allow mounting volumes to a docker image.
Therefore, I think the only way for me to backup my data is to transfer them to some other cloud services.
However, this requires that I have to store some sort of sensitive data for authentication in my docker image, apparently this is not cool.
So:
Is it possible to transfer data from a docker image to a cloud service without taking the risk of leaking password/private key?
Is there any other way to backup my data?
I don't have to use docker as all I need is actually Node.js.
But the server must be hosted on some remote machines because I don't have the ability/time/money to host a machine on my own...
I use borg backup to backup our servers (including docker volumes) ... and it's saved the day many times due to failure and stupidity.
It transfers over SSH so comms are encrypted. The repositories it uses are also encrypted on disk so that makes all your data safe. It de-duplicates, snapshots, prunes, compresses ... the feature list is quite large.
After the first backup, subsequent backups are much faster because it only submits the changes since the previous backup.
You can also mount the snapshots as filesystems so you can hunt down the single file you deleted or just restore the whole lot. The mounts can also be done remotely.
I've configured ours to backup /home, /etc and the /var/lib/docker/volumes directories (among others).
We rent a few cheap storage VPSs and send the data up to them nightly. They're in different geographic locations with different hosting providers, you know, because we're paranoid.
Beside docker swarm secrets, don't forget bind mounts strategies: you could have your data in a volume.
In that case, you can have a backup strategy done on the host (instead of the container at runtime), which would take that volume, compress it and save it elsewhere. See for instance this answer or this one.
I am currently trying to break into Data engineering and I figured the best way to do this was to get a basic understanding of the Hadoop stack(played around with Cloudera quickstart VM/went through tutorial) and then try to build my own project. I want to build a data pipeline that ingests twitter data, store it in HDFS or HBASE, and then run some sort of analytics on the stored data. I would also prefer that I use real time streaming data, not historical/batch data. My data flow would look like this:
Twitter Stream API --> Flume --> HDFS --> Spark/MapReduce --> Some DB
Does this look like a good way to bring in my data and analyze it?
Also, how would you guys recommend I host/store all this?
Would it be better to have one instance on AWS ec2 for hadoop to run on? or should I run it all in a local vm on my desktop?
I plan to have only one node cluster to start.
First of all, Spark Streaming can read from Twitter, and in CDH, I believe that is the streaming framework of choice.
Your pipeline is reasonable, though I might suggest using Apache NiFi (which is in the Hortonworks HDF distribution), or Streamsets, which is installable in CDH easily, from what I understand.
Note, these are running completely independently of Hadoop. Hint: Docker works great with them. HDFS and YARN are really the only complex components that I would rely on a pre-configured VM for.
Both Nifi and Streamsets give you a drop and drop UI for hooking Twitter to HDFS and "other DB".
Flume can work, and one pipeline is easy, but it just hasn't matured at the level of the other streaming platforms. Personally, I like a Logstash -> Kafka -> Spark Streaming pipeline better, for example because Logstash configuration files are nicer to work with (Twitter plugin builtin). And Kafka works with a bunch of tools.
You could also try out Kafka with Kafka Connect, or use Apache Flink for the whole pipeline.
Primary takeaway, you can bypass Hadoop here, or at least have something like this
Twitter > Streaming Framework > HDFS
.. > Other DB
... > Spark
Regarding running locally or not, as long as you are fine with paying for idle hours on a cloud provider, go ahead.