I'm looking for an easy implementation to send the old logs from Graylog automatically to s3 to save disk space.
Thanks!
Graylog offers archiving capabilities, using S3 compliant storage as a backend, in it's commercial offering. Graylog Ops or Graylog Security both offer this functionality and are available in self-managed or cloud-based platforms.
Related
How can I backup CouchDB data, because once we down the hyper ledger fabric network, we lost our previously stored data on CouchDB.
Is there any CouchDB cloud available for storing data?
Yes, IBM Cloudant is a cloud service based on and fully compatible with CouchDB.
IBM also has a hyperledger-based blockchain offer so you might be able to combine both for your project.
FUll disclosure: I work for IBM.
I have taken an moodle AMI from AWS MARKETPLACE (moodle by bitnami) and launched an instance ,
my instance is up and running and working fine, but if i upload any videos or images in that moodle, where will be my data gets stored. I didnot created any S3 buckets or RDS,
Please help me if any one had already took this aws moodle by bitnami in AWS Marketplace
Bitnami Engineer here,
The data is stored in the instance where Moodle is running. If you access the instance using a SSH connection, you can get the app's files from /opt/bitnami/apps/moodle/htdocs.
I hope this information helps.
I have a Moodle site running on AWS and my opinion is that hosting this way does not take full advantage of a cloud based solution. You may wish to consider using EC2 service to run the code, EFS service for moodledata files,RDS for a managed database, Eleasticache for Redis caching, ELB for load balancing to multiple EC2 instances and termination of https, S3 Glacier for backups. If you have more one EC2 instance you can use spot instances to save money.
I am trying to find a way to configure MinIO using Docker to back-end into a single S3 bucket, enabling my client to expose S3 capabilities to their internal customers.
To meet some very specialized compliance rules in an air-gapped environment, my client was provisioned a single bucket in an on-premise S3-compatible solution. They cannot get additional buckets but need to provide their internal organizational customers access to S3 capabilities, including the ability to leverage buckets, ACLs, etc. The requirement is to use their existing S3 storage bucket and not other on-premise storage.
I tried Minio gateway but it tries to create and manage new buckets on the underlying S3 provider. I couldn't find anything like a "prefix" capability I could supply to force it to only work inside {host}/{bucketName} instead of the root endpoint for their keys.
Minio server might work but we'd need to mount a docker volume to their underlying bucket and I'm concerned about the solution becoming brittle. Also, I can't seem to find any well-regarded, production-ready, vendor-supported S3 volume drivers. Since I don't have a volume plugin, I haven't validated performance yet, though I'm concerned it will be sub-par as well.
How can I, in a docker environment, make gateway work to provide bucket/user/management capabilities all rooted in a single underlying bucket/folder? I'm open to alternative designs provided I can meet the customer's requirements (run via docker, store in their underlying S3 storage, provide ability to provision and secure new buckets).
I am currently trying to break into Data engineering and I figured the best way to do this was to get a basic understanding of the Hadoop stack(played around with Cloudera quickstart VM/went through tutorial) and then try to build my own project. I want to build a data pipeline that ingests twitter data, store it in HDFS or HBASE, and then run some sort of analytics on the stored data. I would also prefer that I use real time streaming data, not historical/batch data. My data flow would look like this:
Twitter Stream API --> Flume --> HDFS --> Spark/MapReduce --> Some DB
Does this look like a good way to bring in my data and analyze it?
Also, how would you guys recommend I host/store all this?
Would it be better to have one instance on AWS ec2 for hadoop to run on? or should I run it all in a local vm on my desktop?
I plan to have only one node cluster to start.
First of all, Spark Streaming can read from Twitter, and in CDH, I believe that is the streaming framework of choice.
Your pipeline is reasonable, though I might suggest using Apache NiFi (which is in the Hortonworks HDF distribution), or Streamsets, which is installable in CDH easily, from what I understand.
Note, these are running completely independently of Hadoop. Hint: Docker works great with them. HDFS and YARN are really the only complex components that I would rely on a pre-configured VM for.
Both Nifi and Streamsets give you a drop and drop UI for hooking Twitter to HDFS and "other DB".
Flume can work, and one pipeline is easy, but it just hasn't matured at the level of the other streaming platforms. Personally, I like a Logstash -> Kafka -> Spark Streaming pipeline better, for example because Logstash configuration files are nicer to work with (Twitter plugin builtin). And Kafka works with a bunch of tools.
You could also try out Kafka with Kafka Connect, or use Apache Flink for the whole pipeline.
Primary takeaway, you can bypass Hadoop here, or at least have something like this
Twitter > Streaming Framework > HDFS
.. > Other DB
... > Spark
Regarding running locally or not, as long as you are fine with paying for idle hours on a cloud provider, go ahead.
Could someone please advice how to take the backup of Consul cluster datastore to S3. I know we can use EBS snapshot but I would like to have a script to move the Consult datastore over to S3 instead of snapshot approach which is not very effective.
I use Consul-Snapshot at work to backup our Consul data centers. You can provide AWS credentials or use an IAM role so Consul-Snapshot can upload the snapshots to an AWS S3 bucket.