Is there an Erlang API for bigquery?
I would like to use Bigquery from Google Compute Engine in a Linux instance.
I would like to run Riak NoSQL there.
As far as I can tell, there is no Erlang client for BigQuery. You can always generate the HTTP REST requests by hand -- it is relatively straightforward for most use-cases. Alternately, you could execute a shell command that runs the bq.py command line client.
Related
The bounty expires in 18 hours. Answers to this question are eligible for a +50 reputation bounty.
okonomichiyaki is looking for an answer from a reputable source.
I have a Python application (using Flask) which uses the Google Cloud Vision API client (via pip package google-cloud-vision) to analyze images for text using OCR (via TEXT_DETECTION feature in the API). This works fine when run locally providing Google credentials on the command line via GOOGLE_APPLICATION_CREDENTIALS environment variable pointing to the JSON file I got from a service account in my project with access to the Vision API. It also works fine locally in a Docker container, when the same JSON file is injected via a volume (following the recommendations in the Cloud run docs).
However, when I deploy my application to Cloud run, the same code fails to successfully make a request to the Cloud Vision API in a timely manner, and eventually times out. (the Flask app returns an HTTP 504) Then the container seems to become unhealthy: all subsequent requests (even those not interacting with the Vision API) also time out.
In the Cloud run logs, the last thing logged appears to be related to Google cloud authentication:
DEBUG:google.auth.transport.requests:Making request: GET http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/?recursive=true
I believe my project is configured correctly to access this API: as already stated I can use the API locally via the environment variable. And the service is running in Cloud Run using this same service account (at least I believe it is, serviceAccountName field in the YAML tab matches, and I'm deploying it via gcloud run deploy --service-account ...)
Furthermore, the application can access the same Vision API without using the official Python client (locally and in Cloud run), when accessed using an API key and a plain HTTP POST via requests package. So the Cloud run deploy seems to be able to make this API call and the project has access. But there is something wrong with the project in the context of the combination of Cloud run and the official Python client.
I admit this is basically a duplicate of this 4 year old question. But aside from being old that has no resolution and I hope I can provided more details that might help get a good answer. I would much prefer to use the official client
I am looking at how to get the information on the number of consumers from a Kafka server running in a docker container.
But I'll also take almost any info to help point me in a direction that is forward movement. I've been trying through Python ond URI requests, but I'm getting the feeling I need to get back to Java to ask Kafka questions on it's status?
In relation to the current threads I've seen, many handy scripts from $KAFKA_HOME are referenced but, the configured systems I have access to do not have $KAFKA_HOME defined - nor do they have the contents of that directory. My world is a docker container without a CLI access. So I haven't been able to apply the solutions requiring shell scripts or other tools from $KAFKA_HOME to my running system.
One of the things I have tried is a python script using requests.get(uri...)
where the uri looks like:
http://localhost:9092/connectors/
The code looks like:
r = requests.get("http://%s:%s/connectors" % (config.parameters['kafkaServerIPAddress'],config.parameters['kafkaServerPort']))
currentConnectors=r.json()
So far I get a "nobody's home at that address" response.
I'm really stuck and a pointer to something akin to "Beginners Guide to Getting KAFKA Monitoring Information" would be great. Also if there's a way to grab the helpful kafka shell scripts & tools, that would be great to - where do they come from?
One last thing - I'm new enough to Kafka that I don't know what I don't know.
Thanks.
running in a Docker container
That shouldn't matter, but Confluent maintains a few pages that go over how to configure the containers for monitoring
https://docs.confluent.io/platform/current/installation/docker/operations/monitoring.html
https://docs.confluent.io/platform/current/kafka/monitoring.html
number of consumers
Such a metric doesn't exist
Python and URI requests
You appear to be using the /connectors endpoint of the Kafka Connect REST API (which runs on port 8083, not 9092). It is not a monitoring endpoint for brokers or non-Connect-API consumers
way to grab the helpful kafka shell scripts & tools
https://kafka.apache.org/downloads > Binary downloads
You don't need container shell access, but you will need external network access, just as all clients outside of a container would.
I have a test framework running on my local (& git) that is based on TestCafe-Cucumber (Node.js) example: https://github.com/rquellh/testcafe-cucumber & it works really well.
Now, I am trying to use this framework in the deployment (post-deployment) cycle by hosting it as a service or creating a docker container.
The framework executes through the CLI command (npm test) with few parameters.
I know the easiest way is to call the git repo directly as & when required by adding a Jenkins step, however, that is not the solution I am looking for.
So far, I have successfully built the docker image & container now runs on my localhost 8085 port as http://0.0.0.0:8085 (although I get DNS server as it's not an app - please correct me if I am wrong here)
The concern here is: How can I make it work like an app hosted so that once the deployment completes, the Jenkins/Octopus could call it as a service through the URL (http://0.0.0.0:8085) along with few parameters that the framework used to execute the test case?
I request all experts to provide a solution if there are any.
I guess there is no production-ready application or service to solve this task.
However, you can use a REST framework to handle network requests and subprocesses to start test sessions. If you like Node.js, you can start with the Express framework and the execa module.
This way you can build a basic service that can start your tests. If you need a more flexible solution, you can take look at gherkin-testcafe that provides access to TestCafe's API. You can use it instead of starting TestCafe as a subprocess since this way you will have more options to manage your test sessions.
I am currently trying to break into Data engineering and I figured the best way to do this was to get a basic understanding of the Hadoop stack(played around with Cloudera quickstart VM/went through tutorial) and then try to build my own project. I want to build a data pipeline that ingests twitter data, store it in HDFS or HBASE, and then run some sort of analytics on the stored data. I would also prefer that I use real time streaming data, not historical/batch data. My data flow would look like this:
Twitter Stream API --> Flume --> HDFS --> Spark/MapReduce --> Some DB
Does this look like a good way to bring in my data and analyze it?
Also, how would you guys recommend I host/store all this?
Would it be better to have one instance on AWS ec2 for hadoop to run on? or should I run it all in a local vm on my desktop?
I plan to have only one node cluster to start.
First of all, Spark Streaming can read from Twitter, and in CDH, I believe that is the streaming framework of choice.
Your pipeline is reasonable, though I might suggest using Apache NiFi (which is in the Hortonworks HDF distribution), or Streamsets, which is installable in CDH easily, from what I understand.
Note, these are running completely independently of Hadoop. Hint: Docker works great with them. HDFS and YARN are really the only complex components that I would rely on a pre-configured VM for.
Both Nifi and Streamsets give you a drop and drop UI for hooking Twitter to HDFS and "other DB".
Flume can work, and one pipeline is easy, but it just hasn't matured at the level of the other streaming platforms. Personally, I like a Logstash -> Kafka -> Spark Streaming pipeline better, for example because Logstash configuration files are nicer to work with (Twitter plugin builtin). And Kafka works with a bunch of tools.
You could also try out Kafka with Kafka Connect, or use Apache Flink for the whole pipeline.
Primary takeaway, you can bypass Hadoop here, or at least have something like this
Twitter > Streaming Framework > HDFS
.. > Other DB
... > Spark
Regarding running locally or not, as long as you are fine with paying for idle hours on a cloud provider, go ahead.
I need to distribute some Erlang code which is closed source to a client. I think the easiest way would be to simply give an Erlang shell command to pull the code from a remote host. The remote host will be an Erlang VM which does not shared the same secret cookie as the client. How can I do this?
For example, if I am in the Erlang shell, I would like something thats lets me do:
load_lib(mysql).
load_lib(postgres).
: and then Erlang would download and install the BEAM files, and would allow me to use the mysql: and postgres: Erlang modules from that point on
Update:
1)
I have been suggested to use tarballs, so I guess the procedure in this case would be something like:
Find Erlang lib directory and CD to it
wget tarball to the current directory
Not as nice as gem install, but its the best that Erlang can do
You can't really have this done between two untrusted Erlang node. That secret cookie is the only security measure existing between nodes. You will need to roll out your own protocol, even if it's just straight HTTP.
What you could do from that point on is either send a BEAM file over the network, or just send the binary data contained inside one. You can then load the module by calling code:load_file/1 for the BEAM, or code:load_binary/3for the binary data.
This all sounds relatively brittle to me though. A repository, as suggested by Roberto Aloi would likely be the best idea.
Couldn't you simply upload your code to some repository and provide access to the client?
You could also provide a script to automatically load the new versions of the code without stopping the live system... Or am I missing something?