Suppose I have a shell script stored in a GCS bucket. Is it possible to execute it using Apache Beam? If yes, then how?
I haven't currently tried anything for it yet as I couldn't find anything of this sort in the documentations for Apache Beam or Dataflow. So just wanted to know what approach I must take for it.
Thanks.
It's unusual, but not unheard of to want to execute a whole shell script from something like a DoFn. Is this what you want to do? Do you want to run it once for each element in a PCollection?
If so, you'd want to use the GCS API, or the FileSystems API to obtain the whole contents of the shell script into a String or byte array, and then pass it as a side input into your ParDo.
Then you can execute it using a tool like subprocess in Python, or ProcessBuilder in Java.
Let me know if you need something more specific, and we can iterate a solution.
Related
So I am working on a little project that sets up a streaming pipeline using Google Dataflow and apache beam. I went through some tutorials and was able to get a pipeline up and running streaming into BigQuery, but I am going to want to Stream it into a full relational DB(ie: Cloud SQL). I have searched through this site and throughout google and it seems that the best route to achieve that would be to use the JdbcIO. I am a bit confused here because when I am looking up info on how to do this it all refers to writing to cloud SQL in batches and not full out streaming.
My simple question is can I stream data directly into Cloud SQL or would I have to send it via batch instead.
Cheers!
You should use JdbcIO - it does what you want, and it makes no assumption about whether its input PCollection is bounded or unbounded, so you can use it in any pipeline and with any Beam runner; the Dataflow Streaming Runner is no exception to that.
In case your question is prompted by reading its source code and seeing the word "batching": it simply means that for efficiency, it writes multiple records per database call - the overloaded use of the word "batch" can be confusing, but here it simply means that it tries to avoid the overhead of doing an expensive database call for every single record.
In practice, the number of records written per call is at most 1000 by default, but in general depends on how the particular runner chooses to execute this particular pipeline on this particular data at this particular moment, and can be less than that.
I am using Parse.com as the backend for my iOS app. Parse has a big Export Data button for backing up your database that will send an email with a zip containing each table and its data in JSON format. That's great, but is there any way to automate this task? I want to be able to do this every night, and I know you can use Background Jobs for automated tasks, but is it possible to hook into this particular feature? I couldn't find an answer on Parse's forums and it didn't turn up anything except old threads talking about how this feature was on the horizon.
The best I can work out, without Parse providing a true way of achieving this, is to have a job creating File objects in a "backup" table. And then use an external service (with the REST API) to pull this out into S3 or similar.
It's not ideal, but it would work. Also, it will count against your API requests so you may want to optimise with the updated flag.
What I do for this issue is I am running a simple Windows Server in the AWS EC2 to run a scheduled task.
Create simple bat file to run a command node parse-backup.js
Create basic scheduled task using windows provided scheduler and run bat file
You can use this node code. https://github.com/mkim871/parse-node-backup
I want to run many Lua scripts one after another, without allowing any commands to run in between. I also need to pass the result of the first script to the second one, etc.
I've solved the problem temporarily by putting all my scripts in one file. However, the second script modifies a key returned by the first script. Because of this, putting everything in one file violates the EVAL command semantics as all the keys that the second script uses should be passed using the KEYS array.
Actually, it is possible. Redis has an undocumented feature that allows doing just that. The basic premise is that once you EVAL or SCRIPT LOAD a script, you can call that script from another one by invoking the function f_<sha1 hash> (where sha1 hash is the SHA1 hash of the first script).
Credit for this goes to Josiah Carlson (who, in turn, gives credit to Nathan Fritz). Dr. Josiah was kind enough to provide all the details in here (this file is a part of a Python package that helps managing Lua scripts that call other scripts).
You cannot do that. However violating EVAL semantics this way should not be a problem as long as you do not use Redis Cluster.
I'm a little bit of an Ant n00b, but I hope this question isn't too n00bish:
I have a section of an Ant script that needs to monitor a remote process and report whether it has succeeded or failed. The remote information is exposed as a REST resource. The process behind it may take several minutes, so I will likely need to poll several times before I get a result other than "in progress".
In very rough pseudo-code, I need something like this:
while(true) {
get REST resource status
if status='success' or status='failure'
break;
sleep 10
}
I know that I can (ab)use the <waitfor> task to repeatedly evaluate a condition, but I can't for the life of me figure out what that condition should be. The best I can come up with is to use a <scriptcondition>, but then I'm faced with the problem that Rhino JS (which Ant uses) has no XMLHttpRequest to send the REST query.
In other parts of the Ant script, we're using <exec> to run curl commands to interact with the REST service, but I don't see how to do that within the <waitfor>.
EDIT: I forgot to mention I'm stuck (for the time being) with Ant 1.7.1. Also, I realize it might be easier to push this to an external (bash, python, php, whatever) script, but I would prefer to keep it in the Ant script.
Very new to Ruby so please try to look past my ignorance. Cause I have no idea what I am talking about currently. However I know the ability to do what I want exists. Essentially I have some JAVA server side that can be used via a command line. I am trying to figure out where and how to begin with communicating in the same notion of me typing it out in the cli without actually typing it out to the cli. Basicly I want to pass the commands like as if I was using the CLI but Im not. Does that make sense?
Its for a CLI to UI conversion. I have seen the process done RoR to JAVA in such a fashion but where to begin I couldn't tell ya to save my life.
First of all, I would suggest at least looking into jRuby, which can interact with java classes as though they were ruby classes.
If you still want the cli integration, the naive approach is extremely simple, all you need to do is wrap your cli command in backticks (`) and it will execute the command as if you typed it into a shell, and return the results as a string.
If you need to do this very frequently, check out https://github.com/rtomayko/posix-spawn which is a much more efficient way of doing it then the backtick approach.
If the Java program has a command prompt of its own, look into popen. It allows you to open a subprocess as an I/O stream allowing you to send it input and read its output. If all you need is to start the process and get its output then use backticks as suggested by Matt Briggs:
output = `the-command-to-start-the-java-program`