How to modify videomixer sink pad alpha value dynamically - gstreamer-1.0

I want to take a video file and overlay subtitles that fade in and fade out.
I'm just beginning to learn how to work with Gstreamer.
So far, I've managed to put together a pipeline that composits a subtitle stream drawn by the textrender element onto an original video stream with the videomixer element. Unfortunately, textrender, and its sister element textoverlay do not have a fade-in/fade-out feature.
The videomixer sink pad does have an alpha property. For now, I have set the alpha value of the pad named videomixer.sink_1 to 1.0. Here is the command-line version of that pipeline:
#!/bin/bash
gst-launch-1.0 \
filesrc location=sample_videos/my-video.mp4 ! decodebin ! mixer.sink_0 \
filesrc location=subtitles.srt ! subparse ! textrender ! mixer.sink_1 \
videomixer name=mixer sink_0::zorder=2 sink_1::zorder=3 sink_1::ypos=-25 sink_1::alpha=1 \
! video/x-raw, height=540 \
! videoconvert ! autovideosink
I am looking for a way to dynamically modify that alpha value over time so that I can make the subtitle component fade in and out at the appropriate times. (I will parse the SRT file separately to determine when fades begin and end.)
I am studying the GstBin C API (my actual code is in Python). I think after I create the pipeline with Gst.parse_launch(), I can grab any named element with gst_get_bin_by_name(), then use that value to access the pad "sink_1".
Once I've gotten that far, will I be able to modify that alpha value dynamically from an event handler that receives timer events? Will the videomixer element respond instantly to changes in that pad's property? Has anyone else done this?
I found a partial answers here: https://stackoverflow.com/a/17331845/270511 but they don't tell me if this will work after the pipeline is running.

Yes, it will work.
The videomixer pads respond dynamically to changes; I have done this with both the alpha and position properties. The pad properties can be changed using
g_object_set (mix_sink_pad, "alpha", 0.5, NULL);
I am using C, but your python strategy for accessing the bin and pad sound correct. My gstreamer code responds based on inputs from a udp socket, but timer events will work perfectly fine. For example, if you wanted to change the alpha value every 100ms, you could do something like this
g_timeout_add_seconds (100, alpha_changer_cb, loop);
You can then change the alpha property using g_object_set in the callback; it will update dynamically and looks very smooth.

I got this to work. You can read about it in this post: https://westside-consulting.blogspot.com/2017/03/getting-to-know-gstreamer-part-4.html

Related

Updating the drake system states from robot hardware pose during initialization

I have been trying to set up a custom manipulation station with Kuka IIWA hardware in drake. I got the hardware interface working. When running a joint teleoperation code (adapted from drake/examples/manipulation_station/joint_teleop.py), the robot jerks violently (all joints tries to move to 0 position) at first and then continues to operate normally. On digging deeper, I found that this is caused by the FirstOrderLowPassFilter system. While advancing the simulation a tiny bit (simulator.AdvanceTo(1e-6)) to evaluate the LCM messages to set the initial GUI sliders-filter_initial_output_value-plant joint positions etc., to match the hardware, the FirstOrderLowPassFilter outputs a momentary value of 0. This sets the IIWA_COMMAND position to zero for an instance and causes a jerk.
How can I avoid this behavior?.
As a workaround, I am subscribing separately to the raw LCM message from the hardware, before initializing the drake systems and sets the filter_initial_output_value before advancing the simulation. Is this the recommended way?.
I think what you're doing (manually reading the LCM message) is fine.
In the alternative, look how a DiscreteDerivative offers the suppress_initial_transient = true option. Perhaps we could add a similar option (via unrestricted update event) to FirstOrderLowPassFilter so that the initial output value was sampled from the input at t == 0. But the event sequencing of startup may still be difficult. We essentially need to initialize the systems in their dataflow order, including refreshing output ports as events fire, which is not natively supported.
In another alternative, perhaps we could configure the IIWA_COMMAND publisher to not publish at t == 0, instead publishing only t >= 0.005.
FirstOrderLowPassFilter has a method to set the initial value. https://drake.mit.edu/doxygen_cxx/classdrake_1_1systems_1_1_first_order_low_pass_filter.html#aaef7539cfbf1acfa0cf487c371bc5360
It is used in the example that you copied from:
https://github.com/RobotLocomotion/drake/blob/master/examples/manipulation_station/joint_teleop.py#L146

pixel_cloud_fusion - "camera" passed to lookupTransform argument target_frame does not exist

I'm trying to get Autoware to send out a tracked-object list for the LGSVL Simulation. I turn on Yolo3, Euclidian Cluster detection, then pixel_cloud_fusion. When I do, it constantly states that it's looking for TF and Intrinsics data. Looking further, this seems to be a "camera_info" topic that is missing. So I made one up just trying to get it working (not sure if LGSVL has any kind of native support??). I used a bunch of 1s for the matrices and "plumb bob" for the type and matched the width/height to the published camera images. Once I send it, however, I get the error:
[pixel_cloud_fusion] "camera" passed to lookupTransform argument target_frame does not exist
I have no idea what this means and the text does not appear in the Autoware software. Am I doing something wrong? Is there another topic I'm lacking?
P.S Maybe someone with 1500 rep should create an Autoware tag
It seems like this might be an issue with the TF tree being incomplete. For loopup transform to work it needs a well defined TF tree to whatever other fixed frame. To add your camera to the TF tree you should be able to use the static transform publisher.

Add Silence to multiple tracks in Audacity

I have hundreds of audio tracks and
I want to add Silence before all tracks.
I know it can be done through Chain, but I do not know how exactly it can be done.
PS: The following I have tried:
File> Edit Chains then File> Apply Chain> Apply to Files> selected required files, the output files are all silenced. I do not know coding, this is the first time I am using Audacity.
I am attaching the Edit Chains window.
Actually, Audacity provides more easy way to add silence for each of your files.
First import all the tracks and select all. Then follow Transport > Cursor to > Section Start
This path will put your cursor at the beginning (0th mili-second)
Then Generate > Silence path will ask for the duration. (My favorite is hh:mm:ss + miliseconds)
Sample screen for entering duration
After confirmation all of your sounds will be prepared like here.
Audacity's chain logic doesn't allow to set duration for silence command, it takes the length of longest track and produce all-silence sounds. This task will be more easy rather than applying chain.

Scan video for text string?

My goal is to find the title screen from a movie trailer. I need a service where I can search a video for a string, then return the frame with that string. Pretty obscure, does anything like this exist?
e.g. for this movie, I'd scan for "Sausage Party" and retrieve this frame:
Edit: I found the cloudsight api which would actually work except cost is prohibitive # $.04 per call assuming I need to split the video into 1s intervals and scan every image (at least 60 calls per video).
No exact service that I can find, but you could attempt to do this yourself...
ffmpeg -i sausage_party.mp4 -r 1 %04d.png
/usr/local/bin/parallel --no-notice -j 8 \
/usr/local/bin/tesseract -psm 6 -l eng {} {.} \
::: *.png
This extracts one frame a second from the video file, and then uses tesseract to extract the text via OCR into files of the same name as the image frame (eg. 0135.txt. However your results are going to vary massively depending on the font used and the quality of the video file.
You'd probably find it cheaper/easier to use something like Amazon Mechanical Turk , especially since the OCR is going to have a hard time doing this automatically.
Another option could be implementing this service by yourself using the Scene Text Detection and Recognition module in OpenCV (docs.opencv.org/3.0-beta/modules/text/doc/text.html). You can take a look at this video to get an idea of how such a system would operate. As pointed out above the accuracy would depend on the font used in the movie titles, the quality of the video files, and the OCR.
OpenCV relies on Tesseract as the underlying OCR but, alternatively, you could use the text detection and localization functions (docs.opencv.org/3.0-beta/modules/text/doc/erfilter.html) in OpenCV to find text areas in the image and then employ a different OCR to perform the recognition. The text detection and localization stage can be done very quickly thus achieving real time performance would be mostly a matter of picking a fast OCR.

World of Warcraft (Lua) communication with Adafruit Gemma

I have an Adafruit (Gemma) / Arduino and a Neopixel LED ring that I would like to control from World of Warcraft in-game events. This part is soldered and working.
Question:
Is there any way to send communications between World of Warcraft and some sort of listener on the PC that can then in turn send messages over USB to the Arduino/Gemma device?
My aim is to create an on-desk LED indicator e.g. if I'm a healer, then I want green/yellow/red light to represent the health of each raid/party member - so refreshes would be required at a high rate (0.5 / sec).
Thanks for your feedback in advance and welcome any future possibilities with the soon to be released Warlords of Draenor.
Is there any way to send communications between World of Warcraft and some sort of listener on the PC
Not directly via the WoW API. I came up with a way which I've never shared, because my usage broke Blizzard rules. But I haven't played in years, so here ya go. :)
I used an addon to create a one pixel frame in the top-left of the WoW window. I manipulated the color of this pixel to send data to the outside world.
The "listener" app can read this pixel with three Win32 calls:
HWND hwnd = FindWindow(NULL, "World of Warcraft"); // find WoW window
HDC hdc = GetDC(hwnd); // get the device context (graphics drawing abstraction)
COLORREF color = GetPixel(hdc, 0,0); // read the pixel at x 0, y 0
I then interpreted the bits of the color like this:
4: sequence number
7: checksum: (sequence + key code + ctrl + alt + shift + win)/6
8: key code or ASCII character
1: 1: virtual key code, 0: ASCII
1: CTRL key pressed
1: ALT key pressed
1: SHIFT key pressed
2: WINDOWS key pressed
The "sequence number" was just means of detecting that a new message had been posted to the pixel. The checksum was to prevent bogus reads when my special pixel was not active, like during loading screens. The rest was keystroke information. This allowed me to generate keystrokes from an addon. The entire watcher app is about 100 lines of C. Very simple.
I wrote an in-game script editor and used this with "pixelbot" to automate things in game. Towards the end of my WoW life I had more fun coding for Wow than playing it, which is saying a lot, because it's a fun game. :) One upon a time I knew everything there was to know about WoW addon programming, but I'm several years out of date now. I'll see if I can dig up some pixelbot Lua code for, though.
Anyway, you can adapt this scheme to send any messages you like. For instance:
4: sequence number
7: checksum (sequence + player number + LED color)/3
5: player number
2: LED color (0: green, 1: yellow, 2: red)
6: *reserved*
As for speed, I never actually measured it, but it blows away your 0.5 second requirement. At most a few milliseconds of latency between writes and reads.
that can then in turn send messages over USB to the Arduino/Gemma device?
That's just writing to the serial port in the "watcher" app and using Arduino libraries for read from the serial port inside your device.
I have source code for the "listener" app (pixel watcher) and for the WoW side stuff that writes message to the pixel. Let me know if you're interested and I'll help you out of band or dramatically increase the side of this post.
After some research, I did not found any built-in functionnality to signal/pipe/communicate with an external software. I believe it is due to the anti-bot blizzard policy. Actually you could do this with a memory watcher ( just like CheatEngine ), but there is chances you'll be banned for using this.
The only thing you could do if you can't find anything, is to ask on official forum, and hope a technic-friendly blue poster will answer =)
If you find anything, update your post, your idea is pretty interesting =)
There are only two ways to communicate with the game client without breaking the ToU:
Saving variables between sessions. Meaning that you can have an addon read and write to its storage file but this requires you to either relog or to /reload the UI for this file to be written to and read from. In short this wouldn't be so viable.
Have an addon use a tiny space on screen to write colors and use said colors to communicate with your external software by reading the pixels on screen.
There are many ways to achieve the second suggestion. You only need to be able to write this addon for the game. Then write an external program to read pixels. Sending commands back to the game would require hotkeys or sending it in the chat window.
Note that you are still limited to the API in-game that require hardware events. So for those you'd have to push a button or use the mouse to buypass.

Resources