Flume use of multiplexing channel selector - flume

I am trying to implement flume to ingest the data around 1TB. For this I am trying to use the multiplexing channel selector.
There are some examples available that shows how to use the multiplexing channel selector.
My question is how one can determine the header content of an event?
For example, in below configuration, I am using multiplexing channel selector with header as State and mapping is made to CN, ID, IN etc.
spoolDirAgent.sources.source1.selector.type = multiplexing
spoolDirAgent.sources.source1.selector.header = State
spoolDirAgent.sources.source1.selector.mapping.CN = channel1
spoolDirAgent.sources.source1.selector.mapping.IN = channel2
spoolDirAgent.sources.source1.selector.mapping.ID = channel2
spoolDirAgent.sources.source1.selector.defalut = channel1
Also if possible, please let me know how to use the event headers in flume sources?
Thanks in advance!

as the flume-ng user guide indicated, if the event header doesn't contain key 'State', then the default
channel will be used. and search "static interceptor" in flume-ng user guide, then more or less you'll know how to add key 'State' into the header.

Related

How to properly configure SQS without using SNS topics in MassTransit?

I'm having some issues configuring MassTransit with SQS. My goal is to have N consumers which create N queues and each of them accept a different message type. Since I always have a 1 to 1 consumer to message mapping, I'm not interested in having any sort of fan-out behaviour. So publishing a message of type T should publish it directly to that queue. How exactly would I configure that? This is what I have so far:
services.AddMassTransit(x =>
{
x.AddConsumers(Assembly.GetEntryAssembly());
x.UsingAmazonSqs((context, cfg) =>
{
cfg.Host("aws", h =>
{
h.AccessKey(mtSettings.AccessKey);
h.SecretKey(mtSettings.SecretKey);
h.Scope($"{mtSettings.Environment}", true);
var sqsConfig = new AmazonSQSConfig() { RegionEndpoint = RegionEndpoint.GetBySystemName(mtSettings.Region) };
h.Config(sqsConfig);
var snsConfig = new AmazonSimpleNotificationServiceConfig()
{ RegionEndpoint = RegionEndpoint.GetBySystemName(mtSettings.Region) };
h.Config(snsConfig);
});
cfg.ConfigureEndpoints(context, new BusEnvironmentNameFormatter(mtSettings.Environment));
});
});
The BusEnvironmentNameFormatter class overrides KebabCaseEndpointNameFormatter and adds the environment as a prefix, and the effect is that all the queues start with 'dev', while the h.Scope($"{mtSettings.Environment}", true) line does the same for topics.
I've tried to get this working without configuring topics at all, but I couldn't get it working without any errors. What am I missing?
The SQS docs are a bit thin, but is at actually possible to do a bus.Publish() without using sns topics or are they necessary? If it's not possible, how would I use bus.Send() without hardcoding queue names in the call?
Cheers!
Publish requires the use of topics, which in the case of SQS uses SNS.
If you want to configure the endpoints yourself, and prevent the use of topics, you'd need to:
Set ConfigureConsumeTopology = false – this prevents topics from being created and connected to the receive endpoint queue.
Set PublishFaults = false – this prevents fault topics from being created when a consumer throws an exception.
Don't call Publish, because, obviously that will create a topic.
If you want to somehow establish a convention for your receive endpoint names that aligns with your ability to send messages, you could create your own endpoint name formatter that would use message types and then use those same names to call GetSendEndpoint using the queue:name short name syntax to Send messages directly to those queues.

Passing custom CNAM through Twilio SIP Domain

I have a Twilio phone number configured to direct inbound calls to a PHP webhook. The webhook uses some of the addon information to try and find a useful caller name. I'm also using Twilio's built-in CNAM lookups, but they don't work right in Canada (I always get the caller's number as their name).
The webhook is designed to forward calls to a Twilio SIP Domain first, where I expect I'll be answering most of the calls. Other calls, if deemed urgent, will be forwarded via PSTN.
I've reached the point where I can pull out a relevant name, but I'm having difficulty trying to forward that information to my FXS (HT802). As per the device's documentation:
http://www.grandstream.com/sites/default/files/Resources/ht80x_administration_guide.pdf
Auto: When set to “Auto”, the HT801/HT802 will look for the caller ID in the order of P-Asserted Identity Header, Remote-Party-ID Header and From Header in the incoming SIP INVITE
I'm not able to find a means to pass these headers via a SIP noun in TwiML. Based on Twilio's documentation:
https://www.twilio.com/docs/voice/twiml/sip#custom-headers
UUI (User-to-User Information) header can be sent without prepending x-
https://www.twilio.com/docs/voice/api/sending-sip#sip-x-headers
If you send headers without X- prefix, Twilio will not read the header. As a result, the header will not be passed in the output.
For context, here's a reduced snippet of the PHP code I'm using so far. Note: I'm not actually doing anything with the $callerName value yet.
<?php
// Simple "starting value", in case we can't resolve the name.
// (will also resolve the numbers used for unknown/blocked IDs)
$callerName = FriendlyFormatPhoneNumber($_POST['From']);
use Twilio\Twiml;
$addOns = null;
if (array_key_exists('CallerName', $_POST)) {
$callerName = $_POST['CallerName'];
} elseif (array_key_exists('AddOns', $_POST)) {
$addOns = json_decode($_POST['AddOns']);
$teloName = $addOns->results->telo_opencnam->result->name;
// If we pulled a telo name, and it doesn't seem to be a phone number
// (in case that could happen), use the telo name.
if (isset($teloName) && preg_match('/.*[0-9]{4,}, $teloName') == 0) {
$callerName = $teloName;
}
}
$response = new TwiML;
$dialParams = array(,
'timeout' => 20,
'hangupOnStar' => false,
'answerOnBridge' => true,
'action' => API_BASE_URL . '/dial-callback.php'
);
$dialer = $response->dial($dialParams);
$dialer->sip('sip:101#mytwiliodomain.sip.us1.twilio.com;transport=tls');
echo $response;
Long story short: How do I pass a custom caller name to my SIP devices using TwiML and the Twilio SIP Domains? I don't want to overwrite the number, just the name. And only on the inbound calls to the devices registered to my Twilio SIP domain.
In case it helps: Don't worry about translating to PHP if that's not your field; I can translate from TwiML :)
Unfortunately, this is not possible with Twilio SIP Domains. Currently, there is no way to set the Caller Name via TwiML.

How do I ensure WebChat is hidden when no workers are available?

I dont see in the documentation anywhere the ability to hide the chat.
https://www.twilio.com/docs/flex/flex-webchat-basic-configuration
Essentially, I want the chat to NOT show on a website if no agents are available. Is this possible?
It looks like I have to call twilio flex to get avaialble workers, and set this property accordingly.
const defaultConfiguration: Config = {
...
available: {BoolValueDependingOnAgentAvail},
Here is an example of javascript that leverages the necessary API to obtain the worker count. Based on this variable, you can set the BoolValueDependingOnAgentAvail accordingly
client.taskrouter
.workspaces('WSxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx')
.workers.list()
.then(workers => {
data = {
availWorkersCount: Object.keys(workers.filter(x=> x.available === true && x.attributes.includes("sales"))).length
};

How to find if a youtube channel is currently live streaming without using search?

I'm working on a website to load multiple youtube channels live streams. At first i was trying to figure out a way to do this without utilizing youtube's api but have decided to give in.
To find whether a channel is live streaming and to get the live stream links I've been using:
https://www.googleapis.com/youtube/v3/search?part=snippet&channelId={CHANNEL_ID}&eventType=live&maxResults=10&type=video&key={API_KEY}
However with the minimum quota being 10000 and each search being worth 100, Im only able to do about 100 searches before I exceed my quota limit which doesn't help at all. I ended up exceeding the quota limit in about 10 minutes. :(
Does anyone know of a better way to figure out if a channel is currently live streaming and what the live stream links are, using as minimal quota points as possible?
I want to reload youtube data for each user every 3 minutes, save it into a database, and display the information using my own api to save server resources as well as quota points.
Hopefully someone has a good solution to this problem!
If nothing can be done about links just determining if the user is live without using 100 quota points each time would be a big help.
Since the question only specified that Search API quotas should not be used in finding out if the channel is streaming, I thought I would share a sort of work-around method. It might require a bit more work than a simple API call, but it reduces API quota use to practically nothing:
I used a simple Perl GET request to retrieve a Youtube channel's main page. Several unique elements are found in the HTML of a channel page that is streaming live:
The number of live viewers tag, e.g. <li>753 watching</li>. The LIVE NOW
badge tag: <span class="yt-badge yt-badge-live" >Live now</span>.
To ascertain whether a channel is currently streaming live requires a simple match to see if the unique HTML tag is contained in the GET request results. Something like: if ($get_results =~ /$unique_html/) (Perl). Then, an API call can be made only to a channel ID that is actually streaming, in order to obtain the video ID of the stream.
The advantage of this is that you already know the channel is streaming, instead of using thousands of quota points to find out. My test script successfully identifies whether a channel is streaming, by looking in the HTML code for: <span class="yt-badge yt-badge-live" > (note the weird extra spaces in the code from Youtube).
I don't know what language OP is using, or I would help with a basic GET request in that language. I used Perl, and included browser headers, User Agent and cookies, to look like a normal computer visit.
Youtube's robots.txt doesn't seem to forbid crawling a channel's main page, only the community page of a channel.
Let me know what you think about the pros and cons of this method, and please comment with what might be improved rather than disliking if you find a flaw. Thanks, happy coding!
2020 UPDATE
The yt-badge-live seems to have been deprecated, it no longer reliably shows whether the channel is streaming. Instead, I now check the HTML for this string:
{"text":" watching"}
If I get a match, it means the page is streaming. (Non-streaming channels don't contain this string.) Again, note the weird extra whitespace. I also escape all the quotation marks since I'm using Perl.
Here are my two suggestions:
Check my answer where I explain how you can check how retrieve videos from channels who are livesrteaming.
Another option could be use the following URL and somehow make request(s) each time for check if there's a livestreaming.
https://www.youtube.com/channel/<CHANNEL_ID>/live
Where CHANNEL_ID is the channel id you want check if that channel is livestreaming1.
1 Just notice that maybe the URL wont work in all channels (and that depends of the channel itself).
For example, if you check the channel_id UC7_YxT-KID8kRbqZo7MyscQ - link to this channel livestreaming - https://www.youtube.com/channel/UC4nprx9Vd84-ly7N-1Ce6Og/live, this channel will show if he is livestreaming, but, with his channel id UC4nprx9Vd84-ly7N-1Ce6Og - link to this channel livestreaming -, it will show his main page instead.
Adding to the answer by Bman70, I tried eliminating the need of making a costly search request after knowing that the channel is streaming live. I did this using two indicators in the HTML response from channels page who are streaming live.
function findLiveStreamVideoId(channelId, cb){
$.ajax({
url: 'https://www.youtube.com/channel/'+channelId,
type: "GET",
headers: {
'Access-Control-Allow-Origin': '*',
'Accept-Language': 'en-US, en;q=0.5'
}}).done(function(resp) {
//one method to find live video
let n = resp.search(/\{"videoId[\sA-Za-z0-9:"\{\}\]\[,\-_]+BADGE_STYLE_TYPE_LIVE_NOW/i);
//If found
if(n>=0){
let videoId = resp.slice(n+1, resp.indexOf("}",n)-1).split("\":\"")[1]
return cb(videoId);
}
//If not found, then try another method to find live video
n = resp.search(/https:\/\/i.ytimg.com\/vi\/[A-Za-z0-9\-_]+\/hqdefault_live.jpg/i);
if (n >= 0){
let videoId = resp.slice(n,resp.indexOf(".jpg",n)-1).split("/")[4]
return cb(videoId);
}
//No streams found
return cb(null, "No live streams found");
}).fail(function() {
return cb(null, "CORS Request blocked");
});
}
However, there's a tradeoff. This method confuses a recently ended stream with currently live streams. A workaround for this issue is to get status of the videoId returned from Youtube API (costs a single unit from your quota).
I found youtube API to be very restrictive given the cost of search operation. Apparently the accepted answer did not work for me as I found the string on non live streams as well. Web scraping with aiohttp and beautifulsoup was not an option since the better indicators required javascript support. Hence I turned to selenium. I looked for the css selector
#info-text
and then search for the string Started streaming or with watching now in it.
To reduce load on my tiny server that would have otherwise required lot more resources, I moved this test of functionality to a heroku dyno with a small flask app.
# import flask dependencies
import os
from flask import Flask, request, make_response, jsonify
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
base = "https://www.youtube.com/watch?v={0}"
delay = 3
# initialize the flask app
app = Flask(__name__)
# default route
#app.route("/")
def index():
return "Hello World!"
# create a route for webhook
#app.route("/islive", methods=["GET", "POST"])
def is_live():
chrome_options = Options()
chrome_options.binary_location = os.environ.get('GOOGLE_CHROME_BIN')
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--headless')
chrome_options.add_argument('--remote-debugging-port=9222')
driver = webdriver.Chrome(executable_path=os.environ.get('CHROMEDRIVER_PATH'), chrome_options=chrome_options)
url = request.args.get("url")
if "youtube.com" in url:
video_id = url.split("?v=")[-1]
else:
video_id = url
url = base.format(url)
print(url)
response = { "url": url, "is_live": False, "ok": False, "video_id": video_id }
driver.get(url)
try:
element = WebDriverWait(driver, delay).until(EC.presence_of_element_located((By.CSS_SELECTOR, "#info-text")))
result = element.text.lower().find("Started streaming".lower())
if result != -1:
response["is_live"] = True
else:
result = element.text.lower().find("watching now".lower())
if result != -1:
response["is_live"] = True
response["ok"] = True
return jsonify(response)
except Exception as e:
print(e)
return jsonify(response)
finally:
driver.close()
# run the app
if __name__ == "__main__":
app.run()
You'll however need to add the following buildpacks in settings
https://github.com/heroku/heroku-buildpack-google-chrome
https://github.com/heroku/heroku-buildpack-chromedriver
https://github.com/heroku/heroku-buildpack-python
Set the following Config Vars in settings
CHROMEDRIVER_PATH=/app/.chromedriver/bin/chromedriver
GOOGLE_CHROME_BIN=/app/.apt/usr/bin/google-chrome
You can find supported python runtime here but anything below python 3.9 should be good since selenium had problems with improper use of is operator
I hope youtube will provide better alternatives than workarounds.
I know this is a old thread, but i thought i share my way of checking to for example grab the status code to use in an app.
This is for a single Channel, but you could easly do a foreach with it.
<?php
#####
$ytchannelID = "UCd0BTXriKLvOs1ANx3puZ3Q";
#####
$ytliveurl = "https://www.youtube.com/channel/".$ytchannelID."/live";
$ytchannelLIVE = '{"text":" watching now"}';
$contents = file_get_contents($ytliveurl);
if ( strpos($contents, $ytchannelLIVE) !== false ){http_response_code(200);} else {http_response_code(201);}
unset($ytliveurl);
?>
Adding onto the other answers here, I use a GET request to https://www.youtube.com/c/<CHANNEL_NAME>/live and then search for "isLive":true (rather than {"text":" watching"})

SEMP API equivalent for url "/SEMP/v2/config/msgVpns/default"

URL .../SEMP/v2/config/msgVpns/default returns data
{
"data":{
"authenticationBasicEnabled":true,
"authenticationBasicProfileName":"default",
"authenticationBasicRadiusDomain":"",
"authenticationBasicType":"radius",
"authenticationClientCertAllowApiProvidedUsernameEnabled":false,
....
What is the Java API to return this data? Apparently there is no getMsgVpnsDefault(...) method
Generally speaking what is the translation of URL's into API calls? This doesn't seem to be addressed in the documentation.
What is the Java API to return this data? Apparently there is no getMsgVpnsDefault(...) method
There's no API provided by Solace.
SEMP(v2 in your case) is a series of REST commands to be executed over the management port to manage the configuration of the Solace routers.
This is not to be mistaken for the Java API that's provided for messaging over the messaging port/interface.
Generally speaking what is the translation of URL's into API calls?
The complete list of URL's is documented here:
https://docs.solace.com/API-Developer-Online-Ref-Documentation/swagger-ui/index.html#/
In the Solace Samples repository on GitHub there's a gradle file which uses Swagger CodeGen to generate a POJO wrapper around SEMP v2.
This then gives you a Java API to interact with Solace routers.
WRT your original question about getMsgVpnsDefault(...) I believe you'd use
MsgVpn defaultVPN = sempApiInstance.getMsgVpn("default", null);
Or you could grab the list of all VPNs
MsgVpnsResponse resp = sempApiInstance.getMsgVpns(1000, null, null, null);
List<MsgVpn> allVpsn = resp.getData();
then iterate over the list checking until you find one whose name is "default"
https://github.com/SolaceSamples/solace-samples-semp/tree/master/java

Resources