FlumeRpcClient multithreading - sdk

I'm trying to understand the correct way to use the Flume RpcClient in a multithreaded application. Information I have found so far indicates that the components are thread safe, but the example in the Flume documentation clouds the issue when it comes to error handling. This code:
public void sendDataToFlume(String data) {
// Create a Flume Event object that encapsulates the sample data
Event event = EventBuilder.withBody(data, Charset.forName("UTF-8"));
// Send the event
try {
client.append(event);
} catch (EventDeliveryException e) {
// clean up and recreate the client
client.close();
client = null;
client = RpcClientFactory.getDefaultInstance(hostname, port);
// Use the following method to create a thrift client (instead of the above line):
// this.client = RpcClientFactory.getThriftInstance(hostname, port);
}
}
If more then one thread calls this method, and the exception is thrown, then there will be a problem as multiple threads try and recreate the client in the exception handler.
Is the intent of the SDK that it should only be used by a single thread? Should this method be synchronized, as it appears to be in the log4jappender that is part of the Flume source? Should I put this code in its own worker and pass it events via a queue?
Does anyone have an example of RpcClient being used by more then one thread (included the error condition)?
Would I be better off using the "embedded agent"? Is that multithread friendly?

With the embedded agent, you get the same case except you don't know what to do:
try {
agent.put(event);
} catch (EventDeliveryException e) {
// ???
}
You could stop the agent, and restart it - but you would need a synchronized block (or a ReentrantReadWriteLock, to not block thread while "reading" the client field). But since I'm not a Flume expert, I can't tell you which one is better.
Example:
class MyClass {
private final ReentrantReadWriteLocklock;
private final Lock readLock;
private final Lock writeLock;
private RpcClient client;
private final String hostname;
private final Integer port;
// Constructor
MyClass(String hostname, Integer port) {
this.hostname = Objects.requireNonNull(hostname, "hostname");
this.port = Objects.requireNonNull(port, "port");
this.lock = new ReentrantReadWriteLock();
this.readLock = this.lock.readLock();
this.writeLock = this.lock.writeLock();
this.client = buildClient();
}
private RpcClient buildClient() {
return RpcClientFactory.getDefaultInstance(hostname, port);
}
public void sendDataToFlume(String data) {
// Create a Flume Event object that encapsulates the sample data
Event event = EventBuilder.withBody(data, Charset.forName("UTF-8"));
// Send the event
readLock.lock(); // lock for reading 'client'
try {
try {
client.append(event);
} catch (EventDeliveryException e) {
writeLock.lock(); // lock for reading/writing client
try {
// clean up and recreate the client
client.close();
client = null;
client = buildClient();
} finally {
writeLock.unlock();
}
}
} finally {
readLock.unlock();
}
}
}
Beside, the example will lose the event because it is not sent back. Some kind of loop + a max retry would probably do the trick:
int i = 0;
for (; i < maxRetry; ++i) {
try {
client.append(event);
break;
} catch (EventDeliveryException e) {
// clean up and recreate the client
client.close();
client = null;
client = RpcClientFactory.getDefaultInstance(hostname, port);
// Use the following method to create a thrift client (instead of the above line):
// this.client = RpcClientFactory.getThriftInstance(hostname, port);
}
}
if (i == maxRetry) {
logger.error("flume client is offline, loosing events {}", event);
}
That's the idea, but I don't think that should be the task of the user (eg: us), but an option in the client or the agent to store event that could not be processed due to such errors.

Related

Cloud Dataflow - how does Dataflow do parallelism?

My question is, behind the scene, for element-wise Beam DoFn (ParDo), how does the Cloud Dataflow parallel workload? For example, in my ParDO, I send out one http request to an external server for one element. And I use 30 workers, each has 4vCPU.
Does that mean on each worker, there will be 4 threads at maximum?
Does that mean from each worker, only 4 http connections are necessary or can be established if I keep them alive to get the best performance?
How can I adjust the level of parallelism other than using more cores or more workers?
with my current setting (30*4vCPU worker), I can establish around 120 http connections on the http server. But both server and worker has very low resource usage. basically I want to make them work much harder by sending out more requests out per second. What should I do...
Code Snippet to illustrate my work:
public class NewCallServerDoFn extends DoFn<PreparedRequest,KV<PreparedRequest,String>> {
private static final Logger Logger = LoggerFactory.getLogger(ProcessReponseDoFn.class);
private static PoolingHttpClientConnectionManager _ConnManager = null;
private static CloseableHttpClient _HttpClient = null;
private static HttpRequestRetryHandler _RetryHandler = null;
private static String[] _MapServers = MapServerBatchBeamApplication.CONFIG.getString("mapserver.client.config.server_host").split(",");
#Setup
public void setupHttpClient(){
Logger.info("Setting up HttpClient");
//Question: the value of maxConnection below is actually 10, but with 30 worker machines, I can only see 115 TCP connections established on the server side. So this setting doesn't really take effect as I expected.....
int maxConnection = MapServerBatchBeamApplication.CONFIG.getInt("mapserver.client.config.max_connection");
int timeout = MapServerBatchBeamApplication.CONFIG.getInt("mapserver.client.config.timeout");
_ConnManager = new PoolingHttpClientConnectionManager();
for (String mapServer : _MapServers) {
HttpHost serverHost = new HttpHost(mapServer,80);
_ConnManager.setMaxPerRoute(new HttpRoute(serverHost),maxConnection);
}
// config timeout
RequestConfig requestConfig = RequestConfig.custom()
.setConnectTimeout(timeout)
.setConnectionRequestTimeout(timeout)
.setSocketTimeout(timeout).build();
// config retry
_RetryHandler = new HttpRequestRetryHandler() {
public boolean retryRequest(
IOException exception,
int executionCount,
HttpContext context) {
Logger.info(exception.toString());
Logger.info("try request: " + executionCount);
if (executionCount >= 5) {
// Do not retry if over max retry count
return false;
}
if (exception instanceof InterruptedIOException) {
// Timeout
return false;
}
if (exception instanceof UnknownHostException) {
// Unknown host
return false;
}
if (exception instanceof ConnectTimeoutException) {
// Connection refused
return false;
}
if (exception instanceof SSLException) {
// SSL handshake exception
return false;
}
return true;
}
};
_HttpClient = HttpClients.custom()
.setConnectionManager(_ConnManager)
.setDefaultRequestConfig(requestConfig)
.setRetryHandler(_RetryHandler)
.build();
Logger.info("Setting up HttpClient is done.");
}
#Teardown
public void tearDown(){
Logger.info("Tearing down HttpClient and Connection Manager.");
try {
_HttpClient.close();
_ConnManager.close();
}catch (Exception e){
Logger.warn(e.toString());
}
Logger.info("HttpClient and Connection Manager have been teared down.");
}
#ProcessElement
public void processElement(ProcessContext c) {
PreparedRequest request = c.element();
if(request == null)
return;
String response="{\"my_error\":\"failed to get response from map server with retries\"}";
String chosenServer = _MapServers[request.getHardwareId() % _MapServers.length];
String parameter;
try {
parameter = URLEncoder.encode(request.getRequest(),"UTF-8");
} catch (UnsupportedEncodingException e) {
Logger.error(e.toString());
return;
}
StringBuilder sb = new StringBuilder().append(MapServerBatchBeamApplication.CONFIG.getString("mapserver.client.config.api_path"))
.append("?coordinates=")
.append(parameter);
HttpGet getRequest = new HttpGet(sb.toString());
HttpHost host = new HttpHost(chosenServer,80,"http");
CloseableHttpResponse httpRes;
try {
httpRes = _HttpClient.execute(host,getRequest);
HttpEntity entity = httpRes.getEntity();
if(entity != null){
try
{
response = EntityUtils.toString(entity);
}finally{
EntityUtils.consume(entity);
httpRes.close();
}
}
}catch(Exception e){
Logger.warn("failed by get response from map server with retries for " + request.getRequest());
}
c.output(KV.of(request, response));
}
}
Yes, based on this answer.
No, you can establish more connections. Based on my answer, you can use a async http client to have more concurrent requests. As this answer also describes, you need to collect the results from these asynchronous calls and output it synchronously in any #ProcessElement or #FinishBundle.
See 2.
Since your resource usage is low, it indicates that the worker spends most of its time waiting for a response. I think with the described approach above, you can utilize your resources far better and you can achieve the same performance with far less workers.

Get Publish Response/PubAck latency with paho org.eclipse.paho.client.mqttv3.MqttClient publish

I am using paho library Classes for Mqtt Connections org.eclipse.paho.client.mqttv3.MqttClient. (not MqttAsyncClient)
In my case when I publish using
mqttClient.publish(uid + "/p", new MqttMessage(payload.toString().getBytes()));
This method does the task for me but doesn't return anything so I can't check the latency between publish and pubAck.
To get the latency I use the following instead of directly calling publish function of mqttClient.
public long publish(JsonObject payload , String uid, int qos) {
try {
MqttTopic topic = mqttClient.getTopic(uid + "/p");
MqttMessage message = new MqttMessage(payload.toString().getBytes());
message.setQos(qos);
message.setRetained(true);
long publishTime = System.currentTimeMillis();
MqttDeliveryToken token = topic.publish(message);
token.waitForCompletion(10000);
long pubCompleted = System.currentTimeMillis();
if (token.getResponse() != null && token.getResponse() instanceof MqttPubAck) {
return pubCompleted-publishTime;
}
return -1;
} catch (Exception e) {
e.printStackTrace();
return -1;
}
}
This gets the work done, but I am not sure whether this is the right approach or not. Please let me know in case there is some other way to to do this.

tfs addWorkItemSaveListener is not getting events

I am adding a TFS WorkItemSaveListener but not getting any Event on saving workitem.
public static void main(String[] args) {
// Connecting to Project
final TFSTeamProjectCollection collection = ConsoleSettings.connectToTFS();
// Creating an object of listener
WorkItemSaveListenerImpl listener = new WorkItemSaveListenerImpl();
//Adding the listener
collection.getWorkItemClient().getEventEngine().addWorkItemSaveListener(listener);
for(;;) {
// keeping the program alive
try {
Thread.sleep(10000);
}
catch (InterruptedException exception) {
// TODO Auto-generated catch block
exception.printStackTrace();
}
}
}
Only really guessing here as I don't know the java sdk. But is it possible that the addWorkItemSaveListener event is only triggered for work items changed by that particular work item client?
You may need to setup a soap subscription, or write a server plugin instead.
C# to setup a soap subscription
Sorry it's for the wrong event, but it may be enough to give you an idea.
TfsTeamProjectCollection tpc = TfsTeamProjectCollectionFactory.GetTeamProjectCollection(new Uri(txtServerUrl.Text));
tpc.EnsureAuthenticated();
IEventService eventSrv = tpc.GetService(typeof(IEventService)) as IEventService;
DeliveryPreference delPref = new DeliveryPreference();
delPref.Address = "http://" + System.Environment.MachineName + ":8001/CheckInNotify";
delPref.Schedule = DeliverySchedule.Immediate;
delPref.Type = DeliveryType.Soap;
subscriptionId = eventSrv.SubscribeEvent(System.Environment.UserDomainName + "\\" + System.Environment.UserName, "CheckInNotify", "", delPref);

Deferring persistence as device is being used in BlackBerry when listening file change

I tried to listen file change event in BlackBerry base on FileExplorer example, but whenever I added or deleted file, it always showed "Deferring persistence as device is being used" and I can't catch anything .Here is my code:
public class FileChangeListenner implements FileSystemJournalListener{
private long _lastUSN; // = 0;
public void fileJournalChanged() {
long nextUSN = FileSystemJournal.getNextUSN();
String msg = null;
for (long lookUSN = nextUSN - 1; lookUSN >= _lastUSN && msg == null; --lookUSN)
{
FileSystemJournalEntry entry = FileSystemJournal.getEntry(lookUSN);
// We didn't find an entry
if (entry == null)
{
break;
}
// Check if this entry was added or deleted
String path = entry.getPath();
if (path != null)
{
switch (entry.getEvent())
{
case FileSystemJournalEntry.FILE_ADDED:
msg = "File was added.";
break;
case FileSystemJournalEntry.FILE_DELETED:
msg = "File was deleted.";
break;
}
}
}
_lastUSN = nextUSN;
if ( msg != null )
{
System.out.println(msg);
}
}
}
Here is the caller:
Thread t = new Thread(new Runnable() {
public void run() {
new FileChangeListenner();
try {
Thread.sleep(5000);
createFile();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
});
t.start();
Create file method worked fine:
private void createFile() {
try {
FileConnection fc = (FileConnection) Connector
.open("file:///SDCard/newfile.txt");
// If no exception is thrown, then the URI is valid, but the file
// may or may not exist.
if (!fc.exists()) {
fc.create(); // create the file if it doesn't exist
}
OutputStream outStream = fc.openOutputStream();
outStream.write("test content".getBytes());
outStream.close();
fc.close();
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
and output:
0:00:44.475: Deferring persistence as device is being used.
0:00:46.475: AG,+CPT
0:00:46.477: AG,-CPT
0:00:54.476: VM:+GC(f)w=11
0:00:54.551: VM:-GCt=9,b=1,r=0,g=f,w=11,m=0
0:00:54.553: VM:QUOT t=1
0:00:54.554: VM:+CR
0:00:54.596: VM:-CR t=5
0:00:55.476: AM: Exit net_rim_bb_datatags(291)
0:00:55.478: Process net_rim_bb_datatags(291) cleanup started
0:00:55.479: VM:EVTOv=7680,w=20
0:00:55.480: Process net_rim_bb_datatags(291) cleanup done
0:00:55.481: 06/25 03:40:41.165 BBM FutureTask Execute: net.rim.device.apps.internal.qm.bbm.platform.BBMPlatformManagerImpl$3#d1e1ec79
0:00:55.487: 06/25 03:40:41.171 BBM FutureTask Finish : net.rim.device.apps.internal.qm.bbm.platform.BBMPlatformManagerImpl$3#d1e1ec79
I also tried to remove the thread or create or delete file in simulator 's sdcard directly but it doesn't help. Please tell me where is my problem. Thanks
You instantiate the FileChangeListenner, but you never register it, and also don't keep it as a variable anywhere. You probably need to add this call
FileChangeListenner listener = new FileChangeListenner();
UiApplication.getUiApplication().addFileSystemJournalListener(listener);
You also might need to keep a reference (listener) around for as long as you want to receive events. But maybe not (the addFileSystemJournalListener() call might do that). But, you at least need that call to addFileSystemJournalListener(), or you'll never get fileJournalChanged() called back.

calling a webservice from scheduled task agent class in windows phone 7.1

Can we call a webservice from the scheduled periodic task class firstly, if yes,
Am trying to call a webservice method with parameters in scheduled periodic task agent class in windows phone 7.1. am getting a null reference exception while calling the method though am passing the expected values to the parameters for the webmethod.
am retrieving the id from the isolated storage.
the following is my code.
protected override void OnInvoke(ScheduledTask task)
{
if (task is PeriodicTask)
{
string Name = IName;
string Desc = IDesc;
updateinfo(Name, Desc);
}
}
public void updateinfo(string name, string desc)
{
AppSettings tmpSettings = Tr.AppSettings.Load();
id = tmpSettings.myString;
if (name == "" && desc == "")
{
name = "No Data";
desc = "No Data";
}
tservice.UpdateLogAsync(id, name,desc);
tservice.UpdateLogCompleted += new EventHandler<STservice.UpdateLogCompletedEventArgs>(t_UpdateLogCompleted);
}
Someone please help me resolve the above issue.
I've done this before without a problem. The one thing you need to make sure of is that you wait until your async read processes have completed before you call NotifyComplete();.
Here's an example from one of my apps. I had to remove much of the logic, but it should show you how the flow goes. This uses a slightly modified version of WebClient where I added a Timeout, but the principles are the same with the service that you're calling... Don't call NotifyComplete() until the end of t_UpdateLogCompleted
Here's the example code:
private void UpdateTiles(ShellTile appTile)
{
try
{
var wc = new WebClientWithTimeout(new Uri("URI Removed")) { Timeout = TimeSpan.FromSeconds(30) };
wc.DownloadAsyncCompleted += (src, e) =>
{
try
{
//process response
}
catch (Exception ex)
{
// Handle exception
}
finally
{
FinishUp();
}
};
wc.StartReadRequestAsync();
}
private void FinishUp()
{
#if DEBUG
try
{
ScheduledActionService.LaunchForTest(_taskName, TimeSpan.FromSeconds(30));
System.Diagnostics.Debug.WriteLine("relaunching in 30 seconds");
}
catch (Exception ex)
{
System.Diagnostics.Debug.WriteLine(ex.ToString());
}
#endif
NotifyComplete();
}

Resources