I want to write a Kafka consumer and write the record in Bigquery, I want to commit the offsets manually on successful insertion in Bigquery. I have a written a sample code but it is not working, can someone help
ReceiverOptions<Integer, String> options = receiverOptions.subscription(Collections.singleton(topic))
Flux<ReceiverRecord<Integer, String>> kafkaFlux1 = KafkaReceiver.create(options).receive()
.doOnNext(r -> {
try {
writeBigquery(r);
} catch (IOException e) {
e.printStackTrace();
}
r.receiverOffset().commit().block();
});
return kafkaFlux1.subscribe(record -> {
System.out.println("hello"+record);
});
Related
I have json response from elasticsearch rest client. I would like to create elasticsearch SearchResponse or GetResponse object from that string(json), so that I can reuse un-marshaling part from the grails-2.4.3 elasticsearch plugin.Can some one help me on that ?
I'm not sure if this question is still relevant, but this one worked for me:
String responseJson = "{\"took\":5,\"timed_out\":false,\"_shards\".....}";
try {
JsonXContentParser xContentParser = new JsonXContentParser(NamedXContentRegistry.EMPTY,
new JsonFactory().createParser(responseJson));
SearchResponse response = SearchResponse.fromXContent(xContentParser);
...
Do Whatever
...
} catch (IOException e) {
handleException....
}
I did manage to find something that may help you.
I wrote an JSON like this:
XContentBuilder builder = XContentFactory.jsonBuilder();
response.toXContent(builder, ToXContent.EMPTY_PARAMS);
String result = Strings.toString(builder);
and then I manged to read it like this:
try {
NamedXContentRegistry registry = new NamedXContentRegistry(getDefaultNamedXContents());
XContentParser parser = JsonXContent.jsonXContent.createParser(registry, DeprecationHandler.THROW_UNSUPPORTED_OPERATION, result);
SearchResponse searchResponse = SearchResponse.fromXContent(parser);
} catch (IOException e) {
System.out.println("exception " + e);
} catch (Exception e) {
System.out.println("exception " + e);
}
public static List<NamedXContentRegistry.Entry> getDefaultNamedXContents() {
Map<String, ContextParser<Object, ? extends Aggregation>> map = new HashMap<>();
map.put(TopHitsAggregationBuilder.NAME, (p, c) -> ParsedTopHits.fromXContent(p, (String) c));
map.put(StringTerms.NAME, (p, c) -> ParsedStringTerms.fromXContent(p, (String) c));
List<NamedXContentRegistry.Entry> entries = map.entrySet().stream()
.map(entry -> new NamedXContentRegistry.Entry(Aggregation.class, new ParseField(entry.getKey()), entry.getValue()))
.collect(Collectors.toList());
return entries;
}
Hope it works :)
i have the following code for training Open NLP POS Tagger
Trainer(String trainingData, String modelSavePath, String dictionary){
try {
dataIn = new MarkableFileInputStreamFactory(
new File(trainingData));
lineStream = new PlainTextByLineStream(dataIn, "UTF-8");
ObjectStream<POSSample> sampleStream = new WordTagSampleStream(lineStream);
POSTaggerFactory fac=new POSTaggerFactory();
if(dictionary!=null && dictionary.length()>0)
{
fac.setDictionary(new Dictionary(new FileInputStream(dictionary)));
}
model = POSTaggerME.train("en", sampleStream, TrainingParameters.defaultParams(), fac);
} catch (IOException e) {
// Failed to read or parse training data, training failed
e.printStackTrace();
} finally {
if (lineStream != null) {
try {
lineStream.close();
} catch (IOException e) {
// Not an issue, training already finished.
// The exception should be logged and investigated
// if part of a production system.
e.printStackTrace();
}
}
}
}
and this works just fine. Now, is it possible to do the same without involving files? I want to store the training data in a database somewhere. Then i can read it as a stream or chunks and feed it to the trainer. I do not want to create a temp file. Is this possible?
Yes, instead of passing FileInputStream to a dictionary, you can create your own implementation of InputStream, say DatabaseSourceInputStream and use it instead.
I got Page source using
String pageSource = driver.getPageSource();
Now i need to save this xml file to local in cache. So i need to get element attributes like x and y attribute value rather than every time get using element.getAttribute("x");. But I am not able to parse pageSource xml file to some special character. I cannot remove this character because at if i need element value/text it shows different text if i will remove special character. Appium is use same way to do this.
I was also facing same issue and i got resolution using below code which i have written and it works fine
public static void removeEscapeCharacter(File xmlFile) {
String pattern = "(\\\"([^=])*\\\")";
String contentBuilder = null;
try {
contentBuilder = Files.toString(xmlFile, Charsets.UTF_8);
} catch (IOException e1) {
e1.printStackTrace();
}
if (contentBuilder == null)
return;
Pattern pattern2 = Pattern.compile(pattern);
Matcher matcher = pattern2.matcher(contentBuilder);
StrBuilder sb = new StrBuilder(contentBuilder);
while (matcher.find()) {
String str = matcher.group(1).substring(1, matcher.group(1).length() - 1);
try {
sb = sb.replaceFirst(StrMatcher.stringMatcher(str),
StringEscapeUtils.escapeXml(str));
} catch (Exception e) {
e.printStackTrace();
}
}
try {
Writer output = null;
output = new BufferedWriter(new FileWriter(xmlFile, false));
output.write(sb.toString());
output.close();
} catch (IOException e) {
e.printStackTrace();
}
}
if you will get that kind of problem then catch it with remove special character and parse again.
try {
doc = db.parse(fileContent);
} catch (Exception e) {
removeEscapeCharacter(file);
doc = db.parse(file);
}
It might works for you.
I can able to do same using SAXParser and add handler to do for this.
Refer SAX Parser
We decided to use mqtt protocol for chat module in our mobile application. I want to save messages of topic in server side also. But i saw,mqtt client is global here,so one way is i have to subscribe single instance of mqtt client to all topics and save messages in database. but is it right approach to do it. i am just worring about it.
private void buildClient(){
log.debug("Connecting... "+CLIENT_ID);
try {
mqttClient = new MqttClient(envConfiguration.getBrokerUrl(), CLIENT_ID);
} catch (MqttException e) {
log.debug("build client stopped due to "+e.getCause());
}
chatCallback = new ChatCallback();
mqttClient.setCallback(chatCallback);
mqttConnectOptions = new MqttConnectOptions();
mqttConnectOptions.setCleanSession(false);
}
#Override
public void connect() {
if(mqttClient == null || !mqttClient.getClientId().equals(CLIENT_ID)){
buildClient();
}
boolean tryConnecting = true;
while(tryConnecting){
try {
mqttClient.connect(mqttConnectOptions);
} catch (Exception e) {
log.debug("connection attempt failed "+ e.getCause() + " trying...");
}
if(mqttClient.isConnected()){
tryConnecting = false;
}else{
pause();
}
}
}
#Override
public void publish() {
boolean publishCallCompletedErrorFree = false;
while (!publishCallCompletedErrorFree) {
try {
mqttClient.publish(TOPIC, "hello".getBytes(), 1, true);
publishCallCompletedErrorFree = true;
} catch (Exception e) {
log.debug("error occured while publishing "+e.getCause());
}finally{
pause();
}
}
}
#Override
public void subscribe() {
if(mqttClient != null && mqttClient.isConnected()){
try {
mqttClient.subscribe(TOPIC, 2);
} catch (MqttException e) {
log.debug("subscribing error.."+e.getCause());
}
}
}
#Override
public void disconnect() {
System.out.println(this.mqttClient.isConnected());
try {
mqttClient.disconnect();
log.debug("disconnected..");
} catch (MqttException e) {
log.debug("erro occured while disconneting.."+e.getCause());
}
}
There are two possibilities how to solve this issue:
Write a MQTT client that subscribes to all topics using a wildcard (# in MQTT)
Write a broker plugin that does the job for you, depending on the broker implementation you're using
There is a good description of how to implement both options at the HiveMQ website, also describing limitations of the first option.
I'm trying to understand the correct way to use the Flume RpcClient in a multithreaded application. Information I have found so far indicates that the components are thread safe, but the example in the Flume documentation clouds the issue when it comes to error handling. This code:
public void sendDataToFlume(String data) {
// Create a Flume Event object that encapsulates the sample data
Event event = EventBuilder.withBody(data, Charset.forName("UTF-8"));
// Send the event
try {
client.append(event);
} catch (EventDeliveryException e) {
// clean up and recreate the client
client.close();
client = null;
client = RpcClientFactory.getDefaultInstance(hostname, port);
// Use the following method to create a thrift client (instead of the above line):
// this.client = RpcClientFactory.getThriftInstance(hostname, port);
}
}
If more then one thread calls this method, and the exception is thrown, then there will be a problem as multiple threads try and recreate the client in the exception handler.
Is the intent of the SDK that it should only be used by a single thread? Should this method be synchronized, as it appears to be in the log4jappender that is part of the Flume source? Should I put this code in its own worker and pass it events via a queue?
Does anyone have an example of RpcClient being used by more then one thread (included the error condition)?
Would I be better off using the "embedded agent"? Is that multithread friendly?
With the embedded agent, you get the same case except you don't know what to do:
try {
agent.put(event);
} catch (EventDeliveryException e) {
// ???
}
You could stop the agent, and restart it - but you would need a synchronized block (or a ReentrantReadWriteLock, to not block thread while "reading" the client field). But since I'm not a Flume expert, I can't tell you which one is better.
Example:
class MyClass {
private final ReentrantReadWriteLocklock;
private final Lock readLock;
private final Lock writeLock;
private RpcClient client;
private final String hostname;
private final Integer port;
// Constructor
MyClass(String hostname, Integer port) {
this.hostname = Objects.requireNonNull(hostname, "hostname");
this.port = Objects.requireNonNull(port, "port");
this.lock = new ReentrantReadWriteLock();
this.readLock = this.lock.readLock();
this.writeLock = this.lock.writeLock();
this.client = buildClient();
}
private RpcClient buildClient() {
return RpcClientFactory.getDefaultInstance(hostname, port);
}
public void sendDataToFlume(String data) {
// Create a Flume Event object that encapsulates the sample data
Event event = EventBuilder.withBody(data, Charset.forName("UTF-8"));
// Send the event
readLock.lock(); // lock for reading 'client'
try {
try {
client.append(event);
} catch (EventDeliveryException e) {
writeLock.lock(); // lock for reading/writing client
try {
// clean up and recreate the client
client.close();
client = null;
client = buildClient();
} finally {
writeLock.unlock();
}
}
} finally {
readLock.unlock();
}
}
}
Beside, the example will lose the event because it is not sent back. Some kind of loop + a max retry would probably do the trick:
int i = 0;
for (; i < maxRetry; ++i) {
try {
client.append(event);
break;
} catch (EventDeliveryException e) {
// clean up and recreate the client
client.close();
client = null;
client = RpcClientFactory.getDefaultInstance(hostname, port);
// Use the following method to create a thrift client (instead of the above line):
// this.client = RpcClientFactory.getThriftInstance(hostname, port);
}
}
if (i == maxRetry) {
logger.error("flume client is offline, loosing events {}", event);
}
That's the idea, but I don't think that should be the task of the user (eg: us), but an option in the client or the agent to store event that could not be processed due to such errors.