I am creating a batch data streamer in apache ignite, and need to control what happening after data receive.
My batch has a structure:
public class Batch implements Binarylizable, Serializable {
private String eventKey;
private byte[] bytes;
etc..
Then i trying to stream my data:
try (IgniteDataStreamer<Integer, Batch> streamer = serviceGrid.getIgnite().dataStreamer(cacheName);
StreamBatcher batcher = StreamBatcherFactory.create(event) ){
streamer.receiver(StreamTransformer.from(new BatchDataProcessor(event)));
streamer.autoFlushFrequency(1000);
streamer.allowOverwrite(true);
statusService.updateStatus(event.getKey(), StatusType.EXECUTING);
int counter = 0;
Batch batch = null;
IgniteFuture<?> future = null;
while ((batch = batcher.batch()) != null) {
future = streamer.addData(counter++, batch);
}
Object getted = future.get();
Just for test use lets get only the last future, and try to analyze this object. In the code above I'm using BatchDataProcessor, that look like this:
public class BatchDataProcessor implements CacheEntryProcessor<Integer, Batch, Object> {
private final Event event;
private final String eventKey;
public BatchDataProcessor(Event event) {
this.event = event;
this.eventKey = event.getKey();
}
#Override
public Object process(MutableEntry<Integer, Batch> mutableEntry, Object... objects) throws EntryProcessorException {
Node node = NodeIgniter.node(Ignition.localIgnite().cluster().localNode().id());
ServiceGridContainer container = (ServiceGridContainer) node.getEnvironmentContainer().getContainerObject(ServiceGridContainer.class);
ProcessMarshaller marshaller = (ProcessMarshaller) container.getService(ProcessMarshaller.class);
LocalProcess localProcess = marshaller.intoProccessing(event.getLambdaExecutionKey());
try {
localProcess.addBatch(mutableEntry);
} catch (IOException e) {
e.printStackTrace();
} finally {
return new String("111");
}
}
}
So after localProcess.addBatch(mutableEntry) I want to send back an information about the status of this particular batch, so I think that I should do this in IgniteFuture object, but I don't find any information how to control the future object that's received in addData function.
Can anybody help with understanding, where can I control future that receives in addData function or some other way to realize a callback to streamed batch?
When you do StreamTransformer.from(), you forfeit the result of your BatchDataProcessor, because
for (Map.Entry<K, V> entry : entries)
cache.invoke(entry.getKey(), this, entry.getValue());
// ^ result of cache.invoke() is discarded here
DataStreamer is for one-directional streaming of data. It is not supposed to return values as far as I know.
If you depend on the result of cache.invoke(), I recommend calling it directly instead of relying on DataStreamer.
BTW, be careful with fut.get(). You should do dataStreamer.flush() first, or DataStreamer's futures will wait indefinitely.
Related
RxJava2 has a doAfterNext operator that emits items downstream, and then invokes the consumer. It doesn't seem like Project Reactor has such an operator so I'd like to get some pointers on the best way to create my own to achieve the same thing.
The use case is freeing memory after the subscriber has received the item
Not sure if leavering doOnEach is a valid solution:
public class ByteBufferSafeReleaseConsumer implements Consumer<Signal<ByteBuffer<?>>> {
private final List<ByteBuffer<?>> elements = new ArrayList<>();
#Override
public void accept(Signal<ByteBuffer<?>> signal) {
if (signal.isOnNext()) {
ByteBuffer<?> next = signal.get();
if (next != null) {
elements.add(next);
}
}
if (signal.isOnComplete() || signal.isOnError()) {
for (ByteBuffer<?> buffer : elements) {
ByteBufferUtils.safeRelease(buffer);
}
}
}
}
ByteBufferSafeReleaseConsumer consumer = new ByteBufferSafeReleaseConsumer()
Flux.from(byteBufferPublisher).doOnEach(consumer)
I have an rsocket endpoint that responds with a flux:
#MessageMapping("responses")
Flux<?> deal(#Payload String message) {
return myService.generateResponses(message);
}
The responses can be any of 3 different types of objects produced asynchronously using the following code (if it worked):
public Flux<?> generateResponses(String request) {
// Setup response sinks
final FluxProcessor publish = EmitterProcessor.create().serialize();
final FluxSink<Response1> sink1 = publish.sink();
final FluxSink<Response2> sink2 = publish.sink();
final FluxSink<Response3> sink3 = publish.sink();
// Get async responses: starts new thread to gather responses and update sinks
new MyResponses(request, sink1, sink2, sink3)
// Return the Flux
Flux<?> output = Flux
.from(publish
.log());
}
The problem is that when I populate the sinks with different objects only the first sink is actually publishing back to the subscriber.
public class MyResponses extends CacheListenerAdapter {
private FluxSink<Response1> sink1;
private FluxSink<Response2> sink2;
private FluxSink<Response3> sink3;
// Constructor is omitted for brevity
#Override
public void afterCreate(EntryEvent event) {
if (event.getNewValue() instanceof Response1) {
Response1 r1 = (Response1)event.getNewValue();
sink1.next(r1);
}
if (event.getNewValue() instanceof Response2) {
Response2 r2 = (Response2)event.getNewValue();
sink2.next(r2);
}
if (event.getNewValue() instanceof Response3) {
Response3 r3 = (Response3)event.getNewValue();
sink3.next(r3);
}
}
}
If I make the sinks of type <?> then there's a .next error:
The method next(capture#2-of ?) in the type FluxSink<capture#2-of ?> is not applicable for the arguments (Response1)
Is there a better approach to this requirement?
The reason this did not work with different object was to do with Spring Boot Data Geode serialization of underlying object types. The way to get the object Flux to work was use 1 sink of type <Object>
public Flux<Object> generateResponses(String request) {
// Setup the Flux
EmitterProcessor<Object> emitter = EmitterProcessor.create();
FluxSink<Object> sink = emitter.sink(FluxSink.OverflowStrategy.LATEST);
// Get async responses: starts new thread to gather responses and update sinks
new MyResponses(request, sink)
// Setup an output Flux to publish the input Flux
Flux<Object> out = Flux
.from(emitter
.log(log.getName()));
}
The event handler then used the 1 sink
public class MyResponses extends CacheListenerAdapter {
private FluxSink<Object> sink;
// Constructor is omitted for brevity
#Override
public void afterCreate(EntryEvent event) {
if (event.getNewValue() instanceof Response1) {
Response1 r1 = (Response1)event.getNewValue();
sink.next(r1);
}
if (event.getNewValue() instanceof Response2) {
Response2 r2 = (Response2)event.getNewValue();
sink.next(r2);
}
if (event.getNewValue() instanceof Response3) {
Response3 r3 = (Response3)event.getNewValue();
sink.next(r3);
}
}
}
I'm trying to use Dataflow to delete many millions of Datastore entities and the pace is extremely slow (5 entities/s). I am hoping you can explain to me the pattern I should follow to allow that to scale up to a reasonable pace. Just adding more workers did not help.
The Datastore Admin console has the ability to delete all entities of a specific kind but it fails a lot and takes me a week or more to delete 40 million entities. Dataflow ought to be able to help me delete millions of entities that match only certain query parameters as well.
I'm guessing that some type of batching strategy should be employed (where I create a mutation with 1000 deletes in it for example) but its not obvious to me how I would go about that. DatastoreIO gives me just one entity at a time to work with. Pointers would be greatly appreciated.
Below is my current slow solution.
Pipeline p = Pipeline.create(options);
DatastoreIO.Source source = DatastoreIO.source()
.withDataset(options.getDataset())
.withQuery(getInstrumentQuery(options))
.withNamespace(options.getNamespace());
p.apply("ReadLeafDataFromDatastore", Read.from(source))
.apply("DeleteRecords", ParDo.of(new DeleteInstrument(options.getDataset())));
p.run();
static class DeleteInstrument extends DoFn<Entity, Integer> {
String dataset;
DeleteInstrument(String dataset) {
this.dataset = dataset;
}
#Override
public void processElement(ProcessContext c) {
DatastoreV1.Mutation.Builder mutation = DatastoreV1.Mutation.newBuilder();
mutation.addDelete(c.element().getKey());
final DatastoreV1.CommitRequest.Builder request = DatastoreV1.CommitRequest.newBuilder();
request.setMutation(mutation);
request.setMode(DatastoreV1.CommitRequest.Mode.NON_TRANSACTIONAL);
try {
DatastoreOptions.Builder dbo = new DatastoreOptions.Builder();
dbo.dataset(dataset);
dbo.credential(getCredential());
Datastore db = DatastoreFactory.get().create(dbo.build());
db.commit(request.build());
c.output(1);
count++;
if(count%100 == 0) {
LOG.info(count+"");
}
} catch (Exception e) {
c.output(0);
e.printStackTrace();
}
}
}
There is no direct way of deleting entities using the current version of DatastoreIO. This version of DatastoreIO is going to be deprecated in favor of a new version (v1beta3) in the next Dataflow release. We think there is a good use case for providing a delete utility (either through an example or PTransform), but still work in progress.
For now you can batch your deletes, instead of deleting one at a time:
public static class DeleteEntityFn extends DoFn<Entity, Void> {
// Datastore max batch limit
private static final int DATASTORE_BATCH_UPDATE_LIMIT = 500;
private Datastore db;
private List<Key> keyList = new ArrayList<>();
#Override
public void startBundle(Context c) throws Exception {
// Initialize Datastore Client
// db = ...
}
#Override
public void processElement(ProcessContext c) throws Exception {
keyList.add(c.element().getKey());
if (keyList.size() >= DATASTORE_BATCH_UPDATE_LIMIT) {
flush();
}
}
#Override
public void finishBundle(Context c) throws Exception {
if (keyList.size() > 0) {
flush();
}
}
private void flush() throws Exception {
// Make one delete request instead of one for each element.
CommitRequest request =
CommitRequest.newBuilder()
.setMode(CommitRequest.Mode.NON_TRANSACTIONAL)
.setMutation(Mutation.newBuilder().addAllDelete(keyList).build())
.build();
db.commit(request);
keyList.clear();
}
}
I have an Xamarin Android project and was using mono.data.sqlite and had problems with multithreading, so I tried the Zumero component. I'm still having problems. I'm trying to set serialized mode as with the flag SQLITE_CONFIG_SERIALIZED in http://www.sqlite.org/threadsafe.html. I'm still getting random crashes. Can I set the serialized flag with Zumero? Any other suggestions other than recompiling SQLite from the source?
Thanks,
Brian
I used to have this problem. And despite conflicting recommendations here's how I stopped getting the exceptions:
Share a static instance of SQLiteConnection between all threads. This is safe to do as SQLite connection is only a file pointer it's not like a traditional data connection.
Wrapped all my SQLite queries/inserts/updates in a mutex with the statix instance of my SQLiteConnection as the lock. I've been advised that I shouldn't need to do this when using serialized mode however my experience with it begs to differ.
lock(myStaticConnection) {
myStaticConnection.Query<Employee>("....");
}
As a backup I also use some added retry logic to encapsulate every query. Not sure if SQLite does this on its own (I've seen reference to busytimeout and people claiming it is now gone?). Something like this:
public static List<T> Query<T> (string query, params object[] args) where T : new()
{
return Retry.DoWithLock (() => {
return Data.connection.Query<T> (query, args);
}, Data.connection, 0);
}
public static T DoWithLock<T>(
Func<T> action,
object lockable,
long retryIntervalTicks = defaultRetryIntervalTicks,
int retryCount = defaultRetryCount)
{
return Do (() => {
lock (lockable) {
return action();
}
});
}
public static T Do<T>(
Func<T> action,
long retryIntervalTicks = defaultRetryIntervalTicks,
int retryCount = defaultRetryCount)
{
var exceptions = new List<Exception> ();
for (int retry = 0; retry < retryCount; retry++) {
try{
return action();
} catch (Exception ex) {
exceptions.Add (ex);
ManualSleepEvent (new TimeSpan(retryIntervalTicks));
}
}
throw new AggregateException (exceptions);
}
`hi
I am doing a simple synchronous socket programming,in which i employed twothreads
one for accepting the client and put the socket object into a collection,other thread will
loop through the collection and send message to each client through the socket object.
the problem is
1.i connect to clients to the server and start send messages
2.now i want to connect a new client,while doing this i cant update the collection and add
a new client to my hashtable.it raises an exception "collection modified .Enumeration operation may not execute"
how to add a NEW value without having problems in a hashtable.
private void Listen()
{
try
{
//lblStatus.Text = "Server Started Listening";
while (true)
{
Socket ReceiveSock = ServerSock.Accept();
//keys.Clear();
ConnectedClients = new ListViewItem();
ConnectedClients.Text = ReceiveSock.RemoteEndPoint.ToString();
ConnectedClients.SubItems.Add("Connected");
ConnectedList.Items.Add(ConnectedClients);
ClientTable.Add(ReceiveSock.RemoteEndPoint.ToString(), ReceiveSock);
//foreach (System.Collections.DictionaryEntry de in ClientTable)
//{
// keys.Add(de.Key.ToString());
//}
//ClientTab.Add(
//keys.Add(
}
//lblStatus.Text = "Client Connected Successfully.";
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
private void btn_receive_Click(object sender, EventArgs e)
{
Thread receiveThread = new Thread(new ThreadStart(Receive));
receiveThread.IsBackground = true;
receiveThread.Start();
}
private void Receive()
{
while (true)
{
//lblMsg.Text = "";
byte[] Byt = new byte[2048];
//ReceiveSock.Receive(Byt);
lblMsg.Text = Encoding.ASCII.GetString(Byt);
}
}
private void btn_Send_Click(object sender, EventArgs e)
{
Thread SendThread = new Thread(new ThreadStart(SendMsg));
SendThread.IsBackground = true;
SendThread.Start();
}
private void btnlist_Click(object sender, EventArgs e)
{
//Thread ListThread = new Thread(new ThreadStart(Configure));
//ListThread.IsBackground = true;
//ListThread.Start();
}
private void SendMsg()
{
while (true)
{
try
{
foreach (object SockObj in ClientTable.Keys)
{
byte[] Tosend = new byte[2048];
Socket s = (Socket)ClientTable[SockObj];
Tosend = Encoding.ASCII.GetBytes("FirstValue&" + GenerateRandom.Next(6, 10).ToString());
s.Send(Tosend);
//ReceiveSock.Send(Tosend);
Thread.Sleep(300);
}
}
catch (Exception ex)
{
MessageBox.Show(ex.Message);
}
}
}
You simply can't modify a Hashtable, Dictionary, List or anything similar while you're iterating over it - whether in the same thread or a different one. There are concurrent collections in .NET 4 which allow this, but I'm assuming you're not using .NET 4. (Out of interest, why are you still using Hashtable rather than a generic Dictionary?)
You also shouldn't be modifying a Hashtable from one thread while reading from it in another thread without any synchronization.
The simplest way to fix this is:
Create a new readonly variable used for locking
Obtain the lock before you add to the Hashtable:
lock (tableLock)
{
ClientTable.Add(ReceiveSock.RemoteEndPoint.ToString(), ReceiveSock);
}
When you want to iterate, create a new copy of the data in the Hashtable within a lock
Iterate over the copy instead of the original table
Do you definitely even need a Hashtable here? It looks to me like a simple List<T> or ArrayList would be okay, where each entry was either the socket or possibly a custom type containing the socket and whatever other information you need. You don't appear to be doing arbitrary lookups on the table.
Yes. Don't do that.
The bigger problem here is unsafe multi-threading.
The most basic "answer" is just to say: use a synchronization lock on the shared object. However this hides a number of important aspects (like understanding what is happening) and isn't a real solution to this problem in my mind.