Can a thread run another fiber when the running fiber is in blocked - fiber

As far as I see,a thread can run another fiber when the running fiber is in blocked.But it is not the case.I create 100 fibers which will search solr.The result I find is all the fibers is executed in order.Another fiber can execute only if the previous one is finished just like a thread.This is my code.
import co.paralleluniverse.fibers.Fiber;
import co.paralleluniverse.fibers.FiberForkJoinScheduler;
import co.paralleluniverse.fibers.FiberScheduler;
import co.paralleluniverse.fibers.SuspendExecution;
public class FilterThreadTest {
static FiberForkJoinScheduler fiberForkJoinScheduler = new FiberForkJoinScheduler("fork-join-schedule", 1);
static SolrService solrService = new SolrService();
public static void main(String[] args) {
solrService.init();
for (int i = 0; i < 100; i++) {
new CountFiber(fiberForkJoinScheduler, i, solrService).start();
}
try {
Thread.sleep(10000000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
class CountFiber extends Fiber<Void> {
/**
*
*/
private static final long serialVersionUID = 1L;
private int count;
private SolrService solrService;
public CountFiber(FiberScheduler scheduler, int count, SolrService solrService) {
super(scheduler);
// TODO Auto-generated constructor stub
this.count = count;
this.solrService = solrService;
}
#Override
public Void run() throws SuspendExecution, InterruptedException {
System.out.println(count + " fiber is starting!");
solrService.search();
System.out.println(count + " fiber is ended!");
return null;
}
}
Did I misunderstand fiber?

Fibers will yield execution to other non-blocked fibers only when they perform fiber-blocking calls, not thread-blocking ones and Quasar doesn't automatically transform thread-blocking calls into fiber-blocking ones so you need to write (small, usually) integrations for pre-existing tools that don't know about Quasar.
The concurrent programming libraries provided by Quasar (Go-like channels, Erlang-like actors, dataflow programming, reactive streams and the java.util.concurrent port) support both fiber-blocking (when called from fibers) and thread-blocking (when called from threads); the same is true for Comsat integrations that cover many tools but, as of today, not Solr. Did you build a Solr integration yourself or is solrService.search() only thread-blocking?
For more information about integrating tools with Quasar (it's usually quite easy) see for example this blog post.

Related

Not able to receive onNext and onComplete call on subscribed mono

I was trying reactor library and I'm not able to figure out why below mono never return back with onNext or onComplete call. I think I missing very trivial thing. Here's a sample code.
MyServiceService service = new MyServiceService();
service.save("id")
.map(myUserMono -> new MyUser(myUserMono.getName().toUpperCase(), myUserMono.getId().toUpperCase()))
.subscribe(new Subscriber<MyUser>() {
#Override
public void onSubscribe(Subscription s) {
System.out.println("Subscribed!" + Thread.currentThread().getName());
}
#Override
public void onNext(MyUser myUser) {
System.out.println("OnNext on thread " + Thread.currentThread().getName());
}
#Override
public void onError(Throwable t) {
System.out.println("onError!" + Thread.currentThread().getName());
}
#Override
public void onComplete() {
System.out.println("onCompleted!" + Thread.currentThread().getName());
}
});
}
private static class MyServiceService {
private Repository myRepo = new Repository();
public Mono<MyUser> save(String userId) {
return myRepo.save(userId);
}
}
private static class Repository {
public Mono<MyUser> save(String userId) {
return Mono.create(myUserMonoSink -> {
Future<MyUser> submit = exe.submit(() -> this.blockingMethod(userId));
ListenableFuture<MyUser> myUserListenableFuture = JdkFutureAdapters.listenInPoolThread(submit);
Futures.addCallback(myUserListenableFuture, new FutureCallback<MyUser>() {
#Override
public void onSuccess(MyUser result) {
myUserMonoSink.success(result);
}
#Override
public void onFailure(Throwable t) {
myUserMonoSink.error(t);
}
});
});
}
private MyUser blockingMethod(String userId) throws InterruptedException {
Thread.sleep(5000);
return new MyUser("blocking", userId);
}
}
Above code only prints Subcribed!main. What I'm not able to figure out is why that future callback is not pushing values through myUserMonoSink.success
The important thing to keep in mind is that a Flux or Mono is asynchronous, most of the time.
Once you subscribe, the asynchronous processing of saving the user starts in the executor, but execution continues in your main code after .subscribe(...).
So the main thread exits, terminating your test before anything was pushed to the Mono.
[sidebar]: when is it ever synchronous?
When the source of data is a Flux/Mono synchronous factory method. BUT with the added pre-requisite that the rest of the chain of operators doesn't switch execution context. That could happen either explicitly (you use a publishOn or subscribeOn operator) or implicitly (some operators like time-related ones, eg. delayElements, run on a separate Scheduler).
Simply put, your source is ran in the ExecutorService thread of exe, so the Mono is indeed asynchronous. Your snippet on the other hand is ran on main.
How to fix the issue
To observe the correct behavior of Mono in an experiment (as opposed to fully async code in production), several possibilities are available:
keep subscribe with system.out.printlns, but add a new CountDownLatch(1) that is .countDown() inside onComplete and onError. await on the countdown latch after the subscribe.
use .log().block() instead of .subscribe(...). You lose the customization of what to do on each event, but log() will print those out for you (provided you have a logging framework configured). block() will revert to blocking mode and do pretty much what I suggested with the CountDownLatch above. It returns the value once available or throws an Exception in case of error.
instead of log() you can customize logging or other side effects using .doOnXXX(...) methods (there's one for pretty much every type of event + combinations of events, eg. doOnSubscribe, doOnNext...)
If you're doing a unit test, use StepVerifier from the reactor-tests project. It will subscribe to the flux/mono and wait for events when you call .verify(). See the reference guide chapter on testing (and the rest of the reference guide in general).
Issue is that in created anonymous class onSubscribe method does nothing.
If you look at implementation of LambdaSubscriber, it requests some number of events.
Also it's easier to extend BaseSubscriber as it has some predefined logic.
So your subscriber implementation would be:
MyServiceService service = new MyServiceService();
service.save("id")
.map(myUserMono -> new MyUser(myUserMono.getName().toUpperCase(), myUserMono.getId().toUpperCase()))
.subscribe(new BaseSubscriber<MyUser>() {
#Override
protected void hookOnSubscribe(Subscription subscription) {
System.out.println("Subscribed!" + Thread.currentThread().getName());
request(1); // or requestUnbounded();
}
#Override
protected void hookOnNext(MyUser myUser) {
System.out.println("OnNext on thread " + Thread.currentThread().getName());
// request(1); // if wasn't called requestUnbounded() 2
}
#Override
protected void hookOnComplete() {
System.out.println("onCompleted!" + Thread.currentThread().getName());
}
#Override
protected void hookOnError(Throwable throwable) {
System.out.println("onError!" + Thread.currentThread().getName());
}
});
Maybe it's not the best implementation, I'm new to reactor too.
Simon's answer has pretty good explanation about testing asynchronous code.

Sharing BigTable Connection object among DataFlow DoFn sub-classes

I am setting up a Java Pipeline in DataFlow to read a .csv file and to create a bunch of BigTable rows based on the content of the file. I see in the BigTable documentation the note that connecting to BigTable is an 'expensive' operation and that it's a good idea to do it only once and to share the connection among the functions that need it.
However, if I declare the Connection object as a public static variable in the main class and first connect to BigTable in the main function, I get the NullPointerException when I subsequently try to reference the connection in instances of DoFn sub-classes' processElement() function as part of my DataFlow pipeline.
Conversely, if I declare the Connection as a static variable in the actual DoFn class, then the operation works successfully.
What is the best-practice or optimal way to do this?
I'm concerned that if I implement the second option at scale, I will be wasting a lot of time and resources. If I keep the variable as static in the DoFn class, is it enough to ensure that the APIs don't try to re-establish the connection every time?
I realize there is a special BigTable I/O call to sync DataFlow pipeline objects with BigTable, but I think I need to write one on my own to build-in some special logic into the DoFn processElement() function...
This is what the "working" code looks like:
class DigitizeBT extends DoFn<String, String>{
private static Connection m_locConn;
#Override
public void processElement(ProcessContext c)
{
try
{
m_locConn = BigtableConfiguration.connect("projectID", "instanceID");
Table tbl = m_locConn.getTable(TableName.valueOf("TableName"));
Put put = new Put(Bytes.toBytes(rowKey));
put.addColumn(
Bytes.toBytes("CF1"),
Bytes.toBytes("SomeName"),
Bytes.toBytes("SomeValue"));
tbl.put(put);
}
catch (IOException e)
{
e.printStackTrace();
System.exit(1);
}
}
}
This is what updated code looks like, FYI:
public void SmallKVJob()
{
CloudBigtableScanConfiguration config = new CloudBigtableScanConfiguration.Builder()
.withProjectId(DEF.ID_PROJ)
.withInstanceId(DEF.ID_INST)
.withTableId(DEF.ID_TBL_UNITS)
.build();
DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);
options.setProject(DEF.ID_PROJ);
options.setStagingLocation(DEF.ID_STG_LOC);
// options.setNumWorkers(3);
// options.setMaxNumWorkers(5);
// options.setRunner(BlockingDataflowPipelineRunner.class);
options.setRunner(DirectPipelineRunner.class);
Pipeline p = Pipeline.create(options);
p.apply(TextIO.Read.from(DEF.ID_BAL))
.apply(ParDo.of(new DoFn1()))
.apply(ParDo.of(new DoFn2()))
.apply(ParDo.of(new DoFn3(config)));
m_log.info("starting to run the job");
p.run();
m_log.info("finished running the job");
}
}
class DoFn1 extends DoFn<String, KV<String, Integer>>
{
#Override
public void processElement(ProcessContext c)
{
c.output(KV.of(c.element().split("\\,")[0],Integer.valueOf(c.element().split("\\,")[1])));
}
}
class DoFn2 extends DoFn<KV<String, Integer>, KV<String, Integer>>
{
#Override
public void processElement(ProcessContext c)
{
int max = c.element().getValue();
String name = c.element().getKey();
for(int i = 0; i<max;i++)
c.output(KV.of(name, 1));
}
}
class DoFn3 extends AbstractCloudBigtableTableDoFn<KV<String, Integer>, String>
{
public DoFn3(CloudBigtableConfiguration config)
{
super(config);
}
#Override
public void processElement(ProcessContext c)
{
try
{
Integer max = c.element().getValue();
for(int i = 0; i<max; i++)
{
String owner = c.element().getKey();
String rnd = UUID.randomUUID().toString();
Put p = new Put(Bytes.toBytes(owner+"*"+rnd));
p.addColumn(Bytes.toBytes(DEF.ID_CF1), Bytes.toBytes("Owner"), Bytes.toBytes(owner));
getConnection().getTable(TableName.valueOf(DEF.ID_TBL_UNITS)).put(p);
c.output("Success");
}
} catch (IOException e)
{
c.output(e.toString());
e.printStackTrace();
}
}
}
The input .csv file looks something like this:
Mary,3000
John,5000
Peter,2000
So, for each row in the .csv file, I have to put in x number of rows into BigTable, where x is the second cell in the .csv file...
We built AbstractCloudBigtableTableDoFn ( Source & Docs ) for this purpose. Extend that class instead of DoFn, and call getConnection() instead of creating a Connection yourself.
10,000 small rows should take a second or two of actual work.
EDIT: As per the comments, BufferedMutator should be used instead of Table.put() for best throughput.

How can I do batch deletes millions on entities using DatastoreIO and Dataflow

I'm trying to use Dataflow to delete many millions of Datastore entities and the pace is extremely slow (5 entities/s). I am hoping you can explain to me the pattern I should follow to allow that to scale up to a reasonable pace. Just adding more workers did not help.
The Datastore Admin console has the ability to delete all entities of a specific kind but it fails a lot and takes me a week or more to delete 40 million entities. Dataflow ought to be able to help me delete millions of entities that match only certain query parameters as well.
I'm guessing that some type of batching strategy should be employed (where I create a mutation with 1000 deletes in it for example) but its not obvious to me how I would go about that. DatastoreIO gives me just one entity at a time to work with. Pointers would be greatly appreciated.
Below is my current slow solution.
Pipeline p = Pipeline.create(options);
DatastoreIO.Source source = DatastoreIO.source()
.withDataset(options.getDataset())
.withQuery(getInstrumentQuery(options))
.withNamespace(options.getNamespace());
p.apply("ReadLeafDataFromDatastore", Read.from(source))
.apply("DeleteRecords", ParDo.of(new DeleteInstrument(options.getDataset())));
p.run();
static class DeleteInstrument extends DoFn<Entity, Integer> {
String dataset;
DeleteInstrument(String dataset) {
this.dataset = dataset;
}
#Override
public void processElement(ProcessContext c) {
DatastoreV1.Mutation.Builder mutation = DatastoreV1.Mutation.newBuilder();
mutation.addDelete(c.element().getKey());
final DatastoreV1.CommitRequest.Builder request = DatastoreV1.CommitRequest.newBuilder();
request.setMutation(mutation);
request.setMode(DatastoreV1.CommitRequest.Mode.NON_TRANSACTIONAL);
try {
DatastoreOptions.Builder dbo = new DatastoreOptions.Builder();
dbo.dataset(dataset);
dbo.credential(getCredential());
Datastore db = DatastoreFactory.get().create(dbo.build());
db.commit(request.build());
c.output(1);
count++;
if(count%100 == 0) {
LOG.info(count+"");
}
} catch (Exception e) {
c.output(0);
e.printStackTrace();
}
}
}
There is no direct way of deleting entities using the current version of DatastoreIO. This version of DatastoreIO is going to be deprecated in favor of a new version (v1beta3) in the next Dataflow release. We think there is a good use case for providing a delete utility (either through an example or PTransform), but still work in progress.
For now you can batch your deletes, instead of deleting one at a time:
public static class DeleteEntityFn extends DoFn<Entity, Void> {
// Datastore max batch limit
private static final int DATASTORE_BATCH_UPDATE_LIMIT = 500;
private Datastore db;
private List<Key> keyList = new ArrayList<>();
#Override
public void startBundle(Context c) throws Exception {
// Initialize Datastore Client
// db = ...
}
#Override
public void processElement(ProcessContext c) throws Exception {
keyList.add(c.element().getKey());
if (keyList.size() >= DATASTORE_BATCH_UPDATE_LIMIT) {
flush();
}
}
#Override
public void finishBundle(Context c) throws Exception {
if (keyList.size() > 0) {
flush();
}
}
private void flush() throws Exception {
// Make one delete request instead of one for each element.
CommitRequest request =
CommitRequest.newBuilder()
.setMode(CommitRequest.Mode.NON_TRANSACTIONAL)
.setMutation(Mutation.newBuilder().addAllDelete(keyList).build())
.build();
db.commit(request);
keyList.clear();
}
}

how to pause and resume a download in javafx

I am building a download manager in javafx
I have added function to download button which initialises new task.More than one download is also being executed properly.
But I need to add pause and resume function. Please tell how to implement it using executor. Through execute function of Executors, task is being started but how do i pause & then resume it??
Below I am showing relevant portions of my code. Please tell if you need more details. thanks.
Main class
public class Controller implements Initializable {
public Button addDownloadButton;
public Button pauseResumeButton;
public TextField urlTextBox;
public TableView<DownloadEntry> downloadsTable;
ExecutorService executor;
#Override
public void initialize(URL location, ResourceBundle resources) {
// here tableview and table columns are initialised and cellValueFactory is set
executor = Executors.newFixedThreadPool(4);
}
public void addDownloadButtonClicked() {
DownloadEntry task = new DownloadEntry(new URL(urlTextBox.getText()));
downloadsTable.getItems().add(task);
executor.execute(task);
}
public void pauseResumeButtonClicked() {
//CODE FOR PAUSE AND RESUME
}
}
DownloadEntry.java
public class DownloadEntry extends Task<Void> {
public URL url;
public int downloaded;
final int MAX_BUFFER_SIZE=50*1024;
private String status;
//Constructor
public DownloadEntry(URL ur) throws Exception{
url = ur;
//other variables are initialised here
this.updateMessage("Downloading");
}
#Override
protected Void call() {
file = new RandomAccessFile(filename, "rw");
file.seek(downloaded);
stream = con.getInputStream();
while (status.equals("Downloading")) {
byte buffer=new byte[MAX_BUFFER_SIZE];
int c=stream.read(buffer);
if (c==-1){
break;
}
file.write(buffer,0,c);
downloaded += c;
status = "Downloading";
}
if (status.equals("Downloading")) {
status = "Complete";
updateMessage("Complete");
}
return null;
}
}
You may be interested in Concurrency in JavaFX.
I guess you should also have a look at pattern Observer.
By the way I think you should not use constant string as a status ("Downloading", etc), creating an enum would be a better approach.
In your loop, around the read/write part, there should be a synchronization mechanism, controlled by your pause/resume buttons (see the two links).

EnityManager won't delete records if I put it in a Runnable

I searched for some answer to my question on this site; but failed on every turn. I can delete fine if I don't put this in a ExecutorService, but if I do, it doesn't delete. No error occurs just the records are still in the database. Please advise.
public void deleteAllTrials(List<Trials>list) {
threadList = list;
ExecutorService executor = Executors.newFixedThreadPool(1);
executor.execute(new Job1());
executor.shutdown();
}
public class Job1 implements Runnable {
#Override
public void run() {
//Session session = (Session) entityManager.getDelegate();
EntityManagerFactory emf = entityManager.getEntityManagerFactory();
EntityManager em = emf.createEntityManager();
System.out.println("Size of threadList" + threadList.size());
long start = System.currentTimeMillis();
for(int i =0; i<threadList.size(); i++){
System.out.println("In thread...");
Trials mergedEntity = em.merge(threadList.get(i));
em.remove(mergedEntity);
}
//System.out.println("Result list in service:" + list.size());
//em.close();
long end = System.currentTimeMillis();
System.out.println("Threads took this long:" + (end - start));
}
}
I found out that EJBs are more powerful than I thought. If you just add #Asynchronus on top of the method you want the application to separate in a backing thread, it will be acting as a separate thread allowing the user to continue doing what he wants to do without waiting on the process to finish.
#Asynchronous
public void deleteAllTrials(List<TrialBillet>list) {
List<TrialBillet> threadList = new ArrayList<TrialBillet>();
threadList = list;
for(int i =0; i<threadList.size(); i++){
this.delete(threadList.get(i));
}
}
If you want to go with Executors, java-ee-7 has introduced ManagedExecutorService
From Java EE 7 tutorial
ManagedExecutorService: A managed executor service is used by
applications to execute submitted tasks asynchronously. Tasks are
executed on threads that are started and managed by the container. The
context of the container is propagated to the thread executing the
task. For example, by using an ManagedExecutorService.submit() call, a
task, such as the GenerateReportTask, could be submitted to execute at
a later time and then, by using the Future object callback, retrieve
the result when it becomes available.
code sample:
public class MyClass {
#Resource
private ManagedExecutorService mes;
public void myMethod() {
mes.execute(new Worker());
}
}

Resources