Poor performance of Flux::mergeComparing - project-reactor

I'm migrating my application to Reactor and noticed weird performance issue with Flux::mergeComparing. Here's a benchmark for my existing implementation and Flux alternative.
#Fork(value = 1)
#State(Scope.Thread)
#BenchmarkMode(Mode.Throughput)
#Measurement(iterations = 3)
#OutputTimeUnit(TimeUnit.SECONDS)
public class MergeComparingBenchmark
{
private List<List<Integer>> numbers;
#Setup
public void setUp()
{
numbers = IntStream.range(0, 1_000)
.mapToObj(i -> IntStream.range(0, 10_000))
.map(IntStream::boxed)
.map(stream -> stream.collect(toList()))
.collect(toList());
}
#Benchmark
public List<Integer> myMerge()
{
final List<Stream<Integer>> data = numbers.stream().map(Collection::stream).collect(toList());
final MergingIterator mergingIterator =
new MergingIterator(data);
return Streams.stream(mergingIterator).collect(toList());
}
#Benchmark
public List<Integer> mergeComparing()
{
final Flux<Integer>[] data = numbers.stream().map(Flux::fromIterable)
.toArray(Flux[]::new);
return Flux.mergeComparing(data)
.collectList()
.block();
}
static class MergingIterator implements Iterator<Integer>
{
private final PriorityQueue<Tuple2<Integer, Iterator<Integer>>> queue;
MergingIterator(final List<? extends Stream<Integer>> iterators)
{
this.queue = new PriorityQueue<>(Comparator.comparingInt(entry -> entry.v1));
checkNotNull(iterators).stream()
.map(Stream::iterator)
.filter(Iterator::hasNext)
.forEach(iterator -> queue.add(Tuple.tuple(iterator.next(), iterator)));
}
#Override
public boolean hasNext()
{
return !queue.isEmpty();
}
#Override
public Integer next()
{
final Tuple2<Integer, Iterator<Integer>> element = queue.poll();
final Iterator<Integer> iterator = element.v2;
if (iterator.hasNext()) {
queue.add(Tuple.tuple(iterator.next(), iterator));
}
return element.v1;
}
}
}
And the results are:
MergeComparingBenchmark.mergeComparing thrpt 3 0.181 ± 0.664 ops/s
MergeComparingBenchmark.myMerge thrpt 3 5.146 ± 1.390 ops/s
Why is the reactor case so slow? What can I do to improve it?
I tried to change the subscriber thread pools, changing prefetch size and reactor buffers, but there was not difference.

If your intention is to get a list of sorted integers, then I think you can simply call the collectSortedList method :
return Flux.merge(data)
.collectSortedList()
.block();
I also don't understand why you implemented your own iterator, you basically just need to call the sorted method :
return numbers.stream()
.flatMap(Collection::stream)
.sorted()
.collect(Collectors.toList())
MergeComparing
From the documentation
Merge data from provided Publisher sequences into an ordered merged sequence, by picking the smallest values from each source (as defined by the provided Comparator). This is not a sort(Comparator), as it doesn't consider the whole of each sequences.
Instead, this operator considers only one value from each source and picks the smallest of all these values, then replenishes the slot for that picked source.
It only consider a single element from each sources to do the comparison :
Flux<Integer> flux1 = Flux.just(9, 6, 11);
Flux<Integer> flux2 = Flux.just(10, 2, 13);
Flux.mergeComparing(flux1, flux2)
.doOnNext(System.out::println)
.subscribe();
/**
Output => 9 6 10 2 11 13
Compare 9 and 10 => 9
Compare 6 and 10 => 6
Compare 11 and 10 => 10
Compare 11 and 2 => 2
Compare 11 and 13 => 11
=> 13
**/
This is probably not what you need

Related

Convert Stream<Stream<T>> to Stream<T>

I am using rxdart package to handle stream in dart. I am stuck in handling a peculiar problem.
Please have a look at this dummy code:
final userId = BehaviorSubject<String>();
Stream<T> getStream(String uid) {
// a sample code that returns a stream
return BehaviorSubject<T>().stream;
}
final Observable<Stream<T>> oops = userId.map((uid) => getStream(uid));
Now I want to convert the oops variable to get only Observable<T>.
I am finding it difficult to explain clearly. But let me try. I have a stream A. I map each output of stream A to another stream B. Now I have Stream<Stream<B>> - a kind of recurrent stream. I just want to listen to the latest value produced by this pattern. How may I achieve this?
I will list several ways to flatten the Stream<Stream<T>> into single Stream<T>.
1. Using pure dart
As answered by #Irn, this is a pure dart solution:
Stream<T> flattenStreams<T>(Stream<Stream<T>> source) async* {
await for (var stream in source) yield* stream;
}
Stream<int> getStream(String v) {
return Stream.fromIterable([1, 2, 3, 4]);
}
void main() {
List<String> list = ["a", "b", "c"];
Stream<int> s = flattenStreams(Stream.fromIterable(list).map(getStream));
s.listen(print);
}
Outputs: 1 2 3 4 1 2 3 4 1 2 3 4
2. Using Observable.flatMap
Observable has a method flatMap that flattens the output stream and attach it to ongoing stream:
import 'package:rxdart/rxdart.dart';
Stream<int> getStream(String v) {
return Stream.fromIterable([1, 2, 3, 4]);
}
void main() {
List<String> list = ["a", "b", "c"];
Observable<int> s = Observable.fromIterable(list).flatMap(getStream);
s.listen(print);
}
Outputs: 1 2 3 4 1 2 3 4 1 2 3 4
3. Using Observable.switchLatest
Convert a Stream that emits Streams (aka a "Higher Order Stream") into a single Observable that emits the items emitted by the most-recently-emitted of those Streams.
This is the solution I was looking for! I just needed the latest output emitted by the internal stream.
import 'package:rxdart/rxdart.dart';
Stream<int> getStream(String v) {
return Stream.fromIterable([1, 2, 3, 4]);
}
void main() {
List<String> list = ["a", "b", "c"];
Observable<int> s = Observable.switchLatest(
Observable.fromIterable(list).map(getStream));
s.listen(print);
}
Outputs: 1 1 1 2 3 4
It's somewhat rare to have a Stream<Stream<Something>>, so it isn't something that there is much explicit support for.
One reason is that there are several (at least two) ways to combine a stream of streams of things into a stream of things.
Either you listen to each stream in turn, waiting for it to complete before starting on the next, and then emit the events in order.
Or you listen on each new stream the moment it becomes available, and then emit the events from any stream as soon as possible.
The former is easy to write using async/await:
Stream<T> flattenStreams<T>(Stream<Stream<T>> source) async* {
await for (var stream in source) yield* stream;
}
The later is more complicated because it requires listening on more than one stream at a time, and combining their events. (If only StreamController.addStream allowed more than one stream at a time, then it would be much easier). You can use the StreamGroup class from package:async for this:
import "package:async/async" show StreamGroup;
Stream<T> mergeStreams<T>(Stream<Stream<T>> source) {
var sg = StreamGroup<T>();
source.forEach(sg.add).whenComplete(sg.close);
// This doesn't handle errors in [source].
// Maybe insert
// .catchError((e, s) {
// sg.add(Future<T>.error(e, s).asStream())
// before `.whenComplete` if you worry about errors in [source].
return sg.stream;
}
If you want a Stream> to return a Stream, you basically need to flatten the stream.
The flatmap function is what you would use here
public static void main(String args[]) {
StreamTest st = new StreamTest<String>();
List<String> l = Arrays.asList("a","b", "c");
Stream s = l.stream().map(i -> st.getStream(i)).flatMap(i->i);
}
Stream<T> static getStream(T uid) {
// a sample code that returns a stream
return Stream.of(uid);
}
If you need the first object, then use the findFirst() method
public static void main(String args[]) {
StreamTest st = new StreamTest<String>();
List<String> l = Arrays.asList("a","b", "c");
String str = l.stream().map(i -> st.getStream(i)).flatMap(i->i).findFirst().get();
}
You need to call asyncExpand method on your stream - it lets you transform each element into a sequence of asynchronous events.

how to count number of rows in a file in dataflow in a efficient way?

I want to count total number of rows in a file.
Please explain your code if possible.
String fileAbsolutePath = "gs://sourav_bucket_dataflow/" + fileName;
PCollection<String> data = p.apply("Reading Data From File", TextIO.read().from(fileAbsolutePath));
PCollection<Long> count = data.apply(Count.<String>globally());
Now i want to get the value.
There are a variety of sinks that you can use to get data out of your pipeline. https://beam.apache.org/documentation/io/built-in/ has a list of the current built in IO transforms.
It sort of depends on what you want to do with that number. Assuming you want to use it in your future transformations, you may want to convert it to a PCollectionView object and pass it as a side input to other transformations.
PCollection<String> data = p.apply("Reading Data From File", TextIO.read().from(fileAbsolutePath));
PCollection<Long> count = data.apply(Count.<String>globally());
final PCollectionView<Long> view = count.apply(View.asSingleton());
A quick example to show you how to use the value as a side count:
data.apply(ParDo.of(new FuncFn(view)).withSideInputs(view));
Where:
class FuncFn extends DoFn<String,String>
{
private final PCollectionView<Long> mySideInput;
public FuncFn(PCollectionView<Long> mySideInput) {
this.mySideInput = mySideInput;
}
#ProcessElement
public void processElement(ProcessContext c) throws IOException
{
Long count = c.sideInput(mySideInput);
//other stuff you may want to do
}
}
Hope that helps!
where "input" in line 1 is the input. This will work.
PCollection<Long> number = input.apply(Count.globally());
number.apply(MapElements.via(new SimpleFunction<Long, Long>()
{
public Long apply(Long total)
{
System.out.println("Length is: " + total);
return total;
}
}));

Recommendation Engine using Apache Spark MLIB showing up Zero recommendations after processing all operations

I am a newbie when it comes to Implementation of ML Algorithms. I wanted to implement a recommendation Engine and Got to know after little experimenting that collaborative-filtering can be used for the same. I am using Apache Spark for the same. I got help from one of the blogs and tried to implement the same in my local. PFB Code that I tried out. Every time I execute this the Count of Recommendations that is getting printed is always zero. I don see any Evident Error as such. Could someone please help me understand this. Also, please feel free to provide any other reference that can be referred in this regard.
package mllib.example;
import org.apache.log4j.Level;
import org.apache.log4j.Logger;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.api.java.function.VoidFunction;
import org.apache.spark.mllib.recommendation.ALS;
import org.apache.spark.mllib.recommendation.MatrixFactorizationModel;
import org.apache.spark.mllib.recommendation.Rating;
import scala.Tuple2;
public class RecommendationEngine {
public static void main(String[] args) {
// Create Java spark context
SparkConf conf = new SparkConf().setAppName("Recommendation System Example").setMaster("local[2]").set("spark.executor.memory","1g");
JavaSparkContext sc = new JavaSparkContext(conf);
// Read user-item rating file. format - userId,itemId,rating
JavaRDD<String> userItemRatingsFile = sc.textFile(args[0]);
System.out.println("Count is "+userItemRatingsFile.count());
// Read item description file. format - itemId, itemName, Other Fields,..
JavaRDD<String> itemDescritpionFile = sc.textFile(args[1]);
System.out.println("itemDescritpionFile Count is "+itemDescritpionFile.count());
// Map file to Ratings(user,item,rating) tuples
JavaRDD<Rating> ratings = userItemRatingsFile.map(new Function<String, Rating>() {
public Rating call(String s) {
String[] sarray = s.split(",");
return new Rating(Integer.parseInt(sarray[0]), Integer
.parseInt(sarray[1]), Double.parseDouble(sarray[2]));
}
});
System.out.println("Ratings RDD Object"+ratings.first().toString());
// Create tuples(itemId,ItemDescription), will be used later to get names of item from itemId
JavaPairRDD<Integer,String> itemDescritpion = itemDescritpionFile.mapToPair(
new PairFunction<String, Integer, String>() {
#Override
public Tuple2<Integer, String> call(String t) throws Exception {
String[] s = t.split(",");
return new Tuple2<Integer,String>(Integer.parseInt(s[0]), s[1]);
}
});
System.out.println("itemDescritpion RDD Object"+ratings.first().toString());
// Build the recommendation model using ALS
int rank = 10; // 10 latent factors
int numIterations = Integer.parseInt(args[2]); // number of iterations
MatrixFactorizationModel model = ALS.trainImplicit(JavaRDD.toRDD(ratings),
rank, numIterations);
//ALS.trainImplicit(arg0, arg1, arg2)
// Create user-item tuples from ratings
JavaRDD<Tuple2<Object, Object>> userProducts = ratings
.map(new Function<Rating, Tuple2<Object, Object>>() {
public Tuple2<Object, Object> call(Rating r) {
return new Tuple2<Object, Object>(r.user(), r.product());
}
});
// Calculate the itemIds not rated by a particular user, say user with userId = 1
JavaRDD<Integer> notRatedByUser = userProducts.filter(new Function<Tuple2<Object,Object>, Boolean>() {
#Override
public Boolean call(Tuple2<Object, Object> v1) throws Exception {
if (((Integer) v1._1).intValue() != 0) {
return true;
}
return false;
}
}).map(new Function<Tuple2<Object,Object>, Integer>() {
#Override
public Integer call(Tuple2<Object, Object> v1) throws Exception {
return (Integer) v1._2;
}
});
// Create user-item tuples for the items that are not rated by user, with user id 1
JavaRDD<Tuple2<Object, Object>> itemsNotRatedByUser = notRatedByUser
.map(new Function<Integer, Tuple2<Object, Object>>() {
public Tuple2<Object, Object> call(Integer r) {
return new Tuple2<Object, Object>(0, r);
}
});
// Predict the ratings of the items not rated by user for the user
JavaRDD<Rating> recomondations = model.predict(itemsNotRatedByUser.rdd()).toJavaRDD().distinct();
// Sort the recommendations by rating in descending order
recomondations = recomondations.sortBy(new Function<Rating,Double>(){
#Override
public Double call(Rating v1) throws Exception {
return v1.rating();
}
}, false, 1);
System.out.println("recomondations Total is "+recomondations.count());
// Get top 10 recommendations
JavaRDD<Rating> topRecomondations = sc.parallelize(recomondations.take(10));
// Join top 10 recommendations with item descriptions
JavaRDD<Tuple2<Rating, String>> recommendedItems = topRecomondations.mapToPair(
new PairFunction<Rating, Integer, Rating>() {
#Override
public Tuple2<Integer, Rating> call(Rating t) throws Exception {
return new Tuple2<Integer,Rating>(t.product(),t);
}
}).join(itemDescritpion).values();
System.out.println("recommendedItems count is "+recommendedItems.count());
//Print the top recommendations for user 1.
recommendedItems.foreach(new VoidFunction<Tuple2<Rating,String>>() {
#Override
public void call(Tuple2<Rating, String> t) throws Exception {
System.out.println(t._1.product() + "\t" + t._1.rating() + "\t" + t._2);
}
});
Also, I see that this job is Running for real Long time. Every time it creates a model.Is there a way I can Create the Model once, persist it and Load the same for consecutive Predictions. Can we by any chance improve the Speed of execution of this job
Thanks in Advance

Stateful ParDo not working on Dataflow Runner

Based on Javadocs and the blog post at https://beam.apache.org/blog/2017/02/13/stateful-processing.html, I tried using a simple de-duplication example using 2.0.0-beta-2 SDK which reads a file from GCS (containing a list of jsons each with a user_id field) and then running it through a pipeline as explained below.
The input data contains about 146K events of which only 50 events are unique. The entire input is about 50MB which should be processable in considerably less time than the 2 min Fixed window. I just placed a window there to make sure the per-key-per-window semantics hold without using a GlobalWindow. I run the windowed data through 3 parallel stages to compare the results, each of which are explained below.
just copies the contents into a new file on GCS - this ensures all the events were being processed as expected and I verified the contents are exactly the same as input
Combine.PerKey on the user_id and pick only the first element from the Iterable - this essentially should deduplicate the data and it works as expected. The resulting file has the exact number of unique items from the original list of events - 50 elements
stateful ParDo which checks if the key has been seen already and emits an output only when its not. Ideally, the result from this should match the deduped data as [2] but all I am seeing is only 3 unique events. These 3 unique events always point to the same 3 user_ids in a few runs I did.
Interestingly, when I just switch from the DataflowRunner to the DirectRunner running this whole process locally, I see that the output from [3] matches [2] having only 50 unique elements as expected. So, I am doubting if there are any issues with the DataflowRunner for the Stateful ParDo.
public class StatefulParDoSample {
private static Logger logger = LoggerFactory.getLogger(StatefulParDoSample.class.getName());
static class StatefulDoFn extends DoFn<KV<String, String>, String> {
final Aggregator<Long, Long> processedElements = createAggregator("processed", Sum.ofLongs());
final Aggregator<Long, Long> skippedElements = createAggregator("skipped", Sum.ofLongs());
#StateId("keyTracker")
private final StateSpec<Object, ValueState<Integer>> keyTrackerSpec =
StateSpecs.value(VarIntCoder.of());
#ProcessElement
public void processElement(
ProcessContext context,
#StateId("keyTracker") ValueState<Integer> keyTracker) {
processedElements.addValue(1l);
final String userId = context.element().getKey();
int wasSeen = firstNonNull(keyTracker.read(), 0);
if (wasSeen == 0) {
keyTracker.write( 1);
context.output(context.element().getValue());
} else {
keyTracker.write(wasSeen + 1);
skippedElements.addValue(1l);
}
}
}
public static void main(String[] args) {
DataflowPipelineOptions pipelineOptions = PipelineOptionsFactory.create().as(DataflowPipelineOptions.class);
pipelineOptions.setRunner(DataflowRunner.class);
pipelineOptions.setProject("project-name");
pipelineOptions.setStagingLocation(GCS_STAGING_LOCATION);
pipelineOptions.setStreaming(false);
pipelineOptions.setAppName("deduper");
Pipeline p = Pipeline.create(pipelineOptions);
final ObjectMapper mapper = new ObjectMapper();
PCollection<KV<String, String>> keyedEvents =
p
.apply(TextIO.Read.from(GCS_SAMPLE_INPUT_FILE_PATH))
.apply(WithKeys.of(new SerializableFunction<String, String>() {
#Override
public String apply(String input) {
try {
Map<String, Object> eventJson =
mapper.readValue(input, Map.class);
return (String) eventJson.get("user_id");
} catch (Exception e) {
}
return "";
}
}))
.apply(
Window.into(
FixedWindows.of(Duration.standardMinutes(2))
)
);
keyedEvents
.apply(ParDo.of(new StatefulDoFn()))
.apply(TextIO.Write.to(GCS_SAMPLE_OUTPUT_FILE_PATH).withNumShards(1));
keyedEvents
.apply(Values.create())
.apply(TextIO.Write.to(GCS_SAMPLE_COPY_FILE_PATH).withNumShards(1));
keyedEvents
.apply(Combine.perKey(new SerializableFunction<Iterable<String>, String>() {
#Override
public String apply(Iterable<String> input) {
return !input.iterator().hasNext() ? "empty" : input.iterator().next();
}
}))
.apply(Values.create())
.apply(TextIO.Write.to(GCS_SAMPLE_COMBINE_FILE_PATH).withNumShards(1));
PipelineResult result = p.run();
result.waitUntilFinish();
}
}
This was a bug in the Dataflow service in batch mode, fixed in the upcoming 0.6.0 Beam release (or HEAD if you track the bleeding edge).
Thank you for bringing it to my attention! For reference, or if anything else comes up, this was tracked by BEAM-1611.

Facing Critical Performance issue in Primefaces 4 & 5

I am working on a project which deal with heavy data sets. I am using Primefaces 4 & 5, spring and hibernate. I have to to display a very huge datasets such as min 3000 rows with 100 columns with various features such as sorting, filtering, row-expansion etc. My problem is, my applications took 8 to 10 mins to show the whole page as well as other functionalities(sorting, filtering ) also takes a lot time. My client is not happy at all. However I can use pagination for this but again My client do not want paging. So I decided to use livescroll but unfortunately I failed to implement livescroll with lazyload or without lazyload as there were bugs in PF regarding livescroll. also i have posted this question here earlier but no solution found.
This performance issue is very critical and show stopper for me. To show 3000 rows with 100 columns, the size of the page which is getting loaded is ~10MB.
I have calculated the time consumed by various life-cycles of of JSF, using Phase-listener I figure out that its Browser who is taking time to parse the response rendered by jsf. To complete the all phases my application took only 25 sec.
At minimal I want to increase the performance of my project. Please share any idea, suggestion and anything which could help to overcome this problem
Note: There is no database manipulations in getters and setters as well as no complex business logic.
UPDATE :
This is my datatable without lazyload:
<p:dataTable
style="width:100%"
id="cdTable"
selection="#{controller.selectedArray}"
resizableColumns="true"
draggableColumns="true"
var="cd"
value="#{controller.cdDataModel}"
editable="true"
editMode="cell"
selectionMode="multiple"
rowSelectMode="add"
scrollable="true"
scrollHeight="650"
rowKey="#{cd.id}"
rowIndexVar="rowIndex"
styleClass="screenScrollStyle"
liveScroll="true"
scrollRows="50"
filterEvent="enter"
widgetVar="dt4"
>
Here everything is working except filtering. Once I filter then first page is displayed but unable to sort or livescroll on datatable. Note this I have tested in Primefaces5.
2nd Approch
With lazyload with same datatable
1) When I add rows="100" livescroll happens but problem with row-editing, row-expansion but filter & sorting works.
2) When I remove rows livescroll works with row-editing, row-expansion etc but filter & sorting dont work.
My LazyLoadModel is as follows
public class MyDataModel extends LazyDataModel<YData>
{
#Override
public List<YData> load(int first, int pageSize,
List<SortMeta> multiSortMeta, Map<String, Object> filters) {
System.out.println("multisort wala load");
return super.load(first, pageSize, multiSortMeta, filters);
}
/**
*
*/
private static final long serialVersionUID = 1L;
private List<YData> datasource;
public YieldRecBondDataModel() {
}
public YieldRecBondDataModel(List<YData> datasource) {
this.datasource = datasource;
}
#Override
public YData getRowData(String rowKey) {
// In a real app, a more efficient way like a query by rowKey should be
// implemented to deal with huge data
// List<YData> yList = (List<YData>) getWrappedData();
for (YData y : datasource)
{
System.out.println("datasource :"+datasource.size());
if(y.getId()!=null)
{
if (y.getId()==(new Long(rowKey)))
{
return y;
}
}
}
return null;
}
#Override
public Object getRowKey(YData y) {
return y.getId();
}
#Override
public void setRowIndex(int rowIndex) {
/*
* The following is in ancestor (LazyDataModel):
* this.rowIndex = rowIndex == -1 ? rowIndex : (rowIndex % pageSize);
*/
if (rowIndex == -1 || getPageSize() == 0) {
super.setRowIndex(-1);
}
else
super.setRowIndex(rowIndex % getPageSize());
}
#Override
public List<YData> load(int first, int pageSize, String sortField, SortOrder sortOrder, Map<String,Object> filters) {
List<YData> data = new ArrayList<YData>();
System.out.println("sort order : "+sortOrder);
//filter
for(YData yInfo : datasource) {
boolean match = true;
for(Iterator<String> it = filters.keySet().iterator(); it.hasNext();) {
try {
String filterProperty = it.next();
String filterValue = String.valueOf(filters.get(filterProperty));
Field yField = yInfo.getClass().getDeclaredField(filterProperty);
yField.setAccessible(true);
String fieldValue = String.valueOf(yField.get(yInfo));
if(filterValue == null || fieldValue.startsWith(filterValue)) {
match = true;
}
else {
match = false;
break;
}
} catch(Exception e) {
e.printStackTrace();
match = false;
}
}
if(match) {
data.add(yInfo);
}
}
//sort
if(sortField != null) {
Collections.sort(data, new LazySorter(sortField, sortOrder));
}
int dataSize = data.size();
this.setRowCount(dataSize);
//paginate
if(dataSize > pageSize) {
try {
List<YData> subList = data.subList(first, first + pageSize);
return subList;
}
catch(IndexOutOfBoundsException e) {
return data.subList(first, first + (dataSize % pageSize));
}
}
else
return data;
}
#Override
public int getRowCount() {
// TODO Auto-generated method stub
return super.getRowCount();
}
}
I am fade up with these issues and becomes show stopper for me. Even i tried Primefaces 5
If your data is loaded from db i suggest you to do a better LazyDataModel like:
public class ElementiLazyDataModel extends LazyDataModel<T> implements Serializable {
private Service<T> abstractFacade;
public ElementiLazyDataModel(Service<T> abstractFacade) {
this.abstractFacade = abstractFacade;
}
public Service<T> getAbstractFacade() {
return abstractFacade;
}
public void setAbstractFacade(Service<T> abstractFacade) {
this.abstractFacade = abstractFacade;
}
#Override
public List<T> load(int first, int pageSize, String sortField, SortOrder sortOrder, Map<String, Object> filters) {
PaginatedResult<T> pr = abstractFacade.findRange(new int[]{first, first + pageSize}, sortField, sortOrder, filters);
setRowCount(new Long(pr.getTotalItems()).intValue());
return pr.getItems();
}
}
The service is some kind of backend communication (like an EJB) injected in the ManagedBean that use this model.
The service for pagination may be like this:
#Override
public PaginatedResult<T> findRange(int[] range, String sortField, SortOrder sortOrder, Map<String, Object> filters) {
final Query query = getEntityManager().createQuery("select x from " + entityClass.getSimpleName() + " x")
.setFirstResult(range[0]).setMaxResults(range[1] - range[0] + 1);
// Add filter sort etc.
final Query queryCount = getEntityManager().createQuery("select count(x) from " + entityClass.getSimpleName() + " x");
// Add filter sort etc.
Long rowCount = (Long) queryCount.getSingleResult();
List<T> resultList = query.getResultList();
return new PaginatedResult<T>(resultList, rowCount);
}
Note that you have to do the paginated query (with jpa like this the orm do the query for you, but if you don't use orm have to do paginated query, for oracle look at TOP-N query, for example: http://oracle-base.com/articles/misc/top-n-queries.php)
Remember your return obj must be contains also the total record as a fast count:
public class PaginatedResult<T> implements Serializable {
private List<T> items;
private long totalItems;
public PaginatedResult() {
}
public PaginatedResult(List<T> items, long totalItems) {
this.items = items;
this.totalItems = totalItems;
}
public List<T> getItems() {
return items;
}
public void setItems(List<T> items) {
this.items = items;
}
public long getTotalItems() {
return totalItems;
}
public void setTotalItems(long totalItems) {
this.totalItems = totalItems;
}
}
All this is useful if your database table is correctly setup, pay aptention to the execution plan of the possible query and add the right index.
Hope to give some hint to improve you performance
In the end, remember to your final user that the human eyes can't see more that 10-20 record at once, so it is very useless to have thousand record in a page.
You have used the default load implementation which is used in the showcases of Primefaces. This is not the correct implementation for your case where you load your data from a database.
The load method should use the correct query with consideration of :
1) the filter fields that are used, example:
String query = "select e from Entity e where lower(e.f1) like lower('" + filters.get(key) + "'%) and..., etc. for the other fields
2) the sorting columns that are used, example:
query.append("order by ").append(sortField).append(" ").append(SortOrder.ASCENDING.name() ? "" : sortOrder.substring(0, 4)),..., etc. for the other columns.
3) The total count of your query WITH 1) attached to it. Example:
Long totalCount = (Long) entityManager.createQuery("select count(*) from Entity e where lower(e.f1) like lower('filterKey1%') and lower(e.f2) like lower('filterKey2%') ...").getSingleResult();

Resources