Aggregation Support for Spring Data Elastic Search - spring-data-elasticsearch

Elastic search has deprecated Facets and recommend to use Aggregations (http://www.elastic.co/guide/en/elasticsearch/reference/1.x/search-aggregations.html) .
Is Spring Data Elastic Search supports this currently ?
If Yes, is there any Samples available ?

Yes aggregation is supported.
Example :
#Test
public void shouldReturnAggregatedResponseForGivenSearchQuery() {
// given
IndexQuery article1 = new ArticleEntityBuilder("1").title("article four").subject("computing").addAuthor(RIZWAN_IDREES).addAuthor(ARTUR_KONCZAK).addAuthor(MOHSIN_HUSEN).addAuthor(JONATHAN_YAN).score(10).buildIndex();
IndexQuery article2 = new ArticleEntityBuilder("2").title("article three").subject("computing").addAuthor(RIZWAN_IDREES).addAuthor(ARTUR_KONCZAK).addAuthor(MOHSIN_HUSEN).addPublishedYear(YEAR_2000).score(20).buildIndex();
IndexQuery article3 = new ArticleEntityBuilder("3").title("article two").subject("computing").addAuthor(RIZWAN_IDREES).addAuthor(ARTUR_KONCZAK).addPublishedYear(YEAR_2001).addPublishedYear(YEAR_2000).score(30).buildIndex();
IndexQuery article4 = new ArticleEntityBuilder("4").title("article one").subject("accounting").addAuthor(RIZWAN_IDREES).addPublishedYear(YEAR_2002).addPublishedYear(YEAR_2001).addPublishedYear(YEAR_2000).score(40).buildIndex();
elasticsearchTemplate.index(article1);
elasticsearchTemplate.index(article2);
elasticsearchTemplate.index(article3);
elasticsearchTemplate.index(article4);
elasticsearchTemplate.refresh(ArticleEntity.class, true);
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(matchAllQuery())
.withSearchType(COUNT)
.withIndices("articles").withTypes("article")
.addAggregation(terms("subjects").field("subject"))
.build();
// when
Aggregations aggregations = elasticsearchTemplate.query(searchQuery, new ResultsExtractor<Aggregations>() {
#Override
public Aggregations extract(SearchResponse response) {
return response.getAggregations();
}
});
// then
assertThat(aggregations, is(notNullValue()));
assertThat(aggregations.asMap().get("subjects"), is(notNullValue()));
}
code is copied from ElasticsearchTemplateAggregationTests.java

To prevent a second call to elasticsearch you can extract the serachResult directly.
elasticsearchTemplate.query(query.build(), new ResultsExtractor<Object>() {
#Override
public Object extract(SearchResponse searchResponse) {
Aggregations aggregations = searchResponse.getAggregations();
List<AnyClass> ta = new DefaultResultMapper().mapResults(searchResponse, AnyClass.class, new PageRequest(page != null ? page : 0, 15)).getContent();
return ta;
}
});

Related

How do I write a custom sorter to sort my springdoc swagger tags by name in the UI?

I am using springdoc-openapi with the latest version (1.3.0). Now I would like sort my tags in the UI by "name" property.
I know about the "springdoc.swagger-ui.tagsSorter" configuration and that I can use a custom sorter function. But I cannot find examples how the function should look like.
I tried the following which does not seem to work:
springdoc.swagger-ui.tagsSorter=(a, b) => a.get("name").localeCompare(b.get("name"))
By default, you can sort tags alphabetically:
https://springdoc.org/faq.html#how-can-i-sort-endpoints-alphabetically
You can have control on the tags order, using OpenApiCustomiser and define your own Comparator:
#Bean
public OpenApiCustomiser sortTagsAlphabetically() {
return openApi -> openApi.setTags(openApi.getTags()
.stream()
.sorted(Comparator.comparing(tag -> StringUtils.stripAccents(tag.getName())))
.collect(Collectors.toList()));
}
With reference from #brianbro's answer, as suggested at https://springdoc.org/faq.html#how-can-i-sort-endpoints-alphabetically
I added
#Tag(name="1. Admin endpoints")
#Tag(name = "2. Everyone's enpoints!")
and below prop to application.yml :
springdoc.swagger-ui.tagsSorter=alpha
And can see them sorted according to numbering on my swagger UI.
For sorting schemas , paths and tags in OpenApi.
#Bean
public OpenApiCustomiser openApiCustomiser() {
return openApi -> {
Map<String, Schema> schemas = openApi.getComponents().getSchemas();
openApi.getComponents().setSchemas(new TreeMap<>(schemas));
};
}
#Bean
public OpenApiCustomiser sortPathsAndTagsAlphabetically() {
return openApi -> {
Map<String, PathItem> paths = openApi.getPaths();
Paths sortedPaths = new Paths();
TreeMap<String, PathItem> sortedTree = new TreeMap<String, PathItem>(paths);
Set<Map.Entry<String, PathItem>> pathItems = sortedTree.entrySet();
Map<String, Map.Entry<String, PathItem>> distinctTagMap = new TreeMap<String, Map.Entry<String, PathItem>>();
for ( Map.Entry<String, PathItem> entry:pathItems) {
PathItem pathItem = entry.getValue();
Operation getOp = pathItem.getGet();
if(getOp != null) {
String tag = getOp.getTags().get(0);
if (!distinctTagMap.containsKey(tag)) {
distinctTagMap.put(tag, entry);
}
}
Operation postOp = pathItem.getPost();
if(postOp != null){
String tag1 = postOp.getTags().get(0);
if(!distinctTagMap.containsKey(tag1)){
distinctTagMap.put(tag1,entry);
}
}
Operation putOp = pathItem.getPut();
if(putOp != null) {
String tag2 = putOp.getTags().get(0);
if (!distinctTagMap.containsKey(tag2)) {
distinctTagMap.put(tag2, entry);
}
}
}
LinkedHashMap<String, PathItem> customOrderMap = new LinkedHashMap<String, PathItem>();
for (Map.Entry<String, PathItem> entry: distinctTagMap.values()) {
customOrderMap.put(entry.getKey(), entry.getValue());
}
for(Map.Entry<String, PathItem> entry : sortedTree.entrySet()) {
customOrderMap.putIfAbsent(entry.getKey(), entry.getValue());
}
sortedPaths.putAll(customOrderMap);
openApi.setPaths(sortedPaths);
};
}

ASP.NET Core MVC 2.2 Batch Requests Middleware

I would like a simple middleware that I can use to combine multiple requests into one request and return the result as a single array response.
I don't want to use OData because its too heavy, plus I don't like it.
I have no idea how I can split one HttpContext into multiple small internal sub HttpContext's.
This is my attempt:
public static IApplicationBuilder BatchRequest(this IApplicationBuilder app)
{
app.Map("/api/batch", builder =>
{
builder.Use(async (context, next) =>
{
string[] paths = (context.Request.Query.Get("path") as StringValues?) ?? new string[] { };
Stream originalBody = context.Response.Body;
RecyclableMemoryStreamManager _recyclableMemoryStreamManager = new RecyclableMemoryStreamManager();
IEnumerable<string> responses = await paths.SelectAsync(async path =>
{
context.Request.Path = path;
MemoryStream newResponseBody = _recyclableMemoryStreamManager.GetStream();
context.Response.Body = newResponseBody;
await next.Invoke();
return RequestResponseLoggingMiddleware.ReadStreamInChunks(newResponseBody);
});
await context.Response.WriteAsync(responses.Serialize());
});
});
return app;
}
There is this example, but i am not yet sure how to use it: https://github.com/Tornhoof/HttpBatchHandler
Please be kind.

How to stop current job and start new job use quartz.net in asp.net mvc?

I've made a crawler on a website it has to read a website and fetch some values from it website.I've made use quartz.net and Asp.net MVC. but what is my problem? in fact,My problem is that for example,he/she the first time start for scraping a "Stackoverflow.com" about 5 hours and then he/she is decided stop "stackoverflow.com" and start a scrap new website.So,How can i do it?
[HttpPost]
public ActionResult Index(string keyword, string url)
{
IScheduler scheduler = StdSchedulerFactory.GetDefaultScheduler();
scheduler.Start();
IJobDetail job = JobBuilder.Create<ScrapJob>()
.WithIdentity("MyScrapJob")
.UsingJobData("url", url)
.UsingJobData("keyword", keyword)
.Build();
ITrigger trigger = TriggerBuilder.Create().WithDailyTimeIntervalSchedule(
s => s.WithIntervalInSeconds(20).OnEveryDay().StartingDailyAt(TimeOfDay.HourAndMinuteOfDay(0, 0))
).Build();
scheduler.ScheduleJob(job, trigger);
return View(db.Scraps.ToList());
}
public List<ScrapJob> Scraping(string url, string keyword)
{
int count = 0;
List<ScrapJob> scraps = new List<ScrapJob>();
ScrapJob scrap = null;
HtmlDocument doc = new HtmlDocument();
try
{
var request = (HttpWebRequest)WebRequest.Create(url);
request.Method = "GET";
using (var response = (HttpWebResponse)request.GetResponse())
{
using (var stream = response.GetResponseStream())
{
doc.Load(stream, Encoding.GetEncoding("UTF-8"));
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//text()"))
{
if (node.InnerText.ToString().Contains(keyword))
{
count++;
scrap = new ScrapJob { Keyword = keyword, DateTime = System.DateTime.Now.ToString(), Count = count, Url = url };
}
}
}
}
}
catch (WebException ex)
{
Console.WriteLine(ex.Message);
}
// scraps.Add(scrap);
var isExist = db.Scraps.Where(s => s.Keyword == keyword && s.Count == scrap.Count).Max(s => (int?)s.Id) ?? 0;
if (isExist == 0)
{
db.Scraps.Add(scrap);
db.SaveChanges();
}
return scraps;
}
public void Execute(IJobExecutionContext context)
{
//ScrapJob scraps = null;
using (var scrap = new ScrapJob())
{
JobKey key = context.JobDetail.Key;
JobDataMap dataMap = context.JobDetail.JobDataMap;
string url = dataMap.GetString("url");
string keyword = dataMap.GetString("keyword");
scrap.Scraping(url, keyword);
}
}
I'm not sure why you picked QUARTZ, but here is something that I think will help you.
This is a code sample that interrupt and delete job by unique identifier
public void DeleteJob(JobKey jobKey)
{
var scheduler = StdSchedulerFactory.GetDefaultScheduler();
var executingJobs = scheduler.GetCurrentlyExecutingJobs();
if (executingJobs.Any(x => x.JobDetail.Key.Equals(jobKey)))
{
scheduler.Interrupt(jobKey);
}
scheduler.DeleteJob(jobKey);
}
But I believe you need to define what behavior you expect, because it can be a bit more complex for example:
If you like to just pause the job and resume it after finish with the other website /persist some state and progress/ or just log the progress
If you want them to run in parallel and process multiple sites simultaneously. (You just need to give different names instead of the hardcoded .WithIdentity("MyScrapJob") )
Also with scheduler.GetCurrentlyExecutingJobs() you can get the currently executing jobs, show them to the user and let him decide what to do.
Also looking at your action method I'm not sure whether this is the behavior you expect of that trigger. Also what bothers me is db.Scraps.ToList() you will materialize the whole table you can consider adding pagination as well in your case is not necessary because you will only show count but its mandatory if you have a lot of records in the grid.
About the scraping method
Instead of
var isExist = db.Scraps.Where(s => s.Keyword == keyword && s.Count == scrap.Count).Max(s => (int?)s.Id) ?? 0;
you can use .Any
var exists = db.Scraps.Any(s => s.Keyword == keyword && s.Count == scrap.Count);
this will return boolean and you can check if(!exists)
You can check https://github.com/AngleSharp/AngleSharp it's high performance web parsing library. Super easy to use as well.
I see possibility of duplicated records by keyword if you check them by keyword and count - not sure whether you want this or just want to update the existing record with it's counter
Good luck! I hope this answer helps you :)

Facing Critical Performance issue in Primefaces 4 & 5

I am working on a project which deal with heavy data sets. I am using Primefaces 4 & 5, spring and hibernate. I have to to display a very huge datasets such as min 3000 rows with 100 columns with various features such as sorting, filtering, row-expansion etc. My problem is, my applications took 8 to 10 mins to show the whole page as well as other functionalities(sorting, filtering ) also takes a lot time. My client is not happy at all. However I can use pagination for this but again My client do not want paging. So I decided to use livescroll but unfortunately I failed to implement livescroll with lazyload or without lazyload as there were bugs in PF regarding livescroll. also i have posted this question here earlier but no solution found.
This performance issue is very critical and show stopper for me. To show 3000 rows with 100 columns, the size of the page which is getting loaded is ~10MB.
I have calculated the time consumed by various life-cycles of of JSF, using Phase-listener I figure out that its Browser who is taking time to parse the response rendered by jsf. To complete the all phases my application took only 25 sec.
At minimal I want to increase the performance of my project. Please share any idea, suggestion and anything which could help to overcome this problem
Note: There is no database manipulations in getters and setters as well as no complex business logic.
UPDATE :
This is my datatable without lazyload:
<p:dataTable
style="width:100%"
id="cdTable"
selection="#{controller.selectedArray}"
resizableColumns="true"
draggableColumns="true"
var="cd"
value="#{controller.cdDataModel}"
editable="true"
editMode="cell"
selectionMode="multiple"
rowSelectMode="add"
scrollable="true"
scrollHeight="650"
rowKey="#{cd.id}"
rowIndexVar="rowIndex"
styleClass="screenScrollStyle"
liveScroll="true"
scrollRows="50"
filterEvent="enter"
widgetVar="dt4"
>
Here everything is working except filtering. Once I filter then first page is displayed but unable to sort or livescroll on datatable. Note this I have tested in Primefaces5.
2nd Approch
With lazyload with same datatable
1) When I add rows="100" livescroll happens but problem with row-editing, row-expansion but filter & sorting works.
2) When I remove rows livescroll works with row-editing, row-expansion etc but filter & sorting dont work.
My LazyLoadModel is as follows
public class MyDataModel extends LazyDataModel<YData>
{
#Override
public List<YData> load(int first, int pageSize,
List<SortMeta> multiSortMeta, Map<String, Object> filters) {
System.out.println("multisort wala load");
return super.load(first, pageSize, multiSortMeta, filters);
}
/**
*
*/
private static final long serialVersionUID = 1L;
private List<YData> datasource;
public YieldRecBondDataModel() {
}
public YieldRecBondDataModel(List<YData> datasource) {
this.datasource = datasource;
}
#Override
public YData getRowData(String rowKey) {
// In a real app, a more efficient way like a query by rowKey should be
// implemented to deal with huge data
// List<YData> yList = (List<YData>) getWrappedData();
for (YData y : datasource)
{
System.out.println("datasource :"+datasource.size());
if(y.getId()!=null)
{
if (y.getId()==(new Long(rowKey)))
{
return y;
}
}
}
return null;
}
#Override
public Object getRowKey(YData y) {
return y.getId();
}
#Override
public void setRowIndex(int rowIndex) {
/*
* The following is in ancestor (LazyDataModel):
* this.rowIndex = rowIndex == -1 ? rowIndex : (rowIndex % pageSize);
*/
if (rowIndex == -1 || getPageSize() == 0) {
super.setRowIndex(-1);
}
else
super.setRowIndex(rowIndex % getPageSize());
}
#Override
public List<YData> load(int first, int pageSize, String sortField, SortOrder sortOrder, Map<String,Object> filters) {
List<YData> data = new ArrayList<YData>();
System.out.println("sort order : "+sortOrder);
//filter
for(YData yInfo : datasource) {
boolean match = true;
for(Iterator<String> it = filters.keySet().iterator(); it.hasNext();) {
try {
String filterProperty = it.next();
String filterValue = String.valueOf(filters.get(filterProperty));
Field yField = yInfo.getClass().getDeclaredField(filterProperty);
yField.setAccessible(true);
String fieldValue = String.valueOf(yField.get(yInfo));
if(filterValue == null || fieldValue.startsWith(filterValue)) {
match = true;
}
else {
match = false;
break;
}
} catch(Exception e) {
e.printStackTrace();
match = false;
}
}
if(match) {
data.add(yInfo);
}
}
//sort
if(sortField != null) {
Collections.sort(data, new LazySorter(sortField, sortOrder));
}
int dataSize = data.size();
this.setRowCount(dataSize);
//paginate
if(dataSize > pageSize) {
try {
List<YData> subList = data.subList(first, first + pageSize);
return subList;
}
catch(IndexOutOfBoundsException e) {
return data.subList(first, first + (dataSize % pageSize));
}
}
else
return data;
}
#Override
public int getRowCount() {
// TODO Auto-generated method stub
return super.getRowCount();
}
}
I am fade up with these issues and becomes show stopper for me. Even i tried Primefaces 5
If your data is loaded from db i suggest you to do a better LazyDataModel like:
public class ElementiLazyDataModel extends LazyDataModel<T> implements Serializable {
private Service<T> abstractFacade;
public ElementiLazyDataModel(Service<T> abstractFacade) {
this.abstractFacade = abstractFacade;
}
public Service<T> getAbstractFacade() {
return abstractFacade;
}
public void setAbstractFacade(Service<T> abstractFacade) {
this.abstractFacade = abstractFacade;
}
#Override
public List<T> load(int first, int pageSize, String sortField, SortOrder sortOrder, Map<String, Object> filters) {
PaginatedResult<T> pr = abstractFacade.findRange(new int[]{first, first + pageSize}, sortField, sortOrder, filters);
setRowCount(new Long(pr.getTotalItems()).intValue());
return pr.getItems();
}
}
The service is some kind of backend communication (like an EJB) injected in the ManagedBean that use this model.
The service for pagination may be like this:
#Override
public PaginatedResult<T> findRange(int[] range, String sortField, SortOrder sortOrder, Map<String, Object> filters) {
final Query query = getEntityManager().createQuery("select x from " + entityClass.getSimpleName() + " x")
.setFirstResult(range[0]).setMaxResults(range[1] - range[0] + 1);
// Add filter sort etc.
final Query queryCount = getEntityManager().createQuery("select count(x) from " + entityClass.getSimpleName() + " x");
// Add filter sort etc.
Long rowCount = (Long) queryCount.getSingleResult();
List<T> resultList = query.getResultList();
return new PaginatedResult<T>(resultList, rowCount);
}
Note that you have to do the paginated query (with jpa like this the orm do the query for you, but if you don't use orm have to do paginated query, for oracle look at TOP-N query, for example: http://oracle-base.com/articles/misc/top-n-queries.php)
Remember your return obj must be contains also the total record as a fast count:
public class PaginatedResult<T> implements Serializable {
private List<T> items;
private long totalItems;
public PaginatedResult() {
}
public PaginatedResult(List<T> items, long totalItems) {
this.items = items;
this.totalItems = totalItems;
}
public List<T> getItems() {
return items;
}
public void setItems(List<T> items) {
this.items = items;
}
public long getTotalItems() {
return totalItems;
}
public void setTotalItems(long totalItems) {
this.totalItems = totalItems;
}
}
All this is useful if your database table is correctly setup, pay aptention to the execution plan of the possible query and add the right index.
Hope to give some hint to improve you performance
In the end, remember to your final user that the human eyes can't see more that 10-20 record at once, so it is very useless to have thousand record in a page.
You have used the default load implementation which is used in the showcases of Primefaces. This is not the correct implementation for your case where you load your data from a database.
The load method should use the correct query with consideration of :
1) the filter fields that are used, example:
String query = "select e from Entity e where lower(e.f1) like lower('" + filters.get(key) + "'%) and..., etc. for the other fields
2) the sorting columns that are used, example:
query.append("order by ").append(sortField).append(" ").append(SortOrder.ASCENDING.name() ? "" : sortOrder.substring(0, 4)),..., etc. for the other columns.
3) The total count of your query WITH 1) attached to it. Example:
Long totalCount = (Long) entityManager.createQuery("select count(*) from Entity e where lower(e.f1) like lower('filterKey1%') and lower(e.f2) like lower('filterKey2%') ...").getSingleResult();

Batch cypher queries generated by RestCypherQueryEngine

I am trying to batch together a few cypher queries with the REST API (using the java bindings library) so that only one call is made over the wire. But it seems to not respect the batching on the client side and gives this error:
java.lang.RuntimeException: Error reading as JSON ''
at org.neo4j.rest.graphdb.util.JsonHelper.readJson(JsonHelper.java:57)
at org.neo4j.rest.graphdb.util.JsonHelper.jsonToSingleValue(JsonHelper.java:62)
at org.neo4j.rest.graphdb.RequestResult.toEntity(RequestResult.java:114)
at org.neo4j.rest.graphdb.RequestResult.toMap(RequestResult.java:123)
at org.neo4j.rest.graphdb.batch.RecordingRestRequest.toMap(RecordingRestRequest.java:138)
at org.neo4j.rest.graphdb.ExecutingRestAPI.query(ExecutingRestAPI.java:489)
at org.neo4j.rest.graphdb.ExecutingRestAPI.query(ExecutingRestAPI.java:509)
at org.neo4j.rest.graphdb.RestAPIFacade.query(RestAPIFacade.java:233)
at org.neo4j.rest.graphdb.query.RestCypherQueryEngine.query(RestCypherQueryEngine.java:50)
...
Caused by: java.io.EOFException: No content to map to Object due to end of input
at org.codehaus.jackson.map.ObjectMapper._initForReading(ObjectMapper.java:2766)
at org.codehaus.jackson.map.ObjectMapper._readMapAndClose(ObjectMapper.java:2709)
at org.codehaus.jackson.map.ObjectMapper.readValue(ObjectMapper.java:1854)
at org.neo4j.rest.graphdb.util.JsonHelper.readJson(JsonHelper.java:55)
... 41 more
This is how I am trying to batch them:
graphDatabaseService.getRestAPI().executeBatch(new BatchCallback<Void>() {
#Override
public Void recordBatch(RestAPI batchRestApi) {
String query = "CREATE accounts=({userId:{userId}})-[r:OWNS]->({facebookId:{facebookId}})";
graphDatabaseService.getQueryEngine().query(query, map("userId", 1, "facebookId", "1"));
graphDatabaseService.getQueryEngine().query(query, map("userId", 2, "facebookId", "2"));
graphDatabaseService.getQueryEngine().query(query, map("userId", 3, "facebookId", "3"));
return null;
}
});
I am using noe4j version 1.9 and the corresponding client library. Should this be possible?
Here is a JUnit sample code that works for your batch. Here no string template is used but native methods on the RestAPI object:
public static final DynamicRelationshipType OWNS = DynamicRelationshipType.withName("OWNS");
#Autowired
private SpringRestGraphDatabase graphDatabaseService;
#Test
public void batchTest()
{
Assert.assertNotNull(this.graphDatabaseService);
this.graphDatabaseService.getRestAPI().executeBatch(new BatchCallback<Void>()
{
#Override
public Void recordBatch(RestAPI batchRestApi)
{
for (int counter = 1; counter <= 3; counter++)
{
RestNode userId = batchRestApi.createNode(map("userId", Integer.valueOf(counter)));
RestNode facebookId = batchRestApi.createNode(map("facebookId", Integer.valueOf(counter).toString()));
batchRestApi.createRelationship(userId, facebookId, OWNS, map());
}
return null;
}
});
}

Resources