Spring data elasticsearch - Aggregations in new version - spring-data-elasticsearch

We have been using spring-data-elasticsearch for 4.1.13 until recently for querying from elastic search. For grouping something we used aggregations
Consider a index of books. For each book there can be one or multipe authors
To get count of books by author we used TermsAggregationbuilder to get this grouping as shown below
SearchSourceBuilder builder = this.getQuery(filter, false);
String aggregationName = "group_by_author_id";
TermsAggregationBuilder aggregationBuilders =
AggregationBuilders.terms(aggregationName).field("authors");
var query =
new NativeSearchQueryBuilder()
.withQuery(builder.query())
.addAggregation(aggregationBuilders)
.build();
var result = elasticsearchOperations.search(query, EsBook.class, ALIAS_COORDS);
if (!result.hasAggregations()) {
throw new IllegalStateException("No aggregations found after query with aggregations!");
}
Terms groupById = result.getAggregations().get(aggregationName);
var buckets = groupById.getBuckets();
Map<Long, Integer> booksCount = new HashMap<>();
buckets.forEach(
bucket ->
booksCount.put(
bucket.getKeyAsNumber().longValue(), Math.toIntExact(bucket.getDocCount())));
return booksCount ;
We recently upgraded to spring-data-elasticsearch 4.4.2 and saw that there are some breaking changes.
First .addAggregations were replaced by withAggregations
Second unlike before I cant seem to directly get Terms and buckets after querying as
result.getAggregations().get(aggregationName); is no more possible and only other option I see is result.getAggregations().aggregations(). So am wondering if anyone has done the same. The documentation itself is so poor in Elastic search.

First .addAggregations were replaced by withAggregations
addAggregation(AbstractAggregationBuilder<?>) has been deprecated and should be replaced by withAggregations. This is not a breaking change.
The change in the returned value for SearchHits.getAggregations() is documented in the migration guides from 4.2 to 4.3 "Removal of org.elasticsearch classes from the API."
So from 4.3 on result.getAggregations().aggregations() returns the same that previously result.getAggregations() returned.

Related

can not do sorting for using startScroll method with spring-data-elasticsearch

im using spring-data-elasticsearch for developing api with es as backend. and im using startScroll(long scrollTimeInMillis, SearchQuery searchQuery, Class<T> clazz) method for getting results from elasticsearch. but sort is not working.
i set sorting in the searchQuery as follows.
NativeSearchQueryBuilder nativeSearchQueryBuilder = new NativeSearchQueryBuilder().withIndices(<indices>).withTypes(<types>).withSort(<sort>)
and in i added the following for <sort>
new FieldSortBuilder("created_at").unmappedType("date").order(SortOrder.valueOf("ASC"))
i also tried putting sort in pageable like below.
NativeSearchQueryBuilder nativeSearchQueryBuilder = new NativeSearchQueryBuilder().withIndices(<indices>).withTypes(<types>).withPageable(<pageable>)
and in i added the following for <pageable>
Sort sortRequest = Sort.by(Sort.Direction.valueOf('ASC'), "created_at")
PageRequest.of(<pageNumber>, <pageSize>, sortRequest)
both isn't working.
and i start to think maybe scroll does not support sorting.
expected is for the result to show in order of created_at ASC.
but now its just randomly retrieved.
I encounter the same problem using 3.2, and I notice that 4.0 has resolved it

AWS DynamoDB session table keeps growing, can't delete expired sessions

the ASP.NET_SessionState table grows all the time, already at 18GB, not a sign of ever deleting expired sessions.
we have tried to execute DynamoDBSessionStateStore.DeleteExpiredSessions, but it seems to have no effect.
our system is running fine, sessions are created and end-users are not aware of the issue. however, it doesn't make sense the table keeps growing all the time...
we have triple checked permissions/security, everything seems to be in order. we use SDK version 3.1.0. what else remains to be checked?
With your table being over 18 GB, which is quite large (in this context), it does not surprise me that this isn't working after looking at the code for the DeleteExpiredSessions method on GitHub.
Here is the code:
public static void DeleteExpiredSessions(IAmazonDynamoDB dbClient, string tableName)
{
LogInfo("DeleteExpiredSessions");
Table table = Table.LoadTable(dbClient, tableName, DynamoDBEntryConversion.V1);
ScanFilter filter = new ScanFilter();
filter.AddCondition(ATTRIBUTE_EXPIRES, ScanOperator.LessThan, DateTime.Now);
ScanOperationConfig config = new ScanOperationConfig();
config.AttributesToGet = new List<string> { ATTRIBUTE_SESSION_ID };
config.Select = SelectValues.SpecificAttributes;
config.Filter = filter;
DocumentBatchWrite batchWrite = table.CreateBatchWrite();
Search search = table.Scan(config);
do
{
List<Document> page = search.GetNextSet();
foreach (var document in page)
{
batchWrite.AddItemToDelete(document);
}
} while (!search.IsDone);
batchWrite.Execute();
}
The above algorithm is executed in two parts. First it performs a Search (table scan) using a filter is used to identify all expired records. These are then added to a DocumentBatchWrite request that is executed as the second step.
Since your table is so large the table scan step will take a very, very long time to complete before a single record is deleted. Basically, the above algorithm is useful for lazy garbage collection on small tables, but does not scale well for large tables.
The best I can tell is that the execution of this is never actually getting past the table scan and you may be consuming all of the read throughput of your table.
A possible solution for you would be to run a slightly modified version of the above method on your own. You would want to call the the DocumentBatchWrite inside of the do-while loop so that records will start to be deleted before the table scan is concluded.
That would look like:
public static void DeleteExpiredSessions(IAmazonDynamoDB dbClient, string tableName)
{
LogInfo("DeleteExpiredSessions");
Table table = Table.LoadTable(dbClient, tableName, DynamoDBEntryConversion.V1);
ScanFilter filter = new ScanFilter();
filter.AddCondition(ATTRIBUTE_EXPIRES, ScanOperator.LessThan, DateTime.Now);
ScanOperationConfig config = new ScanOperationConfig();
config.AttributesToGet = new List<string> { ATTRIBUTE_SESSION_ID };
config.Select = SelectValues.SpecificAttributes;
config.Filter = filter;
Search search = table.Scan(config);
do
{
// Perform a batch delete for each page returned
DocumentBatchWrite batchWrite = table.CreateBatchWrite();
List<Document> page = search.GetNextSet();
foreach (var document in page)
{
batchWrite.AddItemToDelete(document);
}
batchWrite.Execute();
} while (!search.IsDone);
}
Note: I have not tested the above code, but it is just a simple modification to the open source code so it should work correctly, but would need to be tested to ensure the pagination works correctly on a table whose records are being deleted as it is being scanned.

Searching HP-Trim / Records-Manager using SDK

Using the HP-Trim SDK, how do you search for a document by its reference number?
The alleged documentation refers to methods for straightforward searches:
SelectByPrefix
SelectFavorites
SelectByUserLabel
SelectNone
SelectAll
SelectByUris
SelectTopLevels
SelectThoseWithin
and a generic search:
records.SetSearchString(“createdOn:this week and assignee:me”);
but all I want to do is find a document by its index.
These don't work:
records.SetSearchString("recordNum: <RecordNumber>");
records.SetSearchString("recordNumber: <RecordNumber>");
records.SetSearchString("reference: <RecordNumber>");
Any suggestions?
Are you using the .NET SDK? If so you can grab a record by its record number like so (C# example):
using (Database db = new Database()) {
db.Connect();
Record record = new Record(db, "123456"); // Replace with record number
// Do stuff with record
Console.WriteLine(record.Title);
}
You aren't required to construct a 'formal search' as such.
In case you're curious about the correct string search syntax, this would have worked:
records.SetSearchString("number: <RecordNumber>");
Using the COM SDK;
using (Database db = new Database()) {
db.Connect();
Records records = db.MakeRecords();
records.SelectAll();
records.FilterString = "number:<RecordNumber>";
if (records == null || records.Count.Equals(0))
return;
Record existing = records.Item(0);
}

Adding a relationship to an index in neo4jclient

This might be real newbie question but please bear with me as I am new. I looked at this code sample in documentation.
graphClient
.Cypher
.Start(new {
n1 = "custom",
n2 = nodeRef,
n3 = Node.ByIndexLookup("indexName", "property", "value"),
n4 = Node.ByIndexQuery("indexName", "query"),
r1 = relRef,
moreRels = new[] { relRef, relRef2 },
r2 = Relationship.ByIndexLookup("indexName", "property", "value"),
r3 = Relationship.ByIndexQuery("indexName", "query"),
all = All.Nodes
});
In the example above I would like to get a relationship by IndexLookup. So I created a Relationship Index
_graphClient.CreateIndex("item_relationship_idx", new IndexConfiguration
{
Provider = IndexProvider.lucene,
Type = IndexType.exact
},
IndexFor.Relationship);
Question - How do I get a relationship created by _graphClient.CreateRelationship into an index. Most of the samples provided just show getting NodeReference into an Index. I am sure I am missing something obvious. Any help would be appreciated.
Update to Neo4jClient 1.0.0.568 or above and you'll find the (new) support for relationship indexing, consistent with how node indexing works.
(You should also look at Neo4j 2.0 and try and use the new indexing infrastructure though. No point writing new code against old approaches.)
Update to Neo4jClient 1.0.0.568 or above and you'll find the (new) support for relationship indexing, consistent with how node indexing works.
Does that mean I can use the Create method or do I still need the CreateRelationship method? I have Neo4jClient 1.0.0.590 but I don't find it that obvious.

db4o issue with graph of objects

I am a new to db4o. I have a big problem with persistance of a graph of objects. I am trying to migrate from old persistance component to new, using db4o.
Before I peristed all objects its graph looked like below (Take a look at Zrodlo.Metadane.abstrakt string field with focused value) [its view from eclipse debuger] with a code:
ObjectContainer db=Db4o.openFile(DB_FILE);
try {
db.store(encja);
db.commit();
} finally{
db.close();
}
After that, I tried to read it with a code:
ObjectContainer db=Db4o.openFile((DB_FILE));
try{
Query q = db.query();
q.constrain(EncjaDanych.class);
ObjectSet<Object> objectSet = q.execute();
logger.debug("objectSet.size" + objectSet.size());
EncjaDanych encja = (EncjaDanych) objectSet.get(0);
logger.debug("ENCJA" + encja.toString());
return encja;
}finally{
db.close();
}
and I got it (picture below) - string field "abstrakt" is null now !!!
I take a look at it using ObjectManager (picture below) and abstrakt field has not-null value there!!! The same value, that on the 1st picture.
Please help me :) It is my second day with db4o. Thanks in advance!
I am attaching some code with structure of persisted class:
public class EncjaDanych{
Map mapaIdRepo = new HashMap();
public Map mapaNazwaRepo = new HashMap(); }
!!!!!!!!UPDATED:
When I tried to read only Metadane object (there was only one such a object), it is all right - it's string field abstrakt could be read correctly.
try{
Query q = db.query();
q.constrain(Metadane.class);
ObjectSet<Object> objectSet = q.execute();
logger.error("objectSet.size" + objectSet.size());
Metadane meta = (Metadane) objectSet.get(0);
logger.debu("Metadane" + meta.toString());
return meta;
}finally{
db.close();
}
This is a common db4o FAQ, an issue with what db4o calls "activation". db4o won't instantiate the entire graph you stored when you load an object from an ObjectContainer. By default, objects are instantiated to depth 5. You can change the default configuration to a higher value, but that is not recommended since it will slow down object loading in principle because the depth will be used everywhere you load an object with a query.
Two approaches are possible to solve your issue:
(1) You can activate an object to a desired depth by hand when you need a specific depth.
db.activate(encja, 10) // 10 is arbitrary
(2) You can work with Transparent Activation. There are multiple chapters on how to use Transparent Activation (TA) in the db4o tutorial and in the reference documentation.
You're not setting a filter in your query so you're reading the first object. Are you sure you didn't have a previous object in the database?

Resources