Spring Data Query into Elastic search for exact match - spring-data-elasticsearch

I am learning elasticsearch with spring data so can someone help me understand better what elasticsearch query is doing here. I am trying to return back only a set of results based off of a certain value, in this case env. It seems to me that this JPQL query, is not making a difference to only return what I ask for. I have also used an #Query with no difference.
-- here is part of my repository class
public interface MyFormRepo extends ElasticsearchRepository<MyForm, String> {
//??? these function calls are not effecting my return
#Query("{\"bool\": {\"must\": [{\"match\": {\"env\": \"?0\"}}]}}")
Page<MyForm> getAllByEnv(String env, Pageable pageable);
Page<MyForm> findAllByEnv(String env, Pageable pageable);
-- Here is part of my entity class
#Document(indexName = "my_form")
public class MyForm {
#Id
private String id;
#Field(type = Text)
private String schema;
#Field(type = Long)
private long version;
#Field(type = Text)
private String env;

Here is what I understand. Elasticsearch has this concept called Fuzziness (https://www.elastic.co/guide/en/elasticsearch/reference/current/common-options.html#fuzziness) so in it's searches which is based on Levenshtein distance (https://en.wikipedia.org/wiki/Levenshtein_distance). Spring data does not allow us to modify this out of the box and is considered Fuzziness.AUTO by default. https://docs.spring.io/spring-data/elasticsearch/docs/current/reference/html/#elasticsearch.misc. As for the queries neither of them will do anything different. Both findByAllEnv and getAllBeEnv has a fuziness.AUTO, as for the #Query I found a good reason stated at this site: What is difference between match and bool must match query in Elasticsearch. What I ended up finding is that I must implemented a custom repo for that I have found this example/explaination How to query Elastic with Spring-data-elastic.

Related

What's the meaning of 'acceptPartial' and 'backdoorToggles' in Kylin's SQLRequest?

I am studying Kylin, and found when kylin execute a query, there are two param
"acceptPartial" and 'backdoorToggles'. What's the meaning?
public class SQLRequest implements Serializable {
protected static final long serialVersionUID = 1L;
private String sql;
private String project;
private Integer offset = 0;
private Integer limit = 0;
private boolean acceptPartial = false;
private Map<String, String> backdoorToggles;
I've searched a lot and didn't find the answer.
Accept partial is a hint to query engine that partial query result is acceptable and engine should return as quick as possible rather than get the full correct result back. This is useful when user just want to try out a query, get a few sample lines back and does not need the actual result.
Backdoor toggles are bunch of parameters for debug and troubleshooting purpose. You can come back to them once you are familiar with the basics of Kylin.

Security Context in terms of QueryDslPredicateExecutor and Spring Data Rest

I'm building REST API on the top of Spring Data Rest. Initially all repositories where extending JpaRepository. Lately decision has been made to take a more flexible approach and use QueryDslPredicateExecutor<T> along with QuerydslBinderCustomizer<Q>.
Pretty much all findAll methods exposed in repositories should address two scenarios
principal has a role ROLE_ADMIN then no filtering should be applied a part from Pageable,Sort
principal does not have a role ROLE_ADMIN I would return only those entities which belong to the current user
Getting that done was as simple as annotating findAll method as below.
#Query("select e from Entity e where e.field = ?#{principal} or 1=?#{hasRole('ROLE_ADMIN') ? 1 : 0}")
Page<Entity> findAll(Pageable pageable);
Now I want our findAll to be something similar to below
Page<Entity> findAll(Predicate predicate, Pageable pageable)
Predicate is being build from request parameters(courtesy of #QuerydslPredicate) and is being passed in to RepositoryEntityController which is all being managed by spring-data-rest which is great.
#ResponseBody
#RequestMapping(value = BASE_MAPPING, method = RequestMethod.GET)
public Resources<?> getCollectionResource(#QuerydslPredicate RootResourceInformation resourceInformation,
DefaultedPageable pageable, Sort sort, PersistentEntityResourceAssembler assembler)
throws ResourceNotFoundException, HttpRequestMethodNotSupportedException {
I want to tweak that predicate(2 scenarios as above that I want to address).
It would be something simialr to below.
BooleanBuilder builder = new BooleanBuilder(predicateBuildFromHttpRequest);
builder.and(predicateAddressingOurRequirements);
builder.getValue();
#PostFilter won't be an option as return type for all repos is Page<Entity>.
Use case that I want to address seems to be quite common to me. Having said that I had a look at spring-data and spring-data-rest documentation and could not find anything related to my question.
Question is : Am I missing something obvious here and there is a quick win for it? or I would need to implement custom solution myself? Any comments very much appreciated!
The Querydsl predicates are constructed by QuerydslAwareRootResourceInformationHandlerMethodArgumentResolver which is sadly package private and can't be directly extended.
However, you can make a copy of that, add your security predicate logic and then drop in your implementation instead of the former resolver.
public class MyQueryDslRootResourceArgumentResolver extends RootResourceInformationHandlerMethodArgumentResolver {
// the most of the code is ommitted, the content is identical with
// QuerydslAwareRootResourceInformationHandlerMethodArgumentResolver,
// the important part is postProcessMethod where you can modify the predicate
#Override
#SuppressWarnings({"unchecked"})
protected RepositoryInvoker postProcess(MethodParameter parameter, RepositoryInvoker invoker,
Class<?> domainType, Map<String, String[]> parameters) {
Object repository = repositories.getRepositoryFor(domainType);
if (!QueryDslPredicateExecutor.class.isInstance(repository)
|| !parameter.hasParameterAnnotation(QuerydslPredicate.class)) {
return invoker;
}
ClassTypeInformation<?> type = ClassTypeInformation.from(domainType);
QuerydslBindings bindings = factory.createBindingsFor(null, type);
// modify your predicate here
Predicate predicate = predicateBuilder.getPredicate(type, toMultiValueMap(parameters), bindings);
return new QuerydslRepositoryInvokerAdapter(invoker, (QueryDslPredicateExecutor<Object>) repository, predicate);
}
}
Then add you own configuration class with the custom resolver implementation.
public class CustomRepositoryRestMvcConfiguration extends RepositoryRestMvcConfiguration {
#Autowired
ApplicationContext applicationContext;
#Override
public RootResourceInformationHandlerMethodArgumentResolver repoRequestArgumentResolver() {
QuerydslBindingsFactory factory = applicationContext.getBean(QuerydslBindingsFactory.class);
QuerydslPredicateBuilder predicateBuilder = new QuerydslPredicateBuilder(defaultConversionService(),
factory.getEntityPathResolver());
return new MyQueryDslRootResourceArgumentResolver(repositories(),
repositoryInvokerFactory(defaultConversionService()), resourceMetadataHandlerMethodArgumentResolver(),
predicateBuilder, factory);
}
}
Here is an example project that modifies the Predicate (that is produced by the parameters from url) before passing it to the repository.
The demonstration of what David Siro explained above
https://github.com/yeldarxman/QueryDslPredicateModifier

Dataflow output parameterized type to avro file

I have a pipeline that successfully outputs an Avro file as follows:
#DefaultCoder(AvroCoder.class)
class MyOutput_T_S {
T foo;
S bar;
Boolean baz;
public MyOutput_T_S() {}
}
#DefaultCoder(AvroCoder.class)
class T {
String id;
public T() {}
}
#DefaultCoder(AvroCoder.class)
class S {
String id;
public S() {}
}
...
PCollection<MyOutput_T_S> output = input.apply(myTransform);
output.apply(AvroIO.Write.to("/out").withSchema(MyOutput_T_S.class));
How can I reproduce this exact behavior except with a parameterized output MyOutput<T, S> (where T and S are both Avro code-able using reflection).
The main issue is that Avro reflection doesn't work for parameterized types. So based on these responses:
Setting Custom Coders & Handling Parameterized types
Using Avrocoder for Custom Types with Generics
1) I think I need to write a custom CoderFactory but, I am having difficulty figuring out exactly how this works (I'm having trouble finding examples). Oddly enough, a completely naive coder factory appears to let me run the pipeline and inspect proper output using DataflowAssert:
cr.RegisterCoder(MyOutput.class, new CoderFactory() {
#Override
public Coder<?> create(List<? excents Coder<?>> componentCoders) {
Schema schema = new Schema.Parser().parse("{\"type\":\"record\,"
+ "\"name\":\"MyOutput\","
+ "\"namespace\":\"mypackage"\","
+ "\"fields\":[]}"
return AvroCoder.of(MyOutput.class, schema);
}
#Override
public List<Object> getInstanceComponents(Object value) {
MyOutput<Object, Object> myOutput = (MyOutput<Object, Object>) value;
List components = new ArrayList();
return components;
}
While I can successfully assert against the output now, I expect this will not cut it for writing to a file. I haven't figured out how I'm supposed to use the provided componentCoders to generate the correct schema and if I try to just shove the schema of T or S into fields I get:
java.lang.IllegalArgumentException: Unable to get field id from class null
2) Assuming I figure out how to encode MyOutput. What do I pass to AvroIO.Write.withSchema? If I pass either MyOutput.class or the schema I get type mismatch errors.
I think there are two questions (correct me if I am wrong):
How do I enable the coder registry to provide coders for various parameterizations of MyOutput<T, S>?
How do I values of MyOutput<T, S> to a file using AvroIO.Write.
The first question is to be solved by registering a CoderFactory as in the linked question you found.
Your naive coder is probably allowing you to run the pipeline without issues because serialization is being optimized away. Certainly an Avro schema with no fields will result in those fields being dropped in a serialization+deserialization round trip.
But assuming you fill in the schema with the fields, your approach to CoderFactory#create looks right. I don't know the exact cause of the message java.lang.IllegalArgumentException: Unable to get field id from class null, but the call to AvroCoder.of(MyOutput.class, schema) should work, for an appropriately assembled schema. If there is an issue with this, more details (such as the rest of the stack track) would be helpful.
However, your override of CoderFactory#getInstanceComponents should return a list of values, one per type parameter of MyOutput. Like so:
#Override
public List<Object> getInstanceComponents(Object value) {
MyOutput<Object, Object> myOutput = (MyOutput<Object, Object>) value;
return ImmutableList.of(myOutput.foo, myOutput.bar);
}
The second question can be answered using some of the same support code as the first, but otherwise is independent. AvroIO.Write.withSchema always explicitly uses the provided schema. It does use AvroCoder under the hood, but this is actually an implementation detail. Providing a compatible schema is all that is necessary - such a schema will have to be composed for each value of T and S for which you want to output MyOutput<T, S>.

Is there support for routing on delete in repository or query annotation?

I'm trying to get working deletion of one document in spring data elasticsearch repository. And can't find way how to solve this error:
[userindex] RoutingMissingException[routing is required for
[userindex]/[address]/[12]
I have two linked documents:
#Document(indexName = "userindex", type = "user")
public class User {
#Field(index = FieldIndex.not_analyzed, type = FieldType.Long)
private Long userId;
...
}
#Document(indexName = "userindex", type = "address")
public class Address {
#Field(type = FieldType.String)
private String name;
#Field(index = FieldIndex.not_analyzed, type = FieldType.String)
private String addressId;
#Field(type = FieldType.String, store = true)
#Parent(type = "user")
private String parentId;
...
}
When I'm trying to delete one address via ElasticsearchCrudRepository<Address, Long> by using standard method delete(Long id) I receiving RoutingMissingException mentioned above.
If I'm trying to do it using ElasticSeach client, like this:
client.prepareDelete().setIndex("userindex")
.setType("address")
.setParent("user")
.setId(id.toString())
.execute().get();
everything works fine, but seems to me working directly with client is not the spring-data way.
Also I can't find any way how to customize delete method with annotation org.springframework.data.elasticsearch.annotations.Query.
I checked sources of org.springframework.data.elasticsearch.core.ElasticsearchTemplate and can't find any way how to add support for delete query.
Anybody knows how to solve it instead of using a client?
The version of spring-data-elasticsearch is 2.0.1
Update 03.05.2017
First of all, in my code was an error with my deletion, don't how it worked before, but it should be:
client.prepareDelete().setIndex("userindex")
.setType("address")
.setParent("500")
.setId(id.toString())
.execute().get();
Here 500 is parent id instead of type name.
And now about the spring-data way. There is no spring-data way in elasticsearch integration.
Proof:
DATAES-257
DATAES-331
If you want to do it the Spring way, you can use ElasticsearchTemplate which is much similar to RestTemplate.
ElasticsearchTemplate has deleteIndex() function which can delete the whole index. Also, you can do lots of other stuff with the template.
Example project with delete index is here: https://github.com/TechPrimers/spring-data-elastic-example-4/blob/master/src/main/java/com/techprimers/elastic/resource/SearchResource.java
Code:
#Autowired
ElasticsearchTemplate template;
template.deleteIndex(Users.class);

Spring Data Neo4j neo4jTemplate.fetch() only returns one value

I'm converting a working system that uses #Fetch to a lazy load strategy. However, when I retrieve the object with a container, the container only has one entry and neo4jTemplate.fetch(obj.getContainer()) does not retrieve the other entries.
Here are the pertinent snippets
#NodeEntity
public class SourcePage {
#GraphId
private Long id;
#RelatedTo(type = "GROUP_MEMBER")
private Group group;
Group Class:
#NodeEntity
public class Group {
#GraphId
private Long id;
#RelatedTo(type = "GROUP_MEMBER", direction = Direction.INCOMING)
private Set<SourcePage> sourcePages = new HashSet<>();
Test Code:
Group group1 = groupRepository.findByName("Test Group");
neo4jTemplate.fetch(group1.getSourcePages());
assertThat(group1.getSourcePages().size(), is(254));
The Result:
java.lang.AssertionError:
Expected: is <254>
but: was <1>
If I do nothing but add #Fetch to private Group group, then it all works as expected.
Also, I looked at the database server with only this test example and ran this query:
MATCH (a)-[:`GROUP_MEMBER`]->(b) RETURN count(b)
It returned 254 as expected. I have also tried direction.BOTH on each side of the relationship - same results.
I found the problem. It's esoteric and plausible enough that it might help someone else seeing the same symptom.
First, I didn't show that I had my own hashCode() for SourcePage. It hashed a field defined as:
#NotEmpty
#Indexed
private String url;
Without #Fetch the 'url' is not retrieved automatically, so all the SourcePages in the container have the same hash code. This resulted in 100% collisions and only one entry added to the set.
If I removed the hashCode() then the default hash worked and all the objects were added to the set.
After several hours of debugging, I posted my question. Of course right after that I stumbled on the solution.
Moral of the story: don't provide a hash function on member data without the ID.

Resources