How to find document with SOLR query and exact string match - datastax-enterprise

Considering a simple table:
CREATE TABLE transactions (
enterprise_id uuid,
transaction_id text,
state text,
PRIMARY KEY ((enterprise_id, transaction_id))
and Solr core with default, auto-generated parameters.
How do I construct a Solr query that will find me record(s) in this table that have state value exact match to an input, considering the state can be arbitrary string?
I tried this with state value of a+b. This works fine with q=state:"a+b", but that creates a "phrase query":
"rawquerystring": "state:\"a+b\"",
"querystring": "state:\"a+b\"",
"parsedquery": "PhraseQuery(state:\"a b\")",
"parsedquery_toString": "state:\"a b\"",
So, the same record is found if I use query like q=state:"a(b", which results into the same phrased query and finds the record with state of a+b. That is unacceptable to me, because I need an exact match.
I went through https://cwiki.apache.org/confluence/display/solr/Other+Parsers, and tried using q={!term f=state}a+b or q={!raw f=state}a+b, but neither even finds my sample transaction record.

Probably you got state generated as a TextField where standard tokenization is applied StandardTokenizer and then a split is made on + and the plus sign itself is discarded. You could use a different tokenizer (whitespace?) or just make state an StrField for an exact match.
This works for me with state as an StrField:
select * from transactions where solr_query='state:a+b';

Related

How to filter out non-null path between nodes in Neo4J/Cypher

My current graph monitors board members at a company through time.
However, I'm only interested in currently employed directors. This can be observed because director nodes connect to company nodes through an employment path which includes an end date (r.to) when the director is no longer employed at the firm. If he is currently employed, there will be no end date(null as per below picture). Therefore, I would like to filter the path not containing an end date. I am not sure if the value is an empty string, a null value, or other types so I've been trying different ways without much success. Thanks for any tips!
Current formula
MATCH (c2:Company)-[r2:MANAGED]-(d:Director)-[r:MANAGED]-(c:Company {ticker:'COMS'})
WHERE r.to Is null
RETURN c,d,c2
Unless the response from the Neo4j browser was edited, it looks like the value of r.to is not null or empty, but the string None.
This query will help verify if this is the case:
MATCH (d:Director)-[r:MANAGED]-(c:Company {ticker:'COMS'})
RETURN DISTINCT r.to ORDER by r.to DESC
Absence of the property will show a null in the tabular response. Any other value is a real value of that property. If None shows up, then your query would be
MATCH (c2:Company)-[r2:MANAGED]-(d:Director)-[r:MANAGED]-(c:Company {ticker:'COMS'})
WHERE r.to="None"
RETURN c,d,c2

Auto increment id Neo4j to retrieve elements in insert order

Recently, I am experimenting Neo4j. I like the idea but I am facing a problem that I have never faced with relational databases.
I want to perform these inserts and then return them exactly in the insertion order.
Insert elements:
create(p1:Person {name:"Marc"})
create(p2:Person {name:"John"})
create(p3:Person {name:"Paul"})
create(p4:Person {name:"Steve"})
create(p5:Person {name:"Andrew"})
create(p6:Person {name:"Alice"})
create(p7:Person {name:"Bob"})
While to return them:
match(p:Person) return p order by id(p)
I receive the elements in the following order:
Paul
Andrew
Marc
John
Steve
Alice
Bob
I note that these elements are not returned respecting the query insertion order (through the id function).
In fact the id of my elements are the following:
Marc: 18221
John: 18222
Paul: 18208
Steve: 18223
Andrew: 18209
Alice: 18224
Bob: 18225
How does the Neo4j id function work? I read that it generates an auto incremental id but it seems a little strange his mechanism. How do I return items respecting the query insertion order? I thought about creating a timestamp attribute for each node but I don't think it's the best choice
If you're looking to generate sequence numbers in Neo4j then you need to manage this yourself using a strategy that works best in your application.
In ours we maintain sequence numbers in key/value pair nodes where Scope is the application name given to the sequence number range, and Value is the last sequence number used. When we generate a node of a given type, such as Product, then we increment the sequence number and assign it to our new node.
MERGE (n:Sequence {Scope: 'Product'})
SET n.Value = COALESCE(n.Value, 0) + 1
WITH n.Value AS seq
CREATE (product:Product)
SET product.UniqueId = seq
With this you can create as many sequence numbers you need just by creating sequence nodes with unique scope names.
For more examples and tests see the AutoInc.Neo4j project https://github.com/neildobson-au/AutoInc/blob/master/src/AutoInc.Neo4j/Neo4jUniqueIdGenerator.cs
The id of Neo4j is maintained internally, which your business code should not depend on.
Generally it's auto incrementally, but if there is delete operation, you may reuse the deleted id according to the Reuse Policy of Neo4j Server.

How to concatenate three columns into one and obtain count of unique entries among them using Cypher neo4j?

I can query using Cypher in Neo4j from the Panama database the countries of three types of identity holders (I define that term) namely Entities (companies), officers (shareholders) and Intermediaries (middle companies) as three attributes/columns. Each column has single or double entries separated by colon (eg: British Virgin Islands;Russia). We want to concatenate the countries in these columns into a unique set of countries and hence obtain the count of the number of countries as new attribute.
For this, I tried the following code from my understanding of Cypher:
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)-[:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND
NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND BEZ3.countries="Belize") OR
(BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved", "Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
SET BEZ4.countries= (BEZ1.countries+","+BEZ2.countries+","+BEZ3.countries)
RETURN BEZ3.countries AS IntermediaryCountries, BEZ3.name AS
Intermediaryname, BEZ2.countries AS OfficerCountries , BEZ2.name AS
Officername, BEZ1.countries as EntityCountries, BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress,DISTINCT count(BEZ4.countries) AS NoofConnections
The relevant part is the SET statement in the 7th line and the DISTINCT count in the last line. The code shows error which makes no sense to me: Invalid input 'u': expected 'n/N'. I guess it means to use COLLECT probably but we tried that as well and it shows the error vice-versa'd between 'u' and 'n'. Please help us obtain the output that we want, it makes our job hell lot easy. Thanks in advance!
EDIT: Considering I didn't define variable as suggested by #Cybersam, I tried the command CREATE as following but it shows the error "Invalid input 'R':" for the command RETURN. This is unfathomable for me. Help really needed, thank you.
CODE 2:
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)-
[:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND
NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND
BEZ3.countries="Belize") OR
(BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved",
"Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
CREATE (p:Connections{countries:
split((BEZ1.countries+";"+BEZ2.countries+";"+BEZ3.countries),";")
RETURN BEZ3.countries AS IntermediaryCountries, BEZ3.name AS
Intermediaryname, BEZ2.countries AS OfficerCountries , BEZ2.name AS
Officername, BEZ1.countries as EntityCountries, BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress, AS TOTAL, collect (DISTINCT
COUNT(p.countries)) AS NumberofConnections
Lines 8 and 9 are the ones new and to be in examination.
First Query
You never defined the identifier BEZ4, so you cannot set a property on it.
Second Query (which should have been posted in a separate question):
You have several typos and a syntax error.
This query should not get an error (but you will have to determine if it does what you want):
MATCH (BEZ2:Officer)-[:SHAREHOLDER_OF]->(BEZ1:Entity),(BEZ3:Intermediary)- [:INTERMEDIARY_OF]->(BEZ1:Entity)
WHERE BEZ1.address CONTAINS "Belize" AND NOT ((BEZ1.countries="Belize" AND BEZ2.countries="Belize" AND BEZ3.countries="Belize") OR (BEZ1.status IN ["Inactivated", "Dissolved shelf company", "Dissolved", "Discontinued", "Struck / Defunct / Deregistered", "Dead"]))
CREATE (p:Connections {countries: split((BEZ1.countries+";"+BEZ2.countries+";"+BEZ3.countries), ";")})
RETURN BEZ3.countries AS IntermediaryCountries,
BEZ3.name AS Intermediaryname,
BEZ2.countries AS OfficerCountries ,
BEZ2.name AS Officername,
BEZ1.countries as EntityCountries,
BEZ1.name AS Companyname,
BEZ1.address AS CompanyAddress,
SIZE(p.countries) AS NumberofConnections;
Problems with the original:
The CREATE clause was missing a closing } and also a closing ).
The RETURN clause had a dangling AS TOTAL term.
collect (DISTINCT COUNT(p.countries)) was attempting to perform nested aggregation, which is not supported. In any case, even if it had worked, it probably would not have returned what you wanted. I suspect that you actually wanted the size of the p.countries collection, so that is what I used in my query.

How to match ets:match against a record in Erlang?

I have heard that specifying records through tuples in the code is a bad practice: I should always use record fields (#record_name{record_field = something}) instead of plain tuples {record_name, value1, value2, something}.
But how do I match the record against an ETS table? If I have a table with records, I can only match with the following:
ets:match(Table, {$1,$2,$3,something}
It is obvious that once I add some new fields to the record definition this pattern match will stop working.
Instead, I would like to use something like this:
ets:match(Table, #record_name{record_field=something})
Unfortunately, it returns an empty list.
The cause of your problem is what the unspecified fields are set to when you do a #record_name{record_field=something}. This is the syntax for creating a record, here you are creating a record/tuple which ETS will interpret as a pattern. When you create a record then all the unspecified fields will get their default values, either ones defined in the record definition or the default default value undefined.
So if you want to give fields specific values then you must explicitly do this in the record, for example #record_name{f1='$1',f2='$2',record_field=something}. Often when using records and ets you want to set all the unspecified fields to '_', the "don't care variable" for ets matching. There is a special syntax for this using the special, and otherwise illegal, field name _. For example #record_name{record_field=something,_='_'}.
Note that in your example you have set the the record name element in the tuple to '$1'. The tuple representing a record always has the record name as the first element. This means that when you create the ets table you should set the key position with {keypos,Pos} to something other than the default 1 otherwise there won't be any indexing and worse if you have a table of type 'set' or 'ordered_set' you will only get 1 element in the table. To get the index of a record field you can use the syntax #Record.Field, in your example #record_name.record_field.
Try using
ets:match(Table, #record_name{record_field=something, _='_'})
See this for explanation.
Format you are looking for is #record_name{record_field=something, _ = '_'}
http://www.erlang.org/doc/man/ets.html#match-2
http://www.erlang.org/doc/programming_examples/records.html (see 1.3 Creating a record)

Ruby on Rails+PostgreSQL: usage of custom sequences

Say I have a model called Transaction which has a :transaction_code attribute.
I want that attribute to be automatically filled with a sequence number which may differ from id (e.g. Transaction with id=1 could have transaction_code=1000).
I have tried to create a sequence on postgres and then making the default value for the transaction_code column the nextval of that sequence.
The thing is, if I do not assign any value to #transaction.transaction_code on RoR, when I issue a #transaction.save on RoR, it tries to do the following SQL:
INSERT INTO transactions (transaction_code) VALUES (NULL);
What this does is create a new row on the Transactions table, with transaction_code as NULL, instead of calculating the nextval of the sequence and inserting it on the corresponding column. Thus, as I found out, if you specify NULL to postgres, it assumes you really want to insert NULL into that column, regardless of it having a default value (I'm coming from ORACLE which has a different behavior).
I'm open to any solution on this, either if it is done on the database or on RoR:
either there is a way to exclude attributes from ActiveRecord's
save
or there is a way to change a column's value before insert with a trigger
or there is a way to generate these sequence numbers within RoR
or any other way, as long as it works :-)
Thanks in advance.
For the moment, you might be stuck fetching and assigning the sequence in your ROR model like this:
before_create :set_transaction_code_sequence
def set_transaction_code_sequence
self.transaction_code = self.class.connection.select_value("SELECT nextval('transaction_code_seq')")
end
I'm not particularily fond of this solution, since I'd like to see this corrected in AR directly... but it does do the trick.
If you want to insert the default value in to a column in an INSERT statement, you can use the keyword DEFAULT - no quotes:
INSERT INTO mytable (col1, col2) VALUES (105, DEFAULT);
Or you could spell out the default, nextval(...) in your case. See the manual here.
A trigger for that case is simple. That's actually what I would recommend if you want to make sure that only numbers from your sequence are entered, no matter what.
CREATE OR REPLACE FUNCTION trg_myseq()
RETURNS trigger AS
$BODY$
BEGIN
NEW.mycol := nextval('my_seq');
RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql VOLATILE;
CREATE TRIGGER myseq
BEFORE INSERT
ON mytable
FOR EACH ROW
EXECUTE PROCEDURE trg_myseq();
On a side note:
If you want to assign your own (non-sequential) numbers as 'sequence', I have written a solution for that in an answer a couple of days ago:
How to specify list of values for a postgresql sequence
I was still experiencing this issue with Rails7 - I could see that Rails was generating a NULL in the insert, but changing the column from integer to bigint solved it. - Rails then does not supply a value for my sequenced column and the DEFAULT nextval('number_seq') is used.

Resources