YQL showing results from multiple sources - yql

select * from people
<root>
<row>
<name>a</name>
<address>address1</address>
<loc_id>1</loc_id>
</row>
<row>
<name>b</name>
<address>address2</address>
<loc_id>2</loc_id>
</row>
</root>
select * from locations
<root>
<row>
<id>1</id>
<name>location1</name>
<details>locationdetails1</details>
</row>
<row>
<id>2</id>
<name>location2</name>
<details>locationdetails2</details>
</row>
</root>
Is there anyway in YQL where we can join both these data sources on people.loc_id<->location.id and return a result that includes all the values together. I know there is a possibility of naming conflicts but anyway of resolving that too? So basically any yql query that might help return results like below or something similar.
<root>
<row>
<name>a</name>
<address>address1</address>
<loc_id>1</loc_id>
<location.id>1</location.id>
<location.name>location1</location.name>
<location.details>locationdetails1</location.details>
</row>
<row>
<name>b</name>
<address>address2</address>
<loc_id>2</loc_id>
<location.id>2</location.id>
<location.name>location2</location.name>
<location.details>locationdetails2</location.details>
</row>
</root>
Cross-posted on YQL forums

No, this is not possible directly in YQL. It doesn't have the ability to combine results like this.
Instead in your application (whatever you have that is calling in to YQL) could make two calls and combine the results.

Related

Handle whitespace in Neo4j full text search

I need some help with full text search.
I have created an index like so:
CALL db.index.fulltext.createNodeIndex("ReasourceName",["Resource"],["name"])
I can query it and get results:
CALL db.index.fulltext.queryNodes('ReasourceName', 'bmc pumping station~') YIELD node, score
WITH node, score
RETURN node.name, score
limit 10;
output:
╒════════════════════════════════╤══════════════════╕
│"node.name" │"score" │
╞════════════════════════════════╪══════════════════╡
│"BMC Pumping Station" │8.143752098083496 │
├────────────────────────────────┼──────────────────┤
│"BMC Office" │2.944127082824707 │
├────────────────────────────────┼──────────────────┤
│"BMC Office" │2.944127082824707 │
├────────────────────────────────┼──────────────────┤
│"BMC Dispensary" │2.944127082824707 │
├────────────────────────────────┼──────────────────┤
│"BMC Office" │2.944127082824707 │
├────────────────────────────────┼──────────────────┤
│"BMC Dispensary" │2.944127082824707 │
├────────────────────────────────┼──────────────────┤
│"BMC Office" │2.944127082824707 │
├────────────────────────────────┼──────────────────┤
│"Police Station" │2.6569595336914062│
├────────────────────────────────┼──────────────────┤
│"Momo Station" │2.6569595336914062│
├────────────────────────────────┼──────────────────┤
│"BMC Shikshak Bhavan" │2.515393018722534 │
└────────────────────────────────┴──────────────────┘
However it performs poorly if the input query differs in whitespace. For example, I would expect the query bmcpumpingstation or bmcpumpingstation~ to have a similar result set, however it returns nothing.
There does not appear to be an analyzer that works on levenshtein distance.
(I also asked this question on neo4j community but didn't get a response)
The underlying engine is Lucene and the flow is that it tokenizes the text and store it as tokens.
Then when you "search", it tokenize your search string and compare tokens between your search and what is in the indexes.
I would suggest to read this article ( note Neo4j did some changes but 90% is still valid today ) : https://graphaware.com/neo4j/2019/01/11/neo4j-full-text-search-deep-dive.html
So, if you search for bmcpumpingstation and your index contains the following tokens :
bmc, pumping, station
Then there is simply no match.
If you want to hack a bit and have this type of search working, you can create a dedicated index for this and remove all whitespaces from the names when you index it, then you can use bmcpumpingstation with some fuzziness to search
Looks like you need to clean up your data!
match (m:Resource) set m.name=replace(m.node,"~","")
or do the clean up before loading the data.

XSLT Lookup Document returning multiple rows

I'm using XSLT 2 and am doing a document lookup, however it's returning in some case multiple rows
<xsl:value-of select="$LookupRegexReplace/Row[matches($seq4, #Key1, 'i')]/#RegexReplace"/>
How can I determine that multiple rows were returned, or is there a way to just return the first occurrence. I've tried the following but it didn't work?
<xsl:value-of select="$LookupRegexReplace/Row[matches($seq4, #Key1, 'i')]/#RegexReplace[1]"/>
Use <xsl:value-of select="($LookupRegexReplace/Row[matches($seq4, #Key1, 'i')]/#RegexReplace)[1]"/>

QBXML : Can there be multiple credit/debit line items in a journal entry query response?

I am retrieving the journal entries and trying to determine whether there will only ever be one JournalCreditLine node and one JournalDebitLine node per JournalEntryRet or if there could be multiple line entries.
EDIT:
I have added multiple journal entries in one place with the same timestamp, but I always get multiple <JournalEntryRet> and never multiple <JournalDebitLine> or <JournalCreditLine>
Query I am sending:
<?xml version="1.0" encoding="utf-8"?>
<?qbxml version="11.0"?>
<QBXML>
<QBXMLMsgsRq onError="stopOnError">
<JournalEntryQueryRq requestID="[request id from DB]">
<IncludeLineItems>1</IncludeLineItems>
</JournalEntryQueryRq>
</QBXMLMsgsRq>
</QBXML>';
Example Response (with all customer data removed):
[data]
[data]
[data]
[data]
[data]
[data]
[data]
[data]
[data]
[data]
[data]
[data]
[data]
[data]
[data]
[data]
[data]
[data]
<JournalEntryRet>
<TxnID>[data]</TxnID>
<TimeCreated>[data]</TimeCreated>
<TimeModified>[data]</TimeModified>
<EditSequence>[data]</EditSequence>
<TxnNumber>[data]</TxnNumber>
<TxnDate>[data]</TxnDate>
<RefNumber>[data]</RefNumber>
<IsAdjustment>[data]</IsAdjustment>
<JournalDebitLine>
<TxnLineID>[data]</TxnLineID>
<AccountRef>
<ListID>[data]</ListID>
<FullName>[data]</FullName>
</AccountRef>
<Amount>[data]</Amount>
<Memo>[data]</Memo>
</JournalDebitLine>
<JournalCreditLine>
<TxnLineID>[data]</TxnLineID>
<AccountRef>
<ListID>[data]</ListID>
<FullName>[data]</FullName>
</AccountRef>
<Amount>[data]</Amount>
<Memo>[data]</Memo>
</JournalCreditLine>
</JournalEntryRet>
<!-- more JournalEntryRet nodes -->
</JournalEntryQueryRs>
</QBXMLMsgsRs>
</QBXML>
There could be multiple journal credit lines, and multiple journal debit lines, in a single JournalEntry object. This mirrors the behavior of the QuickBooks GUI.
The business rule is that the sum of all credit lines must equal the sum of all debit lines.

Getting fuzzy searching to work for Sunspot?

I have in my database or Solr index the following 2 Products: Total War: Shogun 2 [Download] and Eggs.
What I want the search to be able to do is match these 2 Products with mistakes e.g:
"Egggs", "Eggz", "Eg", "Egs" and "Shogn Download", "Totle War","Tutal War: Shogunn 2 Download" etc.
EDIT ( Working somewhat):
This will get you started, still having issues with using different characters inside of a search though i.e. Only things like "Eggs" and "Great Value Vitamin D Whole Milk" can be misspelled not "Total War: Shogun 2".
New code:
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" stemEnglishPossessive="1" splitOnNumerics="1" splitOnCaseChange="1" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" preserveOriginal="1"/>
<filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="50" side="front"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" stemEnglishPossessive="1" splitOnNumerics="1" splitOnCaseChange="1" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" preserveOriginal="1"/>
<filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
The Ideal is to be able have my search like Googles where it does a pretty good job of correcting your spelling whether lowercase, uppercase and with a couple of errors. How would I make my search similar to what Google does?
Fuzzy searches do not undergo query time analysis.
So there are chances that you query does not match the index terms.
The terms in the above config, undergo lower case filtering during indexing, which would store all the terms in lower case.
And searching for Egggs would never produce any results, as Egggs would not match eggs.
The searched terms need to be lowercased explicitly.
Also, in the above config, the index time analysis is very different from query time analysis.
Its usually recommended to have similiar filters during query and index, so that the indexed terms match the searched terms.
solr.PorterStemFilterFactory may result into a completely different root for the searched term and may never match the indexed terms.
Revisit your configuration. Maybe check the example solr schema xml for reference.

JOINS in Lucene

Is there any way to implement JOINS in Lucene?
You can also use the new BlockJoinQuery; I described it in a blog post here:
http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
You can do a generic join by hand - run two searches, get all results (instead of top N),
sort them on your join key and intersect two ordered lists. But that's gonna thrash your heap real hard (if the lists even fit in it).
There are possible optimizations, but under very specific conditions.
I.e. - you do a self-join, and only use (random access) Filters for filtering, no Queries. Then you can manually iterate terms on your two join fields (in parallel), intersect docId lists for each term, filter them - and here's your join.
There's an approach handling a popular use-case of simple parent-child relationships with relatively small numer of children per-document - https://issues.apache.org/jira/browse/LUCENE-2454
Unlike the flattening method mentioned by #ntziolis, this approach correctly handles cases like: have a number of resumes, each with multiple work_experience children, and try finding someone who worked at company NNN in year YYY. If simply flattened, you'll get back resumes for people that worked for NNN in any year & worked somewhere in year YYY.
An alternative for handling simple parent-child cases is to flatten your doc, indeed, but ensure values for different children are separated by a big posIncrement gap, and then use SpanNear query to prevent your several subqueries from matching across children. There was a few-years old LinkedIn presentation about this, but I failed to find it.
Lucene does not support relationships between documents, but a join is nothing else but a specific combination of multiple AND within parenthesis, but you will need to flatten the relationship first.
Sample (SQL => Lucene):
SQL:
SELECT Order.* FROM Order
JOIN Customer ON Order.CustomerID = Customer.ID
WHERE Customer.Name = 'SomeName'
AND Order.Nr = 400
Lucene:
Make sure you have all the neccessary fields and their respective values on the document like:
Customer.Name => "Customer_Name" and
Order.Nr => "Order_Nr"
The query would then be:
( Customer_Name:"SomeName" AND Order_Nr:"400" )
https://issues.apache.org/jira/browse/SOLR-2272
Use joinutil. It allows query time joins.
See: http://lucene.apache.org/core/4_0_0/join/org/apache/lucene/search/join/JoinUtil.html
A little late but you could use Package org.apache.lucene.search.join : https://lucene.apache.org/core/6_3_0/join/org/apache/lucene/search/join/package-summary.html
From their documentation:
The index-time joining support joins while searching, where joined
documents are indexed as a single document block using
IndexWriter.addDocuments().
String fromField = "from"; // Name of the from field
boolean multipleValuesPerDocument = false; // Set only yo true in the case when your fromField has multiple values per document in your index
String toField = "to"; // Name of the to field
ScoreMode scoreMode = ScoreMode.Max // Defines how the scores are translated into the other side of the join.
Query fromQuery = new TermQuery(new Term("content", searchTerm)); // Query executed to collect from values to join to the to values
Query joinQuery = JoinUtil.createJoinQuery(fromField, multipleValuesPerDocument, toField, fromQuery, fromSearcher, scoreMode);
TopDocs topDocs = toSearcher.search(joinQuery, 10); // Note: toSearcher can be the same as the fromSearcher
// Render topDocs...
There are some implementations on the top of Lucene that make those kind of joins among several different indexes possible. Numere (http://numere.stela.org.br/) enable that and make it possible to get results as a RDBMS result set.
Here is an example Numere provides an easy way to extract analytical data from Lucene indexes
select a.type, sum(a.value) as "sales", b.category, count(distinct b.product_id) as "total"
from a (index)
inner join b (index) on (a.seq_id = b.seq_id)
group by a.type, b.category
order by a.type asc, b.category asc
Join join = RequestFactory.newJoin();
// inner join a.seq_id = b.seq_id
join.on("seq_id", Type.INTEGER).equal("seq_id", Type.INTEGER);
// left
{
Request left = join.left();
left.repository(UtilTest.getPath("indexes/md/master"));
left.addColumn("type").textType().asc();
left.addMeasure("value").alias("sales").intType().sum();
}
// right
{
Request right = join.right();
right.repository(UtilTest.getPath("indexes/md/detail"));
right.addColumn("category").textType().asc();
right.addMeasure("product_id").intType().alias("total").count_distinct();
}
Processor processor = ProcessorFactory.newProcessor();
try {
ResultPacket result = processor.execute(join);
System.out.println(result);
} finally {
processor.close();
}
Result:
<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>
<DATAPACKET Version="2.0">
<METADATA>
<FIELDS>
<FIELD attrname="type" fieldtype="string" WIDTH="20" />
<FIELD attrname="category" fieldtype="string" WIDTH="20" />
<FIELD attrname="sales" fieldtype="i8" />
<FIELD attrname="total" fieldtype="i4" />
</FIELDS>
<PARAMS />
</METADATA>
<ROWDATA>
<ROW type="Book" category="stand" sales="127003304" total="2" />
<ROW type="Computer" category="eletronic" sales="44765715835" total="896" />
<ROW type="Meat" category="food" sales="3193526428" total="110" />
... continue

Resources