XSLT Lookup Document returning multiple rows - xslt-2.0

I'm using XSLT 2 and am doing a document lookup, however it's returning in some case multiple rows
<xsl:value-of select="$LookupRegexReplace/Row[matches($seq4, #Key1, 'i')]/#RegexReplace"/>
How can I determine that multiple rows were returned, or is there a way to just return the first occurrence. I've tried the following but it didn't work?
<xsl:value-of select="$LookupRegexReplace/Row[matches($seq4, #Key1, 'i')]/#RegexReplace[1]"/>

Use <xsl:value-of select="($LookupRegexReplace/Row[matches($seq4, #Key1, 'i')]/#RegexReplace)[1]"/>

Related

Extract multiple Substrings from XML stored in a table with datatype CLOB (Oracle 9i)

<!DOCTYPE PODesc SYSTEM "PODesc.dtd"><PODesc><doc_type>P</doc_type><order_no>62249675</order_no><order_type>N/B</order_type><order_type_desc>N/B</order_type_desc><supplier>10167</supplier><qc_ind>N</qc_ind><not_before_date><year>2016</year><month>09</month><day>22</day><hour>00</hour><minute>00</minute><second>00</second></not_before_date><not_after_date><year>2016</year><month>09</month><day>22</day><hour>00</hour><minute>00</minute><second>00</second></not_after_date><otb_eow_date><year>2016</year><month>09</month><day>25</day><hour>00</hour><minute>00</minute><second>00</second></otb_eow_date><earliest_ship_date><year>2016</year><month>09</month><day>22</day><hour>00</hour><minute>00</minute><second>00</second></earliest_ship_date><latest_ship_date><year>2016</year><month>09</month><day>22</day><hour>00</hour><minute>00</minute><second>00</second></latest_ship_date><terms>10003</terms><terms_code>45 days</terms_code><freight_terms>SHIP</freight_terms><cust_order>N</cust_order><status>A</status><exchange_rate>1</exchange_rate><bill_to_id>BT</bill_to_id><po_type>00</po_type><po_type_desc>No Store Cluster</po_type_desc><pre_mark_ind>N</pre_mark_ind><currency_code>CZK</currency_code><comment_desc>created by the Tesco Group Ordering System</comment_desc><PODtl><item>120000935</item><physical_location_type>W</physical_location_type><physical_location>207</physical_location><physical_qty_ordered>625</physical_qty_ordered><unit_cost>281.5</unit_cost><origin_country_id>CZ</origin_country_id><supp_pack_size>25</supp_pack_size><earliest_ship_date><year>2016</year><month>09</month><day>22</day><hour>00</hour><minute>00</minute><second>00</second></earliest_ship_date><latest_ship_date><year>2016</year><month>09</month><day>22</day><hour>00</hour><minute>00</minute><second>00</second></latest_ship_date><packing_method>FLAT</packing_method><round_lvl>C</round_lvl><POVirtualDtl><location_type>W</location_type><location>507</location><qty_ordered>625</qty_ordered></POVirtualDtl></PODtl><PODtl><item>218333522</item><physical_location_type>W</physical_location_type><physical_location>207</physical_location><physical_qty_ordered>180</physical_qty_ordered><unit_cost>230.94</unit_cost><origin_country_id>CZ</origin_country_id><supp_pack_size>18</supp_pack_size><earliest_ship_date><year>2016</year><month>09</month><day>22</day><hour>00</hour><minute>00</minute><second>00</second></earliest_ship_date><latest_ship_date><year>2016</year><month>09</month><day>22</day><hour>00</hour><minute>00</minute><second>00</second></latest_ship_date><packing_method>FLAT</packing_method><round_lvl>C</round_lvl><POVirtualDtl><location_type>W</location_type><location>507</location><qty_ordered>180</qty_ordered></POVirtualDtl></PODtl><PODtl><item>218333416</item>
Above is a part of XML file stored in a table column. I want to extract all the Strings between tags <item> and </item>. There are multiple values in a single file for <item>. I am using oracle 9i. Can anyone please provide a proper query for that?
Figure out what the XPath of the values are in your XML, then use ExtractValue
http://docs.oracle.com/cd/B10501_01/appdev.920/a96620/xdb04cre.htm#1024805
e.g.
select <your_rowid>, extractvalue( xmltype(<your_column>), <your_xpath>) from <your_table>
For multiple values just perform multiple extractvalues in the same select.

XSLT 2.0: filter on match

<xsl:template match="lat:entry[document(lat:file)//h2]"/>
Is this template called ONLY on "entry" elements that contain a lat:file tag with a file name, which file contains h2 tags?
Or on ANY lat:entry ?
If the latter, how can I construct a correct match? (correct being the former option)
That match pattern lat:entry[document(lat:file)//h2] indeed matches elements with local name entry with the namespace matched by the prefix lat which have one or more file child element in the same namespace where document(lat:file) finds at least one XML document containing h2 elements (in no namespace or in the xpath-default-namespace, depending on the context). So your first description is kind of right, with the exception that document(lat:file)//h2 could result in several documents being loaded and checked for h2 elements, if there are several lat:file child elements.

Orbeon - how to set values of numerous fields with one query

I have created a database service that retrieves numerous columns. I have successfully created the action to call other queries which passes in a parameter and displays the output in drop-down box or check boxes. HOWEVER, with this new query I would like to set the values of 5 different fields on the form based on the single query call. What xpath expression syntax is needed in the 'Set Response Control Values' section in order to make this work.....or is this not the right place or way to do this?
Sounds like you're using Form Builder - in the "Set Response Control Values" section in the Actions Editor, you should set up one item for each form field to be updated, with the Destination Control drop-down specifying the form field. So in your case you'll have 5 rows pointing to your 5 fields.
Let's assume that your query returns a single row, with the values that will go into your form fields in separate columns. Your query results come from the database service looking like this:
<response>
<row>
<query-column-1>value</query-column-1>
<query-column-2>value</query-column-2>
...
</row>
</response>
So if the column name for your first item is "id", the "Set Response Control Values" entry would look like this:
/response/row/id
There is one gotcha...if a column name in the database includes an underscore, this will be converted to a hyphen in the results from the database service. So if your column name was "asset_id" you'd put response/row/asset-id.
If your query returns multiple rows, you can refer to a specific row using a predicate, like so: response/row[1]/id

xpath parent attribute of selection

Syntax of the xml document:
<x name="GET-THIS">
<y>
<z>Z</z>
<z>Z__2</z>
<z>Z__3</z>
</y>
</x>
I'm able to get all z elements using:
xpath("//z")
But after that I got stuck, I'm not sure what to do next. I don't really understand the syntax of the .. parent method
So, how do I get the attribute of the parent of the parent of the element?
Instead of traversing back to the parent, just find the right parent to begin with:
//x will select all x elements.
//x[//z] will select all x elements which have z elements as descendants.
//x[//z]/#name will get the name attribute of each of those elements.
You already have a good accepted answer, but here are some other helpful expressions:
//z/ancestor::x/#name - Find <z> elements anywhere, then find all the ancestor <x> elements, and then the name="…" attributes of them.
//z/../../#name - Find the <z> elements, and then find the parent node(s) of those, and then the parent node(s) of those, and then the name attribute(s) of the final set.
This is the same as: //z/parent::*/parent::*/#name, where the * means "an element with any name".
The // is useful, but inefficient. If you know that the hierarchy is x/y/z, then it is more efficient to do something like //x[y/z]/#name
I dont have a reputation, so I cannot add comment to accepted answer by Blender. But his answer will not work in general.
Correct version is
//x[.//z]/#name
Explanation is simple - when you use filter like [//z] it will search for 'z' in global context, i.e. it returns true if xml contains at least one node z anywhere in xml. For example, it will select both names from xml below:
<root>
<x name="NOT-THIS">
</x>
<x name="GET-THIS">
<y>
<z>Z</z>
<z>Z__2</z>
<z>Z__3</z>
</y>
</x>
</root>
Filter [.//z] use context of current node (.) which is xand return only 2nd name.

JOINS in Lucene

Is there any way to implement JOINS in Lucene?
You can also use the new BlockJoinQuery; I described it in a blog post here:
http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html
You can do a generic join by hand - run two searches, get all results (instead of top N),
sort them on your join key and intersect two ordered lists. But that's gonna thrash your heap real hard (if the lists even fit in it).
There are possible optimizations, but under very specific conditions.
I.e. - you do a self-join, and only use (random access) Filters for filtering, no Queries. Then you can manually iterate terms on your two join fields (in parallel), intersect docId lists for each term, filter them - and here's your join.
There's an approach handling a popular use-case of simple parent-child relationships with relatively small numer of children per-document - https://issues.apache.org/jira/browse/LUCENE-2454
Unlike the flattening method mentioned by #ntziolis, this approach correctly handles cases like: have a number of resumes, each with multiple work_experience children, and try finding someone who worked at company NNN in year YYY. If simply flattened, you'll get back resumes for people that worked for NNN in any year & worked somewhere in year YYY.
An alternative for handling simple parent-child cases is to flatten your doc, indeed, but ensure values for different children are separated by a big posIncrement gap, and then use SpanNear query to prevent your several subqueries from matching across children. There was a few-years old LinkedIn presentation about this, but I failed to find it.
Lucene does not support relationships between documents, but a join is nothing else but a specific combination of multiple AND within parenthesis, but you will need to flatten the relationship first.
Sample (SQL => Lucene):
SQL:
SELECT Order.* FROM Order
JOIN Customer ON Order.CustomerID = Customer.ID
WHERE Customer.Name = 'SomeName'
AND Order.Nr = 400
Lucene:
Make sure you have all the neccessary fields and their respective values on the document like:
Customer.Name => "Customer_Name" and
Order.Nr => "Order_Nr"
The query would then be:
( Customer_Name:"SomeName" AND Order_Nr:"400" )
https://issues.apache.org/jira/browse/SOLR-2272
Use joinutil. It allows query time joins.
See: http://lucene.apache.org/core/4_0_0/join/org/apache/lucene/search/join/JoinUtil.html
A little late but you could use Package org.apache.lucene.search.join : https://lucene.apache.org/core/6_3_0/join/org/apache/lucene/search/join/package-summary.html
From their documentation:
The index-time joining support joins while searching, where joined
documents are indexed as a single document block using
IndexWriter.addDocuments().
String fromField = "from"; // Name of the from field
boolean multipleValuesPerDocument = false; // Set only yo true in the case when your fromField has multiple values per document in your index
String toField = "to"; // Name of the to field
ScoreMode scoreMode = ScoreMode.Max // Defines how the scores are translated into the other side of the join.
Query fromQuery = new TermQuery(new Term("content", searchTerm)); // Query executed to collect from values to join to the to values
Query joinQuery = JoinUtil.createJoinQuery(fromField, multipleValuesPerDocument, toField, fromQuery, fromSearcher, scoreMode);
TopDocs topDocs = toSearcher.search(joinQuery, 10); // Note: toSearcher can be the same as the fromSearcher
// Render topDocs...
There are some implementations on the top of Lucene that make those kind of joins among several different indexes possible. Numere (http://numere.stela.org.br/) enable that and make it possible to get results as a RDBMS result set.
Here is an example Numere provides an easy way to extract analytical data from Lucene indexes
select a.type, sum(a.value) as "sales", b.category, count(distinct b.product_id) as "total"
from a (index)
inner join b (index) on (a.seq_id = b.seq_id)
group by a.type, b.category
order by a.type asc, b.category asc
Join join = RequestFactory.newJoin();
// inner join a.seq_id = b.seq_id
join.on("seq_id", Type.INTEGER).equal("seq_id", Type.INTEGER);
// left
{
Request left = join.left();
left.repository(UtilTest.getPath("indexes/md/master"));
left.addColumn("type").textType().asc();
left.addMeasure("value").alias("sales").intType().sum();
}
// right
{
Request right = join.right();
right.repository(UtilTest.getPath("indexes/md/detail"));
right.addColumn("category").textType().asc();
right.addMeasure("product_id").intType().alias("total").count_distinct();
}
Processor processor = ProcessorFactory.newProcessor();
try {
ResultPacket result = processor.execute(join);
System.out.println(result);
} finally {
processor.close();
}
Result:
<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>
<DATAPACKET Version="2.0">
<METADATA>
<FIELDS>
<FIELD attrname="type" fieldtype="string" WIDTH="20" />
<FIELD attrname="category" fieldtype="string" WIDTH="20" />
<FIELD attrname="sales" fieldtype="i8" />
<FIELD attrname="total" fieldtype="i4" />
</FIELDS>
<PARAMS />
</METADATA>
<ROWDATA>
<ROW type="Book" category="stand" sales="127003304" total="2" />
<ROW type="Computer" category="eletronic" sales="44765715835" total="896" />
<ROW type="Meat" category="food" sales="3193526428" total="110" />
... continue

Resources