Been running Solr 5 on a German site for the past few months and dealing with Umlaute appears to be a nightmare. I am not a specialist in Solr and on all other projects I am running elastic. It is a bit of an uphill battle to find your way through Solr documentation.
I am wondering if the following two things can be easily configured via schema.xml:
1.) UMLAUTE and Special characters
Special characters are stored in the Database in HTML code. For example:
"an einer Außenwand. Eine Brandschutztür sorgt für maximale Sicherheit."
Now Solr does NOT in anyway know how to deal with it. So if a user searches for "für" nothing comes up. I also tried to search for "für" and for "fr" - nothing returns the expected result.
The same if I type in "Regelungs-App", nothing comes up - if I enter "Regelungs App" I get hits. Why does a simple dash throw Solr of its track? And what setting, or what can I do to ignore this?
2.) Length of Search string
If I search for a string within indexed content, it may be limited to a certain number of characters - example:
"Erreicht als einziger Staubemissionen" - no results
"als einziger Staubemissionen" - no results
"einziger Staubemissionen" - correct results
"Staubemissionen" - correct result
How can I set this?
My current schema.xml:
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!--
This is the Solr schema file. This file should be named "schema.xml" and
should be in the conf directory under the solr home
(i.e. ./solr/conf/schema.xml by default)
or located where the classloader for the Solr webapp can find it.
This example schema is the recommended starting point for users.
It should be kept correct and concise, usable out-of-the-box.
For more information, on how to customize this file, please see
http://wiki.apache.org/solr/SchemaXml
PERFORMANCE NOTE: this schema includes many optional features and should not
be used for benchmarking. To improve performance one could
- set stored="false" for all fields possible (esp large fields) when you
only need to search on the field but don't need to return the original
value.
- set indexed="false" if you don't need to search on the field, but only
return the field as a result of searching on other indexed fields.
- remove all unneeded copyField statements
- for best index size and searching performance, set "index" to false
for all general text fields, use copyField to copy them to the
catchall "text" field, and use that for searching.
- For maximum indexing performance, use the StreamingUpdateSolrServer
java client.
- Remember to run the JVM in server mode, and use a higher logging level
that avoids logging every request
-->
<schema name="sunspot" version="1.0">
<types>
<!-- field type definitions. The "name" attribute is
just a label to be used by field definitions. The "class"
attribute and any other attributes determine the real
behavior of the fieldType.
Class names starting with "solr" refer to java classes in the
org.apache.solr.analysis package.
-->
<!-- *** This fieldType is used by Sunspot! *** -->
<fieldType name="string" class="solr.StrField" omitNorms="true"/>
<!-- *** This fieldType is used by Sunspot! *** -->
<fieldType name="tdouble" class="solr.TrieDoubleField" omitNorms="true"/>
<!-- *** This fieldType is used by Sunspot! *** -->
<fieldType name="rand" class="solr.RandomSortField" omitNorms="true"/>
<!-- *** This fieldType is used by Sunspot! *** -->
<fieldType name="text" class="solr.TextField" omitNorms="false">
<analyzer type="index">
<tokenizer class="solr.NGramTokenizerFactory" minGramSize="3" maxGramSize="15" />
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.PorterStemFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" stemEnglishPossessive="1" splitOnNumerics="1" splitOnCaseChange="1" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="1" preserveOriginal="1"/>
</analyzer>
</fieldType>
<!-- *** This fieldType is used by Sunspot! *** -->
<fieldType name="boolean" class="solr.BoolField" omitNorms="true"/>
<!-- *** This fieldType is used by Sunspot! *** -->
<fieldType name="tint" class="solr.TrieIntField" omitNorms="true"/>
<!-- *** This fieldType is used by Sunspot! *** -->
<fieldType name="tlong" class="solr.TrieLongField" omitNorms="true"/>
<!-- *** This fieldType is used by Sunspot! *** -->
<fieldType name="tfloat" class="solr.TrieFloatField" omitNorms="true"/>
<!-- *** This fieldType is used by Sunspot! *** -->
<fieldType name="tdate" class="solr.TrieDateField"
omitNorms="true"/>
<fieldType name="daterange" class="solr.DateRangeField" omitNorms="true" />
<!-- Special field type for spell correction. Be careful about
adding filters here, as they apply *before* your values go in
the spellcheck. For example, the lowercase filter here means
all spelling suggestions will be lower case (without it,
though, you'd have duplicate suggestions for lower and proper
cased words). -->
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
<!-- A specialized field for geospatial search. If indexed, this fieldType must not be multivalued. -->
<fieldType name="location" class="solr.LatLonType" subFieldSuffix="_coordinate"/>
</types>
<fields>
<!-- Valid attributes for fields:
name: mandatory - the name for the field
type: mandatory - the name of a previously defined type from the
<types> section
indexed: true if this field should be indexed (searchable or sortable)
stored: true if this field should be retrievable
compressed: [false] if this field should be stored using gzip compression
(this will only apply if the field type is compressable; among
the standard field types, only TextField and StrField are)
multiValued: true if this field may contain multiple values per document
omitNorms: (expert) set to true to omit the norms associated with
this field (this disables length normalization and index-time
boosting for the field, and saves some memory). Only full-text
fields or fields that need an index-time boost need norms.
termVectors: [false] set to true to store the term vector for a
given field.
When using MoreLikeThis, fields used for similarity should be
stored for best performance.
termPositions: Store position information with the term vector.
This will increase storage costs.
termOffsets: Store offset information with the term vector. This
will increase storage costs.
default: a value that should be used if no value is specified
when adding a document.
-->
<!-- *** This field is used by Sunspot! *** -->
<field name="id" stored="true" type="string" multiValued="false" indexed="true"/>
<!-- *** This field is used by Sunspot! *** -->
<field name="type" stored="false" type="string" multiValued="true" indexed="true"/>
<!-- *** This field is used by Sunspot! *** -->
<field name="class_name" stored="false" type="string" multiValued="false" indexed="true"/>
<!-- *** This field is used by Sunspot! *** -->
<field name="text" stored="false" type="string" multiValued="true" indexed="true"/>
<!-- *** This field is used by Sunspot! *** -->
<field name="lat" stored="true" type="tdouble" multiValued="false" indexed="true"/>
<!-- *** This field is used by Sunspot! *** -->
<field name="lng" stored="true" type="tdouble" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="random_*" stored="false" type="rand" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="_local*" stored="false" type="tdouble" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_text" stored="false" type="text" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_texts" stored="true" type="text" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_b" stored="false" type="boolean" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_bm" stored="false" type="boolean" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_bs" stored="true" type="boolean" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_bms" stored="true" type="boolean" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_d" stored="false" type="tdate" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_dm" stored="false" type="tdate" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_ds" stored="true" type="tdate" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_dms" stored="true" type="tdate" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_e" stored="false" type="tdouble" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_em" stored="false" type="tdouble" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_es" stored="true" type="tdouble" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_ems" stored="true" type="tdouble" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_f" stored="false" type="tfloat" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_fm" stored="false" type="tfloat" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_fs" stored="true" type="tfloat" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_fms" stored="true" type="tfloat" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_i" stored="false" type="tint" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_im" stored="false" type="tint" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_is" stored="true" type="tint" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_ims" stored="true" type="tint" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_l" stored="false" type="tlong" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_lm" stored="false" type="tlong" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_ls" stored="true" type="tlong" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_lms" stored="true" type="tlong" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_s" stored="false" type="string" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_sm" stored="false" type="string" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_ss" stored="true" type="string" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_sms" stored="true" type="string" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_it" stored="false" type="tint" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_itm" stored="false" type="tint" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_its" stored="true" type="tint" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_itms" stored="true" type="tint" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_ft" stored="false" type="tfloat" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_ftm" stored="false" type="tfloat" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_fts" stored="true" type="tfloat" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_ftms" stored="true" type="tfloat" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_dt" stored="false" type="tdate" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_dtm" stored="false" type="tdate" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_dts" stored="true" type="tdate" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_dtms" stored="true" type="tdate" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_textv" stored="false" termVectors="true" type="text" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_textsv" stored="true" termVectors="true" type="text" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_et" stored="false" termVectors="true" type="tdouble" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_etm" stored="false" termVectors="true" type="tdouble" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_ets" stored="true" termVectors="true" type="tdouble" multiValued="false" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_etms" stored="true" termVectors="true" type="tdouble" multiValued="true" indexed="true"/>
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_dr" stored="false" type="daterange" multiValued="false" indexed="true" />
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_drm" stored="false" type="daterange" multiValued="true" indexed="true" />
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_drs" stored="true" type="daterange" multiValued="false" indexed="true" />
<!-- *** This dynamicField is used by Sunspot! *** -->
<dynamicField name="*_drms" stored="true" type="daterange" multiValued="true" indexed="true" />
<!-- Type used to index the lat and lon components for the "location" FieldType -->
<dynamicField name="*_coordinate" type="tdouble" indexed="true" stored="false" multiValued="false"/>
<dynamicField name="*_p" type="location" indexed="true" stored="true" multiValued="false"/>
<dynamicField name="*_ll" stored="false" type="location" multiValued="false" indexed="true"/>
<dynamicField name="*_llm" stored="false" type="location" multiValued="true" indexed="true"/>
<dynamicField name="*_lls" stored="true" type="location" multiValued="false" indexed="true"/>
<dynamicField name="*_llms" stored="true" type="location" multiValued="true" indexed="true"/>
<field name="textSpell" stored="false" type="textSpell" multiValued="true" indexed="true"/>
<!-- required by Solr 4 -->
<field name="_version_" type="string" indexed="true" stored="true" multiValued="false" />
</fields>
<!-- Field to use to determine and enforce document uniqueness.
Unless this field is marked with required="false", it will be a required field
-->
<uniqueKey>id</uniqueKey>
<!-- field for the QueryParser to use when an explicit fieldname is absent -->
<defaultSearchField>text</defaultSearchField>
<!-- SolrQueryParser configuration: defaultOperator="AND|OR" -->
<solrQueryParser defaultOperator="AND"/>
<!-- copyField commands copy one field to another at the time a document
is added to the index. It's used either to index the same field differently,
or to add multiple fields to the same field for easier/faster
searching. -->
<!-- Use copyField to copy the fields you want to run spell checking
on into one field. For example: -->
<copyField source="*_text" dest="textSpell" />
<copyField source="*_s" dest="textSpell" />
</schema>
You don't say anything about the type of fields you're searching, but if it's of the type "text", the analysis chain looks unsuitable for what you're trying to do. The input (NGramTokenizer) and just lowercasing, will not give the results you're expecting together with the StandardTokenizer on the query side.
Create a new field with a more simplified definition (and probably the same for both index and query for now), that just consist of a whitespace tokenizer or another, more standard tokenizer - see the reference manual for examples of the differences. You'll probably want a lowercasefilter as well.
You might run into issues with umlauts and other specific german terms, but the ICU*-range of filters and tokenizers are more international than the other ones. There's also a filter for splitting words into their components (as you have the same issue as us Norwegians, where words are written together instead of the English way of splitting them up).
The "Analysis" page under the Solr Admin is a great place to start debugging this - it'll show you exactly which transformations are made both on the index and query side, allowing you to see why terms don't match and what the terms look like at each step.
For latin accent characters use
<filter class="solr.ASCIIFoldingFilterFactory"/>
this stores all words in their ASCII format with accents removed.
Your text type looks pretty unbalanced on the query side. Try something like this to start with then reindex your data (I'm currently working on a French site). I don't think the PorterStemFilterFactor is good for German, there are other stemmers that work better:-
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="40" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
Related
I'm wondering if there's a way using Apache Jena or OWL API, to retrieve ontology's individuals by a given data property, and then matching the relations between those individuals from their in common object properties.
EDIT: Here's a sample of the CSV file
,California,Texas,New York,Alabama
Hillary Clinton,69%,31%,33%,67%
Donald Trump,31%,69%,67%,33%
And this is a simple domain ontology created with Protégé:
<?xml version="1.0"?>
<rdf:RDF xmlns="http://www.kdm.com/OWL/elections2016#"
xml:base="http://www.kdm.com/OWL/elections2016"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:owl="http://www.w3.org/2002/07/owl#"
xmlns:xml="http://www.w3.org/XML/1998/namespace"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:elections2016="http://www.kdm.com/OWL/elections2016#">
<owl:Ontology rdf:about="http://www.kdm.com/OWL/elections2016#"/>
<!--
///////////////////////////////////////////////////////////////////////////////////////
//
// Object Properties
//
///////////////////////////////////////////////////////////////////////////////////////
-->
<!-- http://www.kdm.com/OWL/elections2016#HasVote -->
<owl:ObjectProperty rdf:about="http://www.kdm.com/OWL/elections2016#HasVote">
<rdfs:domain rdf:resource="http://www.kdm.com/OWL/elections2016#Nominee_democratic"/>
<rdfs:domain rdf:resource="http://www.kdm.com/OWL/elections2016#Nominee_republic"/>
<rdfs:range rdf:resource="http://www.kdm.com/OWL/elections2016#Votes"/>
</owl:ObjectProperty>
<!-- http://www.kdm.com/OWL/elections2016#HasVoteByClasseSociale -->
<owl:ObjectProperty rdf:about="http://www.kdm.com/OWL/elections2016#HasVoteByClasseSociale">
<rdfs:subPropertyOf rdf:resource="http://www.kdm.com/OWL/elections2016#HasVote"/>
<rdfs:domain rdf:resource="http://www.kdm.com/OWL/elections2016#Nominee_democratic"/>
<rdfs:domain rdf:resource="http://www.kdm.com/OWL/elections2016#Nominee_republic"/>
<rdfs:range rdf:resource="http://www.kdm.com/OWL/elections2016#ClasseSociale"/>
</owl:ObjectProperty>
<!-- http://www.kdm.com/OWL/elections2016#HasVoteByPeriode -->
<owl:ObjectProperty rdf:about="http://www.kdm.com/OWL/elections2016#HasVoteByPeriode">
<rdfs:subPropertyOf rdf:resource="http://www.kdm.com/OWL/elections2016#HasVote"/>
<rdfs:domain rdf:resource="http://www.kdm.com/OWL/elections2016#Nominee_democratic"/>
<rdfs:domain rdf:resource="http://www.kdm.com/OWL/elections2016#Nominee_republic"/>
<rdfs:range rdf:resource="http://www.kdm.com/OWL/elections2016#Periode"/>
</owl:ObjectProperty>
<!-- http://www.kdm.com/OWL/elections2016#HasVoteByRegion -->
<owl:ObjectProperty rdf:about="http://www.kdm.com/OWL/elections2016#HasVoteByRegion">
<rdfs:subPropertyOf rdf:resource="http://www.kdm.com/OWL/elections2016#HasVote"/>
<rdfs:domain rdf:resource="http://www.kdm.com/OWL/elections2016#Nominee_democratic"/>
<rdfs:domain rdf:resource="http://www.kdm.com/OWL/elections2016#Nominee_republic"/>
<rdfs:range rdf:resource="http://www.kdm.com/OWL/elections2016#Region"/>
</owl:ObjectProperty>
<!-- http://www.kdm.com/OWL/elections2016#HasVoteByVotingAge -->
<owl:ObjectProperty rdf:about="http://www.kdm.com/OWL/elections2016#HasVoteByVotingAge">
<rdfs:subPropertyOf rdf:resource="http://www.kdm.com/OWL/elections2016#HasVote"/>
<rdfs:domain rdf:resource="http://www.kdm.com/OWL/elections2016#Nominee_democratic"/>
<rdfs:domain rdf:resource="http://www.kdm.com/OWL/elections2016#Nominee_republic"/>
<rdfs:range rdf:resource="http://www.kdm.com/OWL/elections2016#VotingAge"/>
</owl:ObjectProperty>
<!-- http://www.kdm.com/OWL/elections2016#hasNomineeDemocratic -->
<owl:ObjectProperty rdf:about="http://www.kdm.com/OWL/elections2016#hasNomineeDemocratic">
<rdfs:domain rdf:resource="http://www.kdm.com/OWL/elections2016#Republic"/>
<rdfs:range rdf:resource="http://www.kdm.com/OWL/elections2016#Nominee_republic"/>
</owl:ObjectProperty>
<!-- http://www.kdm.com/OWL/elections2016#hasNomineeRepublic -->
<owl:ObjectProperty rdf:about="http://www.kdm.com/OWL/elections2016#hasNomineeRepublic">
<rdfs:domain rdf:resource="http://www.kdm.com/OWL/elections2016#Democratic"/>
<rdfs:range rdf:resource="http://www.kdm.com/OWL/elections2016#Nominee_democratic"/>
</owl:ObjectProperty>
<!-- http://www.kdm.com/OWL/elections2016#hasPartone -->
<owl:ObjectProperty rdf:about="http://www.kdm.com/OWL/elections2016#hasPartone">
<rdfs:domain rdf:resource="http://www.kdm.com/OWL/elections2016#Political_parties"/>
<rdfs:range rdf:resource="http://www.kdm.com/OWL/elections2016#Democratic"/>
</owl:ObjectProperty>
<!-- http://www.kdm.com/OWL/elections2016#haspartytwo -->
<owl:ObjectProperty rdf:about="http://www.kdm.com/OWL/elections2016#haspartytwo">
<rdfs:domain rdf:resource="http://www.kdm.com/OWL/elections2016#Republic"/>
<rdfs:range rdf:resource="http://www.kdm.com/OWL/elections2016#Nominee_republic"/>
</owl:ObjectProperty>
<!--
///////////////////////////////////////////////////////////////////////////////////////
//
// Data properties
//
///////////////////////////////////////////////////////////////////////////////////////
-->
<!-- http://www.kdm.com/OWL/elections2016#age -->
<owl:DatatypeProperty rdf:about="http://www.kdm.com/OWL/elections2016#age"/>
<!-- http://www.kdm.com/OWL/elections2016#asset -->
<owl:DatatypeProperty rdf:about="http://www.kdm.com/OWL/elections2016#asset"/>
<!-- http://www.kdm.com/OWL/elections2016#currentLocation -->
<owl:DatatypeProperty rdf:about="http://www.kdm.com/OWL/elections2016#currentLocation"/>
<!-- http://www.kdm.com/OWL/elections2016#name -->
<owl:DatatypeProperty rdf:about="http://www.kdm.com/OWL/elections2016#name"/>
<!-- http://www.kdm.com/OWL/elections2016#occupation -->
<owl:DatatypeProperty rdf:about="http://www.kdm.com/OWL/elections2016#occupation"/>
<!-- http://www.kdm.com/OWL/elections2016#spouse -->
<owl:DatatypeProperty rdf:about="http://www.kdm.com/OWL/elections2016#spouse"/>
<!--
///////////////////////////////////////////////////////////////////////////////////////
//
// Classes
//
///////////////////////////////////////////////////////////////////////////////////////
-->
<!-- http://www.kdm.com/OWL/elections2016#ClasseSociale -->
<owl:Class rdf:about="http://www.kdm.com/OWL/elections2016#ClasseSociale">
<rdfs:subClassOf rdf:resource="http://www.kdm.com/OWL/elections2016#Votes"/>
</owl:Class>
<!-- http://www.kdm.com/OWL/elections2016#Democratic -->
<owl:Class rdf:about="http://www.kdm.com/OWL/elections2016#Democratic">
<rdfs:subClassOf rdf:resource="http://www.kdm.com/OWL/elections2016#Political_parties"/>
</owl:Class>
<!-- http://www.kdm.com/OWL/elections2016#Nominee_democratic -->
<owl:Class rdf:about="http://www.kdm.com/OWL/elections2016#Nominee_democratic">
<rdfs:subClassOf rdf:resource="http://www.kdm.com/OWL/elections2016#Democratic"/>
</owl:Class>
<!-- http://www.kdm.com/OWL/elections2016#Nominee_republic -->
<owl:Class rdf:about="http://www.kdm.com/OWL/elections2016#Nominee_republic">
<rdfs:subClassOf rdf:resource="http://www.kdm.com/OWL/elections2016#Republic"/>
</owl:Class>
<!-- http://www.kdm.com/OWL/elections2016#Periode -->
<owl:Class rdf:about="http://www.kdm.com/OWL/elections2016#Periode">
<rdfs:subClassOf rdf:resource="http://www.kdm.com/OWL/elections2016#Votes"/>
</owl:Class>
<!-- http://www.kdm.com/OWL/elections2016#Political_parties -->
<owl:Class rdf:about="http://www.kdm.com/OWL/elections2016#Political_parties"/>
<!-- http://www.kdm.com/OWL/elections2016#Region -->
<owl:Class rdf:about="http://www.kdm.com/OWL/elections2016#Region">
<rdfs:subClassOf rdf:resource="http://www.kdm.com/OWL/elections2016#Votes"/>
</owl:Class>
<!-- http://www.kdm.com/OWL/elections2016#Republic -->
<owl:Class rdf:about="http://www.kdm.com/OWL/elections2016#Republic">
<rdfs:subClassOf rdf:resource="http://www.kdm.com/OWL/elections2016#Political_parties"/>
</owl:Class>
<!-- http://www.kdm.com/OWL/elections2016#Votes -->
<owl:Class rdf:about="http://www.kdm.com/OWL/elections2016#Votes"/>
<!-- http://www.kdm.com/OWL/elections2016#VotingAge -->
<owl:Class rdf:about="http://www.kdm.com/OWL/elections2016#VotingAge">
<rdfs:subClassOf rdf:resource="http://www.kdm.com/OWL/elections2016#Votes"/>
</owl:Class>
<!--
///////////////////////////////////////////////////////////////////////////////////////
//
// Individuals
//
///////////////////////////////////////////////////////////////////////////////////////
-->
<!-- http://www.kdm.com/OWL/elections2016#Alabama -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#Alabama">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#Region"/>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#April -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#April">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#Periode"/>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#August -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#August">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#Periode"/>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#Between_18_and_49 -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#Between_18_and_49">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#VotingAge"/>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#December -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#December">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#Periode"/>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#DemocraticNominee -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#DemocraticNominee">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#Democratic"/>
<name>HillaryClinton</name>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#DonaldTrump -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#DonaldTrump">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#Nominee_republic"/>
<age rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">70</age>
<asset>4,500,000,000</asset>
<currentLocation>NewYork</currentLocation>
<occupation>Businessman</occupation>
<occupation>MelaniaKnauss</occupation>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#Etudiant -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#Etudiant">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#ClasseSociale"/>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#February -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#February">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#Periode"/>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#HillaryClinton -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#HillaryClinton">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#Nominee_democratic"/>
<currentLocation>chicago</currentLocation>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#Ingenieur -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#Ingenieur">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#ClasseSociale"/>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#January -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#January">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#Periode"/>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#July -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#July">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#Periode"/>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#June -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#June">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#Periode"/>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#March -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#March">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#Periode"/>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#May -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#May">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#Periode"/>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#New_York -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#New_York">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#Region"/>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#November -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#November">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#Periode"/>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#October -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#October">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#Periode"/>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#Over_50 -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#Over_50">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#VotingAge"/>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#Professeur -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#Professeur">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#ClasseSociale"/>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#RepublicNominee -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#RepublicNominee">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#Republic"/>
<name>Donald Trump</name>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#September -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#September">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#Periode"/>
</owl:NamedIndividual>
<!-- http://www.kdm.com/OWL/elections2016#Texas -->
<owl:NamedIndividual rdf:about="http://www.kdm.com/OWL/elections2016#Texas">
<rdf:type rdf:resource="http://www.kdm.com/OWL/elections2016#Region"/>
</owl:NamedIndividual>
</rdf:RDF>
e.g: I'd like to get the inviduals that has a data property value equals to Hillary clinton^^xsd:string and Texas^^xsd:string , and then matching the relations between those individuals from their object properties.
It is possible here a short previous of the 2 steps you need to make it work.
Push your csv file into the ontology. Using owl-api you will have to create 'propertyAssertion' for each value. Using jena, you will have to create a 'statement' for each value.
Add a reasoner to your ontology and transform the pattern you want to "match" into a class expression (OWLClassExpression in the owlapi).
Ask the reasoner for the individuals, properties, classes of the entities that "match" the ClassExpression.
Here a simple example of individual querying using the owlapi reasoning interface an example:
OWLClassExpression expr = OWL.min(OWL.DataProperty("http://myDataProperty"), 1)
namedIndividuals = reasoner.getInstances(expr)
This simple example is a pattern matcher for individuals that have at least one property "http://myDataProperty".
In my model i define an instance related to form-resources.
I want to set "xml:lang" value dynamically by a xpath expr in a variable.
I tried but the xml:lang attribute of resource element isn't evaluating the variable, but instead it recognises it as string! (xml:lang="$language")
...
<xf:model id="fr-form-model" xxf:expose-xpath-types="true">
<xf:var name="language" value="de" as="xs:string" />
...
<xf:instance id="fr-form-resources" xxf:readonly="false">
<resources>
<!-- How "xml:lang" attribute can have dynamic value
from a variable/xpath instead of static string 'de' ? -->
<resource xml:lang="de">
<IntegerField>
<label>%translation.IntegerField%</label>
<IntegerField />
</Project>
<cancel>
<label>%translation.cancel%</label>
<hint />
</cancel>
<ok>
<label>%translation.ok%</label>
<hint />
</ok>
</resource>
</resources>
</xf:instance>
</xf:model>
Is there any way to achieve this?
Using Orbeon 4.5
Use an xf:bind for this; something like:
<xf:bind
ref="instance('fr-form-resources')/resource/#xml:lang"
calculate="$language"/>
This is the general mechanism in XForms to have something an instance automatically calculated and re-evaluated as users interact with the form.
When I was trying to add an image to an orbeon form, I found that in some cases works fine, and in other does not.
For example, a simple code with a form that uses a remote image by URL:
<xh:html xmlns:xh="http://www.w3.org/1999/xhtml" xmlns:xf="http://www.w3.org/2002/xforms"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ev="http://www.w3.org/2001/xml-events"
xmlns:xi="http://www.w3.org/2001/XInclude"
xmlns:xxi="http://orbeon.org/oxf/xml/xinclude"
xmlns:xxf="http://orbeon.org/oxf/xml/xforms"
xmlns:exf="http://www.exforms.org/exf/1-0"
xmlns:fr="http://orbeon.org/oxf/xml/form-runner"
xmlns:saxon="http://saxon.sf.net/"
xmlns:sql="http://orbeon.org/oxf/xml/sql"
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:fb="http://orbeon.org/oxf/xml/form-builder">
<xh:head>
<xh:title>Form with Image by URL</xh:title>
<xf:model id="fr-form-model" xxf:expose-xpath-types="true">
<!-- Main instance -->
<xf:instance id="fr-form-instance" xxf:exclude-result-prefixes="#all">
<form>
<section-1>
<control-3>http://media2.giphy.com/avatars/aap/gjQXEptJHq99.gif</control-3>
</section-1>
</form>
</xf:instance>
<!-- Bindings -->
<xf:bind id="fr-form-binds" ref="instance('fr-form-instance')">
<xf:bind id="section-1-bind" name="section-1" ref="section-1">
<xf:bind id="control-3-bind" ref="control-3" name="control-3" type="xf:anyURI"/>
</xf:bind>
</xf:bind>
<!-- Metadata -->
<xf:instance xxf:readonly="true" id="fr-form-metadata" xxf:exclude-result-prefixes="#all">
<metadata>
<application-name>UrlImage</application-name>
<form-name>UrlImage</form-name>
<title xml:lang="en">Form with Image by URL</title>
<description xml:lang="en"/>
<singleton>false</singleton>
</metadata>
</xf:instance>
<!-- Attachments -->
<xf:instance id="fr-form-attachments" xxf:exclude-result-prefixes="#all">
<attachments>
<css mediatype="text/css" filename="" size=""/>
<pdf mediatype="application/pdf" filename="" size=""/>
</attachments>
</xf:instance>
<!-- All form resources -->
<!-- Don't make readonly by default in case a service modifies the resources -->
<xf:instance id="fr-form-resources" xxf:readonly="false" xxf:exclude-result-prefixes="#all">
<resources>
<resource xml:lang="en">
<section-1>
<label>Untitled Section</label>
</section-1>
<control-3>
<label>This is a remote image</label>
</control-3>
</resource>
</resources>
</xf:instance>
<!-- Utility instances for services -->
<xf:instance id="fr-service-request-instance" xxf:exclude-result-prefixes="#all">
<request/>
</xf:instance>
<xf:instance id="fr-service-response-instance" xxf:exclude-result-prefixes="#all">
<response/>
</xf:instance>
</xf:model>
</xh:head>
<xh:body>
<fr:view>
<fr:body xmlns:xbl="http://www.w3.org/ns/xbl"
xmlns:oxf="http://www.orbeon.com/oxf/processors"
xmlns:p="http://www.orbeon.com/oxf/pipeline">
<fr:section id="section-1-control" bind="section-1-bind">
<xf:label ref="$form-resources/section-1/label"/>
<fr:grid>
<xh:tr>
<xh:td>
<xf:output id="control-3-control" bind="control-3-bind" mediatype="image/*">
<xf:label ref="$form-resources/control-3/label"/>
<!-- No hint? -->
<xf:alert ref="$fr-resources/detail/labels/alert"/>
</xf:output>
</xh:td>
</xh:tr>
</fr:grid>
</fr:section>
</fr:body>
</fr:view>
</xh:body>
</xh:html>
The important part is the URL http://media2.giphy.com/avatars/aap/gjQXEptJHq99.gif, that works fine. But if you use a secure connection like https://media2.giphy.com/avatars/aap/gjQXEptJHq99.gif is not able to obtain the image.
I am not sure, maybe the problem is similar to this one Trusting all certificates using HttpClient over HTTPS
This is something that can be override by the configuration?
Can I use an image by using the URL from a https site without adding the certificate for each server in my java store?
There are properties to configure that, but you have to be careful, because in general you really shouldn't trust all certificates!
Your example of https://media2.giphy.com/avatars/aap/gjQXEptJHq99.gif works from Chrome without warning or error, for example, so I would expect it to work from the JVM as well. Maybe the JVM is not configured with the same set of CAs as the browser. In which case the JVM can be configured to add some, although it's a bit tricky.
Probably this behaviour is due to Orbeon does not uses the image URL directly, making a intermediate processing in the Orbeon "server-media".
Ok, seems that at the end, it only works if I add the certificate to the keystore of orbeon defined in oxf.http.ssl.keystore.uri. But this is only valid if I know the servers from where the images will be linked. Not a valid solution to link any image from any server. Java allows to disable this behaviour. Is it possible in Orbeon?
I use DSE 3.2.0. When I try to index in Solr a DateType column (system locale is GMT+3), I get following SOLR exception:
org.apache.solr.common.SolrException: org.apache.solr.common.SolrException: Invalid Date String:'2013-10-10 23:59:59+0300'
at com.datastax.bdp.cassandra.index.solr.CassandraDirectUpdateHandler2.deleteByQuery(CassandraDirectUpdateHandler2.java:230)
at com.datastax.bdp.cassandra.index.solr.AbstractSolrSecondaryIndex.doDelete(AbstractSolrSecondaryIndex.java:628)
at com.datastax.bdp.cassandra.index.solr.Cql3SolrSecondaryIndex.updateColumnFamilyIndex(Cql3SolrSecondaryIndex.java:138)
at com.datastax.bdp.cassandra.index.solr.AbstractSolrSecondaryIndex$3.run(AbstractSolrSecondaryIndex.java:896)
at com.datastax.bdp.cassandra.index.solr.concurrent.IndexWorker.run(IndexWorker.java:38)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.solr.common.SolrException: Invalid Date String:'2013-10-10 23:59:59+0300'
at org.apache.solr.schema.DateField.parseMath(DateField.java:182)
at org.apache.solr.analysis.TrieTokenizer.reset(TrieTokenizerFactory.java:135)
at org.apache.solr.parser.SolrQueryParserBase.newFieldQuery(SolrQueryParserBase.java:409)
at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:959)
at org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:574)
at org.apache.solr.parser.SolrQueryParserBase.handleQuotedTerm(SolrQueryParserBase.java:779)
Schema below:
<schema name="mach" version="1.1">
<types>
<fieldType name="string" class="solr.StrField"/>
<fieldType name="int" class="solr.TrieIntField"/>
<fieldType name="date" class="solr.TrieDateField"/>
</types>
<fields>
<field name="snapshot_date" type="date" indexed="true" stored="true"/>
<field name="account_id" type="string" indexed="true" stored="true"/>
<field name="account_type" type="string" indexed="true" stored="true" />
</fields>
<uniqueKey>(snapshot_date, account_id)</uniqueKey>
<defaultSearchField>account_id</defaultSearchField>
</schema>
Solr uses a particular subset of the ISO date format: YYYY-MM-DDThh:mm:ssZ or ss.tttZ at the end, suppressing any trailing zeros. Only GMT is supported ("Z").
So, your value of "2013-10-10 23:59:59+0300" should be expressed as "2013-10-10T20:59:59Z".
This is a bug affecting reindexing of deleted rows, and will be fixed in DSE 3.2.3.
A search which indexes the following string: "Ordoñez" as:
text :lastname
Is then searched as:
User.solr_search do
keywords 'Ordonez'
end
Will return 0 results.
How can I index the string: Ordoñez using solr and get a match when the search is performed for
keywords 'Ordonez' or keywords 'Ordoñez'
I have tried the ASCIIFoldingFilter at index time but this did not do the job.
Here's what I did to try to make this work.
You probably need to add the handling on the Container side as well.
You can check Why don't International Characters Work
My problem was having these 3 fields, which happen to be unused.
<field name="firstname_text" type="textgen" stored="false" multiValued="true" indexed="true"/>
<field name="lastname_text" type="textgen" stored="false" multiValued="true" indexed="true"/>
<field name="specialty_text" type="textgen" stored="false" multiValued="true" indexed="true"/>
Not too sure why but as soon as removed them, the ASCII filter started working.
The ASCIIFoldingFilterFactory does do the job.
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.SynonymFilterFactory"/>
</analyzer>