FullTextIndex in NexusDB, How to tokenize the searchstring

FullTextIndex in NexusDB, How to tokenize the searchstring - delphi

We are using NexusDB for a small database. We have a table with a FulltextIndex defined on it.
The index is configured with the following options:
Character separator
ccPunctuationDash
ccPunctuationOther
The user enters a search text in an edit box, and then an SQL statement is constructed with the following WHERE clause (%s substituted with the Editbox.text of course):
WHERE CONTAINS(FullIdx, ''%s'')
When the user enters multiple words in the editbox this goes wrong as the two separate words should have been embedded in the WHERE clause like this:
WHERE CONTAINS(FullIdx, 'word1' and 'word2')
So i have to parse the textbox value, scan it for spaces and split the text at those points. That made me wonder if it was possible to parse the search text for every setting of the fulltextindex, using the actual definition of the fulltextindex to create the correct where clause.
So if ccPunctuationDash is enabled in the FulltextIndex definition, than the search text is also split on a '-'.
If you think of it, it is exactly the same process as when the index is created and all strings are tokenized ...
My question: what is the easiest way of tokenizing a searchstring according to the settings of a FUlltextIndex?

The easiest way is... to create an empty #temporary table with a string field, with the same fulltext index settings as your real table. Set the TnxTable.Options to include dsoAddKeyAsVariantField. Load the string to tokenize into the string field, then view the table indexed by the fulltext index. Presto, you get an extra field displayed, which is the sorted tokens. You can now iterate over the table to read the tokens.

Related

Delphi - How to filter a ADOTable for strings that contain a substring

noob here. The question once again says it all, :).
I have a ADOTable that is connected to a dbgrid and .mdb file. I want to filter my table field "OwnerName" for all instances of a string that contain another substring and display them on the dbGrid. Each record has this string field "OwnerName". How do I do this?
Ex:
substring: 'J'
Strings: 'Jannie', Johanna, Ko-Ja etc..
If possible I also want to be able to filter for string that not only start with that exact wubstring, but contain it later in as well, as with my stupid example: Ko-Ja.
Regards!!!

The are just two properties you have to set: Filter and Filtered. In the first set the filter condition (similar to SQL) and the second is a boolean stating whether to apply the filter or not.
Example:
YourADOTable.Filter := 'OwnerName LIKE ''%J%''';
YourADOTable.Filtered := True;
The %s in '%J%' means 'anything'.. so this way you are filtering for records which has in the OwnerName the text 'anything followed by a J and then anything again'.
Once you apply the filter, the dbGrid updates automatically.
You can find more info on Filter String at:
http://docwiki.embarcadero.com/Libraries/Sydney/en/Data.DB.TDataSet.Filter

How do i remove rows based on comma-separated list of values in a Power BI parameter in Power Query?

I have a list of data with a title column (among many other columns) and I have a Power BI parameter that has, for example, a value of "a,b,c". What I want to do is loop through the parameter's values and remove any rows that begin with those characters.
For example:
Title
a
b
c
d
Should become
Title
d
This comma separated list could have one value or it could have twenty. I know that I can turn the parameter into a list by using
parameterList = Text.Split(<parameter-name>,",")
but then I am unsure how to continue to use that to filter on. For one value I would just use
#"Filtered Rows" = Table.SelectRows(#"Table", each Text.StartsWith([key], <value-to-filter-on>))
but that only allows one value.
EDIT: I may have worded my original question poorly. The comma separated values in the parameterList can be any number of characters (e.g.: a,abcd,foo,bar) and I want to see if the value in [key] starts with that string of characters.

Try using List.Contains to check whether the starting character is in the parameter list.
each List.Contains(parameterList, Text.Start([key], 1)
Edit: Since you've changed the requirement, try this:
Table.SelectRows(
#"Table",
(C) => not List.AnyTrue(
List.Transform(
parameterList,
each Text.StartsWith(C[key], _)
)
)
)
For each row, this transforms the parameterList into a list of true/false values by checking if the current key starts with each text string in the list. If any are true, then List.AnyTrue returns true and we choose not to select that row.

Since you want to filter out all the values from the parameter, you can use something like:
= Table.SelectRows(#"Changed Type", each List.Contains(Parameter1,Text.Start([Title],1))=false)
Another way to do this would be to create a custom column in the table, which has the first character of title:
= Table.AddColumn(#"Changed Type", "FirstChar", each Text.Start([Title],1))
and then use this field in the filter step:
= Table.SelectRows(#"Added Custom", each List.Contains(Parameter1,[FirstChar])=false)
I tested this with a small sample set and it seems to be running fine. You can test both and see if it helps with the performance. If you are still facing performance issues, it would probably be easier if you can share the pbix file.

This seems to work fairly well:
= List.Select(Source[Title], each Text.Contains(Parameter1,Text.Start(_,1))=false)
Replace Source with the name of your table and Parameter1 with the name of your Parameter.

Teradata:Get a specific part of a large variable text field

My first Post: (be kind )
PROBLEM: I need to extract the View Name from a Text field that contains a full SQL Statements so I can link the field a different data source. There are two text strings that always exist on both sides of the target view. I was hoping to use these as identifying "anchors" along with a substring to bring in the View Name text from between them.
EXAMPLE:
from v_mktg_dm.**VIEWNAME** as lead_sql
(UPPER CASE/BOLD is what I want to extract)
I tried using
SELECT
SUBSTR(SQL_FIELD,INSTR(SQL_FIELD,'FROM V_MKTG_TRM_DM.',19),20) AS PARSED_FIELD
FROM DATABASE.SQL_STORAGE_DATA
But am not getting good results -
Any help is appreciated

You can apply a Regular Expression:
RegExp_Substr_gpl(SQL_FIELD, '(v_mktg_dm\.)(.*?)( AS lead_sql)',1,1,'i',2)
This looks for the string between 'v_mktg_dm.' and ' AS lead_sql'.
RegExp_Substr_gpl is an undocumented variation of RegExp_Substr which simplifies the syntax for ignoring parts of the match

Postgresql text searching, matching multiple words

I don't know the name for this kind of search, but I see that it's getting pretty common.
Let's say I have records with the following file names:
'order_spec.rb', 'order.sass', 'orders_controller_spec.rb'
If I search with the following string 'oc' I would like the result to return 'orders_controller_spec.rb' due to match the o in orders and the c in controller.
If the string is 'os' then I'd like all 3 to match, 'order_spec.rb', 'order.sass', 'orders_controller_spec.rb'.
If the string is 'oco' then I'd like 'orders_controller_spec.rb'
What is the name for this kind of search and how would I go about getting this done in Postgresql?

This is a called a subsequence search. One simple way to do it in Postgres is to use the LIKE operator (or several of the other options in those docs) and fill the spaces between your letters with a wildcard, which for LIKE is %. To match anything with an o followed by an s in the words column, that would look like this:
SELECT * FROM table WHERE words LIKE '%o%s%';
This is a relatively expensive search, but you can improve performance with a varchar_pattern_ops or text_pattern_ops index to support faster pattern matching.
CREATE INDEX pattern_index ON table (words varchar_pattern_ops);

Neo4j - search like query with non english characters

Is there an option in neo4j to write a select query with where clause, that ignores non-latin characters ?
MATCH (places:Place)
WHERE (places.name =~ '.*(?ui)Fabergé.*')
RETURN places
I have place with Fabergé name in graph and i want to find it when user type Fabergé or Faberge without this special character.

I'm not aware of an easy way to do this directly with a regex match in Cypher.
One possible workaround is to store the string in question in a normalized form in a second property e.g. place.name_normalized and then compare it with the normalized search string. Of course normalization needs to be done on client side, see another SO question on how to achive this: Remove diacritical marks (ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ) from Unicode chars

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

FullTextIndex in NexusDB, How to tokenize the searchstring - delphi

Related

Delphi - How to filter a ADOTable for strings that contain a substring

How do i remove rows based on comma-separated list of values in a Power BI parameter in Power Query?

Teradata:Get a specific part of a large variable text field

Postgresql text searching, matching multiple words

Neo4j - search like query with non english characters

Categories

Resources