Word separators for Postgres full text search with Rails - ruby-on-rails

I'm using pg_search for some text searching within my model. Among other attributes, I have an url field.
Unfortuantelly Postgres doesn't seem to identify / and . as word separators, therefore I cannot search within the url.
Example: searching for test in http://test.com yields no results.
Is there a way to fix this problem, perhaps using another gem or some inline SQL ?

As stated in the documentation (and noticed by AJcodez), there is a solution in creating a dedicated column for tsvector index. Then define a trigger that catches insertions to index urls properly:
CREATE test_url (url varchar NOT NULL, url_tsvector tsvector NOT NULL);
This method will transorm any non alpha characters into single space and turn the string into a tsvector:
CREATE OR REPLACE FUNCTION generate_url_tsvector(varchar)
RETURNS tsvector
LANGUAGE sql
AS $_$
SELECT to_tsvector(regexp_replace($1, '[^\w]+', ' ', 'gi'));
$_$;
Now create a trigger that calls this function:
CREATE OR REPLACE FUNCTION before_insert_test_url()
RETURNS TRIGGER
LANGUAGE plpgsql AS $_$
BEGIN;
NEW.url_tsvector := generate_url_tsvector(NEW.url);
RETURN NEW;
END;
$_$
;
CREATE TRIGGER before_insert_test_url_trig
BEFORE INSERT ON test_url
FOR EACH ROW EXECUTE PROCEDURE before_insert_test_url();
Now, when url are inserted, the `url_tsvectorè field will be automatically populated.
INSERT INTO test_url (url) VALUES ('http://www.google.fr');
TABLE test_url;
id url url_tsvector
2 http://www.google.fr 'fr':4 'googl':3 'http':1 'www':2
(1 row)
To FT search on URLs you only need to query against this field.
SELECT * FROM test_url WHERE url_tsvector ## 'google'::tsquery;

I ended up modifying the pg_search gem to support arbitrary ts_vector expressions instead of just column names.
The changes are here
Now I can write:
pg_search_scope :search,
against: [[:title , 'B'], ["to_tsvector(regexp_replace(url, '[^\\w]+', ' ', 'gi'))", 'A']],
using: {tsearch: {dictionary: "simple"}}

Slightly simpler approach, add the protocol token type to the simple dictionary:
ALTER TEXT SEARCH CONFIGURATION simple
ADD MAPPING FOR protocol
WITH simple;
you can also add it to the english dictionary if you need stemming
https://www.postgresql.org/docs/13/textsearch-parsers.html
https://www.postgresql.org/docs/13/sql-altertsconfig.html

Related

How to convert a ruby text paragraph to string?

My application allows the administrator to define SQL templates scripts, that users can apply to data structures. The templates are stored within a database textfield, and will be merged with the data structure metadata.
A template statement looks like the following:
CREATE TABLE #{#target.code} (
#{#target.getColList('ORACLE', 'typed')}
)
and I plan to evaluate it to get the desired SQL query:
CREATE TABLE DSD_TEST (
AHVN13_1 number,
ALTER_MT_ERFUELLT_1 number,
CANTONS_2_1 varchar2(32),
POPULATION_TYPE_1 varchar2(32)
)
but actually the template statement (of Postgres text field) returns the following:
"CREATE TABLE \#{#target.code} ( \r\n\#{#target.getColList('ORACLE', 'typed')} \r\n)"
I found out that I can get rid of the new line using task.statement.gsub("\r\n", ''), but I cannot get rid of the special character '\ '.
How can I build this template so it can be evaluated?

Avoiding quotes in a map variable

I'm trying to create a query with variables from a map function, but the content stored in one of these fields does contain ' (quotes, like Barney's). So everytime it cracks since the ' will break the statement. How can I get around it?
I tried to use the .split function but no sucess.
No worries about SQL Injection since I'm just loading data from an API to my db.
Code:
query_values = activities.map do |activity|
'(' +
"#{activity['id']},
""'#{activity['type']}""'" #using ""' just to fill the column when empty cells are raised
+')'
end
query = "INSERT INTO pd_activities VALUES #{query_values.join(', ')}"
Thanks in advance.
How to do this properly is listed on the cheat sheet:
db[:pd_activities].insert(
id: activity['id'],
type: activity['type']
)
This takes care of all the escaping issues for you. If all activity has is those two keys you might even be able to do this:
db[:pd_activities].insert(activity)

How to get all parameter names along with their values in stored procedure which is being executed

I'm using SQL Server 2012, is there any possible way to get all the parameters of a stored procedure along with the values passed to it?
I need these things to build a xml. I mean this should happen in the procedure which being executed and it should be common for all the procedures.
For example, let us suppose we have to procedures,
uspSave, #name='test' #age=20
uspDelete #id=2
now in uspSave procedure, I need to get #name, #age and the values 'test', 20
and in uspDelete, I should get #id with value 2.
For getting the column names, I tried this,
select parameter_name
from information_schema.PARAMETERS
where specific_name = OBJECT_NAME(##procid)
now is it possible to loop through the result of above query and can we get the values.
I think your best bet would be to use some code generation to generate the code block you require.
i.e.
Create your sproc without the code to XML-ify the parameters
Knock up a quick script (could be done in TSQL) to then construct the sproc-specific block of TSQL to convert the parameters into XML, using INFORMATION_SCHEMA.PARAMETERS
Copy that bit of auto-generated script into your sproc
The way you were thinking with dynamic sql wouldn't work because of the scope - the parameters would not be accessible within that dynamically generated SQL, you'd need to pass them in as args via sp_executesql, which puts you back in square 1.
e.g.
DECLARE #someval int = 7
EXECUTE('SELECT #someval') -- #someval is not in scope
So, if it will help save time, then code gen looks like your best bet.

Thinking sphinx word match

The data what i have is
kiran#test.com - first record
kiran1#test.com - second record
I need to search using the email address. I have forums and users indexed in my web app.
First scenario
I kept the '#' symbol in the charset table which is working fine problem is for example if the search keyword as 'kiran#test.com' it is giving me the exact result but if i user only 'test' no results found.
Second scenario
If i won't keep '#' symbol in the charset table. If the i use 'kiran#test.com' i am getting both the results and for 'test' also i am getting both the results
Expected Scenario
If i use the entire email 'kiran#test.com' - I need to get only first record
If i use only 'test' - I need to get both the records
In plain mysql something like "select users where email like '%search-key%'"
I use the following code for searching
ThinkingSphinx.search params[:search_key],:star => Regexp.new('\w+#*\w+', nil, 'u') (I don't want to treat '#' as the separator)
Please suggest me any options i can pass to achieve the expected result.
Thanks
Kiran
Take a look at blended char support
http://sphinxsearch.com/docs/current.html#conf-blend-chars
Or if you really what [ email like '%search-key%'" ] style support maybe min_infix_len. (leaving . and # in charset table)
To search for full email you could use Phrase Search operator.
http://sphinxsearch.com/docs/current.html#extended-syntax
So, if you could determine search query as email use "phrase search" otherwise use general search.

COPY CSV file with an ID column

I have a rails application and I am trying to load data into it via PostgreSQL's COPY command. The file is CSV. The table maps to a Rails model and I need to retain the primary-key ID column so I can use model.for_each to play with the data--the table will have 5M+ rows.
My question is, how can I structure my CSV so that I can load data into the table and still allow the ID column to be there? It's a bummer because the only reason I need it is for the for_each method.
Regardless, I tried sending NULLs, as in:
NULL,col1,col2,col3,col4, etc.
But it doesn't work.
Thanks.
Passing in null for the primary key will never work no matter what options you have set for null string. You would be telling the backend to set the primary key to null which it will never allow no matter what the command to insert might be.
I really have no idea what you mean by retaining the primary key. That is something that is going be retained no matter what do. If you mean letting the DB pick the value for you and the primary key is a serial (auto-increment) then explicitly name all the columns but the primary key:
COPY country (colA, colB, colC) FROM '/usr1/proj/bray/sql/country_data'; -- leave out pkey
It also might be quicker to read the documentation on what null string options you would like to use instead of guessing possible values:
http://www.postgresql.org/docs/9.0/static/sql-copy.html
The default when using WITH CSV is an unquoted empty string, such as:
,col1,col2,col3,,col5
Which would create a record that looks like:
NULL,'col1','col2','col3',NULL,'col5'

Resources