How to identify duplicate emails in table with . inside in ruby on rails
E.x . user 1: testaccount#gmail.com,
user 2: test.account#gmail.com,
user 3: tes.t.account#gmail.com,
user 4: test.a.ccount#gmail.com
Gmail refers to same email account for each emails included by '.' . Gmail ignores dots in email username
in postgesql:
select distinct a from (select replace(adr,'.','') as a from t) t2;
gives you unique set
select a,count(*) from (select replace(adr,'.','') as a from t) t2 group by a;
gives you how many times each value has been met
http://sqlfiddle.com/#!15/e893a2/3
To-the-dot duplicated email records can be identified directly in your SQL statement or in your ruby app code.
Here's a simple query to return all normalized email and the number of users associated with each normalized email:
User.group("replace(email,'.','')").count
which is translated to the following SQL:
SELECT COUNT(*) AS count_all, replace(email,'.','') AS replace_email FROM "users" GROUP BY replace(email,'.','')
and returns something like the following hash:
{"x#gmailcom"=>1, "da#gmailcom"=>2}
Indicating there are 2 users with normalized email equals to da#gmailcom.
Alternitevly you can use group_by in ruby code:
User.all.group_by{ |u| u.email.gsub('.','') }
Related
I have a data flow with a Union on two tables then joining the results of the Union to another table. I keep receiving the following error when I try debugging the pipeline or previewing the data.
DF-JOIN-002 at Join 'Join1'(Line 40/Col 26): Only 2 join condition(s) allowed
I'm basically trying to build a pipeline to automate this query:
SELECT DISTINCT k.acct_id, s.Id, Email, FirstName, LastName FROM table_3 s
INNER JOIN
( (SELECT acct_id, event_date FROM table_1)
UNION (SELECT acct_id, event_date FROM table_2)) k
ON k.acct_id = s.Archtics_acct_id__c
WHERE event_date = 'xxxx-xx-xx'
enter image description here
I figured it out after some time. I had to delete the Join activity and add it again. It was still linked to another source. Even though only two sources were selected in the join settings
I know precious nothing abour Rails, so please excuse my naivete about this question.
I'm trying to modify a piece of code that I got from somewhere to make it execute it for a randomly selected bunch of users. Here it goes:
users = RedshiftRecord.connection.execute(<<~SQL
select distinct user_id
from tablename
order by random()
limit 1000
SQL
).to_a
sql = 'select user_id, count(*) from tablename where user_id in (?) group by user_id'
<Library>.on_replica(:something) do
Something::SomethingElse.
connection.
exec_query(sql, users.join(',')).to_h
end
This gives me the following error:
ActiveRecord::StatementInvalid: PG::SyntaxError: ERROR: syntax error at or near ")"
LINE 1: ...ount(*) from tablename where user_id in (?) group by...
^
Users is an array, I know this coz I executed the following and it resulted in true:
p users.instance_of? Array
Would someone please help me execute this code? I want to execute a simple SQL query that would look like this:
select user_id, count(*) from tablename where user_id in (user1,user2,...,user1000) group by user_id
The problem here is that IN takes a list of parameters. Using a single bind IN (?) and a comma separated string will not magically turn it into a list of arguments. Thats just not how SQL works.
What you want is:
where user_id in (?, ?, ?, ...)
Where the number of binds matches the length of the array you want to pass.
The simple but hacky way to do this would be just interpolate in n number of question marks into the SQL string:
binds = Array.new(users.length, '?').join(',')
sql = <<~SQL
select user_id, count(*)
from tablename
where user_id in (#{binds)})
group by user_id'
SQL
<Library>.on_replica(:something) do
Something::SomethingElse.
connection.
exec_query(sql, users).to_h
end
But you would typically do this in a Rails app by creating a model and using the ActiveRecord query interface or using Arel to programatically create the SQL query.
I want to get Users' info with their last dates of login.
select * from Users as UU
inner join
(select user_id, max(d_login) from Logins group by user_id) as LL
on UU.user_id = LL.user_id
Join doesn't work in VFP. We can't join table to query here?
VFP need to use ALLTRIM() in join, otherwise comparing 2 values could result undesirable data. your code should be like this
select * from Users as UU
inner join
(select user_id, max(d_login) from Logins group by
user_id) as LL
on ALLT(UU.user_id) == ALLT(LL.user_id)
Assuming you have normalised User_ID as primary key in both tables i.e. Users and Logins and have same Field Type and Field Length. I created the tables users and Logins and tested this query with a sample data and it worked smoothly and gave exact results.
select
Users.User_id,
Users.Cell1,
Users.Name,
max(d_login)
from Logins, users
WHERE users.user_id = logins.user_id
GROUP BY
Users.user_id, Users.Cell1, Users.Name, Logins
I have the following statement:
Customer.where(city_id: cities)
which results in the following SQL statement:
SELECT customers.* FROM customers WHERE customers.city_id IN (SELECT cities.id FROM cities...
Is this intended behavior? Is it documented somewhere? I will not use the Rails code above and use one of the followings instead:
Customer.where(city_id: cities.pluck(:id))
or
Customer.where(city: cities)
which results in the exact same SQL statement.
The AREL querying library allows you to pass in ActiveRecord objects as a short-cut. It'll then pass their primary key attributes into the SQL it uses to contact the database.
When looking for multiple objects, the AREL library will attempt to find the information in as few database round-trips as possible. It does this by holding the query you're making as a set of conditions, until it's time to retrieve the objects.
This way would be inefficient:
users = User.where(age: 30).all
# ^^^ get all these users from the database
memberships = Membership.where(user_id: users)
# ^^^^^ This will pass in each of the ids as a condition
Basically, this way would issue two SQL statements:
select * from users where age = 30;
select * from memberships where user_id in (1, 2, 3);
Each of these involves a call on a network port between applications and the data to then be passsed back across that same port.
This would be more efficient:
users = User.where(age: 30)
# This is still a query object, it hasn't asked the database for the users yet.
memberships = Membership.where(user_id: users)
# Note: this line is the same, but users is an AREL query, not an array of users
It will instead build a single, nested query so it only has to make a round-trip to the database once.
select * from memberships
where user_id in (
select id from users where age = 30
);
So, yes, it's expected behaviour. It's a bit of Rails magic, it's designed to improve your application's performance without you having to know about how it works.
There's also some cool optimisations, like if you call first or last instead of all, it will only retrieve one record.
User.where(name: 'bob').all
# SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob'
User.where(name: 'bob').first
# SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob' AND ROWNUM <= 1
Or if you set an order, and call last, it will reverse the order then only grab the last one in the list (instead of grabbing all the records and only giving you the last one).
User.where(name: 'bob').order(:login).first
# SELECT * FROM (SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob' ORDER BY login) WHERE ROWNUM <= 1
User.where(name: 'bob').order(:login).first
# SELECT * FROM (SELECT "USERS".* FROM "USERS" WHERE "USERS"."NAME" = 'bob' ORDER BY login DESC) WHERE ROWNUM <= 1
# Notice, login DESC
Why does it work?
Something deep in the ActiveRecord query builder is smart enough to see that if you pass an array or a query/criteria, it needs to build an IN clause.
Is this documented anywhere?
Yes, http://guides.rubyonrails.org/active_record_querying.html#hash-conditions
2.3.3 Subset conditions
If you want to find records using the IN expression you can pass an array to the conditions hash:
Client.where(orders_count: [1,3,5])
This code will generate SQL like this:
SELECT * FROM clients WHERE (clients.orders_count IN (1,3,5))
in the product i'm developing, i have a Message model.
Message can be restricted to groups, or not restricted (available to
everyone).
If user belongs to one of Message's groups OR message is not
restricted, user can see the message.
here is the query selecting visible messages (in hope that it can
clarify what i mean)
(2,3,4,5,6,1) are the groups user belongs to, they are different for
each user
SELECT `messages`.* FROM `messages`
LEFT JOIN groups_messages ON
messages.id=groups_messages.message_id AND groups_messages.group_id in (2,3,4,5,6,1)
WHERE (messages.restricted=0 OR groups_messages.group_id is not NULL)
GROUP BY messages.id
here is analogical query using a subquery, in hope it helps to clarify what is needed
SELECT * FROM `messages` WHERE
(
restricted=0 OR id in ( select distinct message_id from groups_messages where group_id in (2,3,4,5,6,1) )
)
is it possible somehow to apply this visibility setting to thinking
sphinx results? meaning to apply this OR and IN to
Message.search "test" with/with_all
?
if it is not possible, another question would be - is it somehow
possible to get ids of all objects found in search,
so that i could perform query myself, just adding AND to my WHERE
condition
SELECT * FROM `messages` WHERE
(
restricted=0 OR id in ( select distinct message_id from groups_messages where group_id in (2,3,4,5,6,1) )
)
AND id in (ids_of_the_messages_found_by_thinking_sphinx)
i imagine both the query without LEFT JOIN and adding AND to WHERE
will be a bit resource intensive for mysql, but if other solutions are
not possible, then this would do
thanks,
Pavel K
received a response from Pat Allan, developer of Thinking Sphinx,
link text
I think the best way is to build a string that includes 0 if the
message is unrestricted, otherwise returns the group ids, concatenated
together with commas... ie:
"2,3,4,5,6" or "0"
So, you'll want to build a SQL snippet for an attribute, something
vaguely like:
has "IF(messages.restricted = 0, '0', GROUP_CONCAT (groups_messages.group_id SEPARATOR ','))", :as => :group_ids, :type => :multi
And then for searching:
Message.search "foo", :with => {
:group_ids => [0] + current_user.message_group_ids
}
The SQL snippet will have to be different if you're using PostgreSQL, though... let me know if that's the case.
will try that