Unique query based off another column - ruby-on-rails

So, I'd like to get a bunch of ids based on uniqness of another column.
[1,'A']
[2,'B']
[3,'C']
[4,'A']
I'd like to return [1,2,3] (losing the 4th because 'A' is no longer unique). I've tried .select('DISTINCT letter, id') and some grouping, but I can't seem get it.
My database is PostgreSQL. SQL or ActiveRecord is fine.

Probably not the best solution, but it works with Postgres.
Model.select("min(id) as id, letter").group(:letter).map(&:id)
EDIT:
Actually, I found an even better (read: shorter) solution.
Model.group(:letter).pluck("min(id)")
ANOTHER EDIT:
I've found a solution that works without aggregation (min or max).
Model.select("distinct on (letter) letter, id").map(&:id)

Related

How do you group by years from 'created_at', using ActiveRecord?

In SQL, I'd use something like this:
SELECT year(created_at)
FROM videos
GROUP BY year(created_at)
How can I run this query with Rails?
I've tried this:
Video.all.group_by { |m| m.created_at.year }
... but this will return ALL my records grouped by the corresponding years, which is not entirely bad, except it's returning hundreds of records, when I only need to know the years.
Any pointers?
In case you're wondering I'm using Rails 5.1.5 and MariaDB (10.2.14-MariaDB-10.2.14+maria~xenial-log).
You could do something like this to get an array of unique years...
Video.pluck(:created_at).map{ |dt| dt.year }.uniq
Turns out, pluck accepts expressions, not only column names. So you can do this for max efficiency (at the cost of portability).
Video.distinct.pluck("year(created_at)") # mysql
Video.distinct.pluck("extract(year from created_at)") # postgresql
Same efficiency as running raw sql (from my other answer), but looks a bit more ActiveRecord-y.
If you want a portable solution, then I can't think of anything better than Mark's answer.
Not everything can be expressed in activerecord-speak without losing efficiency here or there. Or some things can't be expressed at all. That's why you can run arbitrary SQL:
Video.connection.execute("select year(created_at)....")
Having actually read your SQL, you want distinct years of the records? So, perhaps this? Should be even faster (benchmarking needed).
SELECT DISTINCT YEAR(created_at) FROM videos
You can also do:
Video.all.map {|u| u.created_at.year}
This would return a array containing the years in which each Video was created.
I'm using
Event.group("strftime('%Y', created_at)")
For example:
Event.group("strftime('%Y', created_at)").count
has the output:
{"2019"=>456}
You can even group by a child's created_at this way:
Application.group("strftime('%Y', events.created_at)")
This would require, that an applications has many events in my particular case.
If you want only years from data you can do
videos = Video.group("created_at").select("videos.created_at")
it will fetch only created_at column
and you can access year from created_at column by
videos.first.created_at.year

How do I join multiple hive queries?

I am trying to join a simple query with a very ugly query that resolves to a single line. They have a date and a userid in common but nothing else. Alone both queries work but for the life of me I cannot get them to work together. Can someone assist me in how I would do this?
Fixed it...when you union queries in hive it looks like you need to have an equal number of fields coming back from each.

RoR/Squeel - How do I use Squeel::Nodes::Join/Predicates?

I just recently inherited a project where the previous developer used Squeel.
I've been studying Squeel for the past day now and know a bit about how to use it from what I could find online. The basic use of it is simple enough.
What I haven't been able to find online (except for on ruby-doc.org, which didn't give me much), is how to use Squeel::Nodes::Join and Squeel::Nodes::Predicate.
The only thing I've been able to find out is that they are nodes representing join associations / predicate expressions, which I had figured as much. What I still don't know is how to use them.
Can someone help me out or point me toward a good tutorial/guide?
I might as well answer this since I was able to figure out quite a bit through trial and error and by using ruby-doc as a guide. Everything I say here is not a final definition to each of these. It's just what I know that may be able to help someone out in the future in case anyone else is stuck making dynamic queries with Squeel.
Squeel::Nodes::Stub
Let's actually start with Squeel::Nodes::Stub. This is a Squeel object that can take either a symbol or a string and can convert it into the name of a table or column. So you can create a new Squeel::Nodes::Stube.new("value") or Squeel::Nodes::Stube.new(:value) and use this stub in other Squeel nodes. You'll see examples of it being used below.
Squeel::Nodes::Join
Squeel::Nodes::Join acts just like you might suspect. It is essentially a variable you can pass in to a Squeel joins{} that will then perform the join you want. You give it a stub (with a table name), and you can also give it another variable to change the type of join (I only know how to change it to outer join at the moment). You create one like so:
Squeel::Nodes::Join.new(Squeel::Nodes::Stub.new(:custom_fields), Arel::OuterJoin)
The stub is used to let the Join know we want to join the custom_fields table, and the Arel::OuterJoin is just to let the Join know we want to do an outer join. Again, you don't have to put a second parameter into Squeel::Nodes::Join.new(), and I think it will default to performing an inner join. You can then join this to a model:
Person.joins{Squeel::Nodes::Join.new(Squeel::Nodes::Stub.new(:custom_fields), Arel::OuterJoin)}
Squeel::Nodes::Predicate
Squeel::Nodes::Predicate may seem pretty obvious at this point. It's just a comparison. You give it a stub (with a column name), a method of comparison (you can find them all in the Predicates section on Squeel's github) and a value to compare with, like so:
Squeel::Nodes::Predicate.new(Squeel::Nodes::Stub(:value), :eq, 5)
You can even AND or OR two of them together pretty easily.
AND: Squeel::Nodes::Predicate.new(Squeel::Nodes::Stub(:value1), :eq, 5) & Squeel::Nodes::Predicate.new(Squeel::Nodes::Stub(:value2), :eq, 10)
OR: Squeel::Nodes::Predicate.new(Squeel::Nodes::Stub(:value1), :eq, 5) | Squeel::Nodes::Predicate.new(Squeel::Nodes::Stub(:value2), :eq, 10)
These will return either a Squeel::Nodes::And or a Squeel::Nodes::Or with the nested Squeel::Nodes::Predicates.
Then you can put it all together like this (of course you'd probably have the joins in a variable, a, and the predicates in a variable b, because you are doing this dynamically, otherwise you should probably be using regular Squeel instead of Squeel nodes):
Person.joins{Squeel::Nodes::Join.new(Squeel::Nodes::Stub.new(:custom_fields),
Arel::OuterJoin)}.where{Squeel::Nodes::Predicate.new(Squeel::Nodes::Stub(:value1), :eq, 5) | Squeel::Nodes::Predicate.new(Squeel::Nodes::Stub(:value2), :eq, 10)}
I unfortunately could not figure out how to do subqueries though :(

Sphinx: Can one update the limit for SQL query size used for indexing?

I seem to have hit a certain Sphinx head case. I'm indexing a certain table, which will produce ≈ 140 indexed fields per record (trust me, they are all important). For 27 * 3 of them, the sub-query which produces it is in itself already quite big. This results in a huge massive query being generated to my development.sphinx.conf (17 lines). Which produces results, I've tested it directly in the db. But which can't index. It complains
"ERROR: index 'vendor_song_core': sql_query_range: : macro '$start' not found in match fetch query."
, but what this really means is that the deamon is not loading the full query. Apparently it is too long for it. Is my assumption right? And if so, can I work around it (like, a magical max_query_length field I can update somewhere)?
Answer copied from the Sphinx forum...
http://sphinxsearch.com/forum/view.html?id=10403
Move the 'long' query definition into a mysql VIEW.
Then the sql_query can be really short :)
I.e. the view itself, contains all the column names, the sql_query can just use "SELECT *
FROM". Similly if joining lots of tables - that can all move into the view.
It seems there is no real way of doing this. Sphinx defines the limit for the query size directly in its source code, so the only way of doing this is either by editing its source code and compile it locally, or do as barryhunter stated, as long as it is possible for you to define such a view. More details about this issue can be addressed on the link provided by barryhunter.

Do I have to use UNION insted of JOIN?

An article about Optimizing your SQL queries has suggested to use Union insted of OR `cause:
Utilize Union instead of OR
Indexes lose their speed advantage when using them in OR-situations in
MySQL at least. Hence, this will not be useful although indexes is
being applied 1 SELECT * FROM TABLE WHERE COLUMN_A = 'value' OR
COLUMN_B = 'value'
On the other hand, using Union such as this will utilize Indexes.
1- SELECT * FROM TABLE WHERE COLUMN_A = 'value'
2- UNION
3- SELECT * FROM
TABLE WHERE COLUMN_B = 'value'
How much this suggestion is true? Should I turn my OR queries to Union?
I do not recommend this as a general rule. I don't have MySQL installed here so can't check the generated execution plan, but certainly in Oracle the plans are much different and the 'OR' plan is more efficient than the 'UNION' plan. (Basically, the 'UNION' has to perform two SELECT's and then merge the results - the 'OR' only has to do a single SELECT). Even a 'UNION ALL' plan has a higher cost than the 'OR' plan. Stick with the 'OR'.
I very strongly recommend writing the clearest, simplest code you possibly can. To me, that means you should use the 'OR' rather than the 'UNION'. I've found over the years that attempting to "pre-optimize" something, or in other words to guess where I'll encounter performance problems and then to try and code around those problems, is a waste of time. Code tends to spend a lot of time in the darndest places, and you'll only find them once you've got your code running. A long time ago I learned three rules that I've found to be useful in this context:
Make if run.
Make it run right.
Make it run right fast.
Optimization is literally the last thing you should be doing.
Share and enjoy.
Followup: I hadn't noticed that the 'OR' was looking at different columns (my bad), but my advice regarding "keep it simple" still holds.
It helps to think of indexes like names in a phone book. A phone book, you could say, has a naturally ordered index by name, meaning, if you want to find all names John Smith, it would take you little to no time to find it. You'd simply open the phone book to the S section and begin looking up Smith.
Now what if I told you to look for entries in the phone book with name John Smith or phone number 863-2253. Not as quick to do, eh? To provide a precise answer, you'd need a phone book to look up John Smith and another one sorted by phone numbers in order to find a name by his or her phone number.
Perhaps a more sophisticated engine could see the need for this separation and do it automatically, but apparently MySQL does not. So while it might seem a hassle to have to do it this way, I assure you the difference in tables with high record counts is noticeable.

Resources