select distinct records based on one field while keeping other fields intact - ruby-on-rails

I've got a table like this:
table: searches
+------------------------------+
| id | address | date |
+------------------------------+
| 1 | 123 foo st | 03/01/13 |
| 2 | 123 foo st | 03/02/13 |
| 3 | 456 foo st | 03/02/13 |
| 4 | 567 foo st | 03/01/13 |
| 5 | 456 foo st | 03/01/13 |
| 6 | 567 foo st | 03/01/13 |
+------------------------------+
And want a result set like this:
+------------------------------+
| id | address | date |
+------------------------------+
| 2 | 123 foo st | 03/02/13 |
| 3 | 456 foo st | 03/02/13 |
| 4 | 567 foo st | 03/01/13 |
+------------------------------+
But ActiveRecord seems unable to achieve this result. Here's what I'm trying:
Model has a 'most_recent' scope: scope :most_recent, order('date_searched DESC')
Model.most_recent.uniq returns the full set (SELECT DISTINCT "searches".* FROM "searches" ORDER BY date DESC) -- obviously the query is not going to do what I want, but neither is selecting only one column. I need all columns, but only rows where the address is unique in the result set.
I could do something like Model.select('distinct(address), date, id'), but that feels...wrong.

You could do a
select max(id), address, max(date) as latest
from searches
group by address
order by latest desc
According to sqlfiddle that does exactly what I think you want.
It's not quite the same as your requirement output, which doesn't seem to care about which ID is returned. Still, the query needs to specify something, which is here done by the "max" aggregate function.
I don't think you'll have any luck with ActiveRecord's autogenerated query methods for this case. So just add your own query method using that SQL to your model class. It's completely standard SQL that'll also run on basically any other RDBMS.
Edit: One big weakness of the query is that it doesn't necessarily return actual records. If the highest ID for a given address doesn't corellate with the highest date for that address, the resulting "record" will be different from the one actually stored in the DB. Depending on the use case that might matter or not. For Mysql simply changing max(id) to id would fix that problem, but IIRC Oracle has a problem with that.

To show unique addresses:
Searches.group(:address)
Then you can select columns if you want:
Searches.group(:address).select('id,date')

Related

With a composite index, what column order do ActiveRecord queries use to decide which composite index to search?

Rails v. 5.2.4
ActiveRecord v5.2.4.3
I have a Rails app with a MySQL database, and my app has a Skill model and a SkillAdjacency model. The SkillAdjacency model has the following attributes:
requested_skill_id, table_name: 'Skill'
adjacent_skill_id, table_name: 'Skill'
score, integer
SkillAdjacencies are used to determine how "similar" two instances of Skill are to each other.
One of the app's constraints is that you can't create more than one instance of SkillAdjacency for each combination of requested_skill and adjacent_skill, and I plan to enforce this both with ActiveModel validations and with a composite index which employs a uniqueness constraint. So far I have the following:
add_index :skill_adjacencies, [:requested_skill_id, :adjacent_skill_id], unique: true, name: 'index_adjacencies_on_requested_then_adjacent', using: :btree
However, I know that the order in which the composite columns are declared is important, so I'm considering adding this 2nd composite index to account for the other possible order:
add_index :skill_adjacencies, [:adjacent_skill_id, :requested_skill_id], unique: true, name: 'index_adjacencies_on_adjacent_then_requested', using: :btree
But because writing to an index isn't free, I only want to add the 2nd index if it will actually result in a performance benefit. The problem is, whether or not this 2nd index will be beneficial depends on whether ActiveRecord will start with adjacent_skill_id vs. requested_skill_id when searching for a composite index to search.
How can I determine what order ActiveRecord uses? Does it just use the same order that's specified in the query? For example, if I query SkillAdjacency.where(requested_skill: Skill.last, adjacent_skill: Skill.first), will it always search for a composite index composed of requested_skill 1st and adjacent_skill 2nd? If that's the case, should I cover all my bases by creating that additional composite index?
Alternately, is there some under-the-hood magic which determines if the relevant composite index exists regardless of the order provided in the query?
EDIT:
I ran EXPLAIN and saw the following:
irb(main):013:0> SkillAdjacency.where(requested_skill_id: 1, adjacent_skill_id: 200).explain
SkillAdjacency Load (0.3ms) SELECT `skill_adjacencies`.* FROM `skill_adjacencies` WHERE `skill_adjacencies`.`requested_skill_id` = 1 AND `skill_adjacencies`.`adjacent_skill_id` = 200
=> EXPLAIN for: SELECT `skill_adjacencies`.* FROM `skill_adjacencies` WHERE `skill_adjacencies`.`requested_skill_id` = 1 AND `skill_adjacencies`.`adjacent_skill_id` = 200
+----+-------------+-------------------+------------+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------+---------+-------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------+------------+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------+---------+-------------+------+----------+-------+
| 1 | SIMPLE | skill_adjacencies | NULL | const | index_adjacencies_on_requested_then_adjacent,index_adjacencies_on_adjacent_then_requested,index_skill_adjacencies_on_requested_skill_id,index_skill_adjacencies_on_adjacent_skill_id | index_adjacencies_on_requested_then_adjacent | 10 | const,const | 1 | 100.0 | NULL |
+----+-------------+-------------------+------------+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------+---------+-------------+------+----------+-------+
1 row in set (0.00 sec)
irb(main):014:0> SkillAdjacency.where(adjacent_skill_id: 200, requested_skill: 1).explain
SkillAdjacency Load (0.3ms) SELECT `skill_adjacencies`.* FROM `skill_adjacencies` WHERE `skill_adjacencies`.`adjacent_skill_id` = 200 AND `skill_adjacencies`.`requested_skill_id` = 1
=> EXPLAIN for: SELECT `skill_adjacencies`.* FROM `skill_adjacencies` WHERE `skill_adjacencies`.`adjacent_skill_id` = 200 AND `skill_adjacencies`.`requested_skill_id` = 1
+----+-------------+-------------------+------------+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------+---------+-------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------------------+------------+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------+---------+-------------+------+----------+-------+
| 1 | SIMPLE | skill_adjacencies | NULL | const | index_adjacencies_on_requested_then_adjacent,index_adjacencies_on_adjacent_then_requested,index_skill_adjacencies_on_requested_skill_id,index_skill_adjacencies_on_adjacent_skill_id | index_adjacencies_on_requested_then_adjacent | 10 | const,const | 1 | 100.0 | NULL |
+----+-------------+-------------------+------------+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------+---------+-------------+------+----------+-------+
1 row in set (0.00 sec)
In both cases, I see that the value in the key column is index_adjacencies_on_requested_then_adjacent, despite each query passing in a different order for the query params. Can I assume this means the order of those params doesn't matter?

What is the best way to attach a running total to selected row data?

I have a table that looks like this:
Created at | Amount | Register Name
--------------+---------+-----------------
01/01/2019... | -150.01 | Front
01/01/2019... | 38.10 | Back
What is the best way to attach an ascending-by-date running total to each record which applies only to the register name the record has? I can do this in Ruby, but doing it in the database will be much faster as it is a web application.
The application is a Rails application running Postgres 10, although the answer can be Rails-agnostic of course.
Use the aggregate sum() as a window function, e.g.:
with my_table (created_at, amount, register_name) as (
values
('2019-01-01', -150.01, 'Front'),
('2019-01-01', 38.10, 'Back'),
('2019-01-02', -150.01, 'Front'),
('2019-01-02', 38.10, 'Back')
)
select
created_at, amount, register_name,
sum(amount) over (partition by register_name order by created_at)
from my_table
order by created_at, register_name;
created_at | amount | register_name | sum
------------+---------+---------------+---------
2019-01-01 | 38.10 | Back | 38.10
2019-01-01 | -150.01 | Front | -150.01
2019-01-02 | 38.10 | Back | 76.20
2019-01-02 | -150.01 | Front | -300.02
(4 rows)

Order by A then B using Ruby on Rails Model

This is not a homework question. I am trying to learn more.
I have the following entities with attributes
Manufacturer {name} //Store Manufactueres
Model {manufacturer_id, name} //Store Models
Tint {manufacturer_id, model_id, front, side, rear} //Store measurements
I have the follow data in my Tint entity. Alphabets stands for different manufacturer name and models.
Manufacturer | Model | Front | Side | Rear |
-------------+-------+-------+------+-------
A | AD | 10 | 10 | 10 |
B | AB | 10 | 10 | 10 |
A | AA | 10 | 10 | 10 |
A | AC | 10 | 10 | 10 |
B | AA | 10 | 10 | 10 |
A | AB | 10 | 10 | 10 |
When I print it out in view, I would like to have it sorted based on Manufacturer name and then Model. So the result will be as below. The name of the Manufactures will be sorted alphabetically, then Models.
Manufacturer | Model | Front | Side | Rear |
-------------+-------+-------+------+-------
A | AA | 10 | 10 | 10 |
A | AB | 10 | 10 | 10 |
A | AC | 10 | 10 | 10 |
A | AD | 10 | 10 | 10 |
B | AA | 10 | 10 | 10 |
B | AB | 10 | 10 | 10 |
I have setup the model to make sure Manufacturer and Model is a distinct pair of values.
My question is since I am referencing using manufacturer_id and model_id, how can I get the name of the Manufacturer and Model from Manufacturer and Model table.
In my tints_controller.rb, I have #tints = Tint.all.order(:manufacturer_id). However, it will only sort based on the manufacturer_id (as in numbers) instead of the name of the manufacturer.
I know that I can do it in SQL way (SELECT, FROM, WHERE) in RoR model. However, I would like to know is it possible to use ActiveRecord to sort the data based on their name.
If I understand correctly, you have 3 models, Tint, Manufacturer and Model. I am assuming you have the appropiate has_many and belongs_to associations setup correctly.
Tint.rb
belongs_to :workspace
Manufacturer.rb
has_many :models
has_many :tints, through: :models
Model.rb:
belongs_to Manufacturer
has_many :tints
You need to first join the three models together, and then order by some criteria
tints_controller.rb
#tints = Tint.joins(model: :manufacturer).order('manufacturers.name, models.name').pluck('manufacturers.name, models.name, tints.front, tints.side, tints.rear')
That will give you all tints records and they appropiate models and manufacturers.
Any time you have the id of an entity in Rails, you can easily retrieve other associated fields simply by instantiating that entity:
#manufacturer = Manufacturer.find(params[manufacturer_id])
Then it's a simple matter to retrieve any of the other fields:
#manufacturer_name = #manufacturer.name
If you need a collection of manufacturers or manufacturer names, then it's advisable to build yourself an ActiveRecord::Relation object immediately via a scoped query (as you already know). I have no idea what your criteria are, otherwise, I'd supply some sample code. I can tell you that your scoped query should include an .order clause at the end:
#manufacturers = Manufacturer.where("some_column = ?", some_criterion).order(:sort_field)
In the above example, :sort_field would be the field by you want to sort your ActiveRecord::Relation. I'm guessing in your case, it's :name.
All this having been said, if you want fancy sorted tables, you should look into the JQuery DataTables gem. DataTables can do a lot of the heavy lifting for you, and it's convenient for your users because they can then sort and resort by any column you present.
In your tints_controller.rb, instedad of
#tints = Tint.all.order(:manufacturer_id)
please write:
#tints = Tint.all.order(:manufacturer_id, :model_id)
Answer to my question:
In tints_controller.rb, I wrote
#tints = Tint.joins(:manufacturer, :model).order("manufacturers.name ASC, models.name ASC") to join the table and order them accordingly.
I tried the answer provided by #Goston above and I had an issue when I was trying edit the tints. It did not allow me to edit.
Note: Answer provided by #Goston will order them, but it broke the edit function for my case.

Thinking Sphinx group by, with distinct count

I have the following manual Sphinx query (via the mySQL client), that is producing proper results, and I would like to call it through Thinking Sphinx from Rails. For the life of me, I am struggling with how to make a 'distinct' query work in Thinking Sphinx.
mysql> select merchant_name, count (distinct part_number) from product_core group by merchant_name;
+-----------------------+-----------------------------------------+
| merchant_name | count (distinct part_number) |
+-----------------------+-----------------------------------------+
| 1962041491 | 1 |
| 3208850848 | 1 |
| 1043652526 | 48754 |
| 770188128 | 1 |
| 374573991 | 34113 |
+-----------------------+-----------------------------------------+
Please note: This mySQL query is agaist Sphinx, NOT mySQL. I use the mySQL client to connect to Sphinx, as: mysql -h 127.0.0.1 -P 9306. This works well for debugging/development. My actual db, is Postgres.
Given this, and to add more context, I am attempting to combine a group_by in thinking Sphinx, with a count('Distinct' ...).
So, this query works:
Product.search group_by: :merchant_name
... and, this query works:
Product.count ('DISTINCT part_number')
... but, this combined query throws an error:
Product.search group_by: :merchant_name, count ('DISTINCT part_number')
SyntaxError: (irb):90: syntax error, unexpected ( arg, expecting keyword_do or '{' or '('
...merchant_name, count ('DISTINCT part_num...
Both merchant_name and part_number are defined as attributes.
Environment:
Sphinx 2.2.10-id64-release (2c212e0)
thinking-sphinx 3.1.4
rails 4.2.4
postgres (PostgreSQL) 9.3.4
I have also tried using Facets, but to no avail:
Product.search group_by: :merchant_name, facets: :part_number
Product.facets :part_number, group_by: :merchant_name
For additional information, and to see if this could be accomplished through a Thinking Sphinx call, here is a basic example. I have one product table (and associated index), that lists both merchants, and their products (I agree, it could be normalized, but its coming in from a data feed, and Sphinx can handle it as is):
+-----------------+-------------------+
| merchant | product |
+-----------------+-------------------+
| Best Buy | Android phone |
| Best Buy | Android phone |
| Best Buy | Android phone |
| Best Buy | iPhone |
| Amazon | Android phone |
| Amazon | iPhone |
| Amazon | iPhone |
| Amazon | iPhone |
| Amazon | Onkyo Receiver |
+-----------------+-------------------+
With Thinking Sphinx, I want to: a) group the rows by merchant, and b) create a “distinct” product count for each group.
The above example, should give the following result:
+-----------------+------------------------+
| merchant | count(DISTINCT product |
+-----------------+------------------------+
| Best Buy | 2 |
| Amazon | 3 |
+-----------------+------------------------+
You're not going to be able to run this query through a model's search call, because that's set up to always return instances of a model, whereas what you're wanting is raw results. The following code should do the trick:
ThinkingSphinx::Connection.take do |connection|
result = connection.execute <<-SQL
SELECT merchant_name, COUNT(distinct part_number)
FROM product_core
GROUP BY merchant_name
SQL
result.to_a
end
Or, I think this will work to go through a normal search call:
Product.search(
select: "merchant_name, COUNT(distinct part_number) AS count",
group_by: :merchant_name,
middleware: ThinkingSphinx::Middlewares::RAW_ONLY
)

Database Design for state, cities and districts

I have users represented in a user table and need to design a model to associate them with state/cities/districts that they choose:
On the database side,
Each user will be associated with 1 state, 1 city and a number of districts within that state/city combination. For instance, User A can choose to be associated with "NY" and "Brooklyn" and any X number of districts in "Brooklyn" (or none).
On the view side,
I'd like to present the district choices with checkboxes so they should be able to be pulled from the database field with simple_form in Rails pretty easily.
The design of the database should make it easy to query for the user and get the associated state / city and district relations that the user has chosen.
One idea I have is to simply have a one-to-many field for districts and a district table listing all the different districts. However, is there a way to enforce that the districts have to be valid for the city/state combination on the backend using validate?
Any tips would be appreciated.
Below I have outlined the database schema I would use based on the information you have given.
Every city belongs to exactly one state.
cities
id unsigned int(P)
state_id unsigned int(F states.id)
name varchar(50)
+----+----------+---------------+
| id | state_id | name |
+----+----------+---------------+
| 1 | 33 | New York City |
| .. | ........ | ............. |
+----+----------+---------------+
See ISO 3166 for more information. You didn't ask for countries but it's trivial to add them...
countries
id char(2)(P)
iso3 char(3)(U)
iso_num char(3)(U)
name varchar(45)(U)
+----+------+---------+---------------+
| id | iso3 | iso_num | name |
+----+------+---------+---------------+
| ca | can | 124 | Canada |
| mx | mex | 484 | Mexico |
| us | usa | 840 | United States |
| .. | .... | ....... | ............. |
+----+------+---------+---------------+
Every district belongs to exactly one city.
districts
id unsigned int(P)
city_id unsigned int(F cities.id)
name varchar(50)
+----+---------+-----------+
| id | city_id | name |
+----+---------+-----------+
| 1 | 1 | The Bronx |
| 2 | 1 | Brooklyn |
| 3 | 1 | Manhattan |
| .. | ....... | ......... |
+----+---------+-----------+
See ISO 3166-2:US for more information. Every state belongs to exactly one country.
states
id unsigned int(P)
country_id char(2)(F countries.id)
code char(2)
name varchar(50)
+----+------------+------+----------+
| id | country_id | code | name |
+----+------------+------+----------+
| 1 | us | AL | Alabama |
| .. | .......... | .... | ........ |
| 33 | us | NY | New York |
| .. | .......... | .... | ........ |
+----+------------+------+----------+
Based on your information a user belongs to exactly one city. In the example data Bob is associated with New York City. By joining tables you can very easily find that Bob is in New York state and the country of United States.
users
id unsigned int(P)
username varchar(255)
city_id unsigned int(F cities.id)
...
+----+----------+---------+-----+
| id | username | city_id | ... |
+----+----------+---------+-----+
| 1 | bob | 1 | ... |
| .. | ........ | ....... | ... |
+----+----------+---------+-----+
Users can belong to any number of districts. In the example data Bob belongs to The Bronx and Brooklyn. user_id and district_id form the Primary Key which insures a user cannot be associated with the same district more than once.
users_districts
user_id unsigned int(F users.id) \_(P)
district_id unsigned int(F districts.id) /
+---------+-------------+
| user_id | district_id |
+---------+-------------+
| 1 | 1 |
| 1 | 2 |
| ....... | ........... |
+---------+-------------+
My database model does NOT enforce the rule that the districts a user belongs to must be in the city that user belongs to - in my opinion that logic should be done at the application level. If Bob moves from New York City to Baltimore I think all of his records should be deleted from the users_districts table and then add any new ones for his new city.
As for the user interface, I would have the user:
Select a country - this will auto-populate a drop down list of associated states.
Select a state - this will auto-populate a drop down list of associated cities.
Select a city - this will auto-populate a list of associated districts.
Allow the user to select any number of districts.
You will need some combination of database and application-level logic.
Here is how I would build the database fields:
users = id, <other user fields>, city_id
districts = id, <other district fields>, city_id
cities = id, name, state_id
states = id, name
And then in the application, set it up so that the user can type in one city and multiple districts, and can not edit the state (view only):
When the user types in a city - maybe through a autocomplete field - it automatically updates the read-only state field with the state of the city
When the user types in a district, list only the districts that have district.city_id == cities.id
If you don't want to restrict the district selection in the UI, you will need to enforce the district.city_id == cities.id check in your application, though I personally think that's less intuitive than doing it right in the front-end UI.
Indian States AND UT MySQL QUERY
INSERT INTO `states`
VALUES
(1,'Andhra Pradesh'),
(2,'Telangana'),
(3,'Arunachal Pradesh'),
(4,'Assam'),
(5,'Bihar'),
(6,'Chhattisgarh'),
(7,'Chandigarh'),
(8,'Dadra and Nagar Haveli'),
(9,'Daman and Diu'),
(10,'Delhi'),
(11'Goa'),
(12,'Gujarat'),
(13,'Haryana'),
(14,'Himachal Pradesh'),
(15,'Jammu and Kashmir'),
(16,'Jharkhand'),
(17,'Karnataka'),
(18,'Kerala'),
(19,'Madhya Pradesh'),
(20,'Maharashtra'),
(21,'Manipur'),
(22,'Meghalaya'),
(23,'Mizoram'),
(24,'Nagaland'),
(25,'Orissa'),
(26,'Punjab'),
(27,'Pondicherry'),
(28,'Rajasthan'),
(29,'Sikkim'),
(30,'Tamil Nadu'),
(31,'Tripura'),
(32,'Uttar Pradesh'),
(33,'Uttarakhand'),
(34,'West Bengal'),
(35,'Lakshadweep'),
(36,'Ladakh ');

Resources