I'm new to Ruby on Rails, and want to learn the basics of how a database works.
I created a table named Users with three columns, username, password, email, and I have three records inside. I made a query:
#user = Users.select(:username).limit(2)
which works.
Now I want to make a query like:
SELECT username FROM Users LIMIT 1,2
which means:
select 2 username column from table Users which starts at index 1
like a normal MySQL query.
I tried
#user = Users.select(:username).limit(1,2)
but it's not working. I don't actually know what to search for my situation. Tried limit query with range or limit from one to another index query but neither works.
I think I'm going to need it for printing out products with pages navigation.
I believe you need to use .offset(1)
#user = Users.select(:username).limit(2).offset(1)
Related
For the analytics of my site, I'm required to extract the 4 states of my users.
#members = list.members.where(enterprise_registration_id: registration.id)
# This pulls roughly 10,0000 records.. Which is evidently a huge data pull for Rails
# Member Load (155.5ms)
#invited = #members.where("user_id is null")
# Member Load (21.6ms)
#not_started = #members.where("enterprise_members.id not in (select enterprise_member_id from quizzes where quizzes.section_id IN (?)) AND enterprise_members.user_id in (select id from users)", #sections.map(&:id) )
# Member Load (82.9ms)
#in_progress = #members.joins(:quizzes).where('quizzes.section_id IN (?) and (quizzes.completed is null or quizzes.completed = ?)', #sections.map(&:id), false).group("enterprise_members.id HAVING count(quizzes.id) > 0")
# Member Load (28.5ms)
#completes = Quiz.where(enterprise_member_id: registration.members, section_id: #sections.map(&:id)).completed
# Quiz Load (138.9ms)
The operation returns a 503 meaning my app gives up on the request. Any ideas how I can refactor this code to run faster? Maybe by better joins syntax? I'm curious how sites with larger datasets accomplish what seems like such trivial DB calls.
The answer is your indexes. Check your rails logs (or check the console in development mode) and copy the queries to your db tool. Slap an "Explain" in front of the query and it will give you a breakdown. From here you can see what indexes you need to optimize the query.
For a quick pass, you should at least have these in your schema,
enterprise_members: needs an index on enterprise_member_id
members: user_id
quizes: section_id
As someone else posted definitely look into adding indexes if needed. Some of how to refactor depends on what exactly you are trying to do with all these records. For the #members query, what are you using the #members records for? Do you really need to retrieve all attributes for every member record? If you are not using every attribute, I suggest only getting the attributes that you actually use for something, .pluck usage could be warranted. 3rd and 4th queries, look fishy. I assume you've run the queries in a console? Again not sure what the queries are being used for but I'll toss in that it is often useful to write raw sql first and query on the db first. Then, you can apply your findings to rewriting activerecord queries.
What is the .completed tagged on the end? Is it supposed to be there? only thing I found close in the rails api is .completed? If it is a custom method definitely look into it. You potentially also have an use case for scopes.
THIRD QUERY:
I unfortunately don't know ruby on rails, but from a postgresql perspective, changing your "not in" to a left outer join should make it a little faster:
Your code:
enterprise_members.id not in (select enterprise_member_id from quizzes where quizzes.section_id IN (?)) AND enterprise_members.user_id in (select id from users)", #sections.map(&:id) )
Better version (in SQL):
select blah
from enterprise_members em
left outer join quizzes q on q.enterprise_member_id = em.id
join users u on u.id = q.enterprise_member_id
where quizzes.section_id in (?)
and q.enterprise_member_id is null
Based on my understanding this will allow postgres to sort both the enterprise_members table and the quizzes and do a hash join. This is better than when it will do now. Right now it finds everything in the quizzes subquery, brings it into memory, and then tries to match it to enterprise_members.
FIRST QUERY:
You could also create a partial index on user_id for your first query. This will be especially good if there are a relatively small number of user_ids that are null in a large table. Partial index creation:
CREATE INDEX user_id_null_ix ON enterprise_members (user_id)
WHERE (user_id is null);
Anytime you query enterprise_members with something that matches the index's where clause, the partial index can be used and quickly limit the rows returned. See http://www.postgresql.org/docs/9.4/static/indexes-partial.html for more info.
Thanks everyone for your ideas. I basically did what everyone said. I added indexes, resorted how I called everything, but the major difference was using the pluck method.. Here's my new stats :
#alt_members = list.members.pluck :id # 23ms
if list.course.sections.tests.present? && #sections = list.course.sections.tests
#quiz_member_ids = Quiz.where(section_id: #sections.map(&:id)).pluck(:enterprise_member_id) # 8.5ms
#invited = list.members.count('user_id is null') # 12.5ms
#not_started = ( #alt_members - ( #alt_members & #quiz_member_ids ).count #0ms
#in_progress = ( #alt_members & #quiz_member_ids ).count # 0ms
#completes = ( #alt_members & Quiz.where(section_id: #sections.map(&:id), completed: true).pluck(:enterprise_member_id) ).count # 9.7ms
#question_count = Quiz.where(section_id: #sections.map(&:id), completed: true).limit(5).map{|quiz|quiz.answers.count}.max # 3.5ms
I have a table called shoppers and another table called Users. I have a shopper_id which is the foreign key in the Shoppers table and refers to the primary key id in the Users table.
I ran a query called #shoppers = shoppers.where("some condition")
This allowed me to get a set of shoppers who satisfy the condition. Next I would like to select those Users who have the same id as the shopper_id as the individual objects in #shoppers.
I know I could do this by writing a loop, but I am wondering if ruby on rails allows me to write a Users.where condition that can help me obtain the subset of user objects with the same id as shopper_id arranged in ascending order by the name field in the Users table.
Any ideas?
Try this.
#shoppers = Shopper.where("some condition")
#users = User.where(id: #shoppers.collect(&:shopper_id)).order('name asc')
In SQLite (development) I don't have any errors, but in production with Postgres I get the following error. I don't really understand the error.
PG::Error: ERROR: column "commits.updated_at" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: ...mmits"."user_id" = 1 GROUP BY mission_id ORDER BY updated_at...
^
: SELECT COUNT(*) AS count_all, mission_id AS mission_id FROM "commits" WHERE "commits"."user_id" = 1 GROUP BY mission_id ORDER BY updated_at DESC
My controller method:
def show
#user = User.find(params[:id])
#commits = #user.commits.order("updated_at DESC").page(params[:page]).per(25)
#missions_commits = #commits.group("mission_id").count.length
end
UPDATE:
So i digged further into this PostgreSQL specific annoyance and I am surprised that this exception is not mentioned in the Ruby on Rails Guide.
I am using psql (PostgreSQL) 9.1.11
So from what I understand, I need to specify which column that should be used whenever you use the GROUP_BY clause. I thought using SELECT would help, which can be annoying if you need to SELECT a lot of columns.
Interesting discussion here
Anyways, when I look at the error, everytime the cursor is pointed to updated_at. In the SQL query, rails will always ORDER BY updated_at. So I have tried this horrible query:
#commits.group("mission_id, date(updated_at)")
.select("date(updated_at), count(mission_id)")
.having("count(mission_id) > 0")
.order("count(mission_id)").length
which gives me the following SQL
SELECT date(updated_at), count(mission_id)
FROM "commits"
WHERE "commits"."user_id" = 1
GROUP BY mission_id, date(updated_at)
HAVING count(mission_id) > 0
ORDER BY updated_at DESC, count(mission_id)
LIMIT 25 OFFSET 0
the error is the same.
Note that no matter what it will ORDER BY updated_at, even if I wanted to order by something else.
Also I don't want to group the records by updated_at just by mission_id.
This PostgreSQL error is just misleading and has little explanation to solving it. I have tried many formulas from the stackoverflow sidebar, nothing works and always the same error.
UPDATE 2:
So I got it to work, but it needs to group the updated_at because of the automatic ORDER BY updated_at. How do I count only by mission_id?
#missions_commits = #commits.group("mission_id, updated_at").count("mission_id").size
I guest you want to show general number of distinct Missions related with Commits, anyway it won't be number on page.
Try this:
#commits = #user.commits.order("updated_at DESC").page(params[:page]).per(25)
#missions_commits = #user.commits.distinct.count(:mission_id)
However if you want to get the number of distinct Missions on page I suppose it should be:
#missions_commits = #commits.collect(&:mission_id).uniq.count
Update
In Rails 3, distinct did not exist, but pure SQL counting should be used this way:
#missions_commits = #user.commits.count(:mission_id, distinct: true)
See the docs for PostgreSQL GROUP BY here:
http://www.postgresql.org/docs/9.3/interactive/sql-select.html#SQL-GROUPBY
Basically, unlike Sqlite (and MySQL) postgres requires that any columns selected or ordered on must appear in an aggregate function or the group by clause.
If you think it through, you'll see that this actually makes sense. Sqlite/MySQL cheat under the hood and silently drop those fields (not sure that's technically what happens).
Or thinking about it another way if you are grouping by a field, what's the point of ordering it? How would that even make sense unless you also had an aggregate function on the ordered field?
I am trying to generate a report to screen of accounting transaction history. In most situations it is one display row per record in the AccountingTransaction table. But occasionally there are transactions that I wish to display to the end user as one transaction which are really, behind the scenes, two accounting transactions. This is caused by deferral of revenues and fund splitting since this app is a fund accounting app.
If I display all rows one by one, those double entries look odd to the user since the fund splitting and deferral is "behind the scenes". So I want to roll up all the related transactions into one display row on screen.
I have my query now using group by to group the related transactions
#history = AccountingTransaction.where("customer_id in (?) AND no_download <> 1", customers_in_account).group(:transaction_type_id, :reference_id).order(:created_at)
as I loop through I get the transactions grouped as I want but I am struggling with how to display the total sum of the 'credit' field for all records in the group. (It is only showing the credit for the first record of the group) If I add a .sum(:credit) to my query, of course, it returns the sums just as I want but not all the other data.
Is there a way for me to group these records like in my #history query and also get the sum of the credit field for each respective group?
* Addition *
What I really want is what the following SQL query would give me.
SELECT transaction_type_id, reference_id, sum(credit)
WHERE customer_id in (21,22,23,24) AND no_download <> 1
GROUP BY reference_id, transaction_type_id ORDER BY created_at
I'm not sure you can do "ORDER BY created_at" and not include it in the select fields, but here is an example.
#history = AccountingTransaction.
select([:reference_id, :transaction_type_id, :created_at]).
select(AccountingTransaction.arel_table[:credit].sum.as("credit_sum")).
where("customer_id in (?) AND no_download <> 1", customers_in_account).
group(:transaction_type_id, :reference_id).
order(:created_at)
To access the credit_sum you could do:
#history[0].attributes["credit_sum"]
I guess if you'd like, you could create a method:
def credit_sum
attributes["credit_sum"]
end
EDIT *
As stated in comments you can access the attribute directly:
#history[0].credit_sum
I have a table of records;
Users: id, screen_name, tweet
I want to return a list of the users and group by the screen_name so that there aren't duplicate screen_names in the list.
#users = User.all(:group => "screen_name")
This works fine when using MySQL, but not when I push to Heroku, which uses PostgreSQL.
How can I get a similar set of results using PostgreSQL?
There seem to be a good few posts on this but I couldn't figure out an answer from the comments.
Unless screen_name is not a primary key you should add all columns in your group statement or aggregate them. But it looks like that you can't aggregate id or tweets ))
.group("id, screen_name, tweet")
You can also normalize your database. Put screen names to another table and you will be able to get desired data easily.