I have some table:
online | offline
10:32 | 11:06
12:28 | 13:04
14:07 | NULL
As you can see, - user is not offline at the time (assume that now is 14:15). And I need to get a duration of all user onlines. If user is not currently offline - I need to get difference Time.now - online.
In my ruby-code I create function:
def offline_datetime
if offline.nil? then Time.current else offline end
end
It's ok, but if I have many records it will be so slow... How can I do it through database?
I don't think it will make any difference. Even with hundreds of records, what you additionally have is one additional test on nil per record, and only one computation of 'Time.now'. Compared to the generation of the model objects, that is nearly nothing. You could compute the differences on storing, and then let the database compute the sum. But I would only do it if necessary (don't tune performance if it is not broken).
EDIT
If you have to compute the sum of the online times many times, you should compute the online time upfront when storing the offline timestamp:
Add by a migration the attribute diff as integer for the difference in minutes.
When storing your offline timestamp, compute as well the difference and store it together.
Add the sums by calling: Model.sum("diff")
Add to that the only record (there could be only one in your example) as #davidb has written in his solution.
I think in that position it would be the easyes solution to do two queries. One exluding datasets where offline is NULL and one where only these are grabbed. Then you can use the CURRENT_TIMESTAMP variable (It always contains the timestmap for actual time) of MySQL like this:
Model.where("offline IS NULL").sum("CURRENT_TIMESTAMP-online")
then you can add the result of the first and secound query. Maybe made method function out of it:
def self.your_method_name_here
secounds=0
secounds+=Model.where("offline IS NOT NULL").sum("offline-online").to_i
secounds+=Model.where("ofline IS NULL").sum("CURRENT_TIMESTAMP-online").to_i
secounds
end
It might be possible to do this in one query but I think this is quit practable!
Related
Intro
I'm doing a system where I have a very simple layout only consisting of transactions (with basic CRUD). Each transaction has a date, a type, a debit amount (minus) and a credit amount (plus). Think of an online banking statement and that's pretty much it.
The issue I'm having is keeping my controller skinny and worrying about possibly over-querying the database.
A Simple Report Example
The total debit over the chosen period e.g. SUM(debit) as total_debit
The total credit over the chosen period e.g. SUM(credit) as total_credit
The overall total e.g. total_credit - total_debit
The report must allow a dynamic date range e.g. where(date BETWEEN 'x' and 'y')
The date range would never be more than a year and will only be a max of say 1000 transactions/rows at a time
So in the controller I create:
def report
#d = Transaction.select("SUM(debit) as total_debit").where("date BETWEEN 'x' AND 'y'")
#c = Transaction.select("SUM(credit) as total_credit").where("date BETWEEN 'x' AND 'y'")
#t = #c.credit_total - #d.debit_total
end
Additional Question Info
My actual report has closer to 6 or 7 database queries (e.g. pulling out the total credit/debit as per type == 1 or type == 2 etc) and has many more calculations e.g totalling up certain credit/debit types and then adding and removing these totals off other totals.
I'm trying my best to adhere to 'skinny model, fat controller' but am having issues with the amount of variables my controller needs to pass to the view. Rails has seemed very straightforward up until the point where you create variables to pass to the view. I don't see how else you do it apart from putting the variable creating line into the controller and making it 'skinnier' by putting some query bits and pieces into the model.
Is there something I'm missing where you create variables in the model and then have the controller pass those to the view?
A more idiomatic way of writing your query in Activerecord would probably be something like:
class Transaction < ActiveRecord::Base
def self.within(start_date, end_date)
where(:date => start_date..end_date)
end
def self.total_credit
sum(:credit)
end
def self.total_debit
sum(:debit)
end
end
This would mean issuing 3 queries in your controller, which should not be a big deal if you create database indices, and limit the number of transactions as well as the time range to a sensible amount:
#transactions = Transaction.within(start_date, end_date)
#total = #transaction.total_credit - #transaction.total_debit
Finally, you could also use Ruby's Enumerable#reduce method to compute your total by directly traversing the list of transactions retrieved from the database.
#total = #transactions.reduce(0) { |memo, t| memo + (t.credit - t.debit) }
For very small datasets this might result in faster performance, as you would hit the database only once. However, I reckon the first approach is preferable, and it will certainly deliver better performance when the number of records in your db starts to increase
I'm putting in params[:year_start]/params[:year_end] for x and y, is that safe to do?
You should never embed params[:anything] directly in a query string. Instead use this form:
where("date BETWEEN ? AND ?", params[:year_start], params[:year_end])
My actual report probably has closer to 5 database calls and then 6 or 7 calculations on those variables, should I just be querying the date range once and then doing all the work on the array/hash etc?
This is a little subjective but I'll give you my opinion. Typically it's easier to scale the application layer than the database layer. Are you currently having performance issues with the database? If so, consider moving the logic to Ruby and adding more resources to your application server. If not, maybe it's too soon to worry about this.
I'm really not seeing how I would get the majority of the work/calculations into the model, I understand scopes but how would you put the date range into a scope and still utilise GET params?
Have you seen has_scope? This is a great gem that lets you define scopes in your models and have them automatically get applied to controller actions. I generally use this for filtering/searching, but it seems like you might have a good use case for it.
If you could give an example on creating an array via a broad database call and then doing various calculations on that array and then passing those variables to the template that would be awesome.
This is not a great fit for Stack Overflow and it's really not far from what you would be doing in a standard Rails application. I would read the Rails guide and a Ruby book and it won't be too hard to figure out.
I'm trying to figure out how to ask this - so I'll update the question as it goes to clear things up if needed.
I have a virtual stock exchange game site I've been building for fun. People make tons of trades, and each trade is its own record in a table.
When showing the portfolio page, I have to calculate everything on the fly, on the table of data - i.e. How many shares the user has, total gains, losses etc.
Things have really started slowing down, when I try to segment it by trades by company by day.
I don't really have any code to show to demonstrate this - but it just feels like I'm not approaching this correctly.
UPDATE: This code in particular is very slow
#Returning an array of values for a total portfolio over time
def portfolio_value_over_time
portfolio_value_array = []
days = self.from_the_first_funding_date
companies = self.list_of_companies
days.each_with_index do |day, index|
#Starting value
days_value = 0
companies.each do |company|
holdings = self.holdings_by_day_and_company(day, company)
price = Company.find_by_id(company).day_price(day)
days_value = days_value + (holdings * price).to_i
end
#Adding all companies together for that day
portfolio_value_array[index] = days_value
end
The page load time can be up to 20+ seconds - totally insane. And I've cached a lot of the requests in Memcache.
Should I not be generating reports / charts on the live data like this? Should I be running a cron task and storing them somewhere? Whats the best practice for handling this volume of data in Rails?
The Problem
Of course it's slow. You're presumably looking up large volumes of data from each table, and performing multiple lookups on multiple tables on every iteration through your loop.
One Solution (Among Many)
You need to normalize your data, create a few new models to store expensive calculated values, and push more of the calculations onto the database or into tables.
The fact that you're doing a nested loop over high-volume data is a red flag. You're making many calls to the database, when ideally you should be making as few sequential requests as possible.
I have no idea how you need to normalize your data or optimize your queries, but you can start by looking at the output of explain. In general, though, you probably will want to eliminate any full table scans and return data in larger chunks, rather than a record at a time.
This really seems more like a modeling or query problem than a Rails problem, per se. Hope this gets you pointed in the right direction!
You should precompute and store all this data on another table. An example table might look like this:
Table: PortfolioValues
Column: user_id
Column: day
Column: company_id
Column: value
Index: user_id
Then you can easily load all the user's portfolio data with a single query, for example:
current_user.portfolio_values
Since you're using memcached anyway, use it to cache some of those queries. For example:
Company.find_by_id(company).day_price(day)
I'm performing a query using an sqlite db where I pull out a quite large data set of call records from a database. On the same page I want to show the breakdown of counts per day on the call records, so I perform about 30 count queries on the database.
Is there a way I can filter the set that I retrieve initially and perform the counts on the in memory set, so I don't have to run those continuous queries? I need those counts for graphing and display purposes but even with an index on date, it takes about 10 seconds to run the initial query plus all of the count queries.
What I'm basically asking is there a way to perform the counts on the records returned or perform analysis on it, or is there a smarter way to cache this data?
#set = Record.get_records_for_range(date1, date2)
while date1 < date2
#count = Record.count_records_for_date(date1)
date1 = date1 + 1
end
is basically what I'm doing. Surely there's a simpler and faster way?
Using #set.length will get you the count of the in memory set without querying the database because it is performed by ruby not active record (like .count is)
Read about it here https://batsov.com/articles/2014/02/17/the-elements-of-style-in-ruby-number-13-length-vs-size-vs-count/
Here is a quote pulled out of that article
length is a method that’s not part of Enumerable - it’s part of a concrete class (like String or Array) and it’s usually running in O(1) (constant) time. That’s as fast as it gets, which means that using it is probably a good idea.
I want to perform some simple calculations while staying database-agnostic in my rails app.
I have three models:
.---------------. .--------------. .---------------.
| ImpactSummary |<------| ImpactReport |<----------| ImpactAuction |
`---------------'1 *`--------------'1 *`---------------'
Basicly:
ImpactAuction holds data about... auctions (prices, quantities and such).
ImpactReport holds monthly reports that have many auctions as well as other attributes ; it also shows some calculation results based on the auctions.
ImpactSummary holds a collection of reports as well as some information about a specific year, and also shows calculation results based on the two other models.
What i intend to do is to store the results of these really simple calculations (just means, sums, and the like) in the relevant tables, so that reading these would be fast, and in a way that i can easilly perform queries on the calculation results.
is it good practice to store calculation results ? I'm pretty sure that's not a very good thing, but is it acceptable ?
is it useful, or should i not bother and perform the calculations on-the-fly?
if it is good practice and useful, what's the better way to achieve what i want ?
Thats the tricky part.At first, i implemented a simple chain of callbacks that would update the calculation fields of the parent model upon save (that is, when an auction is created or updated, it marks some_attribute_will_change! on its report and saves it, which triggers its own callbacks, and so on).
This approach fits well when creating / updating a single record, but if i want to work on several records, it will trigger the calculations on the whole chain for each record... So i suddenly find myself forced to put a condition on the callbacks... depending on if i have one or many records, which i can't figure out how (using a class method that could be called on a relation? using an instance attribute #skip_calculations on each record? just using an outdated field to mark the parent records for later calculation ?).
Any advice is welcome.
Bonus question: Would it be considered DB agnostic if i implement this with DB views ?
As usual, it depends. If you can perform the calculations in the database, either using a view or using #find_by_sql, I would do so. You'll save yourself a lot of trouble: you have to keep your summaries up to date when you change values. You've already met the problem when updating multiple rows. Having a view, or a query that implements the view stored as text in ImpactReport, will allow you to always have fresh data.
The answer? Benchmark, benchmark, benchmark ;)
I have a table in my database that stores event totals, something like:
event1_count
event2_count
event3_count
I would like to transition from these simple aggregates to more of a time series, so the user can see on which days these events actually happened (like how Stack Overflow shows daily reputation gains).
Elsewhere in my system I already did this by creating a separate table with one record for each daily value - then, in order to collect a time series you end up with a huge database table and the need to query 10s or 100s of records. It works but I'm not convinced that it's the best way.
What is the best way of storing these individual events along with their dates so I can do a daily plot for any of my users?
When building tables like this, the real key is having effective indexes. Test your queries with the EXAMINE statement or the equivalent in your database of choice.
If you want to build summary tables you can query, build a view that represents the query, or roll the daily results into a new table on a regular schedule. Often summary tables are the best way to go as they are quick to query.
The best way to implement this is to use Redis. If you haven't worked before with Redis I suggest you to start. You will be surprised how fast this can get :). The way I would do such a thing is to use the Hash data structure Redis provides. Just assign every user to his Hash (making a unique key for every user like "user:23:counters"). Inside this Hash you can store a daily timestamp as "05/06/2011" as the field and increment its counter every time an event happens or whatever you want to do with that!
A good start would be this thread. It has a simple, beginner level solution. Time Series Starter. If you are ok with rails models: This is a way it could work. For a sol called "irregular" time series. So this is a event here and there, but not in a regular interval. Like a sensor that sends data when your door is opened.
The other thing, and that is what I was looking for in this thread is regular time series db: Values come at a interval. Say 60/minute aka 1 per second for example a temperature sensor. This all boils down to datasets with "buckets" as you are suspecting right: A time series table gets long, indexes suck at a point etc. Here is one "bucket" approach using postgres arrays that would a be feasible idea.
Its not done as "plug and play" as far as I researched the web.