Data Structure (or Gem) for Usage Tracking & Metrics - ruby-on-rails

Requirement:
I'm building a Rails application, for which the primary feature is utilised through an API. I want to track use of the API for the object that is called for each user so I can display metrics to the user for each of their objects and also bill them based on total usage.
I'm looking for a data structure or other solution that allows me to track and report on the number of times the API is called for a given resource (in this case the 'Flow' object which belongs_to a user).
Current Implementation:
In an attempt to solve this, I have added a 'daily_calls' Hash field in my model that has a counter for number of API calls by date. I'd also like to add another level with hours below the day, but I know that this will not be very performant when running aggregated queries (e.g. Calls in the last month).
# Flow daily_calls field example (Y => M => D)
{2016=>{12=>{14=>5}}, 2017=>{1=>{9=>6}}}
# Flow daily_calls example with hours (Y => M => D => H)
{2016=>{12=>{14=>{23=>5}}}, 2017=>{1=>{9=>{3=>4,4=>2}}}}
Currently I'm updating the count with a method when the API is called and I have some methods for aggregating data for a specific object:
class Flow < ApplicationRecord
belongs_to :user
...
# Update usage metrics
def api_called
update_lifetime_count!
update_daily_count!
end
# Example method for displaying usage
def calls_in_month(date)
return 0 unless daily_calls[date.year] && daily_calls[date.year][date.month]
daily_calls[date.year][date.month].values.sum
end
end
The more I explain the above, the more crazy an approach it sounds! I'm hoping the application will be high volume, so I did not want to create excessive data volumes if it can be helped, though it may be useful in future to save more information around API usage, e.g. geolocation / IP of request.
An alternative approach I am considering is creating an object instance for each hour and incrementing an integer field on that object. Then I could run queries for a time range and sum the integers (which would be significantly easier than all this Hash-work).
Request:
I suspect this is not the optimal solution, so would like to know the best data structure for this, or if there is a gem that allows the flexibility to track these sorts of user activities and use this information for displaying usage to the user and aggregating usage for reporting.
Some example use cases that it should support:
Display a usage graph by hour for a given Flow (or a more granular level)
Display the total number of API calls for a given month for all Flows that belong to a given User
Report on application usage for all users for a given time period

I think you might be able to use a gem like public_activity or paper_trail.
If you use one of those gems, you would be able to query them as needed for the information that you want to show to your user.

Related

.where statement to filter posts by date causing large number of database queries?

My first StackOverflow question, so pardon if it's a little rough around the edges. I recently began my first engineering job and have inherited some legacy Ruby on Rails code to work through.
My goal:
is to fetch posts (this is a model, though with no association to user) belonging to a user, as seen below. The posts should be filtered to only include an end_date that is nullor in the future
The problem:
The ActiveRecord query #valid_posts ||= Posts.for_user(user).where('end_date > ? OR end_date IS?', Time.now.in_time_zone, nil).pluck(post_set_id) (some further context below)
generates ~15 calls to my database per user per second when testing with Postman, causing significant memory spikes, notably with increased # of posts. I would only expect (not sure about this) 2 at most (one to fetch posts for the user, a second to fetch posts that match the date constraint).
In absence of the .where('end_date > ? OR end_date IS?', Time.now.in_time_zone, nil), there are no memory issues whatsoever. My question essentially, is why does this particular line cause so many queries to the database (which seems to be the cause of memory spikes), and what would be an improved implementation?
My reasoning thus far:
My initial suspicion was that I was making an N+1 query, though I no longer believe this to be the case (compared .select with .where in the query, no significant changes. A third option would possibly be to use .includes, though there is no association between a user and a post, and I do not believe that it would be feasible to generate one, as to my level of understanding, users are a function of an organization, not their own model.
My second thought is that because I am using a date that is precise to the millisecond, the time is ever changing, and therefore the updated time runs against the posts table every time there is a change in time (in this case every millisecond). Would it be possible to capture the current time in a variable and then pass this to the .where statement, rather than with a varying time, as is currently implemented? This would ultimately cause a sort of caching mechanism if I am not mistaken.
My third thought was to add an index to end_date on the posts table for quicker lookup, though in itself, I do not believe this to provide a solution.
Some basic context:
While there are many files working together, I have tried to overly-simplify them to essentially reflect the information that I believe is necessary to understand the issue at hand. If there is no identifiable cause for this issue, then perhaps I need to dig into other areas of code.
for_user is a user scope defined below:
user_scope
module UserScopable
extend ActiveSupport::Concern
...
scope(:for_user,
lambda { |poster|
for_user_scope(
{ user_id: poster.user_id, organization_id: poster.organization_id}
)
})
scope(:for_user_scope, lambda { |hash|
where(user_id: hash.fetch(:user_id), organization_id: hash.fetch(:organization_id))
})
#valid_posts is contained within a module, PostSetFilter and called in the user controller:
users_controller
def post_ids
post_pools = PostSetFilter.new(user: user)
render json: {PostPools: post_pools}
end
Ultimately, there's a lot that I do not know, and it seems like many approaches, so not entirely sure how to proceed. Any guidance about how to reduce the number of queries, and any reasoning as to why would be greatly appreciated.
I am happy to provide further context if needed, though everything points to the aforementioned line as being the culprit.. Thank you in advance.

How to append apis with incrementing numbers

How can I hit multiple apis like example.com/1000/getUser, example.com/1001/getUser in Gatling? They are Get calls.
Note: Numbers start from a non zero integer.
Hard to give good advice based on the small amount of information in your question, but I'm guessing that passing the userID's in with a feeder could be a simple, straightforward solution. Largely depends on how your API-works, what kind of tests you're planning, and how many users (I'm assuming the numbers are userId's) you need to test with.
If you need millions of users, a custom feeder that generates increments would probably be better, but beyond that the strategy would otherwise be the same. I advice you to read up on the feeder-documentation for more information both on usage in general, and how to make custom feeders: https://gatling.io/docs/3.0/session/feeder/
As an example, if you just need a relatively small amount of users, something along these lines could be a simple, straightforward solution:
Make a simple csv file (for example named userid.csv) with all your userID's and add it to the resources folder:
userid
1000
1001
1002
...
...
The .feed() step adds one value from the csv-file to your gatling user session, which you can fetch as you would work with session values ordinarily. Each of the ten users injected in this example will get an increment from the csv-file.
setUp(
scenario("ScenarioName")
.feed(csv("userid.csv"))
.exec{http("Name of your request").get("/${userid}/getUser")}
)
.inject(
atOnceUsers(10)
)
).protocols(http.baseUrl("example.com"))

Valence API Grade Export

I've been testing the grade export functionality using Valence and I have noticed while benchmarking that the process is very slow (about 1.5/2 seconds per user).
Here is the api call I am using:
/d2l/api/le/(D2LVERSION: version)/(D2LID: orgUnitId)/grades/(D2LID: gradeObjectId)/values/(D2LID: userId)
What I am looking to do is export a large number of grades upwards of 10k. Is this possible using this API?
An alternative to consider is to get all the grades for a particular user with GET /d2l/api/le/(version)/(orgUnitId)/grades/values/(userId)/.
(In your question, it looks like with the call you're using, you're getting the grade values one at a time for each user.)
In future, we plan to support paging of results, in order to better support the case of large class sizes + high number of grade items. We also plan to offer a call which retrieves a user's grades set across all courses.

What is the 'Rails Way' to implement a dynamic reporting system on data

Intro
I'm doing a system where I have a very simple layout only consisting of transactions (with basic CRUD). Each transaction has a date, a type, a debit amount (minus) and a credit amount (plus). Think of an online banking statement and that's pretty much it.
The issue I'm having is keeping my controller skinny and worrying about possibly over-querying the database.
A Simple Report Example
The total debit over the chosen period e.g. SUM(debit) as total_debit
The total credit over the chosen period e.g. SUM(credit) as total_credit
The overall total e.g. total_credit - total_debit
The report must allow a dynamic date range e.g. where(date BETWEEN 'x' and 'y')
The date range would never be more than a year and will only be a max of say 1000 transactions/rows at a time
So in the controller I create:
def report
#d = Transaction.select("SUM(debit) as total_debit").where("date BETWEEN 'x' AND 'y'")
#c = Transaction.select("SUM(credit) as total_credit").where("date BETWEEN 'x' AND 'y'")
#t = #c.credit_total - #d.debit_total
end
Additional Question Info
My actual report has closer to 6 or 7 database queries (e.g. pulling out the total credit/debit as per type == 1 or type == 2 etc) and has many more calculations e.g totalling up certain credit/debit types and then adding and removing these totals off other totals.
I'm trying my best to adhere to 'skinny model, fat controller' but am having issues with the amount of variables my controller needs to pass to the view. Rails has seemed very straightforward up until the point where you create variables to pass to the view. I don't see how else you do it apart from putting the variable creating line into the controller and making it 'skinnier' by putting some query bits and pieces into the model.
Is there something I'm missing where you create variables in the model and then have the controller pass those to the view?
A more idiomatic way of writing your query in Activerecord would probably be something like:
class Transaction < ActiveRecord::Base
def self.within(start_date, end_date)
where(:date => start_date..end_date)
end
def self.total_credit
sum(:credit)
end
def self.total_debit
sum(:debit)
end
end
This would mean issuing 3 queries in your controller, which should not be a big deal if you create database indices, and limit the number of transactions as well as the time range to a sensible amount:
#transactions = Transaction.within(start_date, end_date)
#total = #transaction.total_credit - #transaction.total_debit
Finally, you could also use Ruby's Enumerable#reduce method to compute your total by directly traversing the list of transactions retrieved from the database.
#total = #transactions.reduce(0) { |memo, t| memo + (t.credit - t.debit) }
For very small datasets this might result in faster performance, as you would hit the database only once. However, I reckon the first approach is preferable, and it will certainly deliver better performance when the number of records in your db starts to increase
I'm putting in params[:year_start]/params[:year_end] for x and y, is that safe to do?
You should never embed params[:anything] directly in a query string. Instead use this form:
where("date BETWEEN ? AND ?", params[:year_start], params[:year_end])
My actual report probably has closer to 5 database calls and then 6 or 7 calculations on those variables, should I just be querying the date range once and then doing all the work on the array/hash etc?
This is a little subjective but I'll give you my opinion. Typically it's easier to scale the application layer than the database layer. Are you currently having performance issues with the database? If so, consider moving the logic to Ruby and adding more resources to your application server. If not, maybe it's too soon to worry about this.
I'm really not seeing how I would get the majority of the work/calculations into the model, I understand scopes but how would you put the date range into a scope and still utilise GET params?
Have you seen has_scope? This is a great gem that lets you define scopes in your models and have them automatically get applied to controller actions. I generally use this for filtering/searching, but it seems like you might have a good use case for it.
If you could give an example on creating an array via a broad database call and then doing various calculations on that array and then passing those variables to the template that would be awesome.
This is not a great fit for Stack Overflow and it's really not far from what you would be doing in a standard Rails application. I would read the Rails guide and a Ruby book and it won't be too hard to figure out.

Question about Ruby on Rails, Constants, belongs_to & Database Optimization/Performance

I've developed a web based point of sale system for one of my clients in Ruby on Rails with MySQL backend. These guys are growing so fast that they are ringing close to 10,000 transactions per day corporate-wide. For this question, I will use the transactions table as an example. Currently, I store the transactions.status as a string (ie: 'pending', 'completed', 'incomplete') within a varchar(255) field that has an index. In the beginning, it was fine when I was trying to lookup records by different statuses as I didn't have to worry about so many records. Over time, using the query analyzer, I have noticed that performance has worsened and that varchar fields can really slowdown your query speed over thousands of lookups. I've been thinking about converting these varchar fields to integer based status fields utilizing STATUS CONSTANT within the Transaction model like so:
class Transaction < ActiveRecord::Base
STATUS = { :incomplete => 0, :pending => 1, :completed => 2 }
def expensive_query_by_status(status)
self.find(:all,
:select => "id, cashier, total, status",
:condition => { :status => STATUS[status.to_sym] })
end
Is this the best route for me to take? What do you guys suggest? I am already using proper indexes on various lookup fields and memcached for query caching wherever possible. They're currently setup on a distributed server environment of 3 servers where 1st is for application, 2nd for DB & 3rd for caching (all in 1 datacenter & on same VLAN).
Have you tried the alternative on a representative database? From the example given, I'm a little sceptical that it's going to make much difference, you see. If there are only three statuses then a query by status may be better-off not using an index at all.
Say "completed" comprises 80% of your table - with no other indexed column involved, you're going to be requiring more reads if the index is used than not. So a query of that type is almost certainly going to get slower as the table grows. "incomplete" and "pending" queries would probably still benefit from an index, however; they'd only be affected as the total number of rows with those statuses grew.
How often do you look at everything, complete and otherwise, without some more selective criterion? Could you partition the table in some (internal or external) way? For example, store completed transactions in a separate table, moving new ones there as they reach their final (?) state. I think internal database partitioning was introduced in MySQL 5.1 - looking at the documentation it seems that a RANGE partition might be appropriate.
All that said, I do think there's probably some benefit to moving away from storing statuses as strings. Storage and bandwidth considerations aside, it's a lot less likely that you'll inadvertently mis-spell an integer or, better yet, a constant or symbol.
You might want to start limiting your searchings (if your not doing that already), #find(:all) is pretty taxing on that scale. Also you might want to think about what your Transaction model is reaching out for as it gets translated into your views and perhaps eager load those to minimize requests to the db for extra information.

Resources