Valence API Grade Export - desire2learn

I've been testing the grade export functionality using Valence and I have noticed while benchmarking that the process is very slow (about 1.5/2 seconds per user).
Here is the api call I am using:
/d2l/api/le/(D2LVERSION: version)/(D2LID: orgUnitId)/grades/(D2LID: gradeObjectId)/values/(D2LID: userId)
What I am looking to do is export a large number of grades upwards of 10k. Is this possible using this API?

An alternative to consider is to get all the grades for a particular user with GET /d2l/api/le/(version)/(orgUnitId)/grades/values/(userId)/.
(In your question, it looks like with the call you're using, you're getting the grade values one at a time for each user.)
In future, we plan to support paging of results, in order to better support the case of large class sizes + high number of grade items. We also plan to offer a call which retrieves a user's grades set across all courses.

Related

Data Structure (or Gem) for Usage Tracking & Metrics

Requirement:
I'm building a Rails application, for which the primary feature is utilised through an API. I want to track use of the API for the object that is called for each user so I can display metrics to the user for each of their objects and also bill them based on total usage.
I'm looking for a data structure or other solution that allows me to track and report on the number of times the API is called for a given resource (in this case the 'Flow' object which belongs_to a user).
Current Implementation:
In an attempt to solve this, I have added a 'daily_calls' Hash field in my model that has a counter for number of API calls by date. I'd also like to add another level with hours below the day, but I know that this will not be very performant when running aggregated queries (e.g. Calls in the last month).
# Flow daily_calls field example (Y => M => D)
{2016=>{12=>{14=>5}}, 2017=>{1=>{9=>6}}}
# Flow daily_calls example with hours (Y => M => D => H)
{2016=>{12=>{14=>{23=>5}}}, 2017=>{1=>{9=>{3=>4,4=>2}}}}
Currently I'm updating the count with a method when the API is called and I have some methods for aggregating data for a specific object:
class Flow < ApplicationRecord
belongs_to :user
...
# Update usage metrics
def api_called
update_lifetime_count!
update_daily_count!
end
# Example method for displaying usage
def calls_in_month(date)
return 0 unless daily_calls[date.year] && daily_calls[date.year][date.month]
daily_calls[date.year][date.month].values.sum
end
end
The more I explain the above, the more crazy an approach it sounds! I'm hoping the application will be high volume, so I did not want to create excessive data volumes if it can be helped, though it may be useful in future to save more information around API usage, e.g. geolocation / IP of request.
An alternative approach I am considering is creating an object instance for each hour and incrementing an integer field on that object. Then I could run queries for a time range and sum the integers (which would be significantly easier than all this Hash-work).
Request:
I suspect this is not the optimal solution, so would like to know the best data structure for this, or if there is a gem that allows the flexibility to track these sorts of user activities and use this information for displaying usage to the user and aggregating usage for reporting.
Some example use cases that it should support:
Display a usage graph by hour for a given Flow (or a more granular level)
Display the total number of API calls for a given month for all Flows that belong to a given User
Report on application usage for all users for a given time period
I think you might be able to use a gem like public_activity or paper_trail.
If you use one of those gems, you would be able to query them as needed for the information that you want to show to your user.

mahout recommendations on two event on similar item

I am trying to solve a problem on mahout. The question is we have users and courses, a user can view a course or can take a course. If a user is viewing a course frequently then i have to recommend to take the course. I have data like userid and itemid and there is no preferences associated with.
EX:
1 2
1 7
2 4
2 8
3 5
4 6
where in first column 1 is userid and in 2nd column 2 is course id.The twist is in 2nd column can hold both viewed or/and complete of a particular course.suppose courseA which is viewed has id 2 and same courseA which is taken has id 7 for user 1. if a user other than user 1 coming and viewing the courseA than i have to predict courceA to be taken.now the problem here is if all the user viewing a course but not taking it, then user based recommendation in mahout will be failed.because for business perspective we have to give them the course that they are viewing should be taken. Do i need to factorize my dataset here or which algo is best suitable for this kind of problem.
One problem is that viewing may not predict (and certainly won't predict as well) that the user wants to take the course. You should look at the new cross-cooccurrence recommender stuff in Mahout v1. It's part of a complete revamp of Mahout on Spark using a new Scala DSL and built in optimizer for linear algebra. The command line job you are looking for is spark-itemsimilarity and it can ingest your user and item ids directly without translating them into cardinal non-negative numbers.
The algo takes the actions you know you want to recommend (user takes a course) these are the strongest "indicators" that can be used in your recommender. Then finds correlated views, views that led to the user taking that course. This is done with the spark-itemsimilarity job, which can take two actions at a time finding correlations, filtering out noise, and producing two "indicators". From the job you get two sparse matrices, each row is an item from the "user takes a course" action dataset and the values are an ordered list of item ids that are most similar. The first output will be items similar by other peoples taking the course, the second will be items similar by other people viewing and taking the course.
Input uses application specific IDs. You can leave you data mixed if you include a filter term that ids the action. It looks something like:
user-id-1,item-id1,user-took-class
user-id-1,item-id2,user-viewed-class-page
user-id-1,item-id5,user-viewed-class-page
...
The output is text delimited (think CSV but you can control the format) and is all item-id tokens that by default looks like this:
item-id-1,item-id-100 item-id-200 item-id-250 ...
This is an item id, comma, and an ordered list of similar items separated by spaces. Index this with a search engine and use the current user's history of action 1 to query against the primary indicator and the user's history of action 2 against the secondary cross-cooccurrence indicator. These can be indexed together as two fields of the same doc so there is only one query against two fields. This also gives you a server that is as scalable as Solr or Elasticsearch. You just create the data models with Mahout then index and query them with a search engine.
Mahout docs:http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
Preso on the theory and other things you can do with these techniques: http://www.slideshare.net/pferrel/unified-recommender-39986309
Using this technique you can take virtually the entire user clickstream recorded as separate actions and use them to make better recs. The actions don't even have to be on the same items. You can use the user's search term history, for instance, and get a cross-cooccurrence indicator. In this case the output would have search terms that lead users to take the course and so your query would be the current user's search term history.

Volusion API - Retrieve Daily Unprocessed Ordres

The answer to my question should be relatively simple - but I can't seem to find a simple answer. I'm trying to find a way to regularly connect to Volusion API to retrieve orders in a manner that assures no duplicates and no missing orders. However, because of the way that queries must be written I'm finding it difficult: 1) You can't use comparisons in queries (greater than, less than, etc.) and 2) The date fields require a full date/time with HHmmss. 3) There is a limit on the number of orders retrieved with each query. If you query against a date/time, for example, you usually can only get one order, because I can't use a comparative function. I saw one post here that suggested iterating through order ID's until no data is received. Has anyone found an easier way to accomplish order retrieval with Volusion API?

How do I get current courses for a user in Desire2Learn's Valence API? What can we do to fetch when courses are in thousands?

We need to find all the courses for a user whose startDate is less than today's date and endDate is greater than today's date. We are using API
/d2l/api/lp/{ver}/enrollments/myenrollments/?orgUnitTypeId=3
In one particular case I have more than 18 thousand courses against one user. The service can not return 18 thousand records at one go, I can only get 100 records at a time, so I need to use bookmark fields to fetch data in set of 100 records. Bookmark is the courseId of the last 100th record that we fetched, to get next set of 100 records.
/d2l/api/lp/{ver}/enrollments/myenrollments/?orgUnitTypeId=3&bookmark=12528
I need to repeat the loop 180 times, which results in "Request time out" error.
I need to filter the record on the basis of startDate and endDate, no sorting criteria is mentioned which can sort the data on the basis of startDate or endDate. Can anyone help me to find out the way to sort these data, or tell any other API which can do such type of sorting?
Note: All the 18 thousand records has property "IsActive":true
Rather than getting to the list of org units by user, you can try getting to the user by the list of org units. You could try using /d2l/api/lp/{ver}/orgstructure/{orgUnitId}/descendants/?ouTypeId={courseOfferingType} to retrieve the entire list of course offering IDs descended from the highest common ancestor known for the user's enrollments. You can then loop through /d2l/api/lp/{ver}/courses/{orgUnitId} to fetch back the course offering info for each one of those org units to pre-filter and cut out all the course offerings you don't care about based on dates. Then, for the ones left, you can check for the user's enrollment in each one of those to figure out which of your smaller set the user matches with.
This will certainly result in more calls to the service, not less, so it only has two advantages I can see:
You should be able to get the entire starting set of course offerings you need off the hop rather than getting it back in pages (although it's entirely possible that this call will get turned into a paging call in the future and the "fetch all the org units at once" nature it currently has deprecated).
If you need to do this entire use-case for more than one user, you can fetch the org structure data once, cache it, and then only do queries for users on the subset of the data.
In the mean time, I think it's totally reasonable to request an enhancement on the enrollments calls to provide better filtering (active/nonactive, start-dates, end-dates, and etc): I suspect that such a request might see more traction than a request to give control to clients over paging (i.e. number of responses in each page frame).

Building a (simple) twitter-clone with CouchDB

I'm trying to build a (simple) twitter-clone which uses CouchDB as Database-Backend.
Because of its reduced feature set, I'm almost finished with coding, but there's one thing left I can't solve with CouchDB - the per user timeline.
As with twitter, the per user timeline should show the tweets of all people I'm following, in a chronological order. With SQL it's a quite simple Select-Statement, but I don't know how to reproduce this with CouchDBs Map/Reduce.
Here's the SQL-Statement I would use with an RDBMS:
SELECT * FROM tweets WHERE user_id IN [1,5,20,33,...] ORDER BY created_at DESC;
CouchDB schema details
user-schema:
{
_id:xxxxxxx,
_rev:yyyyyy,
"type":"user",
"user_id":1,
"username":"john",
...
}
tweet-schema:
{
"_id":"xxxx",
"_rev":"yyyy",
"type":"tweet",
"text":"Sample Text",
"user_id":1,
...
"created_at":"2011-10-17 10:21:36 +000"
}
With view collations it's quite simple to query CouchDB for a list of "all tweets with user_id = 1 ordered chronologically".
But how do I retrieve a list of "all tweets which belongs to the users with the ID 1,2,3,... ordered chronologically"? Do I need another schema for my application?
The best way of doing this would be to save the created_at as a timestamp and then create a view, and map all tweets to the user_id:
function(doc){
if(doc.type == 'tweet'){
emit(doc.user_id, doc);
}
}
Then query the view with the user id's as keys, and in your application sort them however you want(most have a sort method for arrays).
Edited one last time - Was trying to make it all in couchDB... see revisions :)
Is that a CouchDB-only app? Or do you use something in between for additional buisness logic. In the latter case, you could achieve this by running multiple queries.
This might include merging different views. Another approach would be to add a list of "private readers" for each tweet. It allows user-specific (partial) views, but also introduces the complexity of adding the list of readers for each new tweet, or even updating the list in case of new followers or unfollow operations.
It's important to think of possible operations and their frequencies. So when you're mostly generating lists of tweets, it's better to shift the complexity into the way how to integrate the reader information into your documents (i.e. integrating the readers into your tweet doc) and then easily build efficient view indices.
If you have many changes to your data, it's better to design your database not to update too many existing documents at the same time. Instead, try to add data by adding new documents and aggregate via complex views.
But you have shown an edge case where the simple (1-dimensional) list-based index is not enough. You'd actually need secondary indices to filter by time and user-ids (given that fact that you also need partial ranges for both). But this not possible in CouchDB, so you need to work around by shifting "query" data into your docs and use them when building the view.

Resources