I've set up receiving of Teams CallRecords into Splunk and now stuck in a process of understanding them. I thought that one CallRecord represents one unique Teams call (like Mr. A dialed Mr. B, Mr. B answered, they talked and eventually hung up - that's a CallRecord) and documentation suggests that it is so: "callRecord resource type represents a single peer-to-peer call or a group call between multiple participants", "id - String - Unique identifier for the call record. Read-only."
But what I see is many CallRecords with same id but different versions ("version" field). These records might have different DateTimes of start and end, different lastModifiedDateTime, some versions have null values of organizer* and participant* fields. I saw quantity of versions from 1 to 66.
So here are my questions:
Does the one CallRecord represent one unique conversation? If yes, what would be it's unique identificator? id+version? Then why there are records with same id, different versions and same other data except lastModifiedFateTime (so these records are roughly the same and it will result in double accounting in final report)? And why there are records with null fields of organizer*?
Does the set of all CallRecords with same id and different versions represent one call? If I merge all such records into one I get multivalue fields of startDateTime, endDateTime and other DateTimes - which values of them I need to use for accounting - min(startDateTime) and max(endDateTime) or what?
Maybe there is some deep dive Microsoft documentation on this versioning? Frankly, I've completely lost here.
Related
I have a neo4j database where two users attempt to trade a number of cards.
Each trade node has two outgoing relationships towards the two users the trade involves,
and the cards that are being traded.
If no agreement is made, a subsequent trade is created pointing to the previous one with a PREVIOUS relationship.
If an agreement is made the last node of the trade chain is marked with a success:true property.
The image below represents an example trade between two users.
I am trying to get all last trade nodes between two users with ids 10 and 20.
The last trade node is the one that has no incoming relationship.
My attempt is this:
MATCH (u:User)<--(t:Trade)-->(n:User)
WHERE (ID(u)=10 AND ID(n)=20) OR (ID(u)=20 AND ID(n)=10)
AND NOT (t)<-[:PREVIOUS]-()
RETURN t
The above however returns all 3 trade nodes. In fact the third line seems to make no
difference in the result of the query.
Why is that? How else can I achieve my objective?
I think the problem is with the order of boolean evaluations.
That is, AND is evaluated before OR (but after parenthesis), so what you have (simplified down) is:
WHERE (<id check 1>) OR (<id check 2>) AND <not pattern>
The AND grouping gets evaluated first, so it behaves like:
WHERE (<id check 1>) OR ((<id check 2>) AND <not pattern>)
so as long as the first id check evaluates to true, then the entire WHERE clause comes out as true.
To fix, add parenthesis to surround the ID predicates like so:
WHERE ((ID(u)=10 AND ID(n)=20) OR (ID(u)=20 AND ID(n)=10))
AND NOT (t)<-[:PREVIOUS]-()
I have a fact table that includes "wait times in hours" for certain services. I have a lot of dimensions that could describe the wait-times based on different slices; however, I am also interested in knowing how many people (counts) came for services through the filters of the same dimensions.
Given the dimensions for both the wait-times in hours and the number of people who got services are exactly the same, I think it's best practice to keep it in the same fact table. My question is:
Should there be a different fact table for the count measure mentioned?
How would I include this measure? Do I just put 1 in every single row? Because regardless of the wait-time, they've gotten the service only once (you cannot go above/below 1 in my scenario).
1) Think about the grain of your existing fact table. It sounds like it's probably "an occasion on which a person received a service." If that's the same thing you're trying to count, then yes - the waiting time and the count are the same grain.
However, while they may well be the same grain, there might be no need to add anything to the table. Read point 2 for an explanation.
2) You could put a 1 in a column on every row, but I'm not sure what you'd gain from it. You've not said what tools will be consuming this data, but you should be able to do a count/distinct count of some kind.
Working on the basis that you've tagged SSIS so are likely using Microsoft's BI stack:
TSQL has count(), and you can do count(distinct [column]).
SSAS has both counts and distinct counts as aggregation types.
MDX offers several different types of count.
SSRS has Count, CountDistinct, and CountRows.
Whether you do a normal count or a distinct count will depend on whether you're trying to ask "How many people used this service?" or "How many different people used this service?"
We need to find all the courses for a user whose startDate is less than today's date and endDate is greater than today's date. We are using API
/d2l/api/lp/{ver}/enrollments/myenrollments/?orgUnitTypeId=3
In one particular case I have more than 18 thousand courses against one user. The service can not return 18 thousand records at one go, I can only get 100 records at a time, so I need to use bookmark fields to fetch data in set of 100 records. Bookmark is the courseId of the last 100th record that we fetched, to get next set of 100 records.
/d2l/api/lp/{ver}/enrollments/myenrollments/?orgUnitTypeId=3&bookmark=12528
I need to repeat the loop 180 times, which results in "Request time out" error.
I need to filter the record on the basis of startDate and endDate, no sorting criteria is mentioned which can sort the data on the basis of startDate or endDate. Can anyone help me to find out the way to sort these data, or tell any other API which can do such type of sorting?
Note: All the 18 thousand records has property "IsActive":true
Rather than getting to the list of org units by user, you can try getting to the user by the list of org units. You could try using /d2l/api/lp/{ver}/orgstructure/{orgUnitId}/descendants/?ouTypeId={courseOfferingType} to retrieve the entire list of course offering IDs descended from the highest common ancestor known for the user's enrollments. You can then loop through /d2l/api/lp/{ver}/courses/{orgUnitId} to fetch back the course offering info for each one of those org units to pre-filter and cut out all the course offerings you don't care about based on dates. Then, for the ones left, you can check for the user's enrollment in each one of those to figure out which of your smaller set the user matches with.
This will certainly result in more calls to the service, not less, so it only has two advantages I can see:
You should be able to get the entire starting set of course offerings you need off the hop rather than getting it back in pages (although it's entirely possible that this call will get turned into a paging call in the future and the "fetch all the org units at once" nature it currently has deprecated).
If you need to do this entire use-case for more than one user, you can fetch the org structure data once, cache it, and then only do queries for users on the subset of the data.
In the mean time, I think it's totally reasonable to request an enhancement on the enrollments calls to provide better filtering (active/nonactive, start-dates, end-dates, and etc): I suspect that such a request might see more traction than a request to give control to clients over paging (i.e. number of responses in each page frame).
I have ten master tables and one Transaction table. In my transaction table (it is a memory table just like ClientDataSet) there are ten lookup fields pointing to my ten master tables.
Now i am trying to dynamically assigning key field values to all my lookup key field values (of the transaction table) from a different Server(data is coming as a soap xml). Before assigning these values i need to check whether the corresponding result value is valid in master tables or not. I am using a filter (eg status = 1 ) to check whether it is valid or not.
Currently how we are doing is, before assigning each key field value we are filtering the master tables using this filter and using the locate function to check whether it is there or not. and if located we will assign its key field value.
This will work fine if there is only few records in my master tables. Consider my master tables having fifty thousand records each (yeah, customer is having so much data), this will lead to big performance issue.
Could you please help me to handle this situation.
Thanks
Basil
The only way to know if it is slow, why, where, and what solution works best is to profile.
Don't make a priori assumptions.
That being said, minimizing round trips to the server and the amount of data transferred is often a good thing to try.
For instance, if your master tables are on the server (not 100% clear from your question), sending only 1 Query (or stored proc call) passing all the values to check at once as parameters and doing a bunch of "IF EXISTS..." and returning all the answers at once (either output params or a 1 record dataset) would be a good start.
And 50,000 records is not much, so, as I said initially, you may not even have a performance problem. Check it first!
In my present Rails application, I am resolving scheduling conflicts by sorting the models by the "created_at" field. However, I realized that when inserting multiple models from a form that allows this, all of the created_at times are exactly the same!
This is more a question of best programming practices: Can your application rely on your ID column in your database to increment greater and greater with each INSERT to get their order of creation? To put it another way, can I sort a group of rows I pull out of my database by their ID column and be assured this is an accurate sort based on creation order? And is this a good practice in my application?
The generated identification numbers will be unique.
Regardless of whether you use Sequences, like in PostgreSQL and Oracle or if you use another mechanism like auto-increment of MySQL.
However, Sequences are most often acquired in bulks of, for example 20 numbers.
So with PostgreSQL you can not determine which field was inserted first. There might even be gaps in the id's of inserted records.
Therefore you shouldn't use a generated id field for a task like that in order to not rely on database implementation details.
Generating a created or updated field during command execution is much better for sorting by creation-, or update-time later on.
For example:
INSERT INTO A (data, created) VALUES (smething, DATE())
UPDATE A SET data=something, updated=DATE()
That depends on your database vendor.
MySQL I believe absolutely orders auto increment keys. SQL Server I don't know for sure that it does or not but I believe that it does.
Where you'll run into problems is with databases that don't support this functionality, most notably Oracle that uses sequences, which are roughly but not absolutely ordered.
An alternative might be to go for created time and then ID.
I believe the answer to your question is yes...if I read between the lines, I think you are concerned that the system may re-use ID's numbers that are 'missing' in the sequence, and therefore if you had used 1,2,3,5,6,7 as ID numbers, in all the implementations I know of, the next ID number will always be 8 (or possibly higher), but I don't know of any DB that would try and figure out that record Id #4 is missing, so attempt to re-use that ID number.
Though I am most familiar with SQL Server, I don't know why any vendor who try and fill the gaps in a sequence - think of the overhead of keeping that list of unused ID's, as opposed to just always keeping track of the last I number used, and adding 1.
I'd say you could safely rely on the next ID assigned number always being higher than the last - not just unique.
Yes the id will be unique and no, you can not and should not rely on it for sorting - it is there to guarantee row uniqueness only. The best approach is, as emktas indicated, to use a separate "updated" or "created" field for just this information.
For setting the creation time, you can just use a default value like this
CREATE TABLE foo (
id INTEGER UNSIGNED AUTO_INCREMENT NOT NULL;
created TIMESTAMP NOT NULL DEFAULT NOW();
updated TIMESTAMP;
PRIMARY KEY(id);
) engine=InnoDB; ## whatever :P
Now, that takes care of creation time. with update time I would suggest an AFTER UPDATE trigger like this one (of course you can do it in a separate query, but the trigger, in my opinion, is a better solution - more transparent):
DELIMITER $$
CREATE TRIGGER foo_a_upd AFTER UPDATE ON foo
FOR EACH ROW BEGIN
SET NEW.updated = NOW();
END;
$$
DELIMITER ;
And that should do it.
EDIT:
Woe is me. Foolishly I've not specified, that this is for mysql, there might be some differences in the function names (namely, 'NOW') and other subtle itty-bitty.
One caveat to EJB's answer:
SQL does not give any guarantee of ordering if you don't specify an order by column. E.g. if you delete some early rows, then insert 'em, the new ones may end up living in the same place in the db the old ones did (albeit with new IDs), and that's what it may use as its default sort.
FWIW, I typically use order by ID as an effective version of order by created_at. It's cheaper in that it doesn't require adding an index to a datetime field (which is bigger and therefore slower than a simple integer primary key index), guaranteed to be different, and I don't really care if a few rows that were added at about the same time sort in some slightly different order.
This is probably DB engine depended. I would check how your DB implements sequences and if there are no documented problems then I would decide to rely on ID.
E.g. Postgresql sequence is OK unless you play with the sequence cache parameters.
There is a possibility that other programmer will manually create or copy records from different DB with wrong ID column. However I would simplify the problem. Do not bother with low probability cases where someone will manually destroy data integrity. You cannot protect against everything.
My advice is to rely on sequence generated IDs and move your project forward.
In theory yes the highest id number is the last created. Remember though that databases do have the ability to temporaily turn off the insert of the autogenerated value , insert some records manaully and then turn it back on. These inserts are no typically used on a production system but can happen occasionally when moving a large chunk of data from another system.