Firestore Map field performance with large dataset - ios

I have a list of products, each user can buy multiple tickets for a product. When I fetch a product I need to know how many tickets a user has bought per product. I currently have product as a separate collection and I fetch a list of tickets per products for the current logged in user. This worked great but I was using an in statement to make a single query to fetch the tickets in a single query for multiple products, and Firestore only allows 10 items in an in statement
I'm considering storing the count of tickets on each product as a map. How does this affect the performance of the snapshotListener? Let's say I listen to 100 products and each has a Map with 100k key-values where the key is a string and the value is an Int.
Is it easy to update the map, can i do an increment on the value of a specific field?
Security is not an issue, it's ok to have the ticket count for all users public
Product {
name: String
....
tickets: {
"USERID": 5,
// 100k more records
}
}

I'm considering storing the count of tickets on each product as a map. How does this affect the performance of the snapshotListener?
It has no effect on the performance of anything other than the time it takes to download the document.
Is it easy to update the map, can i do an increment on the value of a specific field?
Yes, it's easy. Use dot notation to locate the field to increment, and use FieldValue.increment() to increment it.

Related

Is there a way to consolidate multiple rows of data in Microsoft Access?

I'm very new to Microsoft Access, and I'm trying to figure out how to output data from a table into a format that's more concise and easier to read.
I have transaction data throughout the year for specific customers. Based off of the Customer number and Catalog number, I would like to consolidate multiple instances of a customer purchasing the same product into one row, run a sum in the Quantity column for instances of the same product, and run a sum in the Price column for instances of the same product.
Desired Output
Input
I tried using an append Query with Customer # and Catalog # as the primary key to try to filter out duplicate entries, but this wasn't working. I'm not super familiar with writing functions in Access, but willing to try anything.

How to count cases with the same ID but different variables in SPSS

I have a data set which has 4420 attendances to a medical department from 1120 people. Each person has a unique ID number and other columns are demographics and primary care provider. I want to filter the data so I can work out how many times each person attends the department and then analyse the data by demographics eg primary care provider or age. It shows whether each attendance is primary or duplicate but I can't figure out how to work out attendances per person.
If what you want to do is to count the number of times each person has visited (assuming each one is represented by a single row in the data), use the AGGREGATE command breaking on the ID variable to add the number of instances to the file as a new variable. In the menus, Data>Aggregate, move the ID variable into the box for Break Variable(s), check the box for Number of cases under Aggregated Variables, change the default N_BREAK to another name if you want, and click OK. That will add a new variable to the data with the number of instances for each unique ID.

Granularity in Star Schema leads to multiple values in Fact Table?

I'm trying to understand star schema at the moment & struggling a lot with granularity.
Say I have a fact table that has session_id, user_id, order_id, product_id and I want to roll-up to sessions by user by week (keeping in mind that not every session would lead to an order or a product & the DW needs to track the sessions for non-purchasing users as well as those who purchase).
I can see no reason to track order_ids or session_ids in the fact table so it would become something like:
week_date, user_id, total_orders, total_sessions ...
But how would I then track product_ids if a user makes more than one purchase in a week? I assume I can't keep multiple product ids in an array (eg: "20/02/2012","5","3","PR01,PR32,PR22")?
I'm thinking it may have to be kept at 'every session' level but that could lead to a very large amount of data. How would you implement granularity for an example such as above?
Dimensional modelling required Dimensions as well as Facts.
You need a Date/Calendar dimension, which includes columns like this:
calendar (id,cal_date,cal_year,cal_month,...)
The "grain" of your fact table is the key to data storage. If you have transactions, then the transaction should be the grain, and you store one row per transaction. Use proper (integer) surrogate keys to your dimensions, and your table won't be as large as you fear.
Now you can write a query like this, to sum sales of product by year:
select product_name,cal_year,sum(purchase_amount)
from fact_whatever
inner join calendar on id = fact_whatever.calendar_id
inner join product on id = fact_whatever.product_id
group by product_name,cal_year

Repeated query on related table are very slow. How to adjust schema to be efficient?

I have two related tables on in a postgres DB. Let's say the first table is Products and the second is Transactions. I have a search feature that queries Products based on specific attributes, some of which are dependent on Transactions (such as last sale price of a specific product).
My problem is that the search query often has to query Transactions for the most recent transaction every time it is searched which takes a long time. It is noteworthy that Transasctions will be updated monthly or quarterly with the latest data.
My simple mind thought a solution would be to add fields for the most recent sale price, etc. in the Products table and run the query so that Products has a most_recent_sales_price field which is updated via query whenever the Transactions table is updated. My gut is telling me that this is a hackey way of caching (which I know very little about). Is there a better approach for this?
Edit: there are approximately 1million transactions and 50,000 products in the DB.

Rating System Database Structure

I have two entity groups. Restaurants and Users. Restaurants can be rated (1-5) by users. And rating fromeach user should be retrievable.
Resturant(id, name, ..... , total_number_of_votes, total_voting_points )
User (id, name ...... )
Rating (id, restaurant_id, user_id, rating_value)
Do i need to store the avg value so that it need not be calculated every time ? which table is the best place to store avg_rating, total_no_of_votes, total_voting_points ?
Well, if you store the average value somewhere; it will only be accurate as of the last time you calculated it. (i.e. you have 5 reviews; then store the averages somewhere. You get 5 more new reviews, and then your saved average is incorrect).
My opinion is that this sort of logic is perfectly suited to a middle-tier. Calculating an average shouldn't be very resource intensive, and really shouldn't impact performance.
If you really want to store it in the database; I would probably store them in their own table, and update those values via triggers. However, this could be even more resource intensive than calculating it in the middle-tier.
Some database, for example, PostGreSQL, allow you to store an array as part of a row. e.g.
create table restaurants (
...,
ratings integer[],
...
);
So you could, for example, keep the last 5 ratings in the same row as the restaurant. When you get a new rating, shuffle the old ratings left, and add the new rating at the end, then calculate the average.

Resources