NoSql Data Modeling with Firebase - ios

Im second guessing how I have modeled my data for a buy/sell app. Each Entity is stored as a node, which contains a list of that entity specified by unique id's.
users -> userId -> [name: "joe", age: 21].
where it gets interesting is the way I store Items. First, I have an items node designed just like the example above. This way I can easily search for any item, or all items. This is handy because I can load all items and add business logic to to display items that can be categorized as Recently Added, Local, Trending etc. These are not nodes in the database.
Now, a user needs access to items they are personally involved with. This kind of item has its own node like so:
selling -> userId -> itemId -> [title: "shoes", size: 11].
Other categories like buying, bought selling, sold liked, archive follow this pattern.
It seems taxing to make changes/searches at so many locations in db when something happens. For instance, if a user wins an item(theres bidding), I have to remove from buying add to bought, add to archive, remove from all items, and for the user who sold the item do the inverse pretty much.
Is it normal to execute this many queries, or should items be more tightly related?
I'm using Firebase by the way. Thanks for your time

Is it normal to execute this many queries, or should items be more tightly related?
Since Firebase doesn't have server-side joins, it is quite common to do joins from the client. This is in itself not always a performance problem, since Firebase pipelines the requests over a single connection.
But it's also quite common to duplicate some of the data to prevent/reduce the number of joins. You'll require a strategy to (or event whether) to keep the duplicated data in sync.
A final alternative is to preload certain data. While that likely doesn't apply here, it can be quite common if the list where you look up from is relatively short, e.g. the list of categories for items.

Related

Is a table (from source system) that contains only relationships and current status of a row from another table a fact table in Data Warehouse?

I am developing a BI system for our company, from scratch, and currently, I am designing a data warehouse. I am completely new to this so there are many things that I don't really understand, so I need to hear some more insights into this.
My problems are:
1) In our source system, there are tables called "Booking" and "BookingAccess". Booking table holds the data of a booking, such as check-in time and check-out time, booking date, booking number, gross amount of that booking.
Whereas in BookingAccess, it holds foreign keys related to the booking, such as bookerID, customerID, processID, hotelID, paymentproviderID and a current status of that booking. Booking and BookingAccess has a 1:1 relation ship.
Our source system is about checking the validity of those bookings, these bookings are not ours. We receive these booking information from other sources, outsource the above process for them. The gross amount is just an information of that booking that we need to validate, their are not parts of our business. The current status of a booking which is hold in the BookingAccess table is the current status of that booking in our system, which can be "Processing" or "Finshed".
From what I read from Ralph Kimball, in this situation, the "Booking" is the Dimension table, and the BookingAccess should be the fact. I feel that the BookingAccess is some what a [Accumulating Snapshot table], in which I should track the time when a booking is "Processing", and when a booking is "Finshed".
Do I get it right?
2) In "Booking" table, there is also a foreign key called "ImportID". This key links to a table called "Import". This "Import" table hold history records of files (these file contain bookings which will be written to the "Booking" table) which were imported to our system, including attributes such as file name, imported date, total booking imported...
From my point of view, this is clearly a fact table.
But the problem is that, the "Import" table and the "Booking" table has a relationship of one to many (1 ImportID in "Import" table can have 1, 2 or more records which have a same ImportID in "Booking" table). This is against the idea of fact tables which insists that the relationship between Fact and Dimension must be many-to-one, which fact is always in the many side.
So what approach should I use to solve this case? I'm thinking of using bridge tables to solve this problem. But I don't know if this is a good practice, as there are a lot of record in the "Import" table, so I will have to create a big bridge table just to covers all of this.
3) Should I separate a table (from source system) which contains a mix of relationships and information to a fact table containing only relationships, and dimension table containing only information? (For example, a table called "Customer" in source system. This table contains some things like customer name, customer address and customertype id, customer parentID....)
I am asking this because I feel that if I use BI tools to analyze things (for example, analyzing the number of customers which has customertypeid = 1), I feel it's some what weird if there are no fact tables involved in.
Or should I treat it as a mere dimension table and use snowflake-schema? But this will lead to a mix of Star-Schema and snowflake-schema in our Data Warehouse. Is this normal? I have read some official sources (most likely Oracle) stating that one should try to avoid using and mixing snowflake-schema as much as possible. But some sources like Microsoft say that this is very normal. Even the Advanture Work Data Warehouse sample database uses this kind of approach.
Or should I de-normalize every relation in that "Customer" table? But I don't think this is a good approach as it will make the Customer contain a lot of columns, and it will be very hard to track the history of every row in the "DIM_Customer" table. For example, if any change occur in any relation of "Customer" table, the whole "DIM_Customer" table will need to be updated.
I still have a lot of question regarding to Data Warehouse. I am working with it nearly alone, without any help or consultant. So pardon me if I made any kind of inconveniences or mistakes.

Firestore: Is it possibile to have duplicate auto generated ID across different subcollection?

I have a collection of Shop, every shop have a subcollection of item.
Item document has a property isAvailable which is a boolean.
Then, I need to put item in the user's shopping cart.
It's important to observe item isAvailable value to inform in real-time that an item is no longer available and auto-remove from all shopping cart.
So I decided to put in the Item object an array of user id and create a duplicated list of all objects at root level of db to simulate an observable shopping cart (I thought it's a good way to structure for this purpose, if you have bettere ideas just tell me).
My problem is: since I duplicate all the subcollections in a single collection and use the same document id, there may be duplicates in the final big collection, is it right?
In short, auto-generate iDs are statistically unique with a good enough probability to consider it all the time. See here.
Also in firestore, the time-based calculation has been removed so the ids are not chronological anymore compared to the real-time database.
Regarding your data structure, I wouldn't recommend duplicating as one of the benefits of firestore is to avoid that, versus real-time database which in some cases you would need to do that.
Also avoid arrays as much as you can and use the object instead of as you can query them.
As I understand, you just want to make sure the items are available. I suggest you do a check when a user wants to proceed to checkout or anytime the page is refreshed and this way you ensure no unavailable product is purchased. That's it.
If you still have a problem, perhaps give me a snapshot of your data rather than explaining, something like
ShopsCollection
- itemDocument
- isAvailable : true

Storing Product Properties

I'm creating a jewellery product catalogue application and I need to store properties for each product such as material, finishes, product type etc.
I've concluded that there needs to be a model for each property, mainly because things like material and finishes might have prices and weights and other things associated with them.
Which of the two options will be the most efficient way to store data and be scalable
Create a model PropertyMap that will map property types and IDs to a Product ID.
Create several other models such as ProductMaterial, ProductFinish etc that will made a property to a product
All the data needs to be searchable & filterable. The database will probably index around 10K products.
Open to other smarter ways to store this data as well!
As a rule of thumb, to get the most out of your database tools, it's best to normalize your data according to the typical SQL conventions. That means that a bunch of fields that have a one-to-one relationship with each other should be collected together into the same table. That way you can grab them all (and they're frequently needed together) with a simple and efficient query.
If you instead have to gather them up from some different organization, both you and the database will end up having to do a lot more work. It will scale poorly, both on the hardware and in your brain as you struggle to maintain and extend it.

Relational Database Design (E-Commence) - Core Data

In my e-commence app (for café/restaurants) I currently have the following database structure.
The cart is the shopping cart, in which you can add products, a temporary place before the products/an order is sent to the server. The ProductCart is a line item, many products (could be the same) with the different quantities, sizes, frying levels etc. When a order is sent, the cart is cleared and the products in the cart is transfered to the ProductOrder entity (an Order).
I now want to extend this further, with the ability of the products having ingredients and this is where it gets tricky and too complex for my head and database skills :-). As well as the (same) products can have different sizes and frying levels (hence the line item) a product should have the ability to have many different ingredients (add ons) for example a pizza, where you could choose the topping. This is what I have tried so far:
But I am not sure if this is the right structure or way to do it?
This is my suggestion.
Remove ProductOrder and Order entities. They are the same as ProductCart and Cart.
Now ProductCart should have an attribute like synchronized that is 1 or 0 based if it has been sent to server or not.
Through this you should simplify a lot your model. About Ingredient… entities they seem ok to me.
There is something fundamental you have not grasped about Core Data. Your ProductOrder entity is essentially a join table. This is completely unnecessary if you are not tracking additional attributes in this table.
Instead, you should have a many-to-many relationship between Order and Product.
It might seem that ProductCart satisfies my condition above that in this case a join table makes sense. But no - you should simply add the orders to your cart and track all the information in the Order entity.

Single Inheritance or Polymorphic?

I'm programming a website that allows users to post classified ads with detailed fields for different types of items they are selling. However, I have a question about the best database schema.
The site features many categories (eg. Cars, Computers, Cameras) and each category of ads have their own distinct fields. For example, Cars have attributes such as number of doors, make, model, and horsepower while Computers have attributes such as CPU, RAM, Motherboard Model, etc.
Now since they are all listings, I was thinking of a polymorphic approach, creating a parent LISTINGS table and a different child table for each of the different categories (COMPUTERS, CARS, CAMERAS). Each child table will have a listing_id that will link back to the LISTINGS TABLE. So when a listing is fetched, it would fetch a row from LISTINGS joined by the linked row in the associated child table.
LISTINGS
-listing_id
-user_id
-email_address
-date_created
-description
CARS
-car_id
-listing_id
-make
-model
-num_doors
-horsepower
COMPUTERS
-computer_id
-listing_id
-cpu
-ram
-motherboard_model
Now, is this schema a good design pattern or are there better ways to do this?
I considered single inheritance but quickly brushed off the thought because the table will get too large too quickly, but then another dilemma came to mind - if the user does a global search on all the listings, then that means I will have to query each child table separately. What happens if I have over 100 different categories, wouldn't it be inefficient?
I also thought of another approach where there is a master table (meta table) that defines the fields in each category and a field table that stores the field values of each listing, but would that go against database normalization?
How would sites like Kijiji do it?
Your database design is fine. No reason to change what you've got. I've seen the search done a few ways. One is to have your search stored procedure join all the tables you need to search across and index the columns to be searched. The second way I've seen it done which worked pretty well was to have a table that is only used for search which gets a copy of whatever fields that need to be searched. Then you would put triggers on those fields and update the search table.
They both have drawbacks but I preferred the first to the second.
EDIT
You need the following tables.
Categories
- Id
- Description
CategoriesListingsXref
- CategoryId
- ListingId
With this cross reference model you can join all your listings for a given category during search. Then add a little dynamic sql (because it's easier to understand) and build up your query to include the field(s) you want to search against and call execute on your query.
That's it.
EDIT 2
This seems to be a little bigger discussion that we can fin in these comment boxes. But, anything we would discuss can be understood by reading the following post.
http://www.sommarskog.se/dyn-search-2008.html
It is really complete and shows you more than 1 way of doing it with pro's and cons.
Good luck.
I think the design you have chosen will be good for the scenario you just described. Though I'm not sure if the sub class tables should have their own ID. Since a CAR is a Listing, it makes sense that the values are from the same "domain".
In the typical classified ads site, the data for an ad is written once and then is basically read-only. You can exploit this and store the data in a second set of tables that are more optimized for searching in just the way you want the users to search. Also, the search problem only really exists for a "general" search. Once the user picks a certain type of ad, you can switch to the sub class tables in order to do more advanced search (RAM > 4gb, cpu = overpowered).

Resources