Creating variable of team demographics- similarity/dissimilarity - spss

I wish to create a variable depicting similarity with respect to race to team members. In other words, I want to know of the people an individual shares a manager with, what percentage of the team is of the same race?
The variables I currently have are participant id, participant race, manager id, manager race, and team size. I know the racial breakdown of the teams, but I need the percentage of similar others in a team for each participant (in one column, not across columns split by race).

First calculate the number of participants of the same race for each participants, then divide that in the team size to get the ratio.
In the following code I assume you have the team ID although you didn't mention that in your post:
aggregate out=* mode=addvariables /break team_id participant_race /NRaceInTeam=n.
compute PRaceInTeam=NRaceInTeam/Team_Size.

Related

Unit Price and Discounts - Fact or Dimension Table

I'm working on a datamart for our sales and marketing departments, and I've come across a modeling challenge. Our ERP stores pricing data in a few different ways:
List pricing for each item
A discount percentage from list pricing for a product line, either for groups of customers or for a specific account
A custom price for an item, either for groups of customers or for a specific account
The Pricing department primarily uses this data operationally, not analytically. For example, they generate reports for customers ("What special pricing / discount %s do I have?") and identify which items / item groups need to be changed when they engage in a new pricing strategy.
Pricing changes happen somewhat regularly on a small scale, usually on a customer-by-customer or item-by-item basis. Infrequently, there are large-scale adjustments to list pricing and group pricing (discounts and individual items) in addition to the customer-level discounts.
My head has been in creating one or more fact tables to represent this process. Unfortunately, there's no pre-existing business key for pricing. There's also no specific "transaction date," since the ERP doesn't (accurately) maintain records of when pricing is changed. Essentially, a "pricing event" is going to be a combination of:
Effective date
End date
Item OR product line
(Not required for list price) customer or customer group
A price amount OR discount percentage
A single fact table seems problematic in that I'm going to have to deal with a lot of invalid combinations of dimensions and facts. First, a record will never have both a non-NULL price amount and a non-NULL discount percentage; pricing events are either-or. Second, only certain combinations of dimensions are valid for each fact. For example, a discount percentage will only ever have a product line, not an individual item.
Does it make sense to model pricing as a fact table in the first place? If so, how many tables should I be considering? My intuition is to use at least two, one for the percentages and one for the price amounts, but this still leaves a problem where each record will either have a valid customer group OR a valid customer (or neither, for list prices), since we need to maintain customer-specific pricing separate from any group pricing that customer might have.
You may need to keep them both as attributes and as facts.
The price a certain item was sold for is a fact. When you multiply it by the quantity sold it's actually an additive measure. So, keep it in the fact table. Total discount applied is also additive, I'd keep it. You can later query "how much was discounted in 2019 per customer", which would be much harder to achieve without those facts.
But if you also need to query things like "what's the discount customer X is on", then you should also keep that as an attribute of the customer dimension, and treat it as a type II dimension, so as to keep discount history. If you know when a certain discount was applied, great, if not take the 1st sale as the start date and you won't be too far off.
Maybe the list price can also be kept as an attribute of product or product line in a dimension, but only if they don't change too often; but if most customers get discounts anyway that would be of limited use.

ER Model representing entities not stored in DB and user choice

I'm trying to create a ER diagram of a simple retail chain type database model. You have your customer, the various stores, inventory etc.
My first question is, how to represent a customer placing an order in a store. If the customer is a discount card holder, the company has their name, address etc, so I can have a cardHolder entity connect to item and store with an order relationship. But how do I represent an order being placed by a customer who is not really an entity in the database?
Secondly, how are conditional... stuff represented in ER diagrams, e.g. in a car dealership, a customer may choose one or more optional extra when buying a car. I would think that there is a Car entity with the relevant attributes and the options as a multi-valued attribute, but how do you represent a user picking those options (I.e. order table shows the car ordered, extras chosen and the added cost of extras) in the order relationship?
First, do you really need to model customers as distinct entities, or do you just need order, payment and delivery details? Many retail systems don't track individual customers. If you need to, you can have a customer table with a surrogate key and unique constraints on identifying attributes like SSN or discount card number (even if those attributes are optional). It's generally hard to prevent duplication in customer tables since there's no ideal natural key for people, so consider whether this is really required.
How to model optional extras depends on what they depends on. Some extras might be make or model-specific, e.g. the choice of certain colors or manual/automatic transmission. Extended warranties might be available across the board.
Here's an example of car-specific optional extras:
car (car_id PK, make, model, color, vin, price, ...)
car_extras (extra_id PK, car_id FK, option_name, price)
order (order_id PK, date_time, car_id FK, customer_id FK, payment_id FK, discount)
order_extras (order_id PK/FK, car_id FK, extra_id PK/FK)
I excluded price totals since those can be calculated via aggregate queries.
In my example, order_extras.car_id is redundant, but supports better integrity via the use of composite FK constraints (i.e. (order_id, car_id) references the corresponding columns in order, and (car_id, extra_id) references the corresponding columns in car_optional_extras to prevent invalid extras from being linked to an order).
Here's an ER diagram for the tables above:
First, as per your thought you can definitely have two kinds of customers. Discount card holders whose details are present with the company and new customers whose details aren't available with the company.
There are three possible ways to achieve what you are trying,
1) Have two different order table in the system(which I personally wouldn't suggest)
2) Have a single Order table in the system and getting the details of those who are a discount card holder.
3) Insert a row in the discount card holder table for new/unregistered customers having only one order table in the system.
Having a single order table would make the system standardized and would be more convenient while performing many other operations.
Secondly, to solve your concern, you need to follow normalization. It will reduce the current problem faced and will also make the system redundant free and will make the entities light weighted which will directly impact on the performance when you grow large.
The extra chosen items can be listed in the order against the customer by adding it at the time of generating a bill using foreign key. Dealing with keys will result in fast and robust results instead of storing redundant/repeating details at various places.
By following normalization, the problem can be handled by applying foreign keys wherever you want to refer data to avoid problems or errors.
Preferably NF 4 would be better. Have a look at the following link for getting started with normalization.
http://www.w3schools.in/dbms/database-normalization/

Enforcing association uniqueness in ActiveRecord

There are three sets of entities: Players, Teams and Games. Team may consist from one or two Players and are formed voluntary for each particular Game, i.e.
players A,B,C,D can form 11 Teams because Team AB is the same as Team BA.
That being said Teams may contain only unique set of players - BA is a duplication of AB.
The most obvious way to form the Teams and Games relationship is as many to many, but is this a way to go? The real question is how to model those restrictions in robust and scalable way, so they can handle, let's say, a team not of only 1-2 players but also a team of 1-20 players without (much) augmentation?
Here is an abstract interface I'm thinking of -
Team.find_or_create_by(player_ids: [p1.id, p2.id]) # find p1 and p2 team
Team.find_by(player_id: p1) # find all teams that p1 has participated in
p.s. I do not think that question title is really good and probably this is well known problem which has an established name, so if one can point it out I'll be really glad to swap the title.

Is it possible to have conditional OLAP dimension aggregators?

I have a set of OLAP cubes, in the form of a snow-flake schema, each representing one factory.
I have three concepts that for some factories clearly behave as 3 dimensions, and for other factories clearly behave as 2 dimensions.
The concepts are always the same: "products", "sales agents" and "customers".
But for some cases, I doubt if I should model it as a purely 3 dimensional cube or I should play around with some tweak or trick with a 2 dimensional cube.
Cases A and B are the ones that are clear for me, and Case C is the one that generates my wonderings.
CASE A: Clearly a 3 dimensional cube
Any agent can sell any product to any company. Several agents are resposible together for the same set of customers.
I model this case as this:
CASE B: Clearly a 2 dimensional cube
Every agent is 'responsible' for a portfolio of customers, and he can sell any product but only to his customers. The analysis is made on 'current responsability on the portfolio' so if an agent leaves the company, all his customers are reassigned to a new agent and the customer uniquely belongs to the new agent.
I model this case as this:
CASE C: My doubts
A customer may have been assigned a single agent or a set of several agents each one being responsible for a ProductCategory.
For example:
Alice manages TablesAndWoods ltd and GreenForest ltd.
Bob manages Chairs ltd and FastWheels ltd.
Carol manages Forniture ltd ONLY for ProductType = 'machinery' and also manages FrozenBottles ltd for ANY type of product.
Dave also manages Forniture ltd but ONLY for ProductType = 'consumables' and also manages HighCeilings ltd for ANY type of product.
QUESTION:
In this example "Case C":
Are customer and agent independent dimensions because Forniture ltd has relation both to Caroland Dave, so it is a 3D cube?
Or it is a 2D cube, where agent is not an independent dimension, but it is an aggregator of customer "conditioned" somehow by the ProductCategory product aggregator?
I would like to see how would you model this.
Thanks in advance.
Here is how I would model it:
Your fact table is Sales.
Your dimensions are (probably) Date, Product, Customer and Agent. This is closest to your Case A.
Collapse your snowflake (white entities) into the dimensions. The presence of these entities suggest that you should consider whether type-2 slowly changing dimensions are needed for at-time analysis.
Consider a Bridge table to capture the many-to-many relationship between Agent and Product.

Point of Sale and Inventory database schema

I’m trying to create a basic Point of Sale and Inventory management system.
Some things to take into account:
The products are always the same (same ID) through the whole system, but inventory (available units for sale per product) is unique per location. Location Y and Z may both have for sale units of product X, but if, for example, two units are sold from location Y, location Z’s inventory should not be affected. Its stocked units are still intact.
Selling one (1) unit of product X from location Y, means inventory of location Y should subtract one unit from its inventory.
From that, I thought of these tables:
locations
id
name
products
id
name
transactions
id
description
inventories_header
id
location_id
product_id
inventories_detail
inventories_id
transaction_id
unit_cost
unit_price
quantity
orders_header
id
date
total (calculated from orders_detail quantity * price; just for future data validation)
orders_detail
order_id
transaction_id
product_id
quantity
price
Okay, so, are there any questions? Of course.
How do I keep track of changes in units cost? If some day I start paying more for a certain product, I would need to keep track of the marginal utility ((cost*quantity) - (price*quantity) = marginal utility) some way. I thought of inventories_detail mostly for this. I wouldn’t have cared otherwise.
Are relationships well stablished? I still have a hard time thinking if the locations have inventories, or if inventories have several locations. It’s maddening.
How would you keep/know your current stock levels? Since I had to separate the inventory table to keep up with cost updates, I guess I would just have to add up all the quantities stated in inventories_detail.
Any suggestions do you want to share?
I’m sure I still have some questions, but these are mostly the ones I need addressing. Also, since I’m using Ruby on Rails for the first time, actually, as a learning experience, it’s a shame to be stopped at design, not letting me punch through implementation quicker, but I guess that’s the way it should be.
Thanks in advance.
The tricky part here is that you're really doing more than a POS solution. You're also doing an inventory management & basic cost accounting system.
The first scenario you need to address is what accounting method you'll use to determine the cost of any item sold. The most common options would be FIFO, LIFO, or Specific Identification (all terms that can be Googled).
In all 3 scenarios, you should record your purchases of your goods in a data structure (typically called PurchaseOrder, but in this case I'll call it SourcingOrder to differentiate from your orders tables in the original question).
The structure below assumes that each sourcing order line will be for one location (otherwise things get even more complex). In other words, if I buy 2 widgets for store A and 2 for store B, I'd add 2 lines to the order with quantity 2 for each, not one line with quantity 4.
SourcingOrder
- order_number
- order_date
SourcingOrderLine
- product_id
- unit_cost
- quantity
- location_id
Inventory can be one level...
InventoryTransaction
- product_id
- quantity
- sourcing_order_line_id
- order_line_id
- location_id
- source_inventory_transaction_id
Each time a SourcingOrderLine is received at a store, you'll create an InventoryTransaction with a positive quantity and FK references to the sourcing_order_line_id, product_id and location_id.
Each time a sale is made, you'll create an InventoryTransaction with a negative quantity and FK references to the order_line_id, product_id and location_id, source_inventory_transaction_id.
The source_inventory_transaction_id would be a link from the negative quantity InventoryTransaction back to the postiive quantity InventoryTransaction calculated using whichever accounting method you choose.
Current inventory for a location would be SELECT sum(quantity) FROM inventory_transactions WHERE product_id = ? and location_id = ?
GROUP BY product_id, location_id.
Marginal cost would be calculated by tracing back from the sale, through the 2 related inventory transactions to the SourcingOrder line.
NOTE: You have to handle the case where you allocate one order line across 2 inventory transactions because the ordered quantity was larger that what was left in the next inventory transaction to be allocated. This data structure will handle this, but you'll need to work the logic and query yourself.
Brian is correct. Just to add additional info. If you are working into a complete system for your business or client. I would suggest that you start working on the organizational level down to process of POS and accounting. That would make your database experience more extensive... :P In my experience in system development, Inventory modules always start with the stock taking+(purchases-purchase returns)=SKU available for sales. POS is not directly attached to Inventory module but rather will be reconciled daily by the sales supervisor. Total Daily Sales quantities will then be deducted to SKU available for sales. you will work out also the costing and pricing modules. Correct normalization of database is always a must.

Resources