Mahout Aggregate Multiple Similarity Objects - mahout-recommender

I have 3 different types of boolean preference data that I'm using to build 3 separate similarity object. I wanted to know if it is possible to combine these 3 similarity objects into one. I have 3 separate data files, one for items followed, items messaged, and item page views. All three files are CSV files of the form user_id, item_id. Item 101 and 567 might have a similarity rating of .047 based on the items followed data, but in the items messaged data the similarity could be .10. My question is how can I combine these to get one similarity rating between all items.

Related

Spatial join a feature with multiple records and keep all records ARCGIS PRO

I have two Polygon feature classes, one contains parcel information, the other building footprints. I have many parcels that contain multiple buildings. In the attached photo, this one parcel has 149 records associated with it, and it contains 33 buildings. I am trying to do a spatial join (with the buildings as target features) and keep ALL 149 records associated with the parcel in the join. The merge rules allow me to choose to keep only the first or last record and associate the same record to all 33 buildings. I tried to concatenate (join on merge rules) but still only one record was kept. I need, at the very least, for all 149 records to be attached to each building when I join the features. Optimally, I would like each building to have only the subset of the 149 records associated with it joined which could be accomplished if I had building addresses in the building feature class, which I do not. Any suggestions would be appreciated.
Parcel with buildings.

Two UIPickerViews with Large Data Set

I would like to create a simple app in Xcode with two UIPickerViews that reference a data set where the second UIPickerView is dependent on the first one. I want to create an app where the user can select the manufacturer of a vehicle; Chevrolet, Dodge, Ford, etc. Then, the user can select the vehicle based on the first choice. For example if "Ford" was selected in the first UIPickerView, then only Ford vehicles show up in the second - F150, Focus, Mustang etc. After selecting both values, the user can search for the average price where the prices are kept in a data set. I found many examples with one UIPickerView referencing arrays, but I want to reference a much larger data set. How would I go about doing this? I am fairly new to Xcode, but I write SAS and SQL code daily.
I am assuming you have all of records saved in the database. I did something similar with 250k+ records.
Do not fetch all of your models' full representation into memory, fetch only one property (string column needed for current picker) with a DISTINCT on it - both SQLite & CoreData allow this.
Your subsequent pickers (2nd, 3rd & so on) will automatically see less data becuase of the previous filter applied (only Ford vehicles possible options).
Rule #1 applies to all of your pickers, only the relevant field as String pulled into memory with right filters.
I had no issues at all with above approach with my dataset. Not sure how big your dataset is.

Best way to use actors when predicting a movie rating

So I'm trying to predict movie ratings based on several variables. I would like to include actors because that has a pretty large impact on the success of a movie. I've come up with several options.
Get the top 5 actors for each movie. Just have a unique integer that represents those actors and use that. I'm worried there are too many unique actors for a model to use this effectively though.
Take an average of all ratings of the movies that the actor performs in and use it like a key performance indicator. Have 5 separate columns for the top 5 actors in the movie with the KPI of each actor in the column.
Same as two except instead of five separate columns, combine them into a single value for the movie.
I'm thinking option two will be best. Is there a better way to go about this? If anyone has had any similar experiences I would love to hear how you solved it.

Merging patient data based on IDs from 4 dimensions into 1 new dimension - there are problems however

I have 4 dimensions with patient data. In each dimension there's an ID for the patients. The only problem is that I have no idea how to merge the 4 dimensions into 1 new dimension.
I would use a merge join, but that doesn't work since I also have patient records with no ID. I can't match the patient records to anything if they don't have an ID. Also, there's patients that have IDs in 2 out of the 4 dimensions only, so how do I load that into my new dimension?
Typically, if you are importing business object data from multiple sources, you generate a new ID in your data warehouse for each business object. Then you code the business rules into SSIS that match resolve the conflicts and merge the appropriate records.

Core Data Model Design: use a single complex Entity or group homogeneous attributes in other entities?

I am dealing with a very complex Entity which has several homogeneous attributes that could be grouped in some kind of "macro categories".
To extremely simplify lets’ think about an Entity, myCar, with only two macro categories: “financial attributes” and “physical attributes”:
Financial attributes: cost, resale value, annual expenses.
Physical attributes: height, width, weight, color.
I have two options to model it:
Option 1: Store all the attributes in a single Entity:
Single Entity: MyCar with the following attributes:
cost
resale value
annual expenses
height
width
weight
color
Option 2: Use three entities and two relationships to model it:
Entity 1: MyCar
1 to 1 relationship 1: Financials
1 to 1 relationship 2: Physicals
Entity 2: Financials
cost
resale value
annual expenses
1 to 1 relationship: myCar
Entity 3: Physicals
Height
Width
Weight
Color
1 to 1 relationship: myCar
Up to now I always used Option 1 but thinking about how the data should be displayed on a Pad, inside an UISplitViewController with “Financials” and “Physicals” options in the master side on the left and related attributes in the detail side on the right, I thought about option 2.
Which is the better approach to model this complex Entity with Core Data? Why?
The choice should be about which data you need at any time. If you always need all of the data then using multiple entities offers little value. If however you have a master view which lists only a subset of the data and a detail view which lists all data then it is very beneficial to separate the data into different entities based around that usage. This limits the amount of data that will be faulted as you scroll through the master list and improves performance.
That doesn't mean that you shouldn't also set the fetch request batch quantity which is also a massive factor in how effective and smooth your scrolling will be...

Resources