mahout for content based recomendation

mahout for content based recomendation - machine-learning

I have a list user data : user name, age, sex , address, location etc and
a set of product data : Product name, Cost , description etc
Now i would like to build a recommendation engine that will be able to :
1 Figure out similar products
eg :
name : category : cost : ingredients
x : x1 : 15 : xx1, xx2, xx3
y : y1 : 14 : yy1, yy2, yy3
z : x1 : 12 : xx1, xy1
here x and z are similar.
2 Recommend relevant products from the product list to user
How this kind or recommendation engine can be implement with mahout ? Which all are the available methods ? Is there any useful tutorial/link available ? Please help

In mahout v1 from here https://github.com/apache/mahout your can use "spark-rowsimilarity" to create indicators for each type of metadata, categroy, cost, and ingredients. This will give you three matrices containing similar items for each item based on that particular metadata. This will give you a "more like this" type of recommendation. You can also try combining the metadata into one input matrix and see if that gives better results.
To personalize this record which items the user has expressed some preference for. Index the indicator matrices in Solr, one indicator per Solr "field" all attached to the item ID (name?). Then the query is the user's history against each field. You can boost certain fields to increase their weight in the recommendations.
This is described
On the Mahout site: http://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html
And some slides here: http://occamsmachete.com/ml/2014/10/07/creating-a-unified-recommender-with-mahout-and-a-search-engine/

Related

Bart Large MNLI - Get predicted label in a single column

I'm trying to classify the sentences of a specific column into three labels with Bart Large MNLI. The problem is that the output of the model is "sentence + the three labels + the scores for each label. Output example:
{'sequence': 'Growing special event set production/fabrication company
is seeking a full-time accountant with experience in entertainment
accounting. This position is located in a fast-paced production office
located near downtown Los Angeles.Responsibilities:â€¢ Payroll
management for 12+ employees, including processing new employee
paperwork.', 'labels': ['senior', 'middle', 'junior'], 'scores':
[0.5461998581886292, 0.327671617269516, 0.12612852454185486]}
What I need is to get a single column with only the label with the highest score, in this case "senior".
Any feedback which can help me to do it? Right now my code looks like:
df_test = df.sample(frac = 0.0025)
classifier = pipeline("zero-shot-classification",
model="facebook/bart-large-mnli")
sequence_to_classify = df_test["full_description"]
candidate_labels = ['senior', 'middle', 'junior']
df_test["seniority_label"] = df_test.apply(lambda x: classifier(x.full_description, candidate_labels, multi_label=True,), axis=1)
df_test.to_csv("Seniority_Classified_SampleTest.csv")
(Using a Sample of the df for testing code).
And the code I've followed comes from this web, where they do receive a column with labels as an output idk how: https://practicaldatascience.co.uk/machine-learning/how-to-classify-customer-service-emails-with-bart-mnli

How to count cypher labels with specific condition?

I have a graph database with information about different companies and their subsidiaries. Now my task is to display the structure of the company. This I have achieved with d3 and vertical tree.
But additionally I have to write summary statistics about the company that is currently displayed. Companies can be chosen from a dropdown list which is fetching this data dynamically via AJAX call.
I have to write in the same HTML a short summary like :
Total amount of subsidiaries for CompanyA: 300
Companies in Corporate Havens : 45%
Companies in Tax havens 5%
My database consists of two nodes: Company and Country, and the country has label like CH and TH.
CREATE (:TH:Country{name:'Nauru', capital:'Yaren', lng:166.920867,lat:-0.5477})
WITH 1 as dummy MATCH (a:Company), (b:Country) WHERE a.name=‘CompanyA ' AND b.name='Netherlands' CREATE (a)-[:IS_REGISTERED]->(b)
So how can I find amount of subsidiaries of CompanyA that are registered in corporate and tax havens? And how to pass this info further to html
I found different cypher queries to query all the labels as well as apocalyptic.stats but this does not allow me to filter on mother company. I appreciate help.

The cypher is good because you write a query almost in natural language (the query below may be incorrect - did not check, but the idea is clear):
MATCH (motherCompany:Company {name: 'CompanyA'})-[:HAS_SUBSIDIARY]->(childCompany:Company)
WITH motherCompany,
childCompany
MATCH (childCompany)-[:IS_REGISTERED]->(country:Country)
WITH motherCompany,
collect(labels(country)) AS countriesLabels
WITH motherCompany,
countriesLabels,
size([countryLabels IN countriesLabels WHERE 'TH' IN countryLabels ]) AS inTaxHeaven
RETURN motherCompany,
size(countriesLabels) AS total,
inTaxHeaven,
size(countriesLabels) - inTaxHeaven AS inCorporateHeaven

what is the best maching learning algorithm in my situation

Assume that a tourist has no idea about the city to visit , I want to recommend top 10 cities based on his features about the city (budgetToTravel , isCoastel , isHitorical , withFamily, etc ...).
My dataset contains features for every city for exemple :
Venice Italy
(budgetToTravel='5000' , isCoastel=1 , isHistorical =1 , withFamily=1,...)
Berlin Germany (BudgetToTravel='6000' ,isHistorical=1, isCoastel =0 , withFamily=1 ,...).
I want to know the best algorithm of machine learning to recommend the top 10 cities to visit based on the features of a tourist .

As stated Pierre S. you can start withKNearestNeigbours
This algorithm will allow you do exactly what you want by doing:
n_cities_to_recommend = 10
neigh = NearestNeighbors(2, radius=1.0) # you need to play with radius here o scale your data to [0, 1] with [scaler][2]
neigh.fit(cities)
user_input = [budgetToTravel, isCoastel, isHistorical, withFamily, ...]
neigh.kneighbors([user_input], n_cities_to_recommend, return_distance=False) # this will return you closest entities id's from cities

You can use (unsupervised) clustering algorithm like Hierarchical Clustering or K-Means Clustering to have clusters of 10 and then you can match the person (tourist) features with the clusters.

SSRS: Adding a filter that returns information from entire group

I am trying to create a report in SSRS. Below is a small example of what my dataset looks like.
Example Data Set
So, there are three different stores (A,B,C) and each has a landlord (a,b,c). Landlords can pay via three different methods (1,2,3) and the amounts paid per method are shown.
Right now, I have two filters set up. The first is by Store and the second is by Landlord.
What I am having trouble with is:
How can I set up a filter by the Amount that will return information from an entire Store/Landlord?
So for example, if I wanted to filter Amount by 150, I would like to return all the "payment" information for the store(s) that have a payment of 150. Such as the following:
Desired Result
Is it possible to add a filter to return information from the entire group? (Store and Landlord are the group in this case)
I am new to SSRS so any help/insight would be greatly appreciated!

You can use LookUpSet to locate the matching groups, JOIN to put the results in a string and the INSTR function to filter your results.
=IIF(ISNOTHING(Parameters!AMOUNT.Value) OR INSTR(
Join(LOOKUPSET(Fields!Amount.Value, Fields!Amount.Value, Fields!Store.Value, "DataSet1"), ", ") ,
Fields!Store.Value
) > 0, 1, 0)
This translates to:
If the Store value is found (INSTR > 0) in the list (JOIN) of Stores where the Amount is the current Amount (Lookupset).
In your filter, put the above expression in the Expression, change the type to INTEGER and the Value to 1.
[

Categorizing Hastags based on similarities

I have different documents with a list of hashtags in each. I would like to group them under the most relevant hashtag (which would be present in the document itself).
Egs: If there are #Eco, # Ecofriendly # GoingGreen - I would like to group all these under the most relevant and representative Hashtag (say #Eco). How should I be approaching this and what techniques and algorithms should I be looking at?

I would create a bipartite graph of documents-hashtags and use clustering on a bipartite graph:
http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_bipartite.pdf
This way I am not using the content of the document, but just clustering the hashtags, which is what you wanted.

Your question is not very strict, and as such may have multiple answers, however, if we assume that you literally want "I would like to group all these under the most common Hashtag", then simply loop through all hashtags, compute have often they come up, and then for each document select the one with highest number of occurences.
Something like
N = {}
for D in documents:
for h in D.hashtags:
if h not in N: N[h] = 0
N[h] += 1
for D in documents:
best = None
for h in D.hashtags:
if best==None or N[best] < N[h]:
best = h
print 'Document ',D,' should be tagged with ',best

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

mahout for content based recomendation - machine-learning

Related

Bart Large MNLI - Get predicted label in a single column

How to count cypher labels with specific condition?

what is the best maching learning algorithm in my situation

SSRS: Adding a filter that returns information from entire group

Categorizing Hastags based on similarities

Categories

Resources