Machine Learning Algorithm for Text Data

Machine Learning Algorithm for Text Data - machine-learning

I want to make a prediction of whether an SSL certificate containing “PayPal” in the Subject Alternative Names is actually a certificate belonging to PayPal based on a combination of the text fields within the certificate. For example: if all of PayPals certificates are issued by DigiCert and we make a prediction on a certificate issued by Lets Encrypt containing “PayPal” in the SAN field, the certificate would likely not belong to PayPal.
What machine learning algorithms and techniques would be suited for this type of problem?

Related

Weight transmission protocol in Federated Machine Learning

I am wondering, in federated machine learning, when we train our local models, and intend to update the cloud model, what protocol we use to transmit those weight? Also, when we use the tensorflow federated machine learning, how we transmit the weight (using which library and protocol)?
Kind regards,

Most authors of federated computation using TensorFlow Federated are using the "TFF Language". The specific protocol used during communication is determined by the platform running the computation and the instructions giving in the algorithm.
For computation authors, TFF supports a few different instructions for the platform which may result in different protocols, for example looking at summation operations of CLIENT values to a SERVER value:
tff.fedreated_sum that indicate any particular protocol.
tff.federated_secure_sum, tff.federated_secure_sum_bitwidth, and tfffederated_secure_modular_sum all use a secure protocol such that the server cannot learn the value of an individual summand, only the aggregate summation value (https://research.google/pubs/pub47246/ provides more details).
All of these could be composable with transport layer security schemes to prevent third parties on the network from learning transmitted values, and depend on the execution platform's implementation. For example TFF's own runtime uses gRPC which supports a few different schemes https://grpc.io/docs/guides/auth/.

In FL, can clients train different model architectures?

I practice on this tutorial, I would like that each client train a different architecture and different model, Is this possible?

TFF does support different clients having different model architectures.
However, the Federated Learning for Image Classification tutorial uses tff.learning.build_federated_averaging_process which implements the Federated Averaging (McMahan et. al 2017) algorithm, defined as each client receiving the same architecture. This is accomplished in TFF by "mapping" (in the functional programming sense) the model to each client dataset to produce a new model, and then aggregating the result.
To achieve different clients having different architectures, a different federated learning algorithm would need to be implemented. There are couple (non-exhaustive) ways this could be expressed:
Implement an alternative to ClientFedAvg. This method applies a fixed model to the clients dataset. An alternate implementation could potentially create a different architecture per client.
Create a replacement for tff.learning.build_federated_averaging_process
that uses a different function signature, splitting out groups of clients
that would receive different architectures. For example, currently FedAvg
looks like:
(<state#SERVER, data#CLIENTS> → <state#SERVER, metrics#SERVER>
this could be replaced with a method with signature:
(<state#SERVER, data1#CLIENTS, data2#CLIENTS, ...> → <state#SERVER, metrics#SERVER>
This would allow the function to internally tff.federated_map() different model architectures to different client datasets. This would likely only be useful in FL simulations or experimentation and research.
However, in federated learning there will be difficult questions around how to aggregate the models back on the server into a single global model. This probably needs to be designed out first.

transfer knowledge learned from distributed source domains

To resolve the problem of non-iid data in federated learning, I read a paper which add a new node with a different data domain and transfer knowledge from decentralized nodes. My question is what is the information transfered, is that updates or data ?

In layman terms, non-idd means that not all class labels are distributed evenly between clients for training. For obvious reasons, in federated environment it is not feasible for every client to hold and train on idd data. With regards your specific query of how it works in the paper mentioned in your question, you may please share the link of the paper.

Matching user profiles with employment opportunities

I am currently working on a software which can connect users to jobs based on their user profiles. I ran text analytics on the job descriptions and derived important keywords from it. I have also collected user information from their profile. Matching the jobs to the user profiles seems to be a challenging task. Are there any Machine Learning based algorithms which can be used for match making?

OK, so basically, you have keywords for each job description and then you have some sort of text data (user profiles) to which you try to match those keywords.
Since your training data (user profiles) is not labeled, the supervised learning will not help you here. Unsupervised learning (clustering) could maybe help you in finding a certain patterns (keywords) from a loads of user profiles, but you would certainly need to experiment with different sorts of techniques (such as gaussian mixture models etc.) and observe possible patterns.
A simpler thing you could maybe do is to derive/find keywords also for each user profile(in other words to identify how many of your job keywords also exist in user profile) and then compare a distance between them using cosine similarity. You would then only need to determine the minimal angle threshold. This would be a parameter to play with. Of course you would need to vectorize your text data using bigrams or similar; if you use python there already is feature extraction in scikit). You could possibly also use tf-idf vectorizer on both, the job description and user profile but with some heavy and well determined words stop list.

How do I choose training data set for job recommendation using linear regression model?

I have two kind of profiles in database.one is candidate
prodile,another is job profile posted by recruiter.
in both the profiles i have 3 common field say location,skill and
experience
i know the algorithm but i am having problem in creating training data
set where my input feature will be location,skill and salary chosen
from candidate profile,but i am not getting how to choose output
(relevant job profile).
as far as i know output can only be a single variable, then how to
choose relevant job profile as a output in my training set
or should i choose some other method?another thought is clustering.

As I understand you want to predict job profile given candidate profile using some prediction algorithm.
Well, if you want to use regression you need to know some historical data -- which candidates were given which jobs, then you can create some model based on this historical data. If you don't have such training data you need some other algorithm. Say, you could set location,skill and experience as features in 3d and use clustering/nearest neighbors to find candidate profile closest to a job profile.

You could look at "recommender systems", they can be an answer to your problem.
Starting with a content based algorithm (you will have to find a way to automate the labels of the jobs, or manually do them), you can improve to an hybrid one by gathering which jobs your users were actually interested (and become an hybrid recommender)

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Machine Learning Algorithm for Text Data - machine-learning

Related

Weight transmission protocol in Federated Machine Learning

In FL, can clients train different model architectures?

transfer knowledge learned from distributed source domains

Matching user profiles with employment opportunities

How do I choose training data set for job recommendation using linear regression model?

Categories

Resources