Identify windows computer names using ML - machine-learning

Windows computer names documentation by Microsoft: https://oofhours.com/2020/11/12/whats-in-a-windows-computer-name/
I want to build an ML model that identifies windows computer names from windows event logs fields.
more about windows event logs fields: https://www.elastic.co/guide/en/beats/winlogbeat/6.8/exported-fields-eventlog.html
My questions are:
What are the best method to use? I thought about supervised ml & word embedding.
Maybe to use Naive-Bayes or SVM?
Is it right for me to create a database myself (according to the documentation) that contains values that are machine names and values that are not?
thank's!

Related

Does AutoML accept external models?

I used random search and got the best hyper parameters for my model, can I pass that model to the AutoML?
Does AutoML do the random search for the best hyper parameters by itself? or is there something I need to pass?
I presume you're referring to Google Cloud AutoML. It is a cloud-based Machine Learning (ML) platform that suggests a no-code approach to building data-driven solutions. AutoML was designed to build custom models for both newcomers and experienced machine learning engineers.
For newcomers, you could use Vertex AI (fully automated) to build a ML model:
For experienced ML engineers, you could also use AutoML Tabular to build a custom model, with the ability to select a model and input the selected hyperparameters:
You can read more details from here

Machine Learning - Derive information from a text

I'm a newbie in the field of Machine Learning and Supervised learning.
My task is the following: from the name of a movie file on a disk, I'd like to retrieve some metadata about the file. I have no control on how the file is named, but it has a title and one or more additional info, like a release year, a resolution, actor names and so on.
Currently I have developed a rule heuristic-based system, where I split the name into tokens and try to understand what each word could represent, either alone or with adjacent ones. For detecting people names for example, I'm using a dataset of english names, and score the word as being a potential person's name if I find it in the dataset. If adjacent to it is a word that I scored as a potential surname, I score the two words as being an actor. And so on. It works with a decent accuracy, but changing heuristic scores manually to "teach" the system is tedious and unpredictable.
Such a rule-based system is hard to maintain or develop further, so, out of curiosity, I was exploring the field of machine learning. What I would like to know is:
Is there some kind of public literature about these kinds of problems?
Is ML a good way to approach the problem, given the limited data set available?
How would I proceed to debug or try to understand the results of such a machine? I already have problems with the "simplistic" heuristic engine I have developed..
Thanks, any advice would be appreciated.
You need to look into NLP (natural language processing). NLP deals with text processing and other things; for example entity recognition and tagging.
Here is an example of using Spacy library: https://spacy.io/usage/linguistic-features.
Some time ago I did a similar thing, you can see it here: https://github.com/Erlemar/Erlemar.github.io/blob/master/Notebooks/Fate_Zero_explore.ipynb

How to get a specific machine type for ML Engine online prediction?

Is there an option to request a faster node for online prediction in ML Engine?
For example, when training I can configure any of these machines for my job:
standard,
large_model,
complex_model_s,
complex_model_m,
complex_model_l,
standard_gpu,
complex_model_m_gpu,
complex_model_l_gpu,
standard_p100,
complex_model_m_p100
See description of available clusters and machines for training here and here
I am struggling to find if it is possible to control what kind of machine runs my online prediction.
We are currently adding that capability and will let you know when it's publicly available.
ML Engine offers 4-core instance type in addition to the default serving instance type for online prediction. However the feature is still at alpha stage and it will only be available to a selected list of accounts who opted in as "Trusted Testers". Please contact cloudml-feedback#google.com if you need help to setup prediction service with faster node.

Customizing the Named Entity Recogntition model in Azure ML

Can we customize the Named Entity Recognition (NER) model in Azure ML Studio with a separate training dataset? What I want to do is to find out non-English names from a text. (Training dataset includes the set of names that going to use for training)
Unfortunately, this module's ability to perform NER with a custom set of entities is planned for the future, but not currently available.
If you're familiar with Python and willing to put in the extra footwork, you might consider using the Natural Language Toolkit (NLTK). Sujit Pal has a nice blog post and sample code describing the creation of a custom NER with that package. You may be able to train an NLTK NER model and apply it to your data of interest from within an Execute Python Script module on Azure ML.

Publish azure machine learning service with feature hashing

I have created an experiment in azure machine learning studio, this experiment is multi-class classification problem using multi-class neural network algorithm, I have also add 'feature hashing' module to transform a stream of English text into a set of features represented as integers. I have successfully run the experiment but when i publish it as web service endpoint i got message "Reduce the total number of input and output columns to less than 1000 and try publishing again."
I understood after some research that feature hashing convert text into thousands of feature but the problem is how i publish it as web service? and i don't want to remove 'feature hashing' module.
It sounds like you are trying to output all those thousands of columns as an output. What you really only need is the scored probability or the scored label. To solve this, just drop all the feature hashed columns from the score model module. To do this add in a project columns module, and tell it to start with "no columns" then "include" by "column names", and just add predicted column (scored probability/scored label).
Then hook up the output of that project columns module to your web service output module. Your web service should now be returning only 1-3 columns rather than thousands.

Resources