Best Features for Term Level Clustering [closed] - twitter

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
At the moment, I am working on a project that related to mining Twitter data. The aim of the project is to find the themes that can be used to represent the set of tweets. To help us finding the themes, we came up with an idea to do term level clustering. The terms are some important concepts that already extracted using some TextMining tools.
Well, my main question is, what are the best features to define term similarity? In this project, due to an insufficient amount of data, I am doing the unsupervised learning, which is clustering using the k-means algorithm.
I do have some extracted features. As I understand, one way to know the semantic (not actually) meaning of a term is by seeing the context of which the term is mentioned. Therefore, what I have at the moment are preceding and following WORD and POS of the term. For instance:
I drink a cup of XYZ
She had a spoon of ABC yesterday.
By seeing the preceding word and POS - cup/NN and of/IN for XYZ and spoon/NN and of/IN for ABC - I knew that XYZ and ABC might be a liquid material or component. Well, it sounds very naive, in fact, I don't get the good clusters. In addition to the previous features, I have some named entity types that I considered as features. For instance, entity type like Person, Location, Problem (in medical), MEDTERM etc.
So, what are the common features for term level clustering? Any comments and suggestions would be appreciated. I do open to any guidance, such as paper, link etc. Thanks
EDIT: In addition to those features, I've extracted the head nouns of each term and considered it as one of my features. I am thinking of using a head noun in the case for multi-word terms.

Well, let me see if I understood correctly what you need. You already extracted/found the terms you want as centres of your clusters, and now you want to find all terms which are similar to them so they get grouped in the proper cluster?.
In general you need to define a similarity measure (distance) and here is the main point, what you want that similarity distance to measure or determine. If you are looking for term to term similarity, just letters then you can try things like Levenshtein distance for example, but if what you want to find are contextual similar terms, even they are written in a very different way but could mean the same thing, thats different from Levenshtein pretty much harder to do.
What is important to keep in mind is that you need a measure of similarity to find the similar terms. What I see you call features some named entity types, normally k-means is bad when dealing with non continuos data.

Related

ML - Instance-based learning [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 5 years ago.
Improve this question
I am quite new to machine learning and I've been reading a book where the author describes instance-based learning as follows
Possibly the most trivial form of learning is simply to learn by heart. If you were to create a spam filter
this way, it would just flag all emails that are identical to emails that have already been flagged by users
— not the worst solution, but certainly not the best.
Instead of just flagging emails that are identical to known spam emails, your spam filter could be
programmed to also flag emails that are very similar to known spam emails. This requires a measure of
similarity between two emails. A (very basic) similarity measure between two emails could be to count
the number of words they have in common. The system would flag an email as spam if it has many words
in common with a known spam email.
This is called instance-based learning: the system learns the examples by heart, then generalizes to new
cases using a similarity measure
But I couldn't understand it completely as he used the words similar and identical. I didn't understand difference. Any explanation would be appreciated. Thank you.
Identical literally means identical - there are zero differences and it is an exact match.
The strings "aaaaa" and "aaaaa" are identical. There is no other string that can ever exist that is also identical to "aaaaa" other than itself.
Similar again is being used in the literal sense. "aaaaa" and "aaaab" are not identical, they differ by one character. But they are similar in that they share 4 out of 5 characters. There are many-many possible strings that are similar to "aaaaa".
Naively looking at the number of different characters in a string is one way to define similarity.
The trick to all instance based learning is the answering the question: how do we explicitly define similar for this application. Every application would likely benefit from different measures of similarity, though there are some common ones do exist and get re-used frequently, that doesn't mean they are optimal.

Fuzzy logic vs AI vs Machine learning vs Deep learning [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
How do these four subjects differ from one another? From what I understand, they learn from numerous input data and output an estimated output. My understanding is very lacking thus me questioning these. It made no sense to me about the examples given by people such as the spam email, apple orange cat dog identification, neural network examples.
Is there a better representation of these four subjects in a more simpler example with coding to show the concept? I really appreciate that a lot.
Links to example you think that is very simple with code is more than welcome. I need something relate-able to get the code writing concept better.
Many thanks!
Fuzzy logic is a form of many-valued logic in which the truth values of variables may be any real number between 0 and 1. By contrast, in Boolean logic, the truth values of variables may only be the integer values 0 or 1. Fuzzy logic has been employed to handle the concept of partial truth, where the truth value may range between completely true and completely false.1 Furthermore, when linguistic variables are used, these degrees may be managed by specific (membership) functions.
The field of AI research defines itself as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of success at some goal. Colloquially, the term "artificial intelligence" is applied when a machine mimics "cognitive" functions that humans associate with other human minds, such as "learning" and "problem solving" (known as Machine Learning).
Machine Learning by Tom Mitchell:
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.
Deep learning is machine learning with deep neural networks.
Hence: AI is a superset of Machine Learning. Machine Learning is a superset of Deep learning. AI includes fuzzy logic:
Resources
Fuzzy logic lecture notes (German)
Computerphile: Fuzzy logic
IEEE: Fuzzy logic
Tom Mitchel: Machine Learning
Michael Nielsen: Deep Learning

What type of neural network would work best for credit scoring? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Let me just start by saying I only took the undergrad AI class at school so I know just enough to be dangerous.
Here's the problem I'm looking to solve...accurate credit scoring is a key part to the success of my business. Currently we rely on a team of actuaries and statistical analysis to suss out patterns in the few dozen variables we track about each individual that indicate that they may be a low or high credit risk. As I understand it this is exactly the type of job that neural nets are great at solving, that is, finding high order relationships across many inputs that a human would likely never spot and then rendering a decision or output that is on average more accurate than what a trained human could do. In short, I want to be able to input your name, address, marital status, what car you drive, where you work, hair color, favorite food, etc in and get a credit score back.
My question is what type or architecture for a neural network would be best for this particular problem. I've done a bit of research and it seems I'm generating questions faster than I'm finding answers at this point. The best I've been able to come up with is some kind of generative deep neural network with multiple hidden layers where each layer is able to abstract one level beyond the previous one. Im assuming it's going to be feed-forward just because it seems to be the default. We have historical data on all previous customers including the information we used to make the initial score as well as data on what type of credit risk they actually turned out to be. This would seem to lend itself to unsupervised learning. Where I'm lost is in number of layers, how the layers are different from each other, size of each layer, connectedness of each of the perceptrons and so on. The more I dig the more I'm getting into research papers that are over my head so I just need some smart person to point me in the right direction
Does anyone have any ideas? Again, I don't need a thorough explanation just a general area I should focus on.
This is supervised learning since you have actual data that can be labelled. It's also feedforward since you're not predicting time series but assigning scores. Further, you should probably just prepare your data (assigning credit scores manually or with some rough heuristic) and start experimenting with some tools before you invest time into implementing state-of-the-art architectures. A multi-layer-perceptron (MLP) with 1 hidden layer is a sufficient starting point for such a problem. From there on, you can train the network to generalize your credit assignment heuristic you began with.
You should know that most "new" architectures you probably read about while researching are dealing with much more difficult problems than credit scoring (speech/image/character recognition/detection). There is a collection of papers on the scenario of credit scoring / risk classification, so I'd recommend reshifting your focus from architectures to actual case studies (see e.g. this paper). Just pick a recent paper with MLPs and apply their parameters. Start simple and improve the system incrementally (as #roganjosh stated).

Neural network: should the algorithm be rewritten for every case? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have 2 sequences of numbers and I'd want to continue it using neural algorithms (there is some logic in them, but I don't know what, and there are no external factors affecting the selection). There are some relationship is in each of the two sequences separately, as well as between them.
So, I'm new to machine learning, but I've got such an idea: is there any already written-and-well-working applications (libraries) that implement exact algorithms for me not to learn them all before using. Simply like "most-frequently-used-neural-algorithms-kit".
I'm thinking of analysing some music sheets and two sequences: "notes" and "durations".
OK, according to the comments I think I got what you want.
Generally, no, you don't need to rewrite the standard algorithm of ANN. But be aware that ANN is not an algorithm, but a cluster of algorithms (including BackPropagation-ANN, Hopfield-ANN, Boltzmann Machine etc). Among them I recommend BP-ANN which is simple and suitable for your project. You might want to input a sequences of the known notes and duration, and then expect an output of the next note and duration.
To use BP-ANN, you don't need to rewrite them. Due to its a widely-used algorithm, there are many toolkits and open source implementations of it:
Google "back propagation neural network implementation", you will find it easily. There are also a few opensource projects on Github(in both C language and Matlab): https://github.com/search?q=back+propagation&type=Everything&repo=&langOverride=&start_value=1
For further reading if you also want to deeply understand the details of its implementation, read this: http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1279&context=ecetr&sei-redir=1
If you're interested in neural networks there are plenty of libraries available.
ANNIE is one such open source example, the MATLAB Neural Network toolbox is a
commercial example. These are libraries which you tell the architecture of the
neural network, you can train, test, verify, etc. The important part in all
these machine learning methods is how you represent your data, and those were
the comments you were getting (for example Predictor's). Sometimes you get
excellent results with one representation and very bad results with others.
There are also libraries to train SVMs (a specialized algorithm to train neural
networks) with quadratic regularization, LIBSVM is one great example.
There is also plenty of work on predicting time series with neural networks (if
that is what you want to do with music, I am not sure what exactly you want).
If the input is a series of (note, duration) pairs, then I suspect you'd get much farther by summarizing the historical note-to-note transitions or by something similar in an effort to capture the syntax of the music (Markov analysis, etc.), than you would by stuffing this into a neural network. It may help, too, to try representing the series as note differentials, measuring how many notes up or down the scale the new note is, rather than the actual value of the note itself.

How does Collective Intelligence beat Experts' view? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am interested in doing some Collective Intelligence programming, but wonder how it can work?
It is said to be able to give accurate predictions: the O'Reilly Programming Collective Intelligence book, for example, says a collection of traders' action actually can predict future prices (such as corn) better than an expert can.
Now we also know in statistics class that, if it is a room of 40 students taking exam, there will be 3 to 5 students who will get an "A" grade. There might be 8 that get "B", and 17 that got "C", and so on. That is, basically, a bell curve.
So from these two standpoints, how can a collection of "B" and "C" answers give a better prediction than the answer that got an "A"?
Note that the corn price, for example, is the accurate price factoring in weather, demand of food companies using corn, etc, rather than "self fulfilling prophecy" (more people buy the corn futures and price goes up and more people buy the futures again). It is actually predicting the supply and demand accurately to give out an accurate price in the future.
How is it possible?
Update: can we say Collective Intelligence won't work in stock market euphoria and panic?
The Wisdom of Crowds wiki page offers a good explanation.
In short, you don't always get good answers. There needs to be a few conditions for it to occur.
Well, you might want to think of the following "model" for a guess:
guess = right answer + error
If we ask a lot of people a question, we'll get lots of different guesses. But if, for some reason, the distribution of errors is symmetric around zero (actually it just has to have zero mean) then the average of the guesses will be a pretty good predictor of the right answer.
Note that the guesses don't necessarily have to be good -- i.e., the errors could indeed be large (grade B or C, rather than A) as long as there are grade B and C answers distributed on both sides of the right answer.
Of course, there are cases where this is a terrible model for our guesses, so collective intelligence won't always work...
Crowd Wisdom techniques, like prediction markets, work well in some situations, and poorly in others, just as other approaches (experts, for instance) have their strengths and weaknesses. The optimal arenas therefore, are ones where no other approaches do very well, and prediction markets can do well. Some examples include predicting public elections, estimating project completion dates, and predicting the prevalence of epidemics. These are areas where information is spread around sparsely, and experts haven't found effective models that reliably predict.
The general idea is that market participants make up for one another's weaknesses. The expectation isn't that the markets will always predict every outcome correctly, but that, due to people noticing other people's mistakes, they won't miss crucial information as often, and that over the long haul, they'll do better. In cases where the exerts actually know the answer, they'll be able to influence the outcome. Different experts can weigh in on different questions, so each has more influence where they have the most knowledge. And as markets continue over time, each participant gets feedback from their gains and losses that makes them better informed about which kinds of questions they actually understand and which ones they should stay away from.
In a classroom, people are often graded on a curve, so the distribution of grades doesn't tell you much about how good the answers were. Prediction markets calibrate all the answers against actual outcomes. This public record of successes and failures does a lot to reinforce the mechanism, and is missing in most other approaches to forecasting.
Collective intelligence is really good at coming up to to problems that have complex behavior behind them, because they are able to take multiple sources of opinions/attributes to determine the end result. With a setup like this, training helps to optimize the end result of the processes.
The fault is in your analogy, both opinions are not equal. Traders predict direct profit for their transaction (the little part of the market they have overview over), while the expert tries to predict the overall field.
IOW the overall traders position is pieced together like a jigsaw puzzle based on a large amount of small opinions for their respective piece of the pie (where they are assumed to be experts).
A single mind can't process that kind of detail, which is why the overall position MIGHT overshadow the real expert. Note that this is particularly phenomon is usually restricted to a fairly static market, not in periods of turmoil. Expert usually do better then, since they are often better trained and motivated to avoid going with general sentiment. (which is often comparable to that of a lemming in times of turmoil)
The problem with the class analogy is that the grading system doesn't assume that the students are masters in their (difficult to predict) terrain, so it is not comparable.
P.s. note that the base axiom depends on all players being experts in a small piece of the field. One can debate if this requirement actually transports well to a web 2 environment.

Resources