Can any one tell me how to create an graph for JUNG for page rank - jung

I am just trying to implement page rank for my small network and i gone though some answers from these questions.
http://stackoverflow.com/questions/2353898/pagerank-implementation-in-java
http://stackoverflow.com/questions/4784530/pagerank-implementation-in-java
JUNG is an web-graph library that implements pagerank . But i haven't seen any example how to compute the graph . can any one please explain it with an example.

Have a look here:
https://github.com/castagna/mr-pagerank/blob/3b0f11cc5e3aab392769c7e4b0e6bed149cfbf11/src/main/java/com/talis/labs/pagerank/jung/JungPageRank.java
https://github.com/castagna/mr-pagerank/blob/3b0f11cc5e3aab392769c7e4b0e6bed149cfbf11/src/main/java/com/talis/labs/pagerank/memory/PlainPageRank.java
You do not need to use JUNG, if you want you have write your own implementation (as PlainPageRank above).

Related

How to Convert NLP Question to Knowledge Graph triple?

I have what I think is a simple question. I am trying to put together a question answering system and I am having trouble converting a natural question to a knowledge graph triple. Here is an example of what I mean:
Assume I have a prebuilt knowledge graph with the relationship:
((Todd) -[:picked_up_by]-> (Jane))
How can I make this conversion:
"Who picked up Todd today?" -> ((Todd) -[:picked_up_by]-> (?))
I am aware that there is a field dedicated to "Relationship Extraction", but I don't think that this fits that problem if I could name it, "question triple extraction" would be the name of what I am trying to do.
Generally speaking, it looks like a relation extraction problem, with your custom relations. Since the question is too generic, this is not an answer, just some links.
Check out reading comprehension: projects on github and lecture by Christopher Manning
Also, look up Semantic Role Labeling.

Deap: Want to know the generation that created the best individual

I'm running a genetic algorithm program and can find the best individual at the end of the run (hof[0]), but i want to know which generation produced it. Is there any attributes of hof[0] that will help print the individual and the generation that created it.
I tried looking at the manuals and Google for answers but could not find it anywhere.
I also couldn't find a list of the attributes of individuals that I could print. Can someone point to the right link and documentation to that.
Thanks
This deap post suggest tracking the logbook, or explicitly adding the generation to the individual along with fitness:
https://groups.google.com/g/deap-users/c/r7fZbMwHg3I/m/BAzHh2ogBAAJ
For the latter:
If you are working with the algo locally(recommended if working beyond a tutorial as something always comes up like adding plotting or this very questions) then you can modify the fitness update line to resemble:
fitnesses = toolbox.map(toolbox.evaluate, invalid_ind)
for ind, fit in zip(invalid_ind, fitnesses):
ind.fitness.values = fit
ind.generation = gen # now we can: print(hof[0].gen)
if halloffame is not None:
halloffame.update(population)
There is no built in way to do this (yet/to the best of my knowledge), and implementing this so would probably be quite a large task. The simplest of which (simplest in thought, not in implementation) would be to change the individual to be a tuple, where tup[0] is the individual and tup[1] is the generation it was produced in, or something similar.
If you're looking for a hacky way, you could maybe try writing the children of each generation to a text file and cross-checking your final solution with the text file; but other than that I'm not sure.
You could always try posting on their Google Group, though it can take a couple of days for a reply.
Good luck!

What is the use of " NumericToNominal" method in machine learning.

Recently i am working on machine learning and build some Models for classification problem with the help of some tutorials. Though i solved my problem successfully but cant get the use and inference of using "NumerictoNominal" method please explain me.
I have tried to learn from the available text but it is very hard core i am seeking for simple explanation.
thanks and regards
I search a lot and finally got a simple example "A set of data is said to be nominal if the values / observations belonging to it can be assigned a code in the form of a number where the numbers are simply labels" for example PIN CODE of a City. Although we use Numeric value to build codes and also u can apply simple Algebra on PIN Codes but it won't make any sense. Also attribute SEX could be male or female it is also a kind of nominal attribute.
thanks

Can I create an HDF5 link to a hyperslab?

Is it possible to create a link to just a hyperslab of a dataset in HDF5?
For example, I have one dataset of size 1000 x 3, representing (a,b,c) as a function of time, let's say. And now I want a link that points just to the 'a' data (1000 x 1). Is this possible?
[Having googled this extensively, I learned the valuable lesson that "link" is essentially useless in a google query. And I can't tell from the HDF5 documentation, so I'm sorry if this is stupid.]
Having asked at the (very helpful) HDF5 helpdesk, I find that the answer is no. For anyone else looking for this functionality: redesign your code/data structure.
Unfortunately for me, the code is not mine, and the data structure is set by other, stubborn people.

What are some good methods to find the "relatedness" of two bodies of text?

Here's the problem -- I have a few thousand small text snippets, anywhere from a few words to a few sentences - the largest snippet is about 2k on disk. I want to be able to compare each to each, and calculate a relatedness factor so that I can show users related information.
What are some good ways to do this? Are there known algorithms for doing this that are any good, are there any GPL'd solutions, etc?
I don't need this to run in realtime, as I can precalculate everything. I'm more concerned with getting good results than runtime.
I just thought I would ask the Stack Overflow community before going and writing my own thing. There HAVE to be people out there who have found good solutions to this before.
These articles on semantic relatedness and semantic similarity may be helpful. And this SO question about Latent Semantic Analysis.
You could also look into Soundex for words that "sound alike" phonetically.
I've never used it, but you might want to look into Levenshtein distance
Jeff talked about something like this on the pod cast to find the Related questions listed on the right side here. (in podcast 32)
One big tip was to remove all common words, like "the" "and" "this" etc. This will leave you with more meaningful words to compare.
And here is a similar question Is there an algorithm that tells the semantic similarity of two phrases
This is quite doable for reasonable large texts, however harder for smaller texts.
I did it once like this, and it worked pretty well:
Filter all "general" words (like a, an, the, in, etc...) (filters about 10-30% of the words)
Count the frequencies of the remaining words, store the top x of most frequent words, these are your topics.
As an extra step you can create groups of 2/3/4 subsequent words and compare them with the groups in other texts. I used it as a measure for plagerism.
See Manning and Raghavan course notes about MinHashing and searching for similar items, and a C#(?) version. I believe the techniques come from Ullman and Motwani's research.
This book may be relevant.
Edit: here is a related SO question
Phonetic algorithms
The article, Beyond SoundEx - Functions for Fuzzy Searching in MS SQL Server, shows how to install and use the SimMetrics library into SQL Server. This library lets you find relative similarity between strings and includes numerous algorithms.
I ended up mostly using Jaro Winkler to match on names. Here's more information where I asked about matching names on SO: Matching records based on Person Name
A few algorithms based on Levenshtein Distance are also available in the SimMetric library and would probably be useful in your application.

Resources