How do we define 'true p2p' - network-programming

I have seen a number of questions in which the term 'true p2p' is used. What exactly is meant by this term, if it has an exact meaning? I am familiar with the term p2p, and I can think of several possible meanings for 'true p2p'. What does it mean to you?
What are some examples of:
a) 'true' p2p?
b) 'untrue p2p'

I think that by "true P2P" you mean "pure P2P".
Pure P2P means totally decentralization. I.e. true P2P doesnt have a central coordinator server.

I think I would differ slightly from Tom - I think a piece of software can be regarded as peer-to-peer if peer communication is a substantial part of the way it works. So Skype and Napster are/were P2P even though they require co-ordinating servers.
I've not come across the term 'true P2P' - it may be referring to the fact that some people use it as a marketing buzzword without knowing what it means.

Related

How to prove that admissible/consistent heuristics in A* searching method would lead to optimal solution?

we have proved in class that if A* in Tree-Search is optimal, then h(n) is admissible(Admissible Heuristics). If using A* in Graph-Search finds the optimal solution, then h(n) is consistent. We proved the properties of admissible and consistent if we assume that A* can find the optimal solution. This indicates that consistent /admissible are necessary conditions for optimality in Graph/Tree Searching.
However, I am not too sure how to prove that they are also both sufficient conditions as well. I tried to figure it out, but I still could not find a good way to prove it. For example, I am not too sure how to prove that being admissible can lead to optimality in Tree-Searching using A*? And similarly, how to prove that being consistent can lead to optimality in Graph-Searching using A*? Thank you in advance!
This is my first time asking on StackOverflow, sorry if I am not phrasing my question well. : )Thank you in advance!
This indicates that consistent /admissible are necessary conditions for optimality in Graph/Tree Searching.
No, it implies they're sufficient conditions. In fact, the converse is not true - it is possible to find cases where a given non-admissible heuristic returns the optimal result for a specific graph (simple counter-example: a tree with only one path will return the optimal path for any heuristic). Thus they are not necessary conditions.
As a side note, 'consistent' implies 'admissible', and trees are a type of graph, so it is enough to prove the "admissible + graph" case, and all four cases (admissible/consistent, tree/graph) are immediately implied.

z3 alternative for Gecode branch() function?

In constraint solver like Gecode , We can control the exploration of search space with help of branching function. for e.g. branch(home , x , INT_VAL_MIN ) This will start exploring the search space from the minimum possible value of variable x in its domain and try to find solution.(There are many such alternatives .)
For z3, do we have this kind of flexibility in-built ?? Any alternative possible??
SMT solvers usually do not allow for these sorts of "hints" to be given, they act more as black-boxes.
Having said that, each solver uses a ton of internal heuristics, and z3 itself has a number of settings that you can play with to give it hints. If you run:
z3 -pd
it will display all the options you can provide, and there are literally over 600 of them! Unfortunately, these options are not really well documented, and how they impact the solver is rather cryptic. The only reliable way to find out would be to study the source code and see what they do, which isn't for the faint of heart. But in any case, it will not be as obvious as the branch feature you cite for gecode.
There are, however, other tricks one can use to speed up solving for SMT-solvers, unfortunately, these things are usually very problem-specific. If you post specific instances, you might get better suggestions.

Profanity filter import

I am looking to write a basic profanity filter in a Rails based application. This will use a simply search and replace mechanism whenever the appropriate attribute gets submitted by a user. My question is, for those who have written these before, is there a CSV file or some database out there where a list of profanity words can be imported into my database? We are submitting the words that we will replace the profanities with on our own. We more or less need a database of profanities, racial slurs and anything that's not exactly rated PG-13 to get triggered.
As the Tin Man suggested, this problem is difficult, but it isn't impossible. I've built a commercial profanity filter named CleanSpeak that handles everything mentioned above (leet speak, phonetics, language rules, whitelisting, etc). CleanSpeak is capable of filtering 20,000 messages per second on a low end server, so it is possible to build something that works well and performs well. I will mention that CleanSpeak is the result of about 3 years of on-going development though.
There are a few things I tell everyone that is looking to try and tackle a language filter.
Don't use regular expressions unless you have a small list and don't mind a lot of things getting through. Regular expressions are relatively slow overall and hard to manage.
Determine if you want to handle conjugations, inflections and other language rules. These often add a considerable amount of time to the project.
Decide what type of performance you need and whether or not you can make multiple passes on the String. The more passes you make the slow your filter will be.
Understand the scunthrope and clbuttic problems and determine how you will handle these. This usually requires some form of language intelligence and whitelisting.
Realize that whitespace has a different meaning now. You can't use it as a word delimiter any more (b e c a u s e of this)
Be careful with your handling of punctuation because it can be used to get around the filter (l.i.k.e th---is)
Understand how people use ascii art and unicode to replace characters (/ = v - those are slashes). There are a lot of unicode characters that look like English characters and you will want to handle those appropriately.
Understand that people make up new profanity all the time by smashing words together (likethis) and figure out if you want to handle that.
You can search around StackOverflow for my comments on other threads as I might have more information on those threads that I've forgotten here.
Here's one you could use: Offensive/Profane Word List from CMU site
Based on personal experience, you do understand that it's an exercise in futility?
If someone wants to inject profanity, there's a slew of words that are innocent in one context, and profane in another so you'll have to write a context parser to avoid black-listing clean words. A quick glance at CMU's list shows words I'd never consider rude/crude/socially unacceptable. You'll see there are many words that could be proper names or nouns, countries, terms of endearment, etc. And, there are myriads of ways to throw your algorithm off using L33T speak and such. Search Wikipedia and the internets and you can build tables of variations of letters.
Look at CMU's list and imagine how long the list would be if, in addition to the correct letter, every a could also be 4, o could be 0 or p, e could be 3, s could be 5. And, that's a very, very, short example.
I was asked to do a similar task and wrote code to generate L33T variations of the words, and generated a hit-list of words based on several profanity/offensive lists available on the internet. After running the generator, and being a little over 1/4 of the way through the file, I had over one million entries in my DB. I pulled the plug on the project at that point, because the time spent searching, even using Perl's Regex::Assemble, was going to be ridiculous, especially since it'd still be so easy to fool.
I recommend you have a long talk with whoever requested that, and ask if they understand the programming issues involved, and low-likelihood of accuracy and success, especially over the long-term, or the possible customer backlash when they realize you're censoring them.
I have one that I've added to (obfuscated a bit) but here it is: https://github.com/rdp/sensible-cinema/blob/master/lib/subtitle_profanity_finder.rb

do record_info and tuple_to_list return the same key order in Erlang?

I.e, if I have a record
-record(one, {frag, left}).
Is record_info(fields, one) going to always return [frag,
left]?
Is tl(tuple_to_list(#one{frag = "Frag", left = "Left"}))
always gonna be ["Frag", "Left"]?
Is this an implementation detail?
Thanks a lot!
The short answer is: yes, as of this writing it will work. The better answer is: it may not work that way in the future, and the nature of the question concerns me.
It's safe to use record_info/2, although relying on the order may be risky and frankly I can't think of a situation where doing so makes sense which implies that you are solving a problem the wrong way. Can you share more details about what exactly you are trying to accomplish so we can help you choose a better method? It could be that simple pattern matching is all you need.
As for the example with tuple_to_list/1, I'll quote from "Erlang Programming" by Cesarini and Thompson:
"... whatever you do, never, ever use the tuple representations of records in your programs. If you do, the authors of this book will disown you and deny any involvement in helping you learn Erlang."
There are several good reasons why, including:
Your code will become brittle - if you later change the number of fields or their order, your code will break.
There is no guarantee that the internal representation of records will continue to work this way in future versions of erlang.
Yes, order is always the same because records represented by tuples for which order is an essential property. Look also on my other answer about records with examples: Syntax Error while accessing a field in a record
Yes, in both cases Erlang will retain the 'original' order. And yes it's implementation as it's not specifically addressed in the function spec or documentation, though it's a pretty safe bet it will stay like that.

Probabilistic Generation of Semantic Networks

I've studied some simple semantic network implementations and basic techniques for parsing natural language. However, I haven't seen many projects that try and bridge the gap between the two.
For example, consider the dialog:
"the man has a hat"
"he has a coat"
"what does he have?" => "a hat and coat"
A simple semantic network, based on the grammar tree parsing of the above sentences, might look like:
the_man = Entity('the man')
has = Entity('has')
a_hat = Entity('a hat')
a_coat = Entity('a coat')
Relation(the_man, has, a_hat)
Relation(the_man, has, a_coat)
print the_man.relations(has) => ['a hat', 'a coat']
However, this implementation assumes the prior knowledge that the text segments "the man" and "he" refer to the same network entity.
How would you design a system that "learns" these relationships between segments of a semantic network? I'm used to thinking about ML/NL problems based on creating a simple training set of attribute/value pairs, and feeding it to a classification or regression algorithm, but I'm having trouble formulating this problem that way.
Ultimately, it seems I would need to overlay probabilities on top of the semantic network, but that would drastically complicate an implementation. Is there any prior art along these lines? I've looked at a few libaries, like NLTK and OpenNLP, and while they have decent tools to handle symbolic logic and parse natural language, neither seems to have any kind of proabablilstic framework for converting one to the other.
There is quite a lot of history behind this kind of task. Your best start is probably by looking at Question Answering.
The general advice I always give is that if you have some highly restricted domain where you know about all the things that might be mentioned and all the ways they interact then you can probably be quite successful. If this is more of an 'open-world' problem then it will be extremely difficult to come up with something that works acceptably.
The task of extracting relationship from natural language is called 'relationship extraction' (funnily enough) and sometimes fact extraction. This is a pretty large field of research, this guy did a PhD thesis on it, as have many others. There are a large number of challenges here, as you've noticed, like entity detection, anaphora resolution, etc. This means that there will probably be a lot of 'noise' in the entities and relationships you extract.
As for representing facts that have been extracted in a knowledge base, most people tend not to use a probabilistic framework. At the simplest level, entities and relationships are stored as triples in a flat table. Another approach is to use an ontology to add structure and allow reasoning over the facts. This makes the knowledge base vastly more useful, but adds a lot of scalability issues. As for adding probabilities, I know of the Prowl project that is aimed at creating a probabilistic ontology, but it doesn't look very mature to me.
There is some research into probabilistic relational modelling, mostly into Markov Logic Networks at the University of Washington and Probabilstic Relational Models at Stanford and other places. I'm a little out of touch with the field, but this is is a difficult problem and it's all early-stage research as far as I know. There are a lot of issues, mostly around efficient and scalable inference.
All in all, it's a good idea and a very sensible thing to want to do. However, it's also very difficult to achieve. If you want to look at a slick example of the state of the art, (i.e. what is possible with a bunch of people and money) maybe check out PowerSet.
Interesting question, I've been doing some work on a strongly-typed NLP engine in C#: http://blog.abodit.com/2010/02/a-strongly-typed-natural-language-engine-c-nlp/ and have recently begun to connect it to an ontology store.
To me it looks like the issue here is really: How do you parse the natural language input to figure out that 'He' is the same thing as "the man"? By the time it's in the Semantic Network it's too late: you've lost the fact that statement 2 followed statement 1 and the ambiguity in statement 2 can be resolved using statement 1. Adding a third relation after the fact to say that "He" and "the man" are the same is another option but you still need to understand the sequence of those assertions.
Most NLP parsers seem to focus on parsing single sentences or large blocks of text but less frequently on handling conversations. In my own NLP engine there's a conversation history which allows one sentence to be understood in the context of all the sentences that came before it (and also the parsed, strongly-typed objects that they referred to). So the way I would handle this is to realize that "He" is ambiguous in the current sentence and then look back to try to figure out who the last male person was that was mentioned.
In the case of my home for example, it might tell you that you missed a call from a number that's not in its database. You can type "It was John Smith" and it can figure out that "It" means the call that was just mentioned to you. But if you typed "Tag it as Party Music" right after the call it would still resolve to the song that's currently playing because the house is looking back for something that is ITaggable.
I'm not exactly sure if this is what you want, but take a look at natural language generation wikipedia, the "reverse" of parsing, constructing derivations that conform to the given semantical constraints.

Resources