I want to write article about frameworks like ROS and industrial control operating systems. One of my chapters must contains some mathematical that uses from calculating trajectory. Is there is the way to get math formulas that has been used in MoveIt while calculating movements?
MoveIt is a big ROS package. By default, it uses the OMPL library to do the motion planning. Within Moveit, many different planners from OMPL are accessible (RRT, RRT*, PRM, etc). So if you want mathematical formulas, you'll first need to look for the planner you've been using with MoveIt, and then look for detailed information about it on the web.
Related
I'm trying to find the best way to compare two text documents using AI and machine learning methods. I've used the TF-IDF-Cosine Similarity and other similarity measures, but this compares the documents at a word (or n-gram) level.
I'm looking for a method that allows me to compare the meaning of the documents. What is the best way to do that?
You should start reading about word2vec model.
use gensim, get the pretrained model of google.
For vectoring a document, use Doc2vec() function.
After getting vectors for all your document, use some distance metric like cosine distance or euclidean distance for comparison.
This is very difficult. There is actually no computational definition of "meaning". You should dive into text mining, summarization and libraries like gensim, spacy or pattern.
In my opinion, the more readily useable libraries available out there ie. higher return on investesment (ROI), that is if you are a newbie you might want to look at tools around chatbots they want to extract from natural language structured data. That is what is the most similar to "meaning". One example free software tool to achieve that is rasa natural language understanding.
The drawback of such tools is that they somewhat work but only in the domain where they were trained and prepared to work. And in particular they do not aim at comparing documents like you want.
I'm trying to find the best way to compare two text documents using AI
You must come up with a more precise task and from there find out which technic apply best to your use case. Do you want to classify documents in predefined categories. Do you to compute some similarity between two documents? Given an input document, do you want to find most similar documents in a database. Do you want to extract important topics or keywords in the document? Do you want to summarize the document? Is it an abstractig summary or key phrase extraction?
In particular, there is no software that allows to extract somekind of semantic fingerprint from any document. Depending on the end goal, the way to achieve it might be completly different.
You must narrow the precise goal you are trying to achieve; From there, you will be able to ask another question (or improve this one) to describe precisly your goal.
Text understanding is AI-Complete. So, just saying to the computer "tell me something about this two documents" doesn't work.
Like other have said, word2vec and other word embeddings are tools to achieve many goals in NLP but it only a mean for an end. You must define the input and output of the system you are trying to design to be able to start working on the implementation.
There is two other Stack Overflow communities that you might want to dig:
Linguistics
Data Science
Given the tfidf value for each token in your corpus (or the most meaningful ones) you can compute a sparse representation for a document.
This is implemented in the sklearn TFIDFVectorizer.
As other users have pointed out, this is not the best solution to your task.
You should take into account embeddings.
The easiest solution consists in using an embedding at the words level, such as the one provided by the FastText framework.
Then you can create an embedding for the whole document by summing together the embedding of the single words which compose it.
An alterative consists in training an embedding directly at the document level, using some Doc2Vec framework such as the gensim or DL4J one.
Also you can use LDA Or LSI Models for text corpus. these methods(and other methods like wor2vec and doc2vec) can summarize documents to fixed length vectors with respect to it's meaning and topics that this document belongs to.
read more:
https://radimrehurek.com/gensim/models/ldamodel.html
I heard there are three approaches from Dr. Golden:
- Cosine Angular Separation
- Hamming Distance
- Latent Semantic Analysis (LSA) or Latent Semantic Indexing (LSI)
These methods are based on semantic similarity.
I also heard some company used tool called Spacy to summarize document to compare each other.
I want to calculate the coefficients of Kaiser FIR Filter. I have seen some implementation but they limit filter order to be less than 20, i dont want limit on this and maybe i can use order of 19 or 89 for experimentation.
I need some tool for this. Link below shows GUI of tool (java applet) to provide filter coefficient calculation but applet is not available now. Has any one got this or similar tool to share.
This tool is nice with minimum inputs but unable to find it for download or online calculation
Thanks
If you have access to MATLAB or a computer with MATLAB, you can use the fdatool command that will open a new window and allow you to design filters according to your specifications.
Once MATLAB creates the filter, there are many functionalities to use, including one where you can extract the coefficients and view them.
I've heard of a max flow min cut method for sharding or segmenting a graph database. Does someone have a sample cypher query that can do that say against the movielens dataset? Basically I want to segment users into different shards/clusters based on what they like so maybe the min cuts can naturally find clusters of users around the genres say Horror, Drama, or maybe it will create non-intuitive clusters/segments like hipster/romantics and conservative/comedy/horror groups.
my short answer is no, sorry I don't know how you would express that.
my longer answer is even if this were possible - which it very well may be - I would advise against it.
multiple algorithms 'do' min-cut max-flow, these will all have different performance characteristics and, because clustering is computationally expensive, I'd guess you want control over the specific algorithm implementation used.
Cypher is a declarative language, you specify what you're looking for but not how to do it, and it will be difficult to specify such a complex problem in a way that the Cypher engine can figure out what you're trying to do. that will make it hard for Cypher (or any declarative language engine) to produce an efficient query plan.
my suggestion is find the specific algorithm you wish to use and implement it using the Neo4j Java API.
if you're running Neo4j in embedded mode you're done at that point. if you're running Neo4j server you'll then just have to run that code as an Unmanaged Server Extension
AFAIK you're after 'Community Detection' algorithms. There are non-overlapping (communities do not overlap) and overlapping variants, where non-overlapping is generally easier to implement and understand. Common algorithms are:
Non-overlapping: Louvain
Overlapping: Label Propagation Algorithm (LPA) (typically non-overlapping, but there are extensions to make it overlapping)
Here are a few C++ code examples for the algorithms: Louvain, Oslom (overlapping), LPA (non-overlapping), and Infomap)
And if you want bleeding edge I was recommended the SCD algorithm
Academic paper: "High Quality, Scalable and Parallel Community Detection for Large Real Graphs"
C++ implementation
I have a bunch of text documents that describe diseases. Those documents are in most cases quite short and often only contain a single sentence. An example is given here:
Primary pulmonary hypertension is a progressive disease in which widespread occlusion of the smallest pulmonary arteries leads to increased pulmonary vascular resistance, and subsequently right ventricular failure.
What I need is a tool that finds all disease terms (e.g. "pulmonary hypertension" in this case) in the sentences and maps them to a controlled vocabulary like MeSH.
Thanks in advance for your answers!
Here are two pipelines that are specifically designed for medical document parsing:
Apache cTAKES
NLM's MetaMap
Both use UMLS, the unified medical language system, and thus require that you have a (free) license. Both are Java and more or less easy to set up.
See http://www.ebi.ac.uk/webservices/whatizit/info.jsf
Whatizit is a text processing system that allows you to do textmining
tasks on text. The tasks come defined by the pipelines in the drop
down list of the above window and the text can be pasted in the text
area.
You could also ask biostars: http://www.biostars.org/show/questions/
there are many tools to do that. some popular ones:
NLTK (python)
LingPipe (java)
Stanford NER (java)
OpenCalais (web service)
Illinois NER (java)
most of them come with some predefined models, i.e. they've already been trained on some general datasets (news articles, etc.). however, your texts are pretty specific, so you might want to first constitute a corpus and re-train one of those tools, in order to adjust it to your data.
more simply, as a first test, you can try a dictionary-based approach: design a list of entity names, and perform some exact or approximate matching. for instance, this operation is decribed in LingPipe's tutorial.
Open Targets has a module for this as part of LINK. It's not meant to be used directly so it might require some hacking and tinkering, but it's the most complete medical NER (named entity recognition) tool I've found for python. For more info, read their blog post.
a bash script that has as example a lexicon generated from the disease ontology:
https://github.com/lasigeBioTM/MER
I have to start work on application for analysis of satellite imagery to identify some man made structure. I would like to use C or Java for this.
For satellite I am planning to use Google Maps data.
I have three questions here:
What is best source for GIS data besides Google Maps/earth.
Best language to write such an application considering i will have to use third-party APIs
Is there a open image processing engine available which identifies man made structures?
Thats a lot of questions but I hope the smarter guys here can help me here.
Overly processed imagery such as Google or Bing maps is a horrible source of imagery for performing feature extraction or feature recognition. Usually, you want the most unprocessed, raw form possible with camera models... of course, if you don't have access to this sort of data, then you have to work with what you have.
A more important consideration of Google Maps/Earth imagery is that you may run afoul of their License Agreement. I suggest you check it before you decide on their data as your imagery source. In particular, if you bypass their API's, you've violated their license agreement.
As far as libraries and langauges, there are dozens of machine vision libraries available. I can't recommend one over the other as I've only been a down-stream consumer of their results. My understanding of the problem is that the biggest concern is how you build the "models" to compare against... i.e. how do you give the system an "example" of what you're looking for.
Once you've found a library, then you can make a decision on the language. Generally, a high-level language like Python or Matlab is used for this kind of prototyping. Once a method has been found, then conversion to a "higher performance" language is done--if necessary.
Personally, I'd probably use Python because (1) it's freely available, (2) has a significant community in the scientific and research worlds, and (3) can interop with a wide variety of languages and platforms.
Specifically, check out Glovis: http://glovis.usgs.gov/
You can browse the earth, and download maps from several different satellites and sensors. Even though you have to go through a bogus "ordering" process, the imagery is free.
You may find the USGS (United States Geological Survey) website helpful. They provide both GIS information and a wide range of data sets.
I agree with James Schek. Google gives you RGB images - not the most helpful fot your task. Most imagery will have a couple of additional channels that may be better suited for you. Different channels show different features, water, urban areas, types of foliage etc. For example an infra-red channel could be used to pick out buildings in a cool climate. If you contact several data provider they may be able to recommend the best channels to use in their data.
Ariel imagery can be huge, numerous terrabytes for a detailed world database. Carefully consider how much information you need to process. If you are only doing a few square miles performance is not an issue. If you are processing thousands of square miles, performance becomes an issue. Processing millions, performance is mission critical and must be considered from day one.
Knowing the number of channels you need to process, your performance requirements and the file format of your data, look around for libraries that fulfil all your requirements. Many of them are written in C/C++ so using a language that interops with them both could be helpful
Take a look at this demo:
Finding Vegetation in a Multispectral Image
, part of the Image Processing Toolbox in MATLAB. It is related to your problem of analysing satellite images to find specific patterns.
I believe it's an excellent example of the sort of things you can achieve easily with MATLAB using very little code.