Resources on realistic procedural leaf generation with variegation - procedural-generation

I know there exists a lot of material on the procedural generation of plants and the like, especially using L-systems etc.
But after a quick research, I haven't been able to find any good / in-depth material on the structure of leaves. Specifically, though I've found a few articles on the shapes of leaves, but nothing on variegation.
Are there any good / in-depth resources on the realistic generation of leaf variegation?
I'm looking to be able to generate leaves such as in this image:
(I know this question might be pushing the boundaries of a good Stackoverflow question, but it is specific in the sense that a specific type of resource is asked for (variegation in leaves).)

For something emerging or constructural I would recommend cellular automata of many kinds. After a given amount of steps + some rules (constraints ...), interesting patterns that resemble natural ones can emerge.
https://pdfs.semanticscholar.org/c82a/8dd17ea8d6a0c35a82f573da51869cfb4bc4.pdf
Also see shells:
https://tickblog.files.wordpress.com/2008/12/shell-automata.png?w=485&h=186
For leafs I think you could have a central symmetry + some rules concerning water propagation from the central stem with branching rules ... so maybe something between a cellular automata and graphs/l-systems ?

One way is to generate a mask & combine it with some noise. Here's a simple example that uses a distance from center mask:
+ =
Experimenting with these techniques, I was able to generate this leaf half:
This general technique is an easy & popular way make procedurally generated islands & searching for that term will bring up a number of tutorials. It may take a bit of tweaking to get the specific results you're after. For instance, my proof of concept doesn't account for interior leaf veins & my colors don't match the target. I made my mask by repeatedly blurring a black & white image of the leaf silhouette - I suspect a more sophisticated mask generation technique would give better results.

Related

Grouping points that represent lines

I am looking for an Algorithm that is able to solve this problem.
The problem:
I have the following set points:
I want to group the points that represents a line (with some epsilon) in one group.
So, the optimal output will be something like:
Some notes:
The point belong to one and only line.
If the point can be belong to two lines, it should belong to the strongest.
A line is considered stronger that another when it has more belonging points.
The algorithm should not cover all points because they may be outliers.
The space contains many outliers it may hit 50% of the the total space.
Performance is critical, Real-Time is a must.
The solutions I found till now:
1) Dealing with it as clustering problem:
The main drawback of this method is that there is no direct distance metric between points. The distance metric is on the cluster itself (how much it is linear). So, I can not use traditional clustering methods and I have to (as far as I thought) use some kind of, for example, clustering us genetic algorithm where the evaluation occurs on the while cluster not between two points. I also do not want to use something like Genetic Algorithm While I am aiming real-time solution.
2) accumulative pairs and then do clustering:
While It is hard to make clustering on points directly, I thought of extracting pairs of points and then try to cluster them with others. So, I have a distance between two pairs that can represents the linearity (two pairs are in real 4 points).
The draw-back of this method is how to choose these pairs? If I depend on the Ecledian-Distance between them, it may not be accurate because two points may be so near to each other but they are so far from making a line with others.
I appreciate any solution, suggest, clue or note. Please you may ask about any clarification.
P.S. You may use any ready OpenCV function in thinking of any solution.
As Micka advised, I used Sequential-RANSAC to solve my problem. Results were fantastic and exactly as I want.
The idea is simple:
Apply RANSAC with fit-line model on the points.
Delete all points that are in-liers of the output of RANSAC.
While there are 2 or more points go to 1.
I have implemented my own fit-line RANSAC but unfortnantly I can not share code because it belongs to the company I work for. However, there is an excellent fit-line RANSAC here on SO that was implemented by Srinath Sridhar. The link of the post is : RANSAC-like implementation for arbitrary 2D sets.
It is easy to make a Sequential-RANSAC depending on the 3 simple steps I mentioned above.
Here are some results:

Definition of integration point in Abaqus

I need to know the definition of "integration points" in abaqus subroutines.
I'm new to abaqus software and I'm waiting for your help
It is now 2.5 years after the OP asked this question, so my answer is probably more for anyone who has followed a link here, hoping for some insight. On the grounds that FEM programming is special,0 I will try to answer this question rather than flag it as off-topic. Anyway, some of my answer is applicable to FEM in general, some is specific to Abaqus.
Quick check:
If you're only asking for the specific numerical value to use for the (usual or standard) location of integration points, then the answer is that it depends. Luckily, standard values are widely available for a variety of elements (see resources below).
However, I assume you're asking about writing a User-Element (UEL) subroutine but are not yet familiar with how elements are formulated, or what an integration point is.
The answer: In the standard displacement-based FEM the constitutive response of an individual finite element is usually obtained by numerical integration (aka quadrature) at one or more points on or within the element. How many and where these points are located depends on the element type, certain performance tradeoffs, etc, and the particular integration technique being used. Integration techniques that I have seen used for continuum (solid) finite elements include:
More Common: Gauss integration -- the number & position of sampling points are determined by the Gauss quadrature rule used; nodes are not included in the sampling domain of [-1,1].
Less Common: Newton-Cotes integration -- evenly spaced sampling points; includes the nodes in the sampling domain of (-1,1).
In my experience, the standard practice by far is to use Gauss quadrature or reduced integration methods (which are often variations of Gauss quadrature). In Gauss quadrature, the location of the integration points are taken at special ("optimal") points within the element known as Gauss points which have been shown to provide a high level of reliably accurate solutions for a given level of computational expense - at least for the typical polynomial functions used for many isoparametric finite elements. Other integration techniques have been found to be competitive in some cases1 but Gauss quadrature is certainly the gold standard. There are other techniques that I'm not familiar with.
Practical advice: Assuming an isoparametric formulation, in the UEL you use "element shape functions" and the primary field variables defined by the nodal degrees of freedom (with a solid mechanics focus, these are typically the displacements) to calculate the element strains, stresses, etc. at each integration point. If this doesn't make sense to you, see resources below.
Note that if you need the stresses at the nodes (or at any other point) you must extrapolate them from the integration points, again using the shape functions, or calculate/integrate directly at the nodes.
Suggested resources:
Please: If you're writing a user subroutine you should already know what an integration point is. I'm sorry, but that's just how it is. You have to know at least the basics before you attempt to write a UEL.
That said, I think it's great that you're interested in programming for FEA/FEM. If you're motivated but not at university where you can enroll in an FEM course or two, then there are a number of resources available, from Massive Open Online Courses (MOOCs), to a plethora of textbooks - I generally recommend anything written by Zienkiewicz. For a readable yet "solid" introduction with an emphasis on solid mechanics, I like Concepts and Applications of Finite Element Analysis, 4th Edition, by Cook et al (aka the "Cook Book"). Good luck!
0 You typically need a lot of background before you even ask the right questions.
1 Trefethen, 2008, "Is Gauss Quadrature Better than Clenshaw-Curtis?", DOI 10.1137/060659831
Your question is not really clear.
Do you mean in the python environment? You have section points for shell elements which are trough thickness you set these through your shell section. The amount of integration points depend on your element type.
You can find a lot of info in the Abaqus scripting manual. For example
http://www.tu-chemnitz.de/projekt/abq_hilfe/docs/v6.12/books/cmd/default.htm
An integration point in FEM where the primary variables are solved. Just keep that in mind. In user subroutines in Abaqus, the calculation takes place at each integration point. Remember that and go forward. If you are unsatisfied, take a look at any FEM book for the definition/explanation of the integration point. It is not dependent on subroutines.
An integration point is one of the nodal values within an element. For example an eight node C3D8R continuum brick element has eight integration points, one at each corner of the brick.
Also within a subroutine other variables such as state variables, SVARS, or stored at the integration points so if your element has say 4 SVARS you need to keep track of then there will 8 * 4 = 32 SVARS in the entire 8 node element.
I hope this answers your question.

What is the best method to template match a image with noise?

I have a large image (5400x3600) that has multiple CCTVs that I need to detect.
The detection takes lot of time (4-7 minutes) with rotation. But it still fails to resolve certain CCTVs.
What is the best method to match a template like this?
I am using skImage - openCV is not an option for me, but I am open to suggestions on that too.
For example: in the images below, the template is correct matched with the second image - but the first image is not matched - I guess due to the noise created by the text "BLDG..."
Template:
Source image:
Match result:
The fastest method is probably a cascade of boosted classifiers trained with several variations of your logo and possibly a few rotations and some negative examples too (non-logos). You have to roughly scale your overall image so the test and training examples are approximately matched by scale. Unlike SIFT or SURF that spend a lot of time in searching for interest points and creating descriptors for both learning and searching, binary classifiers shift most of the burden to a training stage while your testing or search will be much faster.
In short, the cascade would run in such a way that a very first test would discard a large portion of the image. If the first test passes the others will follow and refine. They will be super fast consisting of just a few intensity comparison in average around each point. Only a few locations will pass the whole cascade and can be verified with additional tests such as your rotation-correlation routine.
Thus, the classifiers are effective not only because they quickly detect your object but because they can also quickly discard non-object areas. To read more about boosted classifiers see a following openCV section.
This problem in general is addressed by Logo Detection. See this for similar discussion.
There are many robust methods for template matching. See this or google for a very detailed discussion.
But from your example i can guess that following approach would work.
Create a feature for your search image. It essentially has a rectangle enclosing "CCTV" word. So the width, height, angle, and individual character features for matching the textual information could be a suitable choice. (Or you may also use the image having "CCTV". In that case the method will not be scale invariant.)
Now when searching first detect rectangles. Then use the angle to prune your search space and also use image transformation to align the rectangles in parallel to axis. (This should take care of the need for the rotation). Then according to the feature choosen in step 1, match the text content. If you use individual character features, then probably your template matching step is essentially a classification step. Otherwise if you use image for matching, you may use cv::matchTemplate.
Hope it helps.
Symbol spotting is more complicated than logo spotting because interest points work hardly on document images such as architectural plans. Many conferences deals with pattern recognition, each year there are many new algorithms for symbol spotting so giving you the best method is not possible. You could check IAPR conferences : ICPR, ICDAR, DAS, GREC (Workshop on Graphics Recognition), etc. This researchers focus on this topic : M Rusiñol, J Lladós, S Tabbone, J-Y Ramel, M Liwicki, etc. They work on several techniques for improving symbol spotting such as : vectorial signatures, graph based signature and so on (check google scholar for more papers).
An easy way to start a new approach is to work with simples shapes such as lines, rectangles, triangles instead of matching everything at one time.
Your example can be recognized by shape matching (contour matching), much faster than 4 minutes.
For good match , you require nice preprocess and denoise.
examples can be found http://www.halcon.com/applications/application.pl?name=shapematch

The options for the first step of document clustering

I checked several document clustering algorithms, such as LSA, pLSA, LDA, etc. It seems they all require to represent the documents to be clustered as a document-word matrix, where the rows stand for document and the columns stand for words appearing in the document. And the matrix is often very sparse.
I am wondering, is there any other options to represent documents besides using the document-word matrix? Because I believe the way we express a problem has a significant influence on how well we can solve it.
As #ffriend pointed out, you cannot really avoid using the term-document-matrix (TDM) paradigm. Clustering methods operates on points in a vector space, and this is exactly what the TDM encodes. However, within that conceptual framework there are many things you can do to improve the quality of the TDM:
feature selection and re-weighting attempt to remove or weight down features (words) that do not contribute useful information (in the sense that your chosen algorithm does just as well or better without these features, or if their counts are decremented). You might want to read more about Mutual Information (and its many variants) and TF-IDF.
dimensionality reduction is about encoding the information as accurately as possible in the TDM using less columns. Singular Value Decomposition (the basis of LSA) and Non-Negative Tensor Factorisation are popular in the NLP community. A desirable side effect is that the TDM becomes considerably less sparse.
feature engineering attempts to build a TDM where the choice of columns is motivated by linguistic knowledge. For instance, you may want to use bigrams instead of words, or only use nouns (requires a part-of-speech tagger), or only use nouns with their associated adjectival modifier (e.g. big cat, requires a dependency parser). This is a very empirical line of work and involves a lot of experimentation, but often yield improved results.
the distributional hypothesis makes if possible to get a vector representing the meaning of each word in a document. There has been work on trying to build up a representation of an entire document from the representations of the words it contains (composition). Here is a shameless link to my own post describing the idea.
There is a massive body of work on formal and logical semantics that I am not intimately familiar with. A document can be encoded as a set of predicates instead of a set of words, i.e. the columns of the TDM can be predicates. In that framework you can do inference and composition, but lexical semantics (the meaning if individual words) is hard to deal with.
For a really detailed overview, I recommend Turney and Pantel's "From Frequency to Meaning : Vector Space Models of Semantics".
You question says you want document clustering, not term clustering or dimensionality reduction. Therefore I'd suggest you steer clear of the LSA family of methods, since they're a preprocessing step.
Define a feature-based representation of your documents (which can be, or include, term counts but needn't be), and then apply a standard clustering method. I'd suggest starting with k-means as it's extremely easy and there are many, many implementations of it.
OK, this is quite a very general question, and many answers are possible, none is definitive
because it's an ongoing research area. So far, the answers I have read mainly concern so-called "Vector-Space models", and your question is termed in a way that suggests such "statistical" approaches. Yet, if you want to avoid manipulating explicit term-document matrices, you might want to have a closer look at the Bayesian paradigm, which relies on
the same distributional hypothesis, but exploits a different theoretical framework: you don't manipulate any more raw distances, but rather probability distributions and, which is the most important, you can do inference based on them.
You mentioned LDA, I guess you mean Latent Dirichlet Allocation, which is the most well-known such Bayesian model to do document clustering. It is an alternative paradigm to vector space models, and a winning one: it has been proven to give very good results, which justifies its current success. Of course, one can argue that you still use kinds of term-document matrices through the multinomial parameters, but it's clearly not the most important aspect, and Bayesian researchers do rarely (if ever) use this term.
Because of its success, there are many software that implements LDA on the net. Here is one, but there are many others:
http://jgibblda.sourceforge.net/

Clustering a huge number of URLs

I have to find similar URLs like
'http://teethwhitening360.com/teeth-whitening-treatments/18/'
'http://teethwhitening360.com/laser-teeth-whitening/22/'
'http://teethwhitening360.com/teeth-whitening-products/21/'
'http://unwanted-hair-removal.blogspot.com/2008/03/breakthroughs-in-unwanted-hair-remo'
'http://unwanted-hair-removal.blogspot.com/2008/03/unwanted-hair-removal-products.html'
'http://unwanted-hair-removal.blogspot.com/2008/03/unwanted-hair-removal-by-shaving.ht'
and gather them in groups or clusters. My problems:
The number of URLs is large (1,580,000)
I don't know which clustering or method of finding similarities is better
I would appreciate any suggestion on this.
There are a few problems at play here. First you'll probably want to wash the URLs with a dictionary, for example to convert
http://teethwhitening360.com/teeth-whitening-treatments/18/
to
teeth whitening 360 com teeth whitening treatments 18
then you may want to stem the words somehow, eg using the Porter stemmer:
teeth whiten 360 com teeth whiten treatment 18
Then you can use a simple vector space model to map the URLs in an n-dimensional space, then just run k-means clustering on them? It's a basic approach but it should work.
The number of URLs involved shouldn't be a problem, it depends what language/environment you're using. I would think Matlab would be able to handle it.
Tokenizing and stemming are obvious things to do. You can then turn these vectors into TF-IDF sparse vector data easily. Crawling the actual web pages to get additional tokens is probably too much work?
After this, you should be able to use any flexible clustering algorithm on the data set. With flexible I mean that you need to be able to use for example cosine distance instead of euclidean distance (which does not work well on sparse vectors). k-means in GNU R for example only supports Euclidean distance and dense vectors, unfortunately. Ideally, choose a framework that is very flexible, but also optimizes well. If you want to try k-means, since it is a simple (and thus fast) and well established algorithm, I belive there is a variant called "convex k-means" that could be applicable for cosine distance and sparse tf-idf vectors.
Classic "hierarchical clustering" (apart from being outdated and performing not very well) is usually a problem due to the O(n^3) complexity of most algorithms and implementations. There are some specialized cases where a O(n^2) algorithm is known (SLINK, CLINK) but often the toolboxes only offer the naive cubic-time implementation (including GNU R, Matlab, sciPy, from what I just googled). Plus again, they often will only have a limited choice of distance functions available, probably not including cosine.
The methods are, however, often easy enough to implement yourself, in an optimized way for your actual use case.
These two research papers published by Google and Yahoo respectively go into detail on algorithms for clustering similar URLs:
http://www.google.com/patents/US20080010291
http://research.yahoo.com/files/fr339-blanco.pdf

Resources