Procedural Hash Function - hyperlink

I am wondering what is the best hash function for procedural textures, especially perlin noise. I know about the PRNG posted on this page, but this claims that it is not a good PRNG
Thanks

I use the "Mersenne Twister with improved initialization" PRNG for my implementation perlin-noise and other procedural textures (http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html)
It is very efficient and the randomness is very good.

Related

Image procssing - What are kernel space, functions and data?

I'm reading Kernelized Locality-Sensitive Hashing, which is obviously based on the concept of kernel applied to space, functions and data.
I'm not confident with this concept in math and image processing too (since it's not my domain, sorry if I'm naive), so please can someone help me to understand it?
I found an exhaustive and simple article about what kernel functions are in this article (that I strongly suggest).

What is the absolute best theoretical lossless compression of data possible?

To start with:
Assume that the algorithm takes finite space.
Assume that the computational resources are infinite.
What form would the result of such compression take? My intuition tells me it would it be some form of a pRNG-like algorithm with an irreducible seed that gives rise to the compressed data. Could there be something even more efficient?
Now what if we assume all resources are finite. Would the problem of perfect compression equate to the problem of perfect pattern recognition? What form would the result of such compression take? Factorization into primes? Something else? And would having such an algorithm imply that the problem of AI has been cracked?
As a side question, has there been successful attempts to use machine learning for data compression?
There is a mathematical proof that your question cannot be answered in general. The best compression possible is not computable. See Kolmogorov complexity.
Compression only works when the data can be modeled in some way to expose redundancy.

Delphi mathematical optimisation techniques including simplex or genetic algorithms, etc

Just wondering whether anyone has recommendations for optimisation utilities for Delphi.
eg Simplex, Genetic algorithms etc...
Basically I need to optimise my overarching model as a complete black box function, with input variables like tilt angle or array size, within pre-determined boundaries. Output is usually a smooth curve, and usually with no false summits.
The old NR Pascal stuff is looking a bit dated (no functions as variables etc).
Many thanks, Brian
I found a program, written in Pascal, that simulates the Simplex method. It's a little old, but you may convert it into Delphi. You can find it
here
I hope it's of some use to you.
PS: If you have some cash to spend, try
here
TSimplex Class
https://iie.fing.edu.uy/svn/SimSEE/src/rchlib/mat/usimplex.pas
For Mixed INteger Simplex
TMIPSimplex Class
https://iie.fing.edu.uy/svn/SimSEE/src/rchlib/mat/umipsimplex.pas
User: simsee_svn
Password: publico

Clustering a huge number of URLs

I have to find similar URLs like
'http://teethwhitening360.com/teeth-whitening-treatments/18/'
'http://teethwhitening360.com/laser-teeth-whitening/22/'
'http://teethwhitening360.com/teeth-whitening-products/21/'
'http://unwanted-hair-removal.blogspot.com/2008/03/breakthroughs-in-unwanted-hair-remo'
'http://unwanted-hair-removal.blogspot.com/2008/03/unwanted-hair-removal-products.html'
'http://unwanted-hair-removal.blogspot.com/2008/03/unwanted-hair-removal-by-shaving.ht'
and gather them in groups or clusters. My problems:
The number of URLs is large (1,580,000)
I don't know which clustering or method of finding similarities is better
I would appreciate any suggestion on this.
There are a few problems at play here. First you'll probably want to wash the URLs with a dictionary, for example to convert
http://teethwhitening360.com/teeth-whitening-treatments/18/
to
teeth whitening 360 com teeth whitening treatments 18
then you may want to stem the words somehow, eg using the Porter stemmer:
teeth whiten 360 com teeth whiten treatment 18
Then you can use a simple vector space model to map the URLs in an n-dimensional space, then just run k-means clustering on them? It's a basic approach but it should work.
The number of URLs involved shouldn't be a problem, it depends what language/environment you're using. I would think Matlab would be able to handle it.
Tokenizing and stemming are obvious things to do. You can then turn these vectors into TF-IDF sparse vector data easily. Crawling the actual web pages to get additional tokens is probably too much work?
After this, you should be able to use any flexible clustering algorithm on the data set. With flexible I mean that you need to be able to use for example cosine distance instead of euclidean distance (which does not work well on sparse vectors). k-means in GNU R for example only supports Euclidean distance and dense vectors, unfortunately. Ideally, choose a framework that is very flexible, but also optimizes well. If you want to try k-means, since it is a simple (and thus fast) and well established algorithm, I belive there is a variant called "convex k-means" that could be applicable for cosine distance and sparse tf-idf vectors.
Classic "hierarchical clustering" (apart from being outdated and performing not very well) is usually a problem due to the O(n^3) complexity of most algorithms and implementations. There are some specialized cases where a O(n^2) algorithm is known (SLINK, CLINK) but often the toolboxes only offer the naive cubic-time implementation (including GNU R, Matlab, sciPy, from what I just googled). Plus again, they often will only have a limited choice of distance functions available, probably not including cosine.
The methods are, however, often easy enough to implement yourself, in an optimized way for your actual use case.
These two research papers published by Google and Yahoo respectively go into detail on algorithms for clustering similar URLs:
http://www.google.com/patents/US20080010291
http://research.yahoo.com/files/fr339-blanco.pdf

CUDA vs. CuBlas memory management

I have noticed that I can use memory blocks for matrices either allocated using cudamalloc() or cublasalloc() function to call cublas functions. The matrix transfer rates and computational are slower for arrays allocated using cudamalloc() rather than cublasalloc(), although there are other advantages to using arrays using cudamalloc(). Why is that the case? It would be great to hear some comments.
cublasAlloc is essentially a wrapper around cudaMalloc() so there should be no difference, is there anything else that changes in your code?

Resources