Measure distance by RSSI in veins4.4 Omnet++5 SUMO0.25 - localization

I am a master student working with localization in VANEts
in this moment I am working on a trilateration method based on RSSI for
Cooperative Positioning (CP).
I am considering the Analogue Model : Simple Path Loss Model
But I have some doubts in how to calculate the distance correctly for a determined Phy Model.
I spent some time (one day) reading some papers of Dr. Sommer about the PHY models included in veins.
Would anyone help-me with this solution?
I need a way to:
1) Measure the power of an receiver when its receive a beacon (I found this in the Decider class).
In the Decider802.11p the received Power can be obtained with this line in method Decider80211p::processSignalEnd(AirFrame* msg):
double recvPower_dBm = 10*log10(signal.getReceivingPower()->getValue(start));
2) Apply a formula of RSSI accordingly the phy model in order to achieve a distance estimation between transmiter and receiver.
3) Asssociate this measure (distance by RSSI) with the Wave Short Message to be delivered in AppLayer of the receiver (that is measuring the RSSI).
After read the paper "On the Applicability of Two-Ray Path Loss Models for Vehicular Network Simulation"
and the paper "A Computationally Inexpensive Empirical Model of IEEE 802.11p Radio Shadowing in Urban Environments"
and investigating how it works in the veins project. I noticed that each analogue model have your own path loss model
with your own variables to describe the model.
For example for the SimplePathLossModel we have these
variables defined on AnalogueModels folder of veins modules:
lambda = 0.051 m (wave length to IEEE 802.11p CCH center frequency of 5.890 GHz)
A constant alpha = 2 (default value used)
a distance factor given by pow(sqrDistance, -pathLossAlphaHalf) / (16.0 * M_PI * M_PI);
I found one formula for indoor environments in this link, but I am in doubt if it is applicable for vehicular environments.
Any clarification is welcome. Thanks a lot.

Technically, you are correct. Indeed, you could generate a simple look-up table: have one vehicle drive past another one, record distance and RSSIs, and you have a table that can map RSSI to mean distance (without knowing how the TX power, antenna gains, path loss model, fading models, etc, are configured).
In the simplest case, if you assume that antennas are omnidirectional, that path loss follows the Friis transmission equation, that no shadow fading occurs, and that fast fading is negligible, your table will be perfect.
In a more complicated case, where your simulation also includes probabilistic fast fading (say, a Nakagami model), shadow fading due to radio obstacles (buildings), etc. your table will still be roughly correct, but less so.
It is important to consider a real-life application, though. Consider if your algorithm still works if conditions change (more reflective road surface changing reflection parameters, buildings blocking more or less power, antennas with non-ideal or even unknown gain characteristics, etc).

Related

Best strategy to reduce false positives: Google's new Object Detection API on Satellite Imagery

I'm setting up the new Tensorflow Object Detection API to find small objects in large areas of satellite imagery. It works quite well - it finds all 10 objects I want, but I also get 50-100 false positives [things that look a little like the target object, but aren't].
I'm using the sample config from the 'pets' tutorial, to fine-tune the faster_rcnn_resnet101_coco model they offer. I've started small, with only 100 training examples of my objects (just 1 class). 50 examples in my validation set. Each example is a 200x200 pixel image with a labeled object (~40x40) in the center. I train until my precision & loss curves plateau.
I'm relatively new to using deep learning for object detection. What is the best strategy to increase my precision? e.g. Hard-negative mining? Increase my training dataset size? I've yet to try the most accurate model they offer faster_rcnn_inception_resnet_v2_atrous_coco as i'd like to maintain some speed, but will do so if needed.
Hard-negative mining seems to be a logical step. If you agree, how do I implement it w.r.t setting up the tfrecord file for my training dataset? Let's say I make 200x200 images for each of the 50-100 false positives:
Do I create 'annotation' xml files for each, with no 'object' element?
...or do I label these hard negatives as a second class?
If I then have 100 negatives to 100 positives in my training set - is that a healthy ratio? How many negatives can I include?
I've revisited this topic recently in my work and thought I'd update with my current learnings for any who visit in the future.
The topic appeared on Tensorflow's Models repo issue tracker. SSD allows you to set the ratio of how many negative:postive examples to mine (max_negatives_per_positive: 3), but you can also set a minimum number for images with no postives (min_negatives_per_image: 3). Both of these are defined in the model-ssd-loss config section.
That said, I don't see the same option in Faster-RCNN's model configuration. It's mentioned in the issue that models/research/object_detection/core/balanced_positive_negative_sampler.py contains the code used for Faster-RCNN.
One other option discussed in the issue is creating a second class specifically for lookalikes. During training, the model will attempt to learn class differences which should help serve your purpose.
Lastly, I came across this article on Filter Amplifier Networks (FAN) that may be informative for your work on aerial imagery.
===================================================================
The following paper describes hard negative mining for the same purpose you describe:
Training Region-based Object Detectors with Online Hard Example Mining
In section 3.1 they describe using a foreground and background class:
Background RoIs. A region is labeled background (bg) if its maximum
IoU with ground truth is in the interval [bg lo, 0.5). A lower
threshold of bg lo = 0.1 is used by both FRCN and SPPnet, and is
hypothesized in [14] to crudely approximate hard negative mining; the
assumption is that regions with some overlap with the ground truth are
more likely to be the confusing or hard ones. We show in Section 5.4
that although this heuristic helps convergence and detection accuracy,
it is suboptimal because it ignores some infrequent, but important,
difficult background regions. Our method removes the bg lo threshold.
In fact this paper is referenced and its ideas are used in Tensorflow's object detection losses.py code for hard mining:
class HardExampleMiner(object):
"""Hard example mining for regions in a list of images.
Implements hard example mining to select a subset of regions to be
back-propagated. For each image, selects the regions with highest losses,
subject to the condition that a newly selected region cannot have
an IOU > iou_threshold with any of the previously selected regions.
This can be achieved by re-using a greedy non-maximum suppression algorithm.
A constraint on the number of negatives mined per positive region can also be
enforced.
Reference papers: "Training Region-based Object Detectors with Online
Hard Example Mining" (CVPR 2016) by Srivastava et al., and
"SSD: Single Shot MultiBox Detector" (ECCV 2016) by Liu et al.
"""
Based on your model config file, the HardMinerObject is returned by losses_builder.py in this bit of code:
def build_hard_example_miner(config,
classification_weight,
localization_weight):
"""Builds hard example miner based on the config.
Args:
config: A losses_pb2.HardExampleMiner object.
classification_weight: Classification loss weight.
localization_weight: Localization loss weight.
Returns:
Hard example miner.
"""
loss_type = None
if config.loss_type == losses_pb2.HardExampleMiner.BOTH:
loss_type = 'both'
if config.loss_type == losses_pb2.HardExampleMiner.CLASSIFICATION:
loss_type = 'cls'
if config.loss_type == losses_pb2.HardExampleMiner.LOCALIZATION:
loss_type = 'loc'
max_negatives_per_positive = None
num_hard_examples = None
if config.max_negatives_per_positive > 0:
max_negatives_per_positive = config.max_negatives_per_positive
if config.num_hard_examples > 0:
num_hard_examples = config.num_hard_examples
hard_example_miner = losses.HardExampleMiner(
num_hard_examples=num_hard_examples,
iou_threshold=config.iou_threshold,
loss_type=loss_type,
cls_loss_weight=classification_weight,
loc_loss_weight=localization_weight,
max_negatives_per_positive=max_negatives_per_positive,
min_negatives_per_image=config.min_negatives_per_image)
return hard_example_miner
which is returned by model_builder.py and called by train.py. So basically, it seems to me that simply generating your true positive labels (with a tool like LabelImg or RectLabel) should be enough for the train algorithm to find hard negatives within the same images. The related question gives an excellent walkthrough.
In the event you want to feed in data that has no true positives (i.e. nothing should be classified in the image), just add the negative image to your tfrecord with no bounding boxes.
I think I was passing through the same or close scenario and it's worth it to share with you.
I managed to solve it by passing images without annotations to the trainer.
On my scenario I'm building a project to detect assembly failures from my client's products, at real time.
I successfully achieved very robust results (for production env) by using detection+classification for components that has explicity a negative pattern (e.g. a screw that has screw on/off(just the hole)) and only detection for things that doesn't has the negative pattens (e.g. a tape that can be placed anywhere).
On the system it's mandatory that the user record 2 videos, one containing the positive scenario and another containing the negative (or the n videos, containing n patterns of positive and negative so the algorithm can generalize).
After a while testing I found out that if I register to detected only tape the detector was giving very confident (0.999) false positive detections of tape. It was learning the pattern where the tape was inserted instead of the tape itself. When I had another component (like a screw on it's negative format) I was passing the negative pattern of tape without being explicitly aware of it, so the FPs didn't happen.
So I found out that, in this scenario, I had to necessarily pass the images without tape so it could differentiate between tape and no-tape.
I considered two alternatives to experiment and try to solve this behavior:
Train passing an considerable amount of images that doesn't has any annotation (10% of all my negative samples) along with all images that I have real annotations.
On the images that I don't have annotation I create a dummy annotation with a dummy label so I could force the detector to train with that image (thus learning the no-tape pattern). Later on, when get the dummy predictions, just ignore them.
Concluded that both alternatives worked perfectly on my scenario.
The training loss got a little messy but the predictions work with robustness for my very controlled scenario (the system's camera has its own box and illumination to decrease variables).
I had to make two little modifications for the first alternative to work:
All images that didn't had any annotation I passed a dummy annotation (class=None, xmin/ymin/xmax/ymax=-1)
When generating the tfrecord files I use this information (xmin == -1, in this case) to add an empty list for the sample:
def create_tf_example(group, path, label_map):
with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
encoded_jpg = fid.read()
encoded_jpg_io = io.BytesIO(encoded_jpg)
image = Image.open(encoded_jpg_io)
width, height = image.size
filename = group.filename.encode('utf8')
image_format = b'jpg'
xmins = []
xmaxs = []
ymins = []
ymaxs = []
classes_text = []
classes = []
for index, row in group.object.iterrows():
if not pd.isnull(row.xmin):
if not row.xmin == -1:
xmins.append(row['xmin'] / width)
xmaxs.append(row['xmax'] / width)
ymins.append(row['ymin'] / height)
ymaxs.append(row['ymax'] / height)
classes_text.append(row['class'].encode('utf8'))
classes.append(label_map[row['class']])
tf_example = tf.train.Example(features=tf.train.Features(feature={
'image/height': dataset_util.int64_feature(height),
'image/width': dataset_util.int64_feature(width),
'image/filename': dataset_util.bytes_feature(filename),
'image/source_id': dataset_util.bytes_feature(filename),
'image/encoded': dataset_util.bytes_feature(encoded_jpg),
'image/format': dataset_util.bytes_feature(image_format),
'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
'image/object/class/label': dataset_util.int64_list_feature(classes),
}))
return tf_example
Part of the traning progress:
Currently I'm using tensorflow object detection along with tensorflow==1.15, using faster_rcnn_resnet101_coco.config.
Hope it will solve someone's problem as I didn't found any solution on the internet. I read a lot of people telling that faster_rcnn is not adapted for negative training for FPs reduction but my tests proved the opposite.

Does the Izhikevich neuron model use weights?

I've been working a bit with neural networks and I'm interested on implementing a spiking neuron model.
I've read a fair amount of tutorials but most of them seem to be about generating pulses and I haven't found any application of it on a given input train.
Say for example I got input train:
Input[0] = [0,0,0,1,0,0,1,1]
It enters the Izhikevich neuron, does the input multiply a weight or only makes use of the parameters a, b, c and d?
Izhikevich equations are:
v[n+1] = 0.04*v[n]^2 + 5*v[n] + 140 - u[n] + I
u[n+1] = a*(b*v[n] - u[n])
where v[n] is input voltage and u[n] is a general recovery variable.
Are there any texts on implementations of Izhikevich or similar spiking neuron models on a practical problem? I'm trying to understand how information is encoded on this models but it looks different from what's done with standard second generation neurons. The only tutorial I've found where it deals with a spiking train and a set of weights is [1] but I haven't seen the same with Izhikevich.
[1] https://msdn.microsoft.com/en-us/magazine/mt422587.aspx
The plain Izhikevich model by itself, does not include weights.
The two equations you mentioned, model the membrane potential (v[]) over time of a point neuron. To use weights, you could connect two or more of such cells with synapses.
Each synapse could include some sort spike detection mechanism on the source cell (pre-synaptic), and a synaptic current mechanism in the target (post-synaptic) cell side. That synaptic current could then be multiplied by a weight term, and then become part of the I term (in the 1st equation above) for the target cell.
As a very simple example of a two cell network, at every time step, you could check if pre- cell v is above (say) 0 mV. If so, inject (say) 0.01 pA * weightPrePost into the post- cell. weightPrePost would range from 0 to 1, and could be modified in response to things like firing rate, or Hebbian-like spike synchrony like in STDP.
With multiple synaptic currents going into a cell, you could devise various schemes how to sum them. The simplest one would be just a simple sum, more complicated ones could include things like distance and dendrite diameters (e.g. simulated neural morphology).
This chapter is a nice introduction to other ways to model synapses: Modelling
Synaptic Transmission

Understanding ibeacon distancing

Trying to grasp a basic concept of how distancing with ibeacon (beacon/ Bluetooth-lowenergy/BLE) can work. Is there any true documentation on how far exactly an ibeacon can measure. Lets say I am 300 feet away...is it possible for an ibeacon to detect this?
Specifically for v4 &. v5 and with iOS but generally any BLE device.
How does Bluetooth frequency & throughput affect this? Can beacon devices enhance or restrict the distance / improve upon underlying BLE?
ie
| Range | Freq | T/sec | Topo |
|–—–––––––––––|–—––––––––––|–—––––––––––|–—––––––––––|
Bluetooth v2.1 | Up to 100 m | < 2.481ghz | < 2.1mbit | scatternet |
|-------------|------------|------------|------------|
Bluetooth v4 | ? | < 2.481ghz | < 305kbit | mesh |
|-------------|------------|------------|------------|
Bluetooth v5 | ? | < 2.481ghz | < 1306kbit | mesh |
The distance estimate provided by iOS is based on the ratio of the beacon signal strength (rssi) over the calibrated transmitter power (txPower). The txPower is the known measured signal strength in rssi at 1 meter away. Each beacon must be calibrated with this txPower value to allow accurate distance estimates.
While the distance estimates are useful, they are not perfect, and require that you control for other variables. Be sure you read up on the complexities and limitations before misusing this.
When we were building the Android iBeacon library, we had to come up with our own independent algorithm because the iOS CoreLocation source code is not available. We measured a bunch of rssi measurements at known distances, then did a best fit curve to match our data points. The algorithm we came up with is shown below as Java code.
Note that the term "accuracy" here is iOS speak for distance in meters. This formula isn't perfect, but it roughly approximates what iOS does.
protected static double calculateAccuracy(int txPower, double rssi) {
if (rssi == 0) {
return -1.0; // if we cannot determine accuracy, return -1.
}
double ratio = rssi*1.0/txPower;
if (ratio < 1.0) {
return Math.pow(ratio,10);
}
else {
double accuracy = (0.89976)*Math.pow(ratio,7.7095) + 0.111;
return accuracy;
}
}
Note: The values 0.89976, 7.7095 and 0.111 are the three constants calculated when solving for a best fit curve to our measured data points. YMMV
I'm very thoroughly investigating the matter of accuracy/rssi/proximity with iBeacons and I really really think that all the resources in the Internet (blogs, posts in StackOverflow) get it wrong.
davidgyoung (accepted answer, > 100 upvotes) says:
Note that the term "accuracy" here is iOS speak for distance in meters.
Actually, most people say this but I have no idea why! Documentation makes it very very clear that CLBeacon.proximity:
Indicates the one sigma horizontal accuracy in meters. Use this property to differentiate between beacons with the same proximity value. Do not use it to identify a precise location for the beacon. Accuracy values may fluctuate due to RF interference.
Let me repeat: one sigma accuracy in meters. All 10 top pages in google on the subject has term "one sigma" only in quotation from docs, but none of them analyses the term, which is core to understand this.
Very important is to explain what is actually one sigma accuracy. Following URLs to start with: http://en.wikipedia.org/wiki/Standard_error, http://en.wikipedia.org/wiki/Uncertainty
In physical world, when you make some measurement, you always get different results (because of noise, distortion, etc) and very often results form Gaussian distribution. There are two main parameters describing Gaussian curve:
mean (which is easy to understand, it's value for which peak of the curve occurs).
standard deviation, which says how wide or narrow the curve is. The narrower curve, the better accuracy, because all results are close to each other. If curve is wide and not steep, then it means that measurements of the same phenomenon differ very much from each other, so measurement has a bad quality.
one sigma is another way to describe how narrow/wide is gaussian curve.
It simply says that if mean of measurement is X, and one sigma is σ, then 68% of all measurements will be between X - σ and X + σ.
Example. We measure distance and get a gaussian distribution as a result. The mean is 10m. If σ is 4m, then it means that 68% of measurements were between 6m and 14m.
When we measure distance with beacons, we get RSSI and 1-meter calibration value, which allow us to measure distance in meters. But every measurement gives different values, which form gaussian curve. And one sigma (and accuracy) is accuracy of the measurement, not distance!
It may be misleading, because when we move beacon further away, one sigma actually increases because signal is worse. But with different beacon power-levels we can get totally different accuracy values without actually changing distance. The higher power, the less error.
There is a blog post which thoroughly analyses the matter: http://blog.shinetech.com/2014/02/17/the-beacon-experiments-low-energy-bluetooth-devices-in-action/
Author has a hypothesis that accuracy is actually distance. He claims that beacons from Kontakt.io are faulty beacuse when he increased power to the max value, accuracy value was very small for 1, 5 and even 15 meters. Before increasing power, accuracy was quite close to the distance values. I personally think that it's correct, because the higher power level, the less impact of interference. And it's strange why Estimote beacons don't behave this way.
I'm not saying I'm 100% right, but apart from being iOS developer I have degree in wireless electronics and I think that we shouldn't ignore "one sigma" term from docs and I would like to start discussion about it.
It may be possible that Apple's algorithm for accuracy just collects recent measurements and analyses the gaussian distribution of them. And that's how it sets accuracy. I wouldn't exclude possibility that they use info form accelerometer to detect whether user is moving (and how fast) in order to reset the previous distribution distance values because they have certainly changed.
The iBeacon output power is measured (calibrated) at a distance of 1 meter. Let's suppose that this is -59 dBm (just an example). The iBeacon will include this number as part of its LE advertisment.
The listening device (iPhone, etc), will measure the RSSI of the device. Let's suppose, for example, that this is, say, -72 dBm.
Since these numbers are in dBm, the ratio of the power is actually the difference in dB. So:
ratio_dB = txCalibratedPower - RSSI
To convert that into a linear ratio, we use the standard formula for dB:
ratio_linear = 10 ^ (ratio_dB / 10)
If we assume conservation of energy, then the signal strength must fall off as 1/r^2. So:
power = power_at_1_meter / r^2. Solving for r, we get:
r = sqrt(ratio_linear)
In Javascript, the code would look like this:
function getRange(txCalibratedPower, rssi) {
var ratio_db = txCalibratedPower - rssi;
var ratio_linear = Math.pow(10, ratio_db / 10);
var r = Math.sqrt(ratio_linear);
return r;
}
Note, that, if you're inside a steel building, then perhaps there will be internal reflections that make the signal decay slower than 1/r^2. If the signal passes through a human body (water) then the signal will be attenuated. It's very likely that the antenna doesn't have equal gain in all directions. Metal objects in the room may create strange interference patterns. Etc, etc... YMMV.
iBeacon uses Bluetooth Low Energy(LE) to keep aware of locations, and the distance/range of Bluetooth LE is 160ft (http://en.wikipedia.org/wiki/Bluetooth_low_energy).
Distances to the source of iBeacon-formatted advertisement packets are estimated from the signal path attenuation calculated by comparing the measured received signal strength to the claimed transmit power which the transmitter is supposed to encode in the advertising data.
A path loss based scheme like this is only approximate and is subject to variation with things like antenna angles, intervening objects, and presumably a noisy RF environment. In comparison, systems really designed for distance measurement (GPS, Radar, etc) rely on precise measurements of propagation time, in same cases even examining the phase of the signal.
As Jiaru points out, 160 ft is probably beyond the intended range, but that doesn't necessarily mean that a packet will never get through, only that one shouldn't expect it to work at that distance.
With multiple phones and beacons at the same location, it's going to be difficult to measure proximity with any high degree of accuracy. Try using the Android "b and l bluetooth le scanner" app, to visualize the signal strengths (distance) variations, for multiple beacons, and you'll quickly discover that complex, adaptive algorithms may be required to provide any form of consistent proximity measurement.
You're going to see lots of solutions simply instructing the user to "please hold your phone here", to reduce customer frustration.

Kohonen Self Organizing Maps: Determining the number of neurons and grid size

I have a large dataset I am trying to do cluster analysis on using SOM. The dataset is HUGE (~ billions of records) and I am not sure what should be the number of neurons and the SOM grid size to start with. Any pointers to some material that talks about estimating the number of neurons and grid size would be greatly appreciated.
Thanks!
Quoting from the som_make function documentation of the som toolbox
It uses a heuristic formula of 'munits = 5*dlen^0.54321'. The
'mapsize' argument influences the final number of map units: a 'big'
map has x4 the default number of map units and a 'small' map has
x0.25 the default number of map units.
dlen is the number of records in your dataset
You can also read about the classic WEBSOM which addresses the issue of large datasets
http://www.cs.indiana.edu/~bmarkine/oral/self-organization-of-a.pdf
http://websom.hut.fi/websom/doc/ps/Lagus04Infosci.pdf
Keep in mind that the map size is also a parameter which is also application specific. Namely it depends on what you want to do with the generated clusters. Large maps produce a large number of small but "compact" clusters (records assigned to each cluster are quite similar). Small maps produce less but more generilized clusters. A "right number of clusters" doesn't exists, especially in real world datasets. It all depends on the detail which you want to examine your dataset.
I have written a function that, with the data set as input, returns the grid size. I rewrote it from the som_topol_struct() function of Matlab's Self Organizing Maps Toolbox into a R function.
topology=function(data)
{
#Determina, para lattice hexagonal, el número de neuronas (munits) y su disposición (msize)
D=data
# munits: número de hexágonos
# dlen: número de sujetos
dlen=dim(data)[1]
dim=dim(data)[2]
munits=ceiling(5*dlen^0.5) # Formula Heurística matlab
#munits=100
#size=c(round(sqrt(munits)),round(munits/(round(sqrt(munits)))))
A=matrix(Inf,nrow=dim,ncol=dim)
for (i in 1:dim)
{
D[,i]=D[,i]-mean(D[is.finite(D[,i]),i])
}
for (i in 1:dim){
for (j in i:dim){
c=D[,i]*D[,j]
c=c[is.finite(c)];
A[i,j]=sum(c)/length(c)
A[j,i]=A[i,j]
}
}
VS=eigen(A)
eigval=sort(VS$values)
if (eigval[length(eigval)]==0 | eigval[length(eigval)-1]*munits<eigval[length(eigval)]){
ratio=1
}else{
ratio=sqrt(eigval[length(eigval)]/eigval[length(eigval)-1])}
size1=min(munits,round(sqrt(munits/ratio*sqrt(0.75))))
size2=round(munits/size1)
return(list(munits=munits,msize=sort(c(size1,size2),decreasing=TRUE)))
}
hope it helps...
Iván Vallés-Pérez
I don't have a reference for it, but I would suggest starting off by using approximately 10 SOM neurons per expected class in your dataset. For example, if you think your dataset consists of 8 separate components, go for a map with 9x9 neurons. This is completely just a ballpark heuristic though.
If you'd like the data to drive the topology of your SOM a bit more directly, try one of the SOM variants that change topology during training:
Growing SOM
Growing Neural Gas
Unfortunately these algorithms involve even more parameter tuning than plain SOM, but they might work for your application.
Kohenon has written on the issue of selecting parameters and map size for SOM in his book "MATLAB Implementations and Applications of the Self-Organizing Map". In some cases, he suggest the initial values can be arrived at after testing several sizes of the SOM to check that the cluster structures were shown with sufficient resolution and statistical accuracy.
my suggestion would be the following
SOM is distantly related to correspondence analysis. In statistics, they use 5*r^2 as a rule of thumb, where r is the number of rows/columns in a square setup
usually, one should use some criterion that is based on the data itself, meaning that you need some criterion for estimating the homogeneity. If a certain threshold would be violated, you would need more nodes. For checking the homogeneity you would need some records per node. Agai, from statistics you could learn that for simple tests (small number of variables) you would need around 20 records, for more advanced tests on some variables at least 8 records.
remember that the SOM represents a predictive model. So validation is the key, absolutely mandatory. Yet, validation of predictive models (see typeI / II error entry in Wiki) is a subject on its own. And the acceptable risk as well as the risk structure also depend fully on your purpose.
You may test the dynamics of the error rate of the model by reducing its size more and more. Then take the smallest one with acceptable error.
It is a strength of the SOM to allow for empty nodes. Yet, there should not be too much of them. Let me say, less than 5%.
Taken all together, from experience, I would recommend the following criterion a minimum of the absolute number of 8..10 records, but those should not be more than 5% of all clusters.
Those 5% rule is of of course a heuristics, which however can be justified by the general usage of the confidence level in statistical tests. You may choose any percentage from 1% to 5%.

Learning of Outcome Space Given Noisy Actions and Non-Monotonic Reinforcment

I'm looking to construct or adapt a model preferably based in RL theory that can solve the following problem. Would greatly appreciate any guidance or pointers.
I have a continuous action space, where actions can be chosen from the range 10-100 (inclusive). Each action is associated with a certain reinforcement value, ranging from 0 to 1 (also inclusive) according to a value function. So far, so good. Here's where I start to get in over my head:
Complication 1:
The value function V maps actions to reinforcement according to the distance between a given action x and a target action A. The less the distance between the two, the greater the reinforcement (that is, reinforcement is inversely proportional to abs(A - x). However, the value function is only nonzero for actions close to A ( abs(A - x) is less than, say, epsilon) and zero elsewhere. So:
**V** is proportional to 1 / abs(**A** - **x**) for abs(**A** - **x**) < epsilon, and
**V** = 0 for abs(**A** - **x**) > epsilon.
Complication 2:
I do not know precisely what actions have been taken at each step. I know roughly what they are, such that I know they belong to the range x +/- sigma, but cannot exactly associate a single action value with the reinforcement I receive.
The precise problem I would like to solve is as follows: I have a series of noisy action estimates and exact reinforcement values (e.g. on trial 1 I might have x of ~15-30 and reinforcement of 0; on trial 2 I might have x of ~25-40 and reinforcement of 0; on trial 3, x of ~80-95 and reinforcment of 0.6.) I would like to construct a model which represents the estimate of the most likely location of the target action A after each step, probably weighting new information according to some learning rate parameter (since certainty will increase with increasing samples).
This journal article which may be relevant: It addresses delayed rewards and robust learning in the presence of noise and inconsistent rewards.
"Rare neural correlations implement robot conditioning with delayed rewards and disturbances"
Specifically, they trace (remember) which synapses (or actions) had been firing before a reward event and reinforce all of them, where the amount of the reinforcement decays with time between the action and the reward.
An individual reward event will reward any synapses which happen to be firing before the reward (or actions performed) including those irrelevant to the reward. However, with a suitable learning rate, this should stabilize over a handful of iterations, with only the desired action being consistently rewarded and reinforced.

Resources