What can be safely assumed about neighborhood operations near the edge/border? - image-processing

What can be safely assumed, when authors of a research article do not say/mention/hint anything about how they dealt with neighborhood operations close to image border?
My question may seem naive as some options are mentioned on https://en.wikipedia.org/wiki/Neighborhood_operation.
I am replicating a work reported in a journal article, where a 300x300 neighborhood around the current_point is used for computations. The authors did not mention how they dealt with border cases.

There's a couple ways to deal with borders:
1) Crop: Just get rid of the pixels. Typically implemented in software as filling in these outside values as 0s. Example:
00000
123 01230
456 ----> 04560
789 07890
00000
2) Extend: Simply "copy" the nearest edge pixels to the out of bounds areas. Example,
11233
123 11233
456 ----> 44566
789 77899
77899
or, keep going for however far your neighborhood/kernel needs to be.
3) Wrap: Just like Pacman. Example:
97897
123 31231
456 ----> 64564
789 97897
31231
In this case I arbitrarily chose to wrap diagonally (copied opposite corners). Some people like to interpolate the corners. I think this type of edge handling can be particularly useful if you plan on doing a Fourier Transform on your data (or maybe if it's already in frequency space, same idea as any type of spectral periodic wrapping), but I'm not really sure, I've never used it in practice.
4) Reflection: This is a method I've also never used, but have heard of it.
For example:
123 2112332
456 ----> 5445665
789 8778998
I chose not to pad in the top/bottom there, as it would be verbose.
It gets kind of tricking doing off-diagonals with some of these methods as well. You can either extend columns as needed to try to find the diagonals you might need, or interpolate to get the value.

In case of edge points, it is totally dependent on the operation you are performing. You need to see what kind of operation you are doing with an image (Specially at the edges/boundary).
The simplest way is to use zero padding.
00000
123 => 01230
00000
I don't know how you are implementing it (MATLAB/ OpenCV)?
Following link may be helpful for MATLAB implementation.
MATLAB Neighborhood Operations

Related

Comsol: Infinite Element Domain module

I want to simulate a 2D heat transfer process in the subsurface on a region which is infinite on the r-direction. So, as you know, the very basic way to model this is to draw a geometry that is very long in the r direction. I have done this, and the results that I obtain is correct as in this case, the results are matched with the analytical solution. As you know, there is a capability in Comsol called infinite element domain which serves the purpose to the problem mentioned above. In this case, we need to define a limited geometry on which we want to solve the PDE, and also need to draw a small domain acting as the Infinite Element Domain. However, in this case, the results are not correct because they are not matched with the analytical solution. Is there anything that I am missing to correctly use Infinite Element Domain in comsol?
Any help or comment would be appreciated.
Edit:
I edited the post to be more specific.
Please consider the following figure where a fluid with high temperature is being injected into a region with lower temperature:
https://i.stack.imgur.com/BQycC.png
The equation to solve is:
https://i.stack.imgur.com/qrZcK.png
With the following initial and boundary conditions (note that the upper and lower boundary condition is no-flux):
https://i.stack.imgur.com/l7pHo.png
We want to obtain the temperature profile over the length of rw<r<140 m (rw is very small and is equal to 0.005 m here) at different times. One way to model this numerically in Comsol is to draw a rectangle that is 2000 m in the r-direction, and get results only in the span of r [rw,140] m:
https://i.stack.imgur.com/BKCOi.png
The results of this case is fine, because they are well-matched with the analytical solution.
Another way to model this is to replace the above geometry with a bounded one that is [rw, 140] m in the r-direction and then augment it with an Infinite Element domain that is meshed mapped, as follows:
https://i.stack.imgur.com/m9ksm.png
Here, I have set the thickness of Infinite Element to 10 m in the r-direction. However, the results in this case are not matched with the analytical solution (or the above case where Infinite Element domain was not used). Is there anything that I am missing in Comsol? I have also changed some variables with regard to Infinite Element in Comsol such as physical width or distance, but I didn't see any changes in the results.
BTW, here are the results:
https://i.stack.imgur.com/cdaPH.png

How to scale % change based features so that they are viewed "similarly" by the model

I have some features that are zero-centered values and supposed to represent change between a current value and previous value. Generally speaking i believe there should be some symmetry between these values. Ie. there should be roughly the same amount of positive values as negative values and roughly these values should operate on the same scale.
When i try to scale my samples using MaxAbsScaler, i notice that my negative values for this feature get almost completely drowned out by the positive values. And i don't really have any reason to believe my positive values should be that much larger than my negative values.
So what i've noticed is that fundamentally, the magnitude of percentage change values are not symmetrical in scale. For example if i have a value that goes from 50 to 200, that would result in a 300.0% change. If i have a value that goes from 200 to 50 that would result in a -75.0% change. I get there is a reason for this, but in terms of my feature, i don't see a reason why a change of 50 to 100 should be 3x+ more "important" than the same change in value but the opposite direction.
Given this information, i do not believe there would be any reason to want my model to treat a change of 200-50 as a "lesser" change than a change of 50-200. Since i am trying to represent the change of a value over time, i want to abstract this pattern so that my model can "visualize" the change of a value over time that same way a person would.
Right now i am solving this by using this formula
if curr > prev:
return curr / prev - 1
else:
return (prev / curr - 1) * -1
And this does seem to treat changes in value, similarly regardless of the direction. Ie from the example of above 50>200 = 300, 200>50 = -300. Is there a reason why i shouldn't be doing this? Does this accomplish my goal? Has anyone ran into similar dilemmas?
This is a discussion question and it's difficult to know the right answer to it without knowing the physical relevance of your feature. You are calculating a percentage change, and a percent change is dependent on the original value. I am not a big fan of a custom formula only to make percent change symmetric since it adds a layer of complexity when it is unnecessary in my opinion.
If you want change to be symmetric, you can try direct difference or factor change. There's nothing to suggest that difference or factor change are less correct than percent change. So, depending on the physical relevance of your feature, each of the following symmetric measures would be correct ways to measure change -
Difference change -> 50 to 200 yields 150, 200 to 50 yields -150
Factor change with logarithm -> 50 to 200 yields log(4), 200 to 50 yields log(1/4) = -log(4)
You're having trouble because you haven't brought the abstract questions into your paradigm.
"... my model can "visualize" ... same way a person would."
In this paradigm, you need a metric for "same way". There is no such empirical standard. You've dropped both of the simple standards -- relative error and absolute error -- and you posit some inherently "normal" standard that doesn't exist.
Yes, we run into these dilemmas: choosing a success metric. You've chosen a classic example from "How To Lie With Statistics"; depending on the choice of starting and finishing proportions and the error metric, you can "prove" all sorts of things.
This brings us to your central question:
Does this accomplish my goal?
We don't know. First of all, you haven't given us your actual goal. Rather, you've given us an indefinite description and a single example of two data points. Second, you're asking the wrong entity. Make your changes, run the model on your data set, and examine the properties of the resulting predictions. Do those properties satisfy your desired end result?
For instance, given your posted data points, (200, 50) and (50, 200), how would other examples fit in, such as (1, 4), (1000, 10), etc.? If you're simply training on the proportion of change over the full range of values involved in that transaction, your proposal is just what you need: use the higher value as the basis. Since you didn't post any representative data, we have no idea what sort of distribution you have.

objective-c looking for algorithm

In my application I need to determine what the plates a user can load on their barbell to achieve the desired weight.
For example, the user might specify they are using a 45LB bar and have 45,35,25,10,5,2.5 pound plates to use. For a weight like 115, this is an easy problem to solve as the result neatly matches a common plate. 115 - 45 / 2 = 35.
So the objective here is to find the largest to smallest plate(s) (from a selection) the user needs to achieve the weight.
My starter method looks like this...
-(void)imperialNonOlympic:(float)barbellWeight workingWeight:(float)workingWeight {
float realWeight = (workingWeight - barbellWeight);
float perSide = realWeight / 2;
.... // lots of inefficient mod and division ....
}
My thought process is to determine first what the weight per side would be. Total weight - weight of the barbell / 2. Then determine what the largest to smallest plate needed would be (and the number of each, e.g. 325 would be 45 * 3 + 5 or 45,45,45,5.
Messing around with fmodf and a couple of other ideas it occurred to me that there might be an algorithm that solves this problem. I was looking into BFS, and admit that it is above my head but still willing to give it a shot.
Appreciate any tips on where to look in algorithms or code examples.
Your problem is called Knapsack problem. You will find a lot solution for this problem. There are some variant of this problem. It is basically a Dynamic Programming (DP) problem.
One of the common approach is that, you start taking the largest weight (But less than your desired weight) and then take the largest of the remaining weight. It easy. I am adding some more links ( Link 1, Link 2, Link 3 ) so that it becomes clear. But some problems may be hard to understand, skip them and try to focus on basic knapsack problem. Good luck.. :)
Let me know if that helps.. :)

Genetic Algorithms - Crossover and Mutation operators for paths

I was wondering if anyone knew any intuitive crossover and mutation operators for paths within a graph? Thanks!
Question is a bit old, but the problem doesn't seem to be outdated or solved, so I think my research still might be helpful for someone.
As far as mutation and crossover is quite trivial in the TSP problem, where every mutation is valid (that is because chromosome represents an order of visiting fixed nodes - swapping order then always can create a valid result), in case of Shortest Path or Optimal Path, where the chromosome is a exact route representation, this doesn't apply and isn't that obvious. So here is how I approach problem of solving Optimal Path using GA.
For crossover, there are few options:
For routes that have at least one common point (besides start and end node) - find all common points and swap subroutes in the place of crossing
Parent 1: 51 33 41 7 12 91 60
Parent 2: 51 9 33 25 12 43 15 60
Potential crossing point are 33 and 12. We can get following children: 51 9 33 41 7 12 43 15 60 and 51 33 25 12 91 60 that are the result of crossing using both of these crossing points.
When two routes don't have common point, select randomly two points from each parent and connect them (you can use for that either random traversal, backtracking or heuristic search like A* or beam search). Now this path may be treated as crossover path. For better understanding, see below picture of two crossover methods:
see http://i.imgur.com/0gDTNAq.png
Black and gray paths are parents, pink and orange paths are
children, green point is a crossover place, and red points are start
and end nodes. First graph shows first type of crossover, second graph is example of another one.
For mutation, there are also few options. Generally, dummy mutation like swapping order of nodes or adding random node is really ineffective for graphs with average density. So here are the approaches that guarantee valid mutations:
Take randomly two points from path and replace them with a random path between those two nodes.
Chromosome: 51 33 41 7 12 91 60 , random points: 33 and 12, random/shortest path between then: 33 29 71 12, mutated chromosome: 51 33 29 71 12 91 60
Find random point from path, remove it and connect its neighbours (really very similar to the first one)
Find random point from path and find random path to its neighbour
Try subtraversing the path from some randomly chosen point, until reaching any point on the initial route (slight modification of the first method).
see http://i.imgur.com/19mWPes.png
Each graph corresponds to each mutation method in appropriate order. In last example, the orange path is the one that would replace original path between mutation points (green nodes).
Note: this methods obviously may have performance drawback in the case, when finding alternative subroute (using a random or heuristic method) will stuck at some place or find very long and useless subpath, so consider bounding the time of mutation execution or trials number.
For my case, which is finding an optimal path in terms of maximizing sum of vertices weights while keeping sum of nodes weight less than given bound, those methods are quite effective and give a good result. Should you have any question, feel free to ask. Also, sorry for my MS Paint skills ;)
Update
One big hint: I basically used this approach in my implementation, but there was one big drawback of using random path generating. I decided to switch to semi-random route generation using shortest path traversing randomly picked point(s) - it is much more efficent (but obviously may not be applicable for all problems).
Emm.. That is very difficult question, people write dissertations for that and still there is no right answer to that.
The general rule is "it all depends on your domain".
There are some generic GA libraries that will do some work for you, but for the best results it is recommended to implement your GA operations yourself, specifically for your domain.
You might have more luck with answers on Theoretical CS, but you need to expand your question more and add more details about your task and domain.
Update:
So you have a graph. In GA terms, a path through the graph represents an individual, nodes in the path would be chromosomes.
In that case I would say a mutation can be represented as deviation of the path somewhere from the original - one of the nodes is moved somewhere, and the path is adjusted so the start and end values in the path are remaining the same.
Mutation can lead to invalid individuals. And in that case you need to make a decision: allow invalid ones and hope that they will converge to some unexplored solution. Or kill them on the spot. When I was working with GA, I did allow invalid solution, adding "Unfitness" value along with fitness. Some researchers suggest this can help with broad exploring of the solution space.
Crossover can only happen to the paths that are crossing each other: on the point of the crossing, swap the remains of the path with the parents.
Bear in mind that there are various ways for crossover: individuals can be crossed-over in multiple points or just in one. In the case with graphs you can have multiple crossing points, and that can naturally lead to the multiple children graphs.
As I said before, there is no right or wrong way of doing this, but you will find out the best way only by experimenting on it.

How to fill in the 'holes' in an irregular spaced grid or array having missing data?

Does anyone have a straight forward Delphi example of filling in a grid using Delaunay
Triangles or kriging? Either method can fill a grid by 'interpolating.'
What do I want to do? I have a grid, similar to:
22 23 xx 17 19 18 05
21 xx xx xx 17 18 07
22 24 xx xx 18 21 20
30 22 25 xx 22 20 19
28 xx 23 24 22 20 18
22 23 xx 17 23 15 08
21 29 30 22 22 17 09
where the xx's represent grids cells with no data and the x,y coordinates of each cell is
known. Both kriging and Delaunay Triangles can supply the 'missing' points (which of course, are fictitious, but reasonable values).
Kriging is a statistical method to fill in 'missing' or unavailable data in
a grid with 'reasonable' values. Why would you need it? Principly to 'contour' the
data. Contouring algorithms (like CONREC for Delphi http://local.wasp.uwa.edu.au/~pbourke/papers/conrec/index.html) can contour regularly spaced data. Google around for 'kriging' and 'Delphi' and you eventually are pointed to the GEOBLOCK project on Source Forge (http://geoblock.sourceforge.net/ ). Geoblock has numerous Delphi pas units for kriging based on GSLIB (a Fortran statistical package developed at Stanford). However all the kriging/delauney units are dependent on units refered to in the Delphi uses clause. Unfortunately, these 'helper' units are not posted with the rest of the source code. It appears none of the kriging units can stand alone or work without helper units that are not posted or in some cases, undefined data types.
Delaunay triangulation is described at
http://local.wasp.uwa.edu.au/~pbourke/papers/triangulate/index.html. Posted is
a Delphi example, pretty neat, that generates 'triangles.' Unfortunately, I
haven't a clue how to use the unit with a static grid. The example 'generates' a data field on the fly.
Has anyone got either of these units to work to fill an irregular data grid? Any code or hints how to use the existing code for kriging a simple grid or using Delaunay to fill in the holes would be appreciated.
I'm writing this as an answer because it's too long to fit into a comment.
Assuming your grid really is irregular (you give no examples of a typical pattern of grid coordinates), then triangulation only partially helps. Once you have triangulated you would then use that triangulation to do an interpolation, and there are different choices that could be made.
But you've not said anything about how you want to interpolate, what you want to do with that interpolation.
It seems to me that you have asked for some code, but it's not clear that you know what algorithm you want. That's really the question you should have asked.
For example since you appear to have no criteria for how you should do the interpolation, why don't you choose the nearest neighbour for your missing values. Or why don't you use the overall mean for the missing values. Both of these choices meet all the criteria you have specified since you haven't specified any!
Really I think you need to spend some more time explaining what properties you want this interpolation to have, what you are going to do with it etc. I also think you should stop thinking about code for now and think about algorithms. Since you have mentioned statistics you should consider asking at https://stats.stackexchange.com/.
Code posted by Richard Winston on the Embarcadero Developer Newwork Code Central titled
Delaunay triangulation and contouring code
( ID: 29365 ) demonstrates routines for generating constrained Delaunay triangulations and for plotting contour lines based on data points at arbitrary locations. Richard's code . These algorithms do not manipulate and fill-in the holes in a grid. They do provide a method for contouring arbitrary data and do not require a grid without missing values.
I still have not found an acceptable krieging algorithm in Pascal to actually fill-in the holesin a grid.

Resources