Parsing a dimension field with variable formatting in Teradata? - parsing

I have a dimension field that holds data in the below format. I am using teradata to query this field.
10 x 10 x 10
5.0x6x7
10 x 12x 1
6.0 x6.0 x6.0
0 X 0 X 0
I was wondering how should I go about parsing this filed to only obtain the numbers into 3 different columns.

Something like this should work or at least get you close.
REGEXP_SUBSTR(DATA, '(.*?)(x ?|$)', 1, 1, 'i', 1) AS length,
REGEXP_SUBSTR(DATA, '(.*?)(x ?|$)', 1, 2, 'i', 1) AS width,
REGEXP_SUBSTR(DATA, '(.*?)(x ?|$)', 1, 3, 'i', 1) AS height
Return the first captured group of a set of characters that are followed by a case-insensitive 'x' and an optional space or the end of the line. The 4th argument is the instance of this match to return.

Related

How to understand this sentence?

'Let V be the set of intensity values used to define adjacency. In a binary image, V = {1} if we are referring to adjacency of pixels with value 1.'
i read this sentence in Chapter 2.5 'adjacency, connectivity, regions and boundaries' of book 'digital image processing 4th edition'.Gonzalez.
i just don't understand this sentence.is it means that, if a pixel's value is 1, then V = {1}.Is V determined by the value of the pixel?pixel's intensity value can just be one certain integer, and so do V, then why V is called a 'set' rather than an integer?
how to correctly understand this sentence?
1st case:
When V={1}
It simply means 2 neighbouring pixels should have value 1 to be considered as connected.
Example: V={1} and 4-connectivity:
2nd case:
When V={20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30}
It means 2 neighbouring pixels should have any of these values to be considered as connected.
Example: Vā€‰=ā€‰{2,3,5} and 4-connectivity:
In this picture, there is no 4-connectivity between two pixels (they are marked as gray, both have 3 value). However there is 8-connectivity. See:

How to get solution report from lp_select gem (lpsolve)

Thank you for your time.
I couldn't find how to get variables values after the solution.
Make a three row five column equation
#lp = LPSolve::make_lp(3, 5)
Set some column names
LPSolve::set_col_name(#lp, 1, "fred")
LPSolve::set_col_name(#lp, 2, "bob")
Add a constraint and a row name, the API expects a 1 indexed array
constraint_vars = [0, 0, 1]
FFI::MemoryPointer.new(:double, constraint_vars.size) do |p|
p.write_array_of_double(constraint_vars)
LPSolve::add_constraint(#lp, p, LPSelect::EQ, 1.0.to_f)
end
LPSolve::set_row_name(#lp, 1, "onlyBob")
Set the objective function and minimize it
constraint_vars = [0, 1.0, 3.0]
FFI::MemoryPointer.new(:double, constraint_vars.size) do |p|
p.write_array_of_double(constraint_vars)
LPSolve::set_obj_fn(#lp, p)
end
LPSolve::set_minim(#lp)
Solve it and retreive the result
LPSolve::solve(#lp)
#objective = LPSolve::get_objective(#lp)
Output
Model name: '' - run #1
Objective: Minimize(R0)
SUBMITTED
Model size: 4 constraints, 5 variables, 1 non-zeros.
Sets: 0 GUB, 0 SOS.
Using DUAL simplex for phase 1 and PRIMAL simplex for phase 2.
The primal and dual simplex pricing strategy set to 'Devex'.
Optimal solution 3 after 1 iter.
Excellent numeric accuracy ||*|| = 0
MEMO: lp_solve version 5.5.0.15 for 64 bit OS, with 64 bit REAL
variables.
In the total iteration count 1, 0 (0.0%) were bound flips.
There were 0 refactorizations, 0 triggered by time and 0 by density.
... on average 1.0 major pivots per refactorization.
The largest [LUSOL v2.2.1.0] fact(B) had 5 NZ entries, 1.0x largest basis.
The constraint matrix inf-norm is 1, with a dynamic range of 1.
Time to load data was 0.031 seconds, presolve used 0.000 seconds,
... 0.000 seconds in simplex solver, in total 0.031 seconds. => 3.0
retvals = []
FFI::MemoryPointer.new(:double, 2) do |p|
err = LPSolve::get_variables(#lp, p)
retvals = p.get_array_of_double(0,2)
end
retvals[0]
retvals[1]
gives the solution.

if (freq) x$counts else x$density length > 1 and only the first element will be used

for my thesis I have to calculate the number of workers at risk of substitution by machines. I have calculated the probability of substitution (X) and the number of employee at risk (Y) for each occupation category. I have a dataset like this:
X Y
1 0.1300 0
2 0.1000 0
3 0.0841 1513
4 0.0221 287
5 0.1175 3641
....
700 0.9875 4000
I tried to plot a histogram with this command:
hist(dataset1$X,dataset1$Y,xlim=c(0,1),ylim=c(0,30000),breaks=100,main="Distribution",xlab="Probability",ylab="Number of employee")
But I get this error:
In if (freq) x$counts else x$density
length > 1 and only the first element will be used
Can someone tell me what is the problem and write me the right command?
Thank you!
It is worth pointing out that the message displayed is a Warning message, and should not prevent the results being plotted. However, it does indicate there are some issues with the data.
Without the full dataset, it is not 100% obvious what may be the problem. I believe it is caused by the data not being in the correct format, with two potential issues. Firstly, some values have a value of 0, and these won't be plotted on the histogram. Secondly, the observations appear to be inconsistently spaced.
Histograms are best built from one of two datasets:
A dataframe which has been aggregated grouped into consistently sized bins.
A list of values X which in the data
I prefer the second technique. As originally shown here The expandRows() function in the package splitstackshape can be used to repeat the number of rows in the dataframe by the number of observations:
set.seed(123)
dataset1 <- data.frame(X = runif(900, 0, 1), Y = runif(900, 0, 1000))
library(splitstackshape)
dataset2 <- expandRows(dataset1, "Y")
hist(dataset2$X, xlim=c(0,1))
dataset1$bins <- cut(dataset1$X, breaks = seq(0,1,0.01), labels = FALSE)

Dijkstra algorithm under constraint

I have N vertices one being the source. I would like to find the shortest path that connects all the vertices together (so a N-steps path) with the constraint that all the vertices cannot be visited at whichever step.
A network is defined by N the number of vertices, the source, the cost to travel between each pair of vertices and, for each step the list of vertices that can be visited
For example, if N=5 and the vertices are 1(the source),2,3,4 and 5, the list [[2, 3, 4], [2, 3, 4, 5], [2, 3, 4, 5], [3, 4, 5]] means that for step 2 only vertices 2,3 and 4 can be visited and so forth...
I can't figure out how to adapt the Dijkstra algorithm to my problem. I would really like some ideas Or maybe a better solution is to find something else, are there others algorithm that can handle this problem ?
Note : I posted the same question at math.stackexchange, I apologize if it is considered as a duplicate
You don't need any adaptation. Dijkstra algorithm will work fine under these constraints.
Following your example:
Starting from the vertex 1 we can get to 2 (let's suppose distance d = 2), 3 (d = 7) and 4 (d = 11) - current values of distance is [0, 2, 7, 11, N/A]
Next, pick the vertex with the shortest distance (vertex 2) - we can get from it to 2 again (shouldn't be counted), 3 (d = 3), 4 (d = 4) or 5 (d = 9). We see, that we can get to the vertex 3 with distance 2 + 3 = 5 < 7, which is shorter than 7, so update the value. The same is for the vertex 4 (2 + 4 = 6 < 11) - current values are [0, 2, 5, 6, 9]
Mark all the vertices we visited and follow the algorithm until all the vertices are selected.

How to find same edges of two paths?

A path is represented by a vector, containing node id. The edge in the path has direction.
Given two paths, for example : <1,6,11,7,2,5 ...> and <3, 4, 8, 2, 7,3, 1,6...>, here <1,6> is the same edge. Sometimes the edges are successive, some times not. It's better to put a flag between these edges. For example,
(1,2) * (5,7,9) * (6,11,12), are same edge 1->2, 5->7,7->9, 6->11, 11->12, but there is no edges from 2 to 5 or 9 to 6. So put a '*' or other symbol as a flag.
Is there anyone has some ideas? I will be appreciate it.
Assuming each node has only one incoming and one outcoming edge.
Call P1 the first path of length n and P2 the second path of length m. You can turn P2 into a hashmap startEdge -> endEdge (e.g <3,4,5> would become [3->4, 4->5]).
Then for each element of P1, say number i, you compare P1(i+1) to Hashmap(key= P1(i)). If the hashmap doesn't have the key or has it but with a different value, you don't have a common edge, otherwise you do.
(If you have multiple edges for one node, values of hashmap can be Sets of Ints, and you would check whether P1(i+1) is contained within Hashmap(key=P1(i))).
Here's an example solution in Clojure:
(defn same-edges [& paths]
(->> paths
(map (comp set (partial partition 2 1)))
(apply clojure.set/intersection)))
So, for each path (map over all paths), you partition the path into 2-element subpaths (using a step of 1 to get all pairs of adjacent items), then calculate the set of all unique pairs attained from that partition. Then you find the intersection of all those sets.
Example:
(same-edges [1 6 11 7 2 5] [3 4 8 2 7 3 1 6])
;=> #{(1 6)}
In other words, the set of shared edges between the two paths represented by the vectors [1 6 11 7 2 5] and [3 4 8 2 7 3 1 6] contains only one item: the pair (1 6).

Resources