Splitting a string where one item is in parentheses - lua

Please find my code below.
str = "1791 (AR6K Async) S 2 0 0 0 -1 2129984 0 0 0 0 0 113 0 0 20 0 1 0 2370 0 0 4294967295 0 0 0 0 0 0 0 2147483647 0 3221520956 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0"
for val in str:gmatch("%S+") do
print(val)
end
Output:
1791
(AR6K
Async)
S
2
0
0
0
-1
....
But I am expecting the output like,
1791
(AR6K Async)
S
2
0
0
0
-1
...
Can anyone please help me how to get the values in bracket as a single value instead getting separate values.

str = "1791 (AR6K Async) S 2 0 0 0 -1 2129984 0 0 0 0 0 113 0 0 20 0 1 0 2370 0 0 4294967295 0 0 0 0 0 0 0 2147483647 0 3221520956 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0"
for val in str:gsub("%S+","\0%0\0")
:gsub("%b()", function(s) return s:gsub("%z","") end)
:gmatch("%z(.-)%z") do
print(val)
end
Explanation:
1. Surround all spaceless substrings with "zero marks"
(add one binary-zero-character at the beginning and one at the end)
2. Remove "zero marks" from inside parentheses
3. Display all surrounded parts

It may not be possible to use a single lua pattern alone to do this.
However it can be easy to roll your own parsing / splitting of the string or just extend your code a bit to concatenate the parts from a part that starts with ( to the part that ends with )
Here is a small example
str = "1791 (AR6K Async) S 2 0 0 0 -1 2129984 0 0 0 0 0 113 0 0 20 0 1 0 2370 0 0 4294967295 0 0 0 0 0 0 0 2147483647 0 3221520956 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0"
local temp
for val in str:gmatch("%S+") do
if temp then
if val:sub(#val, #val) == ")" then
print(temp.." "..val)
temp = nil
else
temp = temp.." "..val
end
elseif val:sub(1,1) == "(" then
temp = val
else
print(val)
end
end
This code behaves exactly like your own, except that when it encounters a substring that starts with an opening bracket, it will save it to temp variable. Then it will concatenate new values to temp until a substring with the closing bracket at the end of it is encountered. At that point the whole string saved to temp will be printed and temp is set to nil and the loop is continued normally.
So there is just a special case coded for when a string with brackets comes by.
This may not be the most efficient implementation, but it works. Also this assumes that the separators are spaces, since when the strings are concatenated to temp variable, they will be concatenated with an ordinary space. This does not handle nested brackets.
This was just a quick demonstration of the idea however so I believe you can fix these shortcomings on your own as you need to if you use it.

Related

How to get connected components label in a binary image?

I've a binary image where removing green dot gets me separate line segments. I've tried using label_components() function from Julia but it labels only verticall joined pixels as one label.
I'm using
using Images
img=load("current_img.jpg")
img[findall(img.==RGB(0.0,0.1,0.0))].=0 # this makes green pixels same as background, i.e. black
labels = label_components(img)
I'm expecteing all lines which are disjoint to be given a unique label
(as was a funciton in connected component labeling in matlab, but i can't find something similar in julia)
Since you updated the question and added more details to make it clear, I decided to post the answer. Note that this answer utilizes some of the functions that I wrote here; so, if you didn't find documentation for any of the following functions, I refer you to the previous answer. I operated on several examples and brought the results in the continue.
Let's begin with an image similar to the one you brought in the question and perform the entire operation from the scratch. for this, I drew the following:
I want to perform a segmentation process on it and labelize each segment and highlight the segments using the achieved labels.
Let's define the functions:
using Images
using ImageBinarization
function check_adjacent(
loc::CartesianIndex{2},
all_locs::Vector{CartesianIndex{2}}
)
conditions = [
loc - CartesianIndex(0,1) ∈ all_locs,
loc + CartesianIndex(0,1) ∈ all_locs,
loc - CartesianIndex(1,0) ∈ all_locs,
loc + CartesianIndex(1,0) ∈ all_locs,
loc - CartesianIndex(1,1) ∈ all_locs,
loc + CartesianIndex(1,1) ∈ all_locs,
loc - CartesianIndex(1,-1) ∈ all_locs,
loc + CartesianIndex(1,-1) ∈ all_locs
]
return sum(conditions)
end;
function find_the_contour_branches(img::BitMatrix)
img_matrix = convert(Array{Float64}, img)
not_black = findall(!=(0.0), img_matrix)
contours_branches = Vector{CartesianIndex{2}}()
for nb∈not_black
t = check_adjacent(nb, not_black)
(t==1 || t==3) && push!(contours_branches, nb)
end
return contours_branches
end;
"""
HighlightSegments(img::BitMatrix, labels::Matrix{Int64})
Highlight the segments of the image with random colors.
# Arguments
- `img::BitMatrix`: The image to be highlighted.
- `labels::Matrix{Int64}`: The labels of each segment.
# Returns
- `img_matrix::Matrix{RGB}`: A matrix of RGB values.
"""
function HighlightSegments(img::BitMatrix, labels::Matrix{Int64})
colors = [
# Create Random Colors for each label
RGB(rand(), rand(), rand()) for label in 1:maximum(labels)
]
img_matrix = convert(Matrix{RGB}, img)
for seg∈1:maximum(labels)
img_matrix[labels .== seg] .= colors[seg]
end
return img_matrix
end;
"""
find_labels(img_path::String)
Assign a label for each segment.
# Arguments
- `img_path::String`: The path of the image.
# Returns
- `thinned::BitMatrix`: BitMatrix of the thinned image.
- `labels::Matrix{Int64}`: A matrix that contains the labels of each segment.
- `highlighted::Matrix{RGB}`: A matrix of RGB values.
"""
function find_labels(img_path::String)
img::Matrix{RGB} = load(img_path)
gimg = Gray.(img)
bin::BitMatrix = binarize(gimg, UnimodalRosin()) .> 0.5
thinned = thinning(bin)
contours = find_the_contour_branches(thinned)
thinned[contours] .= 0
labels = label_components(thinned, trues(3,3))
highlighted = HighlightSegments(thinned, labels)
return thinned, labels, highlighted
end;
The main function in the above is find_labels which returns
The thinned matrix.
The labels of each segment.
The highlighted image (Matrix, actually).
First, I load the image, and binarize the Gray scaled image. Then, I perform the thinning operation on the binarized image. After that, I find the contours and the branches using the find_the_contour_branches function. Then, I turn the color of contours and branches to black in the thinned image; this gives me neat segments. After that, I labelize the segments using the label_components function. Finally, I highlight the segments using the HighlightSegments function for the sake of visualization (this is the bonus :)).
Let's try it on the image I drew above:
result = find_labels("nU3LE.png")
# you can get the labels Matrix using `result[2]`
# and the highlighted image using `result[3]`
# Also, it's possible to save the highlighted image using:
save("nU3LE_highlighted.png", result[3])
The result is as follows:
Also, I performed the same thing on another image:
julia> result = find_labels("circle.png")
julia> result[2]
14×16 Matrix{Int64}:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0
0 1 1 0 0 0 3 3 0 0 0 5 5 5 0 0
0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
As you can see, the labels are pretty clear. Now let's see the results of performing the procedure in some examples in one glance:
Original Image
Labeled Image

Conditionally Assign Value to Dask Dataframe using Apply

I am trying to iterate through a Dask dataframe and compare the values in one of its columns to a column in another Dask dataframe with the same name. If the columns match I would like to update the value is the target Dask dataframe. The code below runs, but the values are not updated to '1' where I expected, or anywhere. I am new to Dask and suspect I am missing some crucial step or am not understanding the framework.
def populateSymptomsDDF(row):
for vac in row['vac_codes']:
if vac in symptoms_ddf.columns:
symptoms_ddf[vac] = symptoms_ddf[vac].where(symptoms_ddf['dog'] == row['dog'], 1)
with ProgressBar():
x = vac_ddf.apply(lambda x: populateSymptomsDDF(x), meta=('int64'), axis=1)
x.compute(scheduler='processes')
symptoms_ddf.compute()
Head of icd_ddf:
dog vac_codes
0 1 [G35, E11.40, R53.1, Z79.899, I87.2]
1 2 [G35, R53.83, G47.00]
2 3 [G35, G95.9, R53.83, F41.9]
3 4 [G35, N53.9, E55.9, Z74.09]
4 5 [G35, M51.26, R53.1, M47.816, R25.2, G82.50, R...
Head of symptoms_ddf (before running code):
dog W19 W10 W05.0 V00.811 R53.83 R53.8 R53.1 R47.9 R47.89 ... G81.12 G81.11 G81.10 G50.0 G31.84 F52.8 F52.31 F52.22 F52.0 F03
0 1 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
1 2 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
2 3 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
3 4 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 5 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
Thank you for any insights you can provide!
Dask dataframes don't have the same in-place behavior as pandas. Generally every operation should be a bulk parallel operation. Otherwise there isn't much reason to use Dask.
Also, iterating through dataframes will generally be quite slow. This is also true with Pandas.
Fortunately, I think that you're maybe just looking for a join or merge operation. I would encourage you to look up the documentation for Pandas merge
https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

R-Package vegan Decorana

enter image description hereI'm new to R and I was trying to run a Detrended correspondence analysis (DCA) which is a multivariate statistical analysis for ordination of species, I have four sites. I keep getting the error message:
> Error rowsums x must be numeric
Species Haasfontein Mini Pit Vlaklaagte Mini Pit Vlaklaagte Block 3 Mini Pit Block 10 Mini Pit
Agrostis lachnantha 1 0 0 0
Aristida congesta subsp. Congesta 0 0 0 0
Brachiaria nigropedata 0 0 0 0
Cynodon dactylon 0 12 2 3
Cyperus esculentus  0 5 0 0
Digitaria eriantha 0 1 6 20
Elionurus muticus 0 0 0 0
Eragrostis acraea De Winter 0 0 1 0
Eragrostis chloromelas 35 0 12 4
Eragrostis curvula 6 0 0 0
Eragrostis lehmanniana 5 0 0 0
Eragrostis rigidior 3 0 1 0
Eragrostis rotifer 3 0 0 0
Eragrostis trichophora 10 1 2 2
Hyparrhenia hirta 0 0 9 1
Melinis repens 0 0 2 0
Panicum coloratum 0 4 0 0
Panicum deustum  3 0 0 0
Paspalum dilatatum 0 0 0 0
Setaria sphacelata var. sphacelata 0 1 0 0
Sporobolus africanus 0 0 2 0
Sporobolus centrifuges 1 0 1 0
Sporobolus fimbriatus 0 0 0 0
Sporobolus ioclados 2 0 5 1
Themeda triandra 0 0 0 0
Trachypogon spicatus 0 0 0 0
Tragus berteronianus 0 0 0 1
Verbena bonariensis 16 0 2 0
Cirsium vulgare 0 0 0 0
Eucalyptus cameldulensis 1 0 0 0
Xanthium strumarium 0 0 0 0
Argemone ochroleuca 0 0 0 0
Solanum sisymbriifolium 0 0 0 0
Campuloclinium macrocephalum  7 0 0 0
Paspalum dilatatum 0 0 0 0
Senecio ilicifolius 0 0 0 0
Pseudognaphalium luteoalbum (L.) 8 0 0 0
 Cyperus esculentus  0 0 0 0
Foeniculum vulgare  0 0 0 0
Conyza canadensis 0 0 0 1
Tagetes minuta 0 0 0 0
Hypochaeris radicata 0 0 0 0
Solanum incanum 0 0 0 0
Asclepias fruticosa 11 0 0 0
Hypochaeris radicata 0 0 0 0
My data is organised as shown above and I'm not sure if my data is organised correctly or there is some other error. Can someone please assist me
You're still fighting to get you data into R. That is your first problem. After you tackle this problem and manage to read in your data, you have the following problems:
You should not have empty (all zero) rows in your data, but they will give an error (empty columns are removed and only give a warning).
DCA treats rows and columns non-symmetrically, and you should have species as columns and sampling units as rows. You should transpose your data (function t()).
You really should not use DCA with only four sampling units. It will be meaningless.
I think the last point is most important.

Why is my convolution result shifted when using FFT

I'm implementing Convolutions using Radix-2 Cooley-Tukey FFT/FFT-inverse, and my output is correct but shifted upon completion.
My solution is to zero-pad both input size and kernel size to 2^m for smallest possible m, tranforming both input and kernel using FFT, then multiply the two element-wise and transform the result back using FFT-inverse.
As an example on the resulting problem:
0 1 2 3 0 0 0 0
4 5 6 7 0 0 0 0
8 9 10 11 0 0 0 0
12 13 14 15 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
with identity kernel
0 0 0 0
0 1 0 0
0 0 0 0
0 0 0 0
becomes
0 0 0 0 0 0 0 0
0 0 1 2 3 0 0 0
0 4 5 6 7 0 0 0
0 8 9 10 11 0 0 0
0 12 13 14 15 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
It seems any sizes of inputs and kernels produces the same shift (1 row and 1 col), but I could be wrong. I've performed the same computations using the online calculator at this link! and get same results, so it's probably me missing some fundamental knowledge. My available litterature has not helped. So my question, why does this happen?
So I ended up finding the answer why this happens myself. The answered is given through the definition of the convolution and the indexing that happens there. So by definition the convolution of s and k is given by
(s*k)(x) = sum(s(k)k(x-k),k=-inf,inf)
The center of the kernel is not "known" by this formula, and thus an abstraction we make. Define c as the center of the convolution. When x-k = c in the sum, s(k) is s(x-c). So the sum containing the interesting product s(x-c)k(c) ends up at index x. In other words, shifted to the right by c.
FFT fast convolution does a circular convolution. If you zero pad so that both the data and kernel are circularly centered around (0,0) in the same size NxN arrays, the result will also stay centered. Otherwise any offsets will add.

Search for a location in a file based on a number of spaces

What is the best way to extract the two fields (I have marked them in parenthesis) in the following line on a Linux system.
<measResults>0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 311133336 325126 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10427349176 3527288 284344 439048 3582 3583 0 0 0 0 0 0 0 0 14422 14422 (311133336) 325126 (10427349176) 3527288 </measResults>
This is a single line in a *.xml file.
I am hoping some variant of grep would work but I am open to suggestions.
Are they always at the same position (i.e. after the same amount of spaces)? If so, I would use awk:
# echo "$string" | awk '{ print $51 }'
(311133336)
This should do:
awk '{print $51,$53}' <<< "$string"
(311133336) (10427349176)

Resources