Shuffle 16 bit vectors SSE - sse

I am working on SSE and a newbie here. I am trying to use shuffle instruction to shuffle a 16 bit vector like below:
Input:
1 2 3 4 5 6 7 8
Output:
1 5 2 6 3 7 4 8
How do I achieve the desired target? I am confused about the constant being used here and I don't see any 16 bit shuffle instruction. shuffle instruction is only available for 8 bit and 32 bit.

So long as you can assume SSSE3 then you can use pshufb aka _mm_shuffle_epi8:
#include <stdio.h>
#include <tmmintrin.h> // SSSE3
int main()
{
__m128i v_in = _mm_setr_epi16(1, 2, 3, 4, 5, 6, 7, 8);
__m128i v_perm = _mm_setr_epi8(0, 1, 8, 9, 2, 3, 10, 11, 4, 5, 12, 13, 6, 7, 14, 15);
__m128i v_out = _mm_shuffle_epi8(v_in, v_perm); // pshufb
printf("v_in = %vhd\n", v_in);
printf("v_out = %vhd\n", v_out);
return 0;
}
Compile and run:
$ gcc -Wall -mssse3 green_goblin.c
$ ./a.out
v_in = 1 2 3 4 5 6 7 8
v_out = 1 5 2 6 3 7 4 8
Alternate solution which relies only on SSE2:
#include <stdio.h>
#include <emmintrin.h> // SSE2
int main()
{
__m128i v_in = _mm_setr_epi16(1, 2, 3, 4, 5, 6, 7, 8);
__m128i v_out = _mm_shuffle_epi32(v_in, _MM_SHUFFLE(3, 1, 2, 0)); // pshufd
v_out = _mm_shufflelo_epi16(v_out, _MM_SHUFFLE(3, 1, 2, 0)); // pshuflw
v_out = _mm_shufflehi_epi16(v_out, _MM_SHUFFLE(3, 1, 2, 0)); // pshufhw
printf("v_in = %vhd\n", v_in);
printf("v_out = %vhd\n", v_out);
return 0;
}
Compile and run:
$ gcc -Wall -msse2 green_goblin_sse2.c
$ ./a.out
v_in = 1 2 3 4 5 6 7 8
v_out = 1 5 2 6 3 7 4 8

Related

Yolov5 model not able to train

I'm making a model to detect potholes in an image. I've done everything right or so it seems to me, but I can't train the model for some reason. What might be the problem here?
!python train.py --img 640 --cfg yolov5m.yaml --hyp data/hyps/hyp.scratch-med.yaml --batch 20 --epochs 300 --data data/potholeData.yaml --weights yolov5m.pt --workers 4 --name yolo_pothole_det_m
This is the final line of the code, which outputs the following.
train: weights=yolov5m.pt, cfg=yolov5m.yaml, data=data/potholeData.yaml, hyp=data/hyps/hyp.scratch-med.yaml, epochs=300, batch_size=20, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=4, project=runs/train, name=yolo_pothole_det_m, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
github: up to date with https://github.com/ultralytics/yolov5 ✅
YOLOv5 🚀 v7.0-23-g5dc1ce4 Python-3.9.13 torch-1.13.0 CPU
hyperparameters: lr0=0.01, lrf=0.1, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.3, cls_pw=1.0, obj=0.7, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.9, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.1, copy_paste=0.0
ClearML: run 'pip install clearml' to automatically track, visualize and remotely train YOLOv5 🚀 in ClearML
Comet: run 'pip install comet_ml' to automatically track and visualize YOLOv5 🚀 runs in Comet
TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
Overriding model.yaml nc=80 with nc=1
from n params module arguments
0 -1 1 5280 models.common.Conv [3, 48, 6, 2, 2]
1 -1 1 41664 models.common.Conv [48, 96, 3, 2]
2 -1 2 65280 models.common.C3 [96, 96, 2]
3 -1 1 166272 models.common.Conv [96, 192, 3, 2]
4 -1 4 444672 models.common.C3 [192, 192, 4]
5 -1 1 664320 models.common.Conv [192, 384, 3, 2]
6 -1 6 2512896 models.common.C3 [384, 384, 6]
7 -1 1 2655744 models.common.Conv [384, 768, 3, 2]
8 -1 2 4134912 models.common.C3 [768, 768, 2]
9 -1 1 1476864 models.common.SPPF [768, 768, 5]
10 -1 1 295680 models.common.Conv [768, 384, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 2 1182720 models.common.C3 [768, 384, 2, False]
14 -1 1 74112 models.common.Conv [384, 192, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 2 296448 models.common.C3 [384, 192, 2, False]
18 -1 1 332160 models.common.Conv [192, 192, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 2 1035264 models.common.C3 [384, 384, 2, False]
21 -1 1 1327872 models.common.Conv [384, 384, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 2 4134912 models.common.C3 [768, 768, 2, False]
24 [17, 20, 23] 1 24246 models.yolo.Detect [1, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [192, 384, 768]]
Isn't it supposed to train the model after that? What am I doing wrong for it to stop it right here?
in cmd you can see that it didn't read any images dataset. make sure that your potholedata.yaml file true located. in this file you have to write this code:
train: ../train/images #path to train images
val: ../valid/images #path to valid images
nc: 1 #number of classes
names: ['Weapon'] #name of classes
After this you can run and your train will continue

Z3 vZ - Adding constraint improves optimum

I'm new to using Z3 and am trying to model an ILP, which I have already successfully done using the MILP Solver PuLP. I now implemented the same objective function (which is to be minimized) and the same constraints in Z3 and am experiencing strange behaviour. Mainly, that adding a constraint decreases the minimum.
My question is: How can that be? Is it a bug or can it be explained somehow?
More detail, in case needed:
I'm trying to solve a Teacher Assignment Problem. Courses are scheduled before a year and the teachers get the list of the courses. They then can select which courses they want to teach, with which priority they want to teach it, how many workdays (each course lasts several days) they desire to teach and a max and min number of courses they definitly want to teach. The program gets as input a list of possible teacher-assignments. A teacher-assignment is a tuple consisting of:
teacher-name
event-id
priority of teacher towards event
the distance between teacher and course
The goal of the program to find a combination of assignments that minimize:
the average relative deviation 'desired workdays <-> assigned workdays' of all teachers
the maximum relative deviation 'desired workdays <-> assigned workdays' of any teacher
the overall distance of courses to assigned teachers
the sum of priorities (higher priority means less willingness to teach)
Main Constraints:
number of teachers assigned to course must match needed amount of teachers
the number of assigned courses to a teacher must be within the specified min/max range
the courses to which a teacher is assigned may not overlap in time (a list of overlap-sets are given)
To track the average relative deviation and the maximum deviation of workdays two more 'helper-constraints' are introduced:
for each teacher: overload (delta_plus) - underload (delta_minus) = assigned workdays - desired workdays
for each teacher: delta_plus + delta_minus <= max relative deviation (DELTA)
Here you have this as Python code:
from z3 import *
def compute_optimum(a1, a2, a3, a4, worst_case_distance=0):
"""
Find the optimum solution with weights a1, a2, a3, a4
(average workday deviation, maximum workday deviation, cummulative linear distance, sum of priority 2 assignments)
Higher weight = More optimized (value minimized)
Returns all assignment-tuples which occur in the calculated optimal model.
"""
print("Powered by Z3")
print(f"\n\n\n\n ------- FINDING OPTIMUM TO WEIGHTS: a1={a1}, a2={a2}, a3={a3}, a4={a4} -------\n")
# key: assignment tuple value: z3-Bool
x = {assignment : Bool('e%i_%s' % (assignment[1], assignment[0])) for assignment in possible_assignments}
delta_plus = {teacher : Int('d+_%s' % teacher) for teacher in teachers}
delta_minus = {teacher : Int('d-_%s' % teacher) for teacher in teachers}
DELTA = Real('DELTA')
opt = Optimize()
# constraint1: number of teachers needed per event
num_available_per_event = {event : len(list(filter(lambda assignment: assignment[1] == event, possible_assignments))) for event in events}
for event in events:
num_teachers_to_assign = min(event_size[event], num_available_per_event[event])
opt.add(Sum( [If(x[assignment], 1, 0) for assignment in x.keys() if assignment[1] == event] ) == num_teachers_to_assign)
for teacher in teachers:
# constraint2: max and min number of events for each teacher
max_events = len(events)
min_events = 0
num_assigned_events = Sum( [If(x[assignment], 1, 0) for assignment in x.keys() if assignment[0] == teacher] )
opt.add(num_assigned_events >= min_events, num_assigned_events <= max_events)
# constraint3: teacher can't work in multiple overlapping events
for overlapping_events in event_overlap_sets:
opt.add(Sum( [If(x[assignment], 1, 0) for assignment in x.keys() if assignment[1] in overlapping_events and assignment[0] == teacher] ) <= 1)
# constraint4: delta (absolute over and underload of teacher)
num_teacher_workdays = Sum( [If(x[assignment], event_durations[assignment[1]], 0) for assignment in x.keys() if assignment[0] == teacher])
opt.add(delta_plus[teacher] >= 0, delta_minus[teacher] >= 0)
opt.add(delta_plus[teacher] - delta_minus[teacher] == num_teacher_workdays - desired_workdays[teacher])
# constraint5: DELTA (maximum relative deviation of wished to assigned workdays)
opt.add(DELTA >= ToReal(delta_plus[teacher] + delta_minus[teacher]) / desired_workdays[teacher])
#opt.add(DELTA <= 1) # adding this results in better optimum
average_rel_workday_deviation = Sum( [ToReal(delta_plus[teacher] + delta_minus[teacher]) / desired_workdays[teacher] for teacher in teachers]) / len(teachers)
overall_distance = Sum( [If(x[assignment], assignment[3], 0) for assignment in x.keys()])
num_prio2 = Sum( [If(x[assignment], assignment[2]-1, 0) for assignment in x.keys()])
obj_fun = opt.minimize(
a1 * average_rel_workday_deviation
+ a2 * DELTA
+ a3 * overall_distance
+ a4 * num_prio2
)
#print(opt)
if opt.check() == sat:
m = opt.model()
optimal_assignments = []
for assignment in x.keys():
if m.evaluate(x[assignment]):
optimal_assignments.append(assignment)
for teacher in teachers:
print(f"{teacher}: d+ {m.evaluate(delta_plus[teacher])}, d- {m.evaluate(delta_minus[teacher])}")
#print(m)
print("DELTA:::", m.evaluate(DELTA))
print("min value:", obj_fun.value().as_decimal(2))
return optimal_assignments
else:
print("Not satisfiable")
return []
compute_optimum(1,1,1,1)
Sample input:
teachers = ['fr', 'hö', 'pf', 'bo', 'jö', 'sti', 'bi', 'la', 'he', 'kl', 'sc', 'str', 'ko', 'ba']
events = [5, 6, 7, 8, 9, 10, 11, 12]
event_overlap_sets = [{5, 6}, {8, 9}, {10, 11}, {11, 12}, {12, 13}]
desired_workdays = {'fr': 36, 'hö': 50, 'pf': 30, 'bo': 100, 'jö': 80, 'sti': 56, 'bi': 20, 'la': 140, 'he': 5.0, 'kl': 50, 'sc': 38, 'str': 42, 'ko': 20, 'ba': 20}
event_size = {5: 2, 6: 2, 7: 2, 8: 3, 9: 2, 10: 2, 11: 3, 12: 2}
event_durations = {5: 5.0, 6: 5.0, 7: 5.0, 8: 16, 9: 7.0, 10: 5.0, 11: 16, 12: 5.0}
# assignment: (teacher, event, priority, distance)
possible_assignments = [('he', 5, 1, 11), ('sc', 5, 1, 48), ('str', 5, 1, 199), ('ko', 6, 1, 53), ('jö', 7, 1, 317), ('bo', 9, 1, 56), ('sc', 10, 1, 25), ('ba', 11, 1, 224), ('bo', 11, 1, 312), ('jö', 11, 1, 252), ('kl', 11, 1, 248), ('la', 11, 1, 303), ('pf', 11, 1, 273), ('str', 11, 1, 228), ('kl', 5, 2, 103), ('la', 5, 2, 16), ('pf', 5, 2, 48), ('bi', 6, 2, 179), ('la', 6, 2, 16), ('pf', 6, 2, 48), ('sc', 6, 2, 48), ('str', 6, 2, 199), ('sc', 7, 2, 354), ('sti', 7, 2, 314), ('bo', 8, 2, 298), ('fr', 8, 2, 375), ('hö', 9, 2, 95), ('jö', 9, 2, 119), ('sc', 9, 2, 37), ('sti', 9, 2, 95), ('bi', 10, 2, 211), ('hö', 11, 2, 273), ('bi', 12, 2, 408), ('bo', 12, 2, 318), ('ko', 12, 2, 295), ('la', 12, 2, 305), ('sc', 12, 2, 339), ('str', 12, 2, 218)]
Output (just the delta+ and delta-):
------- FINDING OPTIMUM TO WEIGHTS: a1=1, a2=1, a3=1, a4=1 -------
fr: d+ 17, d- 37
hö: d+ 26, d- 69
pf: d+ 0, d- 25
bo: d+ 41, d- 120
jö: d+ 0, d- 59
sti: d+ 27, d- 71
bi: d+ 0, d- 15
la: d+ 0, d- 119
he: d+ 0, d- 0
kl: d+ 0, d- 50
sc: d+ 0, d- 33
str: d+ 0, d- 32
ko: d+ 0, d- 20
ba: d+ 10, d- 14
DELTA::: 19/10
min value: 3331.95?
What I observe that does not make sense to me:
often, neither delta_plus nor delta_minus for a teacher equals 0, DELTA is bigger than 1
adding constraint 'DELTA <= 1' results in a smaller objective function value, faster computation and observation 1 cannot be observed anymore
Also: the computation takes forever (although this is not the point of this)
I am happy for any sort of help!
Edit:
Like suggested by alias, changing the delta+/- variables to Real and removing the two ToReal() statements yields the desired result. If you look at the generated expressions of my sample input, there are in fact slight differences (also besides the different datatype and missing to_real statements).
For example, when looking at the constraint, which is supposed to constrain that delta_plus - delta_minus of 'fri' is equals to 16 - 36 if he works for event 8, 0 - 36 if he doesn't.
My old code using integers and ToReal-conversions produces this expression:
(assert (= (- d+_fr d-_fr) (- (+ (ite e8_fr 16 0)) 36)))
The code using Reals and no type-conversions produces this:
(assert (let ((a!1 (to_real (- (+ (ite e8_fr 16 0)) 36))))
(= (- d+_fr d-_fr) a!1)))
Also the minimization expressions are slightly different:
My old code using integers and ToReal-conversions produces this expression:
(minimize (let (
(a!1 ...)
(a!2 (...))
(a!3 (...))
)
(+ (* 1.0 (/ a!1 14.0)) (* 1.0 DELTA) a!2 a!3)))
The code using Reals and no type-conversions produces this:
(minimize (let (
(a!1 (/ ... 14.0))
(a!2 (...))
(a!3 (...))
)
(+ (* 1.0 a!1) (* 1.0 DELTA) a!2 a!3)))
Sadly I don't know really know how to read this but it seems quite the same to me.

how to extract matrix from big matrix in torch/lua

In torch7 / lua
There is a matrix:
[ 1, 2, 3, 4;
5, 6, 7, 8;
9, 10, 11, 12;
13, 14, 15, 16 ]
how to extract this:
[ 6, 7;
10, 11 ]
and how to overwrite it by matrix operation
[ 1, 2, 3, 4;
5, 78, 66, 8;
9, 45, 21, 12;
13, 14, 15, 16 ]
Thanks in advance.
matrix
matrix:sub(2,3,2,3)
z = torch.Tensor({{78,66},{45,21}})
matrix:sub(2,3,2,3):copy(z)

What is correct implementation of LDA (Linear Discriminant Analysis)?

I found that the result of LDA in OpenCV is different from other libraries. For example, the input data was
DATA (13 data samples with 4 dimensions)
7 26 6 60
1 29 15 52
11 56 8 20
11 31 8 47
7 52 6 33
11 55 9 22
3 71 17 6
1 31 22 44
2 54 18 22
21 47 4 26
1 40 23 34
11 66 9 12
10 68 8 12
LABEL
0 1 2 0 1 2 0 1 2 0 1 2 0
The OpenCV code is
Mat data = (Mat_<float>(13, 4) <<\
7, 26, 6, 60,\
1, 29, 15, 52,\
11, 56, 8, 20,\
11, 31, 8, 47,\
7, 52, 6, 33,\
11, 55, 9, 22,\
3, 71, 17, 6,\
1, 31, 22, 44,\
2, 54, 18, 22,\
21, 47, 4, 26,\
1, 40, 23, 34,\
11, 66, 9, 12,\
10, 68, 8, 12);
Mat mean;
reduce(data, mean, 0, CV_REDUCE_AVG);
mean.convertTo(mean, CV_64F);
Mat label(data.rows, 1, CV_32SC1);
for (int i=0; i<label.rows; i++)
label.at<int>(i) = i%3;
LDA lda(data, label);
Mat projection = lda.subspaceProject(lda.eigenvectors(), mean, data);
The matlab code is (used Matlab Toolbox for Dimensionality Reduction)
cd drtoolbox\techniques\
load hald
label=[0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0]
[projection, trainedlda] = lda(ingredients, label)
The eigenvalues are
OpenCV (lda.eigenvectors())
0.4457 4.0132
0.4880 3.5703
0.5448 3.3466
0.5162 3.5794
Matlab Toolbox for Dimensionality Reduction (trainedlda.M)
0.5613 0.7159
0.6257 0.6203
0.6898 0.5884
0.6635 0.6262
Then the projections of data are
OpenCV
1.3261 7.1276
0.8892 -4.7569
-1.8092 -6.1947
-0.0720 1.1927
0.0768 3.3105
-0.7200 0.7405
-0.3788 -4.7388
1.5490 -2.8255
-0.3166 -8.8295
-0.8259 9.8953
1.3239 -3.1406
-0.5140 4.2194
-0.5285 4.0001
Matlab Toolbox for Dimensionality Reduction
1.8030 1.3171
1.2128 -0.8311
-2.3390 -1.0790
-0.0686 0.3192
0.1583 0.5392
-0.9479 0.1414
-0.5238 -0.9722
1.9852 -0.4809
-0.4173 -1.6266
-1.1358 1.9009
1.6719 -0.5711
-0.6996 0.7034
-0.6993 0.6397
The eigenvectors and projections are different even though these LDAs have the same data. I believe there are 2 possibilities.
One of the libraries is wrong.
I am doing it wrong.
Thank you!
The difference is because eigenvectors are not normalized.
The normalized (L2 norm) eigenvectors are
OpenCV
0.44569 0.55196
0.48798 0.49105
0.54478 0.46028
0.51618 0.49230
Matlab Toolbox for Dimensionality Reduction
0.44064 0.55977
0.49120 0.48502
0.54152 0.46008
0.52087 0.48963
They look simliar now, although they have quite different eigenvalues.
Even though the PCA in OpenCV returns normalized eigenvectors, LDA does not. My next question is 'Is normalizing eigenvectors in LDA not necessary?'

How to Line Chart this data?

I would like to create line chart for following data. I would also like to have the ability to hover over each data point to look at the x and y value at that data point. I have the following data structure:
x[0] = [23 4 2 2 4 4 5 3 334 2]
y[0] = [6 24 1 2 2 5 1 3 8 0]
x[1] = [5 6 8 6 3 4 6 3 3]
y[1] = [9 7 8 6 3 4 1 9 2]
x[2] = [6 9 9 6 2 5 8 3]
y[2] = [1 0 2 5 6 2 1 5]
... so that I will have 3 lines on the same chart.
I played with "Seer" without much success. Can anyone provide any recommendations / examples / references for plotting similar data using Seer or anything else?
Thanks.
Give the lazy_high_charts gem a try.
#app/views/layouts/appliction.*
= javascript_include_tag 'highcharts.js'
#Gemfile
gem 'lazy_high_charts'
# my_controller#my_action
x_0 = [23, 4, 2, 2, 4, 4, 5, 3, 334, 2]
y_0= [6, 24, 1, 2, 2, 5, 1, 3, 8, 0]
x_1 = [5, 6, 8, 6, 3, 4, 6, 3, 3]
y_1 = [9, 7, 8, 6, 3, 4, 1, 9, 2]
x_2 = [6, 9, 9, 6, 2, 5, 8, 3]
y_2 = [1, 0, 2, 5, 6, 2, 1, 5]
data_0 = x_0.zip(y_0)
data_1 = x_1.zip(y_1)
data_2 = x_2.zip(y_2)
#h = LazyHighCharts::HighChart.new('graph') do |f|
f.series(:name => "xy0", :data => data_0)
f.series(:name => "xy1", :data => data_1)
f.series(:name => "xy3", :data => data_2)
f.chart({:defaultSeriesType=>"line" })
f.yAxis(:title => { :text => "y axis values" } )
f.xAxis(:title => { :text => "x axis values"} )
f.title(:text => "XY Graph")
f.plotOptions({}) # override the default values that lazy_high_charts puts there
f.legend({}) # override the default values
end
#app/views/my_controller/my_action
= high_chart("chart", #h)
Caveat:
HighCharts is only free for non-commercial use. That may or may not be a dealbreaker for you.
I've really liked jQuery flot for this kind of thing:
http://code.google.com/p/flot/
Check out the example here:
http://flot.googlecode.com/svn/trunk/README.txt
In your controller or view, you can use Ruby's zip to zip together arrays of x and y values if you need to:
> a = [1,2,3]
=> [1, 2, 3]
> b = [5,6,7]
=> [5, 6, 7]
> a.zip(b)
=> [[1, 5], [2, 6], [3, 7]]

Resources