I'm trying to use my energy (a rapidly changing variable) to describe the color of my bar.
function(progress, r1, g1, b1, a1, r2, g2, b2, a2)
local maxMana = UnitPowerMax("player");
local currMana = UnitPower("player");
if (currMana > maxMana) then
currMana = maxMana;
end
if currMana < 120 then
return (currMana*2), (currMana/2), (255/currMana), a1
else
return r2, g2, b2, a2 -- blue
end
end
This gives me the same result as r1, b1, g1, a1
Check those Weak Aura import codes:
!TJvBVjUrq4)lr6Kivqo)(lv6(qifYLtein25stvKHf8AW354LU2obNpWV9o7U2GbSVY176jv1QqwwpE2X7m78mpJXt2Z1ZbbdPEojEokNjRFMUNZtEo(SpWnNIM9zFkz5fKicfemuM)rHpQYh18gkbRcmXuc1ht7wSepNUrHV(kI6VEIlHeLgcIsMHIWSL5KsM9mMMesIHlLvLukxV30Yz98NJ9CC(JmefVEs)SOO1tUFrykiKINdR0nFjmFoLKTSCnoHVIz7phuw6ccD0YuqVeHlk0OQRiXDfjURiXDfzMIOzLRYjjfrtflpmoSywamnzbFoCvknC(CWve20DJa(u(oeLrrGPYMIFghN6KfeeUYZz8fN74o2X98Bbf53XZ59yuu6IT6EdfZ115MEdgaIxIJIUYV0Fad5SG8YOy4o8Vph24pdpWy0t4cLYy7ANLrOCyh54Jzlse3EFVb30)UbcL2UNtlJapJDfcVM4JlImRgfeKGtzrliEKIPXOOpwEmQOblmEge2VHeY8Ml6n0T3TmHHpHyH0Qr0Iytmjgg9HietHXcHj4zKyWnbB8ekm(41EZjZXPp7ufs1VGazgjSBefuS17oY1D01d61hmvafcNWPkkfXtDf(yFMurK05IB71Byzc2vX8ieOygZ40HHx37HBZcpfYbGhCq4CXjtELyjBdfwjvnIG44VWyr2nm9PmacT58HLcXpI3vEkaVItpqmK3m7aHZIqjhAb)qW0Zaj5hClkAg(aHzj4Xclb4wkjc3WEs9a5lFEzdB3abaSblPuVNmgf7pUwhnHvtypHBkhuEiOXn0gmmjlnkKL7m6o3bxnShSPia0LxDrwP2Yx1uXXpmHH8CXRGlEZm4bKLKsEsC96pS(dbzXCxT1PWvte)frGIKRNSQD(7Uy81OLNDjo9go(fU4gscppP1MB1fNKct7tO3bq9wNiG6NCA7nZ(zqRF7HAEeZ459RN8U1tciuaI26K3CMsq71tyFDs7v)KSKu7C2yLftXPz04wIfdYXX(BtM73Xs1qYstws1qtvwvZWlOJM9UvSt5(F9vfAaRwfpxPQWXP92QchL(PnugRk4D7b5Dl9rmkPSIVH0acZVMDFOpRu(5W1nrPuHizl9YrrPagjnlra)qrVGYtsPzyVU)f0mL(VNJ8goNl2w7544DyptoTs39Px2H3PW8iU))nq80rwIhX)ITGilcH1Wi9j4Kkmi3Zrum)fiR9EkcAy4(nZ2WbGJr(OKdlhF0mahWGuqcO9(pE5d9dKV)Qh4b5B5OHcRxsQ2xt32svww2urttsYWIbDuzzjRW(ISPHkssBzh(kkR3qnqwrnpN(0Wxxp5xZq(mpeAxJD)H86S1vNRjN)f(wSVUPTHSHPLLPHSLKUeZlmRT84MUj0STod8AlDzDdfwhOHukH3j4M6k6k2kAMQA6sA2622QmRA1iqD)ufElE91vnSSTnvKuSumKGY4bD012M(mfyx855CoDh0B4VCyEW(fXYOmaF0YffPf)pi)Oa5aT4N4eFSEvrX5Fjyph(8kEylNF73BDjPBdO8nNgxXUl0ab(XZ99hfN849y0NphIijpEn2pe9ORqZKhfVyX4I3OOcYTmnI9KxX3TZaop8ykjvexzDSTFjbFCcCyrzua8gHQOAvEJ9X(lWHZxWsV1nvTvmS1vunLaOqbOHBhUf)(H5RN3LJzUv02kC0lErTPfYpNlNp9sXBVXpJfKGfVv3QnRva)6UbzivSu3Dwuo)gR4JBZyHMhyv44wmF7tTpKUaX8YULwsjZHJWeOtfQmmmNnmLnGydufMm2Wu2asPs3l)q7ZQtNv8URG49IZcIakMw8EQo9TWWH7KWguwr34qLd4k3cEaDA4bCkBg7)DnqNofMaqjW7xWnZCCQWleYQPxriVd7Z1f8sweWHjyByy9K3UEIP0(TjY8Q2SDB7ct0wUOJXIu7Qh)73vg)OVBr15)HB5lLXqupH3Uq7gP)(BY6udVyD0FYwSnJUSj0yGQ2xK(BBrJdz0QLbBBdiMMAgAAkMakuXs2OObK6Ra(v(RK8TwGUAv06Pa(xeF6o0MFZ)AnFn8PnWZLFe8cvkZFmScBrVnXk0e0Sk1qnmh)NKCiavw8vSn6xuO(TgNPyPQCG(5hqMK3mzsEbzs((Kj5FNjtkjfYzKc5msHauDmc70cY(Dh8d4xfC)U7kF1gdlnBddllBvOFnzPQDPvwwK3WAIBGt6Gi5LF8dFHE(0mumv0T11KLS1adjSwt)wGFh68R8nv4x9mmttHDH3F(
!TR16snUXs43L8RDtbSA0fFPQKTklWMLeV2CSeHDpfLThBn2wbHKJUWc8d)SF6EMr3TmM6KCojvTflksT6UNE6zMV(RfzkzQ9ulkCjEQv0ul1ZigNzm16HPwo4)GxUGU8ENWGTNh4feccgr4)tLFvJFvF6ifWkWflccDyHMstMAz65(Yl0qNDZTdc8IDbrpnE1Qiwm4b1Zuvv1jQeLUATHbnoy5JSWi3aFWRenf1u)nDXuRNLMn40okkgDiDv062wxxrrz6Qtn0GjH)YnbHxh46dXX59hz3FsQ9wUVWWiw(yAaONkO4utHp1u4tnf(uJGtm6YyiUIWesumnmwKXC9DL3TcUnAd)E4P4q31RHPIWN2zc43(8wg4VKqk4QKfShz(XwjRw5(0uRzN3ZYEMLDVjGI83m16tmQx8MCDVoKX1166(dhcI3Y88UYjseglahzTj4BJ9H3W)V9Ga)rya9PpWKkLGrT1wp6Zqez5WqJS5r1N6p86b3muOuEmhNMbEKzle(5ahMmZisH9Dwd2B9hj0q2U5ds882n)2nUXmmjfZc9PE)w6ARQoo)JHvRXBtZQ42pM3k5YN5yB7XFEy)bqAW1jnhSBUjfYPWkT7du0WIRgY8QFGpC1bYUOcZecJyld8Hue45hOU(hV2zRQhN(OfHqEgwUPXum)i3voaLksXwNpPF)rtTaBw5UwUQXtHx5Z3vcgLGZ5FHCZq3HTgFzCcmglPE8eoANJBHSMxaLFmnguWpwfV9He4Kw2YgUZIVYxwUq)AID9xhgKSTM8LE0OO9lDg13zgSnCzTxVxHoUqeTeK8CTxfsxYAiq1QjFL4izDNe4v3jBFCBdt4Ki2mXKdazIqGIkAGUKTggjX6Ni9GZ98d31XTkSxofkQXDcPaId6O1YqJ0TfrZODx1wDaGn99Fu5b3WWGYWI6T162HORQ1PnXORIoACRYHEm7P4KqgMIIPWjttyNhKtC4hMTmh2F0fnaJU8vW(BZb)Dyr0y8SbdJS3iIzASDfIwaRTS7654m2p6UBz077bEn6UpZCCP3zl0m6obyZmjktXZt7fX51XKHWmojQrKwCNcxoxBZsqQPqctTiza3NNFsTcK7bH9XHHRUzvW98b366H9(A)jZM0)Y(JMDXvw9GvVlWHro2WEMaHROEi8VP4HfSn0hDf7CocC)xFeBQYWPef(B2lsnhm7zoAMW3ltIIdEqEwKVvZuUNBIeZdU1UKQnGbBr92UHIvUTEIF9z(vUdmZ91L532l)2bj(lH9LWv0ZVBByWAyFw0j7MhsGlRXllWlu8sOkkdVSaVqvF)UFz3CXpEbWuC38K06w)8U53alPIhF3pi2h8dnAWNPpvXgqYHm76GVXcZmH)0RQEXbjvWbm68GhwqJZflFg9GsUu3vch2B1kgKf9xluRGFNhVH5xYCYU5aiuUpkM1YU)dLtoWp56Np7tV9dfNKzkMFxidGq8ZC5jPQFsAGDcgwfmecWmytCdv(5DTwifuXE6NKuI(RMKbspWscw54gTeMoSz8OJJ2PKDma2z)nxhG94G26T6s6QQRamM70TvlS8qB5uk1ioTdV3p(IBhDZFOFjHdunHxarIaTH5UEdwPsXG0XakvPQzqAr6i9wtCt2lpG9ZOOHkWvOvHDpiCmn8(B5ZWref5ZIkIFrKhC(DaWG5a7d4zJSYLTBRyOb9q0gMbTieE5sJgQ(X96KIzQsseLp7DJ9yrZsidbFNd3tHSDPJVMecnUr0jWIA48nuWAFfLs7M4nuqQqzUY1JkwPAllCj4aziAPPjEDzXlQO3Sa4rAmKNS(PJVreyh)VZXOX2AO(pFOcqUWwYzhM9cPrgd7He3cAyT(GbEBaiaCatrhiF1goiWBiTsdKLjhvyJ6xf7OkEstUxiRrryuNb)MTThjlnlAd1j4BFjTz2SIPGNW3l1sCB6Qox0rKo46TkaN5dcDFz38)vc1bp8b9XxE8pIwNlO9xtJ2tjPYZZXItqcqwBEypA8O(fcgXwJrC4rUWsNwVA0iKFYvx(j70ZXzdNI0aemWD1Z5RNz(w8jciQPM(LQM(OBKlWwgw8IlIeAae(n6O2QlaM0rVTUej84q(IGcKGlTUT3KHJp)xZvltI5brf57ImZPW37Il4h76jr9e4EMLjhVN2jkBcu7Ar5ghK7bfhN4C7pqh4Pm0K83gO3v3GO2vrrTLIANoyLhdJsKaZRDwKs2RZWBp0cFUa1VQm9mZL(ppMEVvsrYHjK5utgmvbgyvLUWlHvMhx2W8rGI3zTmsPUvGxK8hHhfC5Q9smeWx9oYPsp((FK8bLZ0mY1Te)VsJ9pvASB05KdfuVt6TtHbfiG(JG6V24ln(JC2PIPDXjj3G5hkK1(VkIvAi0QZGfC3jslpPiFwcNZkOp8ZHzUwSPR)xWDfRih5IWvsSAokMD9()pNgHDqysdV7MRon)PtTqdK2D(RxmJR3Ne41J0uk8TgSc8qkV4VxZpXTBEALYxca4KmQEiM3((kpCp764WG05O()g)Z8KX8(O)uGsk6ikj4w126QeshT0sjCw5xApg4z25xVwXhjRGbI4JSG8)aa(GqxGvLCH9tJNC1)E8i7EdRtjp7JnPBObLSm6cCGnq0zC00EJF1gmFEC0WRxZPrM5Tuv0jT7cKP0iDHkR)jXmpT01HOC7gZEyeS8i6gIFUayFI53suPfCNHSZAw8(PTEqY7hKSDdu5RqVwgEzXB5pgvfQ(Ijbg4ntFFpF4OYdPzZu3)4plhJYYQ(TKkL1(7gr(gOU(Ds7hhP9kmK)lLYE9geAGXEjA9)dN0(rZq)n(biY(JXAqmotx3OJIs7oevJ)0RtND(5qfI3x90ScXPL62tDy0wBoWsEb8gR2wTS8HR(Mxo)yPBuO6mV09JpTyZ93FFRLRB)gQtV)6WfQ(MxtUXVZEnQELBqRABxn3R2EA0Qq7BvAiZS63DVx1V7(t)FOrmrDN3CJy1n77nI99gXE7nIvQfSs)zUE9)xvOPpesHoUk1j2rPVSy0JWbsaWdUD6)5
You will get those results:
The top two bluewish squares are the code you are looking for
Related
Using a tutorial on neural-networks from scratch, I created an ANN with 1 hidden layer for the MNIST dataset. In the tutorial they use np.random.rand() - 0.5 to initialize the weights and the neural network works fine. I did the same, but when instead of using rand()-0.5 function if I use np.random.randn(), the final accuracy drops significantly.
I don't understand why that would happen because in this question, the response was that using randn is better.
Code for the network:
def init_params():
W1 = np.random.rand(10, 784) - 0.5
b1 = np.random.rand(10, 1) - 0.5
W2 = np.random.rand(10, 10) - 0.5
b2 = np.random.rand(10, 1) - 0.5
return W1, b1, W2, b2
def ReLU(Z):
return np.maximum(0, Z)
def dReLU(Z):
return Z > 0
def softmax(Z):
A = np.exp(Z) / sum(np.exp(Z))
return A
def forward_prop(W1, b1, W2, b2, X):
Z1 = W1.dot(X) + b1
A1 = ReLU(Z1)
Z2 = W2.dot(A1) + b2
A2 = softmax(Z2)
return Z1, A1, Z2, A2
def one_hot(Y):
one_hot_Y = np.zeros((Y.size, Y.max() + 1))
one_hot_Y[np.arange(Y.size), Y] = 1
one_hot_Y = one_hot_Y.T
return one_hot_Y
def back_prop(Z1, A1, Z2, A2, W2, X, Y):
m = Y.size
one_hot_Y = one_hot(Y)
dZ2 = A2 - one_hot_Y
dW2 = 1 / m * np.dot(dZ2, A1.T)
db2 = 1 / m * np.sum(dZ2)
dZ1 = np.dot(W2.T, dZ2)
dW1 = 1 / m * np.dot(dZ1, X.T)
db1 = 1 / m *np.sum(dZ1)
return dW1, db1, dW2, db2
def update_params(W1, b1, W2, b2, dW1, db1, dW2, db2, alpha):
W1 = W1 - alpha * dW1
b1 = b1 - alpha * db1
W2 = W2 - alpha * dW2
b2 = b2 - alpha * db2
return W1, b1, W2, b2
def get_predictions(A2):
return np.argmax(A2, 0)
def get_accuracy(predictions, Y):
print(predictions, Y)
return np.sum(predictions == Y) / Y.size
def gradient_descent(X, Y, iterations, alpha):
W1, b1, W2, b2 = init_params()
for i in range(iterations):
Z1, A1, Z2, A2 = forward_prop(W1, b1, W2, b2, X)
dW1, db1, dW2, db2 = back_prop(Z1, A1, Z2, A2, W2, X, Y)
W1, b1, W2, b2 = update_params(W1, b1, W2, b2, dW1, db1, dW2, db2, alpha)
if (i % 10 == 0):
print('Iteration: ', i)
print('Accuracy: ', get_accuracy(get_predictions(A2), Y))
return W1, b1, W2, b2
Number of epochs = 500
Learning rate = 0.15
Accuracy of using np.random.rand() - 0.5 after 500 epochs: 80.6%
Accuracy of using np.random.randn() after 500 epochs: 55.2%
It also seems that the accuracy stops changing after the 100th epoch for the method that uses np.random.randn()
The data I am using is mnist digit classification data
I have two set of values defined:
local A1 = {100, 200, 300, 400}
local A2 = {500, 600, 700, 800}
I want to iterate a loop assigning values for another variable B1 and B2 as pairs from A1 and A2 as follows:
B1 = 100 and B2 = 500 (first iteration)
B1 =200 and B2 = 600 (second iteration)
B1 = 300 and B2 = 700 (third iteration)
B1=400 and B2 = 800 (fourth iteration)
I tried to use ipairs as follows:
for i, f1 in ipairs(A1) do
for j, f2 in ipairs(A2) do
B1 = f1
B2 = f2
end
end
but this gave me
B1 = 100 and B2 = 500 (first iteration)
B1 =100 and B2 = 600 (second iteration)
B1 = 100 and B2 = 700 (third iteration)
B1=100 and B2 = 800 (fourth iteration)
B1 = 200 and B2 = 500 (fifth iteration)
B1 =200 and B2 = 600 (sixth iteration)
B1 =200 and B2 = 700 (seventh iteration)
....
...
...
so on...
can anyone help me to code in the right way?
You can easily do this with a numerical loop:
for i = 1, 4 do
local a, b = A1[i], B1[i]
--- use them
end
How you go about determining the number of iterations you'll need is the tricky part. If the sizes are variant, but each table is the same length as the others you can instead use the length operator (#A1).
Alternatively, you might want a function that returns the largest length of a given set of tables.
local function max_table_len (...)
local tabs = { ... }
local len = 0
for i = 1, #tabs do
local l = #tabs[i]
if l > len then
len = l
end
end
return len
end
And maybe even a helper function to get each value.
local function get_from_tables (index, ...)
local values = { ... }
local len = #values
for i = 1, len do
values[i] = values[i][index]
end
return table.unpack(values, 1, len)
end
Ending up with something like:
for index = 1, max_table_len(A1, B1) do
local a, b = get_from_tables(index, A1, B1)
end
You can build on the ipairs example from Programming in Lua. For instance this version iterates over 2 sequences in parallel:
-- iterator function
local function iter_ipairs2(tablePair, i)
i = i + 1
local v1 = tablePair[1][i]
local v2 = tablePair[2][i]
-- if you use 'and' here the iteration stops after finishing
-- the shortest sequence. If you use 'or' the iteration
-- will stop after it finishes the longest sequence.
if v1 and v2 then
return i, v1, v2
end
end
-- this is the function you'll call from your other code:
local function ipairs2(t1, t2)
return iter_ipairs2, {t1, t2}, 0
end
-- usage:
local A1 = {100, 200, 300, 400, 500}
local A2 = {500, 600, 700, 800}
for i, v1, v2 in ipairs2(A1, A2) do
print(i, v1, v2)
end
The previous answers are more detailed and provide a more general and better answer.
This one is for someone very new to Lua. Not only does it show two loops, it reinforces that there is usually more than one way to get where you want to go.
local A1 = {100, 200, 300, 400}
local A2 = {500, 600, 700, 800}
print("simplest answer:")
-- doesn't use ipairs and assumes A1 and A2 are the same size
for i = 1, #A1 do
B1 = A1[i]
B2 = A2[i]
print(B1, B2, "(iteration #"..i..")")
end
print()
print("answer that uses ipairs:")
-- again, assumes A1 and A2 are the same size
for i, v in ipairs(A1) do
B1 = A1[i] -- i steps through A1 and A2
B2 = A2[i] -- this works because A1 and A2 are same size
print(B1, B2, "(iteration #"..i..")")
end
Gives this output:
simplest answer:
100 500 (iteration #1)
200 600 (iteration #2)
300 700 (iteration #3)
400 800 (iteration #4)
answer that uses ipairs:
100 500 (iteration #1)
200 600 (iteration #2)
300 700 (iteration #3)
400 800 (iteration #4)
How would i go about counting the scores between C1 and C8 and entering the values into A2 and B2?
a1 = blue
b1 = red
a2 = team blue score
b2 = team red score
between c1 to c8 = winning team & score (NOTE: c1 = $a$1&" 1.25" )
c1 = blue 1.25
c2 = blue 2
c3 = red .5
c4 = draw
c5 = blue 1.5
c6 = blue 1.75
c7 = red 2
c8 = draw
So what I should get is:
A2 should = 6.5
B2 should = 2.5
You can get the total score of the blue team with
=sum(arrayformula(if(left(C1:C, 4)="blue", value(regexreplace(C1:C, "[^0-9.]", "")), 0)))
For the red team, use left(C1:C, 3)="red" in the formula.
The conversion from text to number happens in two steps: regexreplace removes all characters except . and 0-9; then value converts text to number.
It would be better to keep the winning team and their score in separate cells (team in column C, their score in column D), which would simplify the handling of this data: you'd only need =sumif(C1:C, "blue", D1:D).
Taking help of helper columns and without Array formula.These formula can adapt if you change team to Green or any other colour.
Formula in D1:(And Fill down)
=VALUE(RIGHT(C1,(LEN(C1)-LEN($A$1))))
Formula in E1:((And Fill down)
=LEFT(C1,(MIN(FIND({0,1,2,3,4,5,6,7,8,9},C1&"0123456789"))-2))
(And Fill down)
Formula in A2:
=SUMIF(E1:E9,"blue",D1:D9)
Formula in B2:
=SUMIF(E1:E9,"red",D1:D9)
(I'm using Lua 5.2 and LPeg 0.12)
Suppose I have a pattern P that produces some indeterminate number of captures, if any, and I want to write create a pattern Q that captures P as well as the position after P--but for that position to be returned before the captures of P. Essentially, if lpeg.match(P * lpeg.Cp(), str, i) results in v1, v2, ..., j, then I want lpeg.match(Q, str, i) to result in j, v1, v2, ....
Is this achievable without having to create a new table every time P is matched?
Mostly I want to do this to simplify some functions that produce iterators. Lua's stateless iterator functions only get one control variable, and it needs to be the first value returned by the iterator function.
In a world that let people name the last arguments of a variadic function, I could write:
function pos_then_captures(pattern)
local function roll(..., pos)
return pos, (...)
end
return (pattern * lpeg.Cp()) / roll
end
Alas. The easy solution is judicious use of lpeg.Ct():
function pos_then_captures(pattern)
-- exchange the order of two values and unpack the first parameter
local function exch(a, b)
return b, unpack(a)
end
return (lpeg.Ct(pattern) * lpeg.Cp()) / exch
end
or to have the caller to lpeg.match do a pack/remove/insert/unpack dance. And as yucky as the latter sounds, I would probably do that one because lpeg.Ct() might have some unintended consequences for pathological but "correct" arguments to pos_then_captures.
Either of these creates a new table every time pattern is successfully matched, which admittedly doesn't matter too much in my application, but is there a way to do this without any pack-unpack magic?
I'm not too familiar with the internals of Lua, but it feels like what I really want to do is pop something from Lua's stack and put it back in somewhere else, which doesn't seem like an operation that would be directly or efficiently supported, but maybe something that LPeg can do in this specific case.
Match-time captures and upvalues get the job done. This function uses Cmt to ensure pos is set before sticking it in front of pattern's captures in pattern / prepend.
Cmt = lpeg.Cmt
Cp = lpeg.Cp
function prepend_final_pos(pattern)
-- Upvalues are dynamic, so this variable belongs to a
-- new environment for each call to prepend_final_pos.
local pos
-- lpeg.Cmt(patt, func) passes the entire text being
-- searched to `function` as the first parameter, then
-- any captures. Ignore the first parameter.
local function setpos(_, x)
pos = x
-- If we return nothing, Cmt will fail every time
return true
end
-- Keep the varargs safe!
local function prepend(...)
return pos, ...
end
-- The `/ 0` in `Cmt(etc etc) / 0` is to get rid of that
-- captured `true` that we picked up from setpos.
return (pattern / prepend) * (Cmt(Cp(), setpos) / 0)
end
Sample session:
> bar = lpeg.C "bar"
> Pbar = prepend_final_pos(bar)
> print(lpeg.match(Pbar, "foobarzok", 4))
7 bar
> foo = lpeg.C "foo" / "zokzokzok"
> Pfoobar = prepend_final_pos(foo * bar)
> print(lpeg.match(Pfoobar, "foobarzok"))
7 zokzokzok bar
As intended, the actual captures have no influence on the position returned by the new pattern; only the length of the text matched by the original pattern.
You can do it with your original solution w/o table captures nor match-time captures like this
function pos_then_captures(pattern)
local function exch(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, ...)
if a1 == nil then return end
if a2 == nil then return a1 end
if a3 == nil then return a2, a1 end
if a4 == nil then return a3, a1, a2 end
if a5 == nil then return a4, a1, a2, a3 end
if a6 == nil then return a5, a1, a2, a3, a4 end
if a7 == nil then return a6, a1, a2, a3, a4, a5 end
if a8 == nil then return a7, a1, a2, a3, a4, a5, a6 end
if a9 == nil then return a8, a1, a2, a3, a4, a5, a6, a7 end
if a10 == nil then return a9, a1, a2, a3, a4, a5, a6, a7, a8 end
local t = { a10, ... }
return t[#t], a1, a2, a3, a4, a5, a6, a7, a8, a9, unpack(t, 1, #t-1)
end
return (pattern * lpeg.Cp()) / exch
end
Following sample usage returns each matched 'a' with the end of match in front of it
local p = lpeg.P{ (pos_then_captures(lpeg.C'a') + 1) * lpeg.V(1) + -1 }
print(p:match('abababcd'))
-- output: 2 a 4 a 6 a
Suppose I have the following 4 vectors of doubles in Xeon Phi registers:
A-> |a8|a7|a6|a5|a4|a3|a2|a1|
B-> |b8|b7|b6|b5|b4|b3|b2|b1|
C-> |c8|c7|c6|c5|c4|c3|c2|c1|
D-> |d8|d7|d6|d5|d4|d3|d2|d1|
I want to permute them into the following:
A_new ->|d2|d1|c2|c1|b2|b1|a2|a1|
B_new ->|d4|d3|c4|c3|b4|b3|a4|a3|
C_new ->|d6|d5|c6|c5|b6|b5|a6|a5|
D_new ->|d8|d7|c8|c7|b8|b7|a8|a7|
The goal is to get :
O = _mm512_add_pd(_mm512_add_pd(A_new,B_new),_mm512_add_pd(C_new,D_new));
How can I achiever the above with the least number of instructions/cycles?
Answered by Evgueni Petrov in the Intel forums:
__m512i a1 = (__m512i)_mm512_mask_blend_pd(0x33, B, _mm512_swizzle_pd(A, _MM_SWIZ_REG_BADC));
__m512i a0 = (__m512i)_mm512_mask_blend_pd(0xcc, A, _mm512_swizzle_pd(B, _MM_SWIZ_REG_BADC));
__m512i a3 = (__m512i)_mm512_mask_blend_pd(0x33, D, _mm512_swizzle_pd(C, _MM_SWIZ_REG_BADC));
__m512i a2 = (__m512i)_mm512_mask_blend_pd(0xcc, C, _mm512_swizzle_pd(D, _MM_SWIZ_REG_BADC));
__m512d C_new = (__m512d)_mm512_mask_alignr_epi32(a2, 0x00ff, a0, a0, 8);
__m512d A_new = (__m512d)_mm512_mask_alignr_epi32(a0, 0xff00, a2, a2, 8);
__m512d D_new = (__m512d)_mm512_mask_alignr_epi32(a3, 0x00ff, a1, a1, 8);
__m512d B_new = (__m512d)_mm512_mask_alignr_epi32(a1, 0xff00, a3, a3, 8);
As of this writing, the _mm512_mask_blend_pd() intrinsic isn't mentioned in the Intel C++ User Guide but should be corrected soon. It is present in the "zmmintrin.h" header file.