I am learning little by little SIMD programming, and I've devised a (seemingly) simple problem that I hope I can speed-up using SIMD (AVX, at the moment I have access only to AVX CPUs).
I have a long string constituted by an alphabet of 2^k characters (for instance 0, 1, 2, 3), and I'd like to:
generate all substrings of a given length substringlength
convert all the substrings in bits
The substrings are just sequences of characters from the input string:
012301230123012301230123012301233012301301230123123213012301230
substringlength = 6;
string bits
------+--+-----------------
012301 -> 01 00 11 10 01 00
123012 -> 10 01 00 11 10 01
230123 -> 11 10 01 00 11 10
301230 -> 00 11 10 01 00 11
...
My question is due to my inexperience with SIMD (I've only read "Modern x86 Assembly Language Programming", by Kusswurm):
Is this a task where SIMD could help?
Edit: for simplicity, let's just assume k = 2, and so the ASCII numbers will be just '0'..'3'.
Iteration 1
Reading the comments and playing around I've come to these realizations. I can convert the the ASCII into values, and as suggested, multiply-add adjacent bytes:
// SIMD 128-bit registers, apparently I cannot use AVX ones directly (some operations are AVX2 or AVX-512)
__m128i sse, val, adj, res;
auto mask = _mm_set_epi8(1, 1<<4, 1, 1<<4, 1, 1<<4, 1, 1<<4, 1, 1<<4, 1, 1<<4, 1, 1<<4, 1, 1<<4);
auto zero = _mm_set_epi8('0', '0', '0', '0', '0', '0', '0', '0',
'0', '0', '0', '0', '0', '0', '0', '0');
// Load ascii values
sse = _mm_loadu_si128((__m128i*) s.data());
// Convert to integer values
val = _mm_sub_epi8(sse[0], zero);
// Multiply with mask byte by byte (aka SHL second bytes of val) and sum
adj = _mm_maddubs_epi16(val, mask);
An idea of what it does, to people learning like me, is given here (I will need more 128-bits to encode one substring, ascii is in hex):
bytes 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
ascii 30 31 30 31 30 31 30 31 30 31 30 31 30 31 30 31
_mm_sub_epi8:
value 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
_mm_maddubs_epi16:
value 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
* * * * * * * * * * * * * * * *
mask 1 4 1 4 1 4 1 4 1 4 1 4 1 4 1 4
+ + + + + + + +
| | | | | | | | |
(16-bits)
bits ....0100 ....0100 ....0100 ....0100 ....0100 ....0100 ....0100 ....0100
In other words the first 4 bits are correct, encoding 2 ascii chars, if I understand correctly what _mm_maddubs_epi16 did to my values, which I am not sure at all!
Now I'd need some sort of "shift-or" of adjacent bytes, something like _mm_maddubs_epi16 that shifts left the first, and ORs with the second argument, producing an 8-bit or 16-bit value:
(16-bits)
bits ....0100 ....0100 ....0100 ....0100 ....0100 ....0100 ....0100 ....0100
| shl 4 | | shl 4 | | shl 4 | | shl 4 |
0100.... ....0100 0100.... ....0100 0100.... ....0100 0100.... ....0100
OR OR OR OR
....01000100 ....01000100 ....01000100 ....01000100
However, I cannot see how _mm_bslli_si128 could help me here, or if there is a smarter way to do this. Maybe even this "horizontal" approach is foolish, and I have to rethink it.
Any hint is welcome!
Related
I have a table created with the following script:
n=15
ts=now()+1..n * 1000 * 100
status=rand(0 1 ,n)
val=rand(100,n)
t=table(ts,status,val)
select * from t order by ts
where
ts is the time, status indicates the device status (0: down; 1: running), and val indicates the running time.
Suppose I have the following data:
ts status val
2023.01.03T18:17:17.386 1 58
2023.01.03T18:18:57.386 0 93
2023.01.03T18:20:37.386 0 24
2023.01.03T18:22:17.386 1 87
2023.01.03T18:23:57.386 0 85
2023.01.03T18:25:37.386 1 9
2023.01.03T18:27:17.386 1 46
2023.01.03T18:28:57.386 1 3
2023.01.03T18:30:37.386 0 65
2023.01.03T18:32:17.386 1 66
2023.01.03T18:33:57.386 0 56
2023.01.03T18:35:37.386 0 42
2023.01.03T18:37:17.386 1 82
2023.01.03T18:38:57.386 1 95
2023.01.03T18:40:37.386 0 19
So how do I calculate the longest continuous running time? For example, both the 7th and 8th records have the status 1, I want to sum their val values. Or the 14th-15th records, I want to sum up their val values.
You can use the built-in function segment to group the consecutive identical values. The full script is as follows:
select first(ts), sum(iif(status==1, val, 0)) as total_val
from t
group by segment(status)
having sum(iif(status==1, val, 0)) > 0
The result:
segment_status first_ts total_val
0 2023.01.03T18:17:17.386 58
3 2023.01.03T18:22:17.386 87
5 2023.01.03T18:25:37.386 58
9 2023.01.03T18:32:17.386 66
12 2023.01.03T18:37:17.386 177
I am reading elixir doc of binary operator: https://elixir-lang.org/getting-started/binaries-strings-and-char-lists.html#binaries-and-bitstrings
In doc:
iex> <<255>>
<<255>>
iex> <<256>> # truncated
<<0>>
iex> <<256 :: size(16)>> # use 16 bits (2 bytes) to store the number
<<1, 0>>
the default is 8 bits of elixir binary, if over 8 bits, the result will truncate to 0.
But why <<256 :: size(16)>> will present <<1, 0>>? I think it should be <<1, 255>>
<<1, 0>> is correct. 256 in binary is 0b100000000.
iex(1)> 0b100000000
256
When you extend it to 16 bits you get 0b0000000100000000.
iex(2)> 0b0000000100000000
256
When you split it into two bytes in big-endian byte order, you get 0b00000001 and 0b00000000, which is 1 and 0.
iex(3)> <<256::size(16)>>
<<1, 0>>
In little-endian byte order, you'll get 0 and 1 as the order of the bytes is reversed:
iex(4)> <<256::little-size(16)>>
<<0, 1>>
To get the original number back from big-endian bytes, you can think of it is multiplying the last number by 1, the second last by 256, the third last by 256 * 256 and so on, and then summing all of them.
iex(5)> <<256::size(16)>>
<<1, 0>>
iex(6)> 1 * 256 + 0 * 1
256
iex(7)> <<123456::size(24)>>
<<1, 226, 64>>
iex(8)> 1 * 256 * 256 + 226 * 256 + 64 * 1
123456
given a 64bit int I need to split it into 4 x 2bytes int.
for example decimal 66309 is 0000 0000 0000 0001 0000 0011 0000 0101
I need to convert this into an array of 4 ints {0, 1, 3, 5}. How can I do it in lua?
First, the conversion of 66309 into four 16 bit integers wouldn't be {0, 1, 3, 5}, but {0, 0, 1, 773}. In your example, you are splitting it into 8 bit integers. The below does 16 bit integers.
local int = 66309
local t = {}
for i = 0, 3 do
t[i+1] = (int >> (i * 16)) & 0xFFFF
end
If you want it to be 8 bit integers change the 3 in the loop to 7, the 16 in the shift expression to an 8, and the hex mask 0xFFFF to 0xFF.
And finally, this only works for Lua 5.3. You cannot accurately represent a 64 bit integer in Lua before this version without external libraries.
Probably answer is 256 but I am not satisfied with it.
Suppose a variable has 8 bits , its mean its 8th bit can hold the value 256 . But it also has other seven bits . Wouldn't the total value be the sum of all bits?
To me final value that 8 bit variable holds would be the sum of all bits. But it doesn't. Why?
The max value 8 bits can hold is: 11111111 which is equal to 255. If you have a signed value, the max value it can hold is 127, the left-most bit is used for sign.
The binary 10000000 equals 128 (2 ^ 7), not 256. That's where your confusion lays I think.
00000001 = 2 ^ 0 = 1
00000010 = 2 ^ 1 = 2
00000100 = 2 ^ 2 = 4
00001000 = 2 ^ 3 = 8
00010000 = 2 ^ 4 = 16
00100000 = 2 ^ 5 = 32
01000000 = 2 ^ 6 = 64
10000000 = 2 ^ 7 = 128
The value is indeed the sum of all bits set to 1, but the place value of the eighth bit is 27 (128), not 256 as you suggest - the least significant bit is 20 (i.e. 1), so for eight bits the MSB is 27. You appear to have started from 21 (2) .
For an unsigned integer:
Bit 0 = 20 = 1
Bit 1 = 21 = 2
Bit 2 = 22 = 4
Bit 3 = 23 = 8
Bit 4 = 24 = 16
Bit 5 = 25 = 32
Bit 6 = 26 = 64
Bit 7 = 27 = 128
Sum of all ones = 255 - not 256 as you suggest: 0 to 255 = 28 (256) values.
For a two's complement signed 8 bit type:
Bit 7 = -27 = -128
Sum of all ones = -1,
while if Bit 8 = 0, sum = +127,
and all zeros except bit 8 = -128.
(-128 to +127 = 28 (256) values).
Either way an 8 bit integer signed or otherwise has 28 (256) possible bit patterns.
This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 9 years ago.
I am not a specialist in C/C++.
I found this declaration today:
typedef NS_OPTIONS(NSUInteger, PKRevealControllerType)
{
PKRevealControllerTypeNone = 0,
PKRevealControllerTypeLeft = 1 << 0,
PKRevealControllerTypeRight = 1 << 1,
PKRevealControllerTypeBoth = (PKRevealControllerTypeLeft | PKRevealControllerTypeRight)
};
Can you guys translate what values every value will have?
opertor << is bitwise left shift operator. Shift all the bits to left a specified number of times: (arithmetic left shift and reserves sign bit)
m << n
Shift all the bits of m to left a n number of times. (notice one shift == multiply by two).
1 << 0 means no shift so its equals to 1 only.
1 << 1 means one shift so its equals to 1*2 = 2 only.
I explain with one byte: one in one byte is like:
MSB
+----+----+----+---+---+---+---+---+
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1
+----+----+----+---+---+---+---+---+
7 6 5 4 3 2 1 / 0
| / 1 << 1
| |
▼ ▼
+----+----+----+---+---+---+---+---+
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 2
+----+----+----+---+---+---+---+---+
7 6 5 4 3 2 1 0
Whereas 1 << 0 do nothing but its like figure one. (notice 7th bit is copied to preserve sign)
OR operator: do bit wise or
MSB PKRevealControllerTypeLeft
+----+----+----+---+---+---+---+---+
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | == 1
+----+----+----+---+---+---+---+---+
7 6 5 4 3 2 1 0
| | | | | | | | OR
MSB PKRevealControllerTypeRight
+----+----+----+---+---+---+---+---+
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | == 2
+----+----+----+---+---+---+---+---+
7 6 5 4 3 2 1 0
=
MSB PKRevealControllerTypeBoth
+----+----+----+---+---+---+---+---+
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | == 3
+----+----+----+---+---+---+---+---+
7 6 5 4 3 2 1 0
| is bit wise operator. in below code it or 1 | 2 == 3
PKRevealControllerTypeNone = 0, // is Zero
PKRevealControllerTypeLeft = 1 << 0, // one
PKRevealControllerTypeRight = 1 << 1, // two
PKRevealControllerTypeBoth = (PKRevealControllerTypeLeft |
PKRevealControllerTypeRight) // three
There is not more technical reason to initialized values like this, defining like that makes things line up nicely read this answer:define SOMETHING (1 << 0)
compiler optimization convert them in simpler for like: (I am not sure for third one, but i think compiler will optimize that too)
PKRevealControllerTypeNone = 0, // is Zero
PKRevealControllerTypeLeft = 1, // one
PKRevealControllerTypeRight = 2, // two
PKRevealControllerTypeBoth = 3, // Three
Edit: #thanks to Till.
read this answer App States with BOOL flags show the usefulness of declarations you got using bit wise operators.
It's an enum of bit flags:
PKRevealControllerTypeNone = 0 // no flags set
PKRevealControllerTypeLeft = 1 << 0, // bit 0 set
PKRevealControllerTypeRight = 1 << 1, // bit 1 set
And then
PKRevealControllerTypeBoth =
(PKRevealControllerTypeLeft | PKRevealControllerTypeRight)
is just the result of bitwise OR-ing the other two flags. So, bit 0 and bit 1 set.
The << operator is the left shift operator. And the | operator is bitwise OR.
In summary the resulting values are:
PKRevealControllerTypeNone = 0
PKRevealControllerTypeLeft = 1
PKRevealControllerTypeRight = 2
PKRevealControllerTypeBoth = 3
But it makes a lot more sense to think about it in terms of flags of bits. Or as a set where the universal set is: { PKRevealControllerTypeLeft, PKRevealControllerTypeRight }
To learn more you need to read up about enums, shift operators and bitwise operators.
This looks like Objective C and not C++, but regardless:
1 << 0
is just one bitshifted left (up) by 0 positions. Any integer "<<0" is just itself.
So
1 << 0 = 1
Similarly
1 << 1
is just one bitshifted left by 1 position. Which you could visualize a number of ways but the easiest is to multiply by 2.[Note 1]
So
x << 1 == x*2
or
1 << 1 == 2
Lastly the single pipe operator is a bitwise or.
So
1 | 2 = 3
tl;dr:
PKRevealControllerTypeNone = 0
PKRevealControllerTypeLeft = 1
PKRevealControllerTypeRight = 2
PKRevealControllerTypeBoth = 3
[1] There are some limitations on this generalization, for example when x is equal to or greater than 1/2 the largest value capable of being stored by the datatype.
This all comes down to bitwise arithmetic.
PKRevealControllerTypeNone has a value of 0 (binary 0000)
PKRevealControllerTypeLeft has a value of 1 (binary 0001)
PKRevealControllerTypeRight has a value of 2 (binary 0010) since 0001 shifted left 1 bit is 0010
PKRevealControllerTypeBoth has a value of 3 (binary 0011) since 0010 | 0001 (or works like addition) = 0011
In context, this is most-likely used to determine a value. The property is & (or bitwise-and) works similar to multiplication. If 1 ands with a number, then the number is preserved, if 0 ands with a number, then the number is cleared.
Thus, if you want to check if a particular controller is specifically type Left and it has a value of 0010 (i.e. type Right) 0010 & 0001 = 0 which is false as we expect (thus, you have determined it is not of correct type). However, if the controller is Both 0011 & 0001 = 1 so the result is true which is correct since we determined this is of Both types.