I am using adler32 function from zlib to calculate the weak checksum of a chunk of memory x (4096 in length). Everything is fine, but now I would like to perform the rolling checksum if the chunks from different file do not match. However, I am not sure how to write a function to perform that on the value returned by adler32 in zlib. So if the checksum does not match, how do I calculate rolling checksum by using original checksum, x + 1 byte and x + 4096 + 1? Basically trying to build rsync implementation.
Pysync has implemented rolling on top of zlib's Adler32 like this:
_BASE=65521 # largest prime smaller than 65536
_NMAX=5552 # largest n such that 255n(n+1)/2 + (n+1)(BASE-1) <= 2^32-1
_OFFS=1 # default initial s1 offset
import zlib
class adler32:
def __init__(self,data=''):
value = zlib.adler32(data,_OFFS)
self.s2, self.s1 = (value >> 16) & 0xffff, value & 0xffff
self.count=len(data)
def update(self,data):
value = zlib.adler32(data, (self.s2<<16) | self.s1)
self.s2, self.s1 = (value >> 16) & 0xffff, value & 0xffff
self.count = self.count+len(data)
def rotate(self,x1,xn):
x1,xn=ord(x1),ord(xn)
self.s1=(self.s1 - x1 + xn) % _BASE
self.s2=(self.s2 - self.count*x1 + self.s1 - _OFFS) % _BASE
def digest(self):
return (self.s2<<16) | self.s1
def copy(self):
n=adler32()
n.count,n.s1,n.s2=self.count,self.s1,self.s2
return n
But as Peter stated, rsync does not use Adler32 directly, but a faster variant of it.
Code of the rsync tool is bit hard to read, but checkout librsync. It's a completely separate project and it's much more readable. Take a look at rollsum.c and rollsum.h. There is an efficient implementation of the variant in C macros:
/* the Rollsum struct type*/
typedef struct _Rollsum {
unsigned long count; /* count of bytes included in sum */
unsigned long s1; /* s1 part of sum */
unsigned long s2; /* s2 part of sum */
} Rollsum;
#define ROLLSUM_CHAR_OFFSET 31
#define RollsumInit(sum) { \
(sum)->count=(sum)->s1=(sum)->s2=0; \
}
#define RollsumRotate(sum,out,in) { \
(sum)->s1 += (unsigned char)(in) - (unsigned char)(out); \
(sum)->s2 += (sum)->s1 - (sum)->count*((unsigned char)(out)+ROLLSUM_CHAR_OFFSET); \
}
#define RollsumRollin(sum,c) { \
(sum)->s1 += ((unsigned char)(c)+ROLLSUM_CHAR_OFFSET); \
(sum)->s2 += (sum)->s1; \
(sum)->count++; \
}
#define RollsumRollout(sum,c) { \
(sum)->s1 -= ((unsigned char)(c)+ROLLSUM_CHAR_OFFSET); \
(sum)->s2 -= (sum)->count*((unsigned char)(c)+ROLLSUM_CHAR_OFFSET); \
(sum)->count--; \
}
#define RollsumDigest(sum) (((sum)->s2 << 16) | ((sum)->s1 & 0xffff))
Related
I'm in the process of creating a cryptography package for Dart (https://pub.dev/packages/steel_crypt). Right now, most of what I've done is either exposed from PointyCastle or simple-ish algorithms where bitwise rotations are unnecessary or replaceable by >> and <<.
However, as I move toward complicated cryptography solutions, which I can do mathematically, I'm unsure of how to implement bitwise rotation in Dart with maximum efficiency. Because of the nature of cryptography, the speed part is emphasized and uncompromising, in that I need the absolute fastest implementation.
I've ported a method of bitwise rotation from Java. I'm pretty sure this is correct, but unsure of the efficiency and readability:
My tested implementation is below:
int INT_BITS = 64; //Dart ints are 64 bit
static int leftRotate(int n, int d) {
//In n<<d, last d bits are 0.
//To put first 3 bits of n at
//last, do bitwise-or of n<<d with
//n >> (INT_BITS - d)
return (n << d) | (n >> (INT_BITS - d));
}
static int rightRotate(int n, int d) {
//In n>>d, first d bits are 0.
//To put last 3 bits of n at
//first, we do bitwise-or of n>>d with
//n << (INT_BITS - d)
return (n >> d) | (n << (INT_BITS - d));
}
EDIT (for clarity): Dart has no unsigned right or left shift, meaning that >> and << are signed right shifts, which bears more significance than I might have thought. It poses a challenge that other languages don't in terms of devising an answer. The accepted answer below explains this and also shows the correct method of bitwise rotation.
As pointed out, Dart has no >>> (unsigned right shift) operator, so you have to rely on the signed shift operator.
In that case,
int rotateLeft(int n, int count) {
const bitCount = 64; // make it 32 for JavaScript compilation.
assert(count >= 0 && count < bitCount);
if (count == 0) return n;
return (n << count) |
((n >= 0) ? n >> (bitCount - count) : ~(~n >> (bitCount - count)));
}
should work.
This code only works for the native VM. When compiling to JavaScript, numbers are doubles, and bitwise operations are only done on 32-bit numbers.
I am trying to parse the MNIST Database of handwritten numbers. However, when I look at the values that it is giving me when I use fread, they aren't right. I have changed the endianness, but the numerical values aren't correct still. Link to the database is here: http://yann.lecun.com/exdb/mnist/
int ChangeEndianness(int value) {
int result = 0;
result |= (value & 0x000000FF) << 24;
result |= (value & 0x0000FF00) << 8;
result |= (value & 0x00FF0000) >> 8;
result |= (value & 0xFF000000) >> 24;
return result;
}
FILE *imageTestFiles = fopen("train-images-idx3-ubyte.gz","r");
if(imageTestFiles == NULL) {
perror("File Not Found");
}
int magic_number_bytes;
fread(&magic_number_bytes, sizeof(int), 1, imageTestFiles);
printf("%d\n", ChangeEndianness(magic_number_bytes));
All this is supposed to do is print the "magic number" which is 2049 or 0x00000801, but it instead prints a 529205256 which is 0x1F8B0808. I am sorta new to C, always used Java beforehand. Thanks in advance!
The file must first be decompressed rather than simply removing the gz extension.
One can tell your code is operating on a compressed file because 0x1F8B is the magic number for the gzip file format.
If xxd is used to display the file contents after downloading you get the observed 0x1F8B0808:
$ xxd -p train-images-idx3-ubyte.gz | head -c 8
1f8b0808
However, if you decompress the file:
$ gunzip train-images-idx3-ubyte.gz
$ xxd -p train-images-idx3-ubyte | head -c 8
00000803
you get the expected magic number for the MNIST data.
all.
I need help, I have a signal like this one
/\
/\ / \
/ \ /\ / \
0 ---------------------------------------
/ \ / \ / \ /
\/ \ / \ /
\/ \/
and I need to detect all peaks (negative and positive). all values are float and I get all 66ms. I want to know time between two peaks. I need help to achieve it, I think I need to store all values in an array with timestamp from last peak, any one have best approach to do it ?
Thanks.
To discover a peak you might want to discover a change in direction.
You would not necessarily have to store the values in an array.
Pseudocode:
//every frame:
frameIncrement++;
currentDir = currentVal - prevVal
if( (prevDir < 0 && currentDir > 0) || (prevDir > 0 && currentDir < 0)) {
//change in direction!
time = frameIncrement * 66
frameIncrement = 0
}
prevDir = currentDir
prevVal = currentVal
Hope this helps!
I have a specific requirement to convert a stream of bytes into a character encoding that happens to be 6-bits per character.
Here's an example:
Input: 0x50 0x11 0xa0
Character Table:
010100 T
000001 A
000110 F
100000 SPACE
Output: "TAF "
Logically I can understand how this works:
Taking 0x50 0x11 0xa0 and showing as binary:
01010000 00010001 10100000
Which is "TAF ".
What's the best way to do this programmatically (pseudo code or c++). Thank you!
Well, every 3 bytes, you end up with four characters. So for one thing, you need to work out what to do if the input isn't a multiple of three bytes. (Does it have padding of some kind, like base64?)
Then I'd probably take each 3 bytes in turn. In C#, which is close enough to pseudo-code for C :)
for (int i = 0; i < array.Length; i += 3)
{
// Top 6 bits of byte i
int value1 = array[i] >> 2;
// Bottom 2 bits of byte i, top 4 bits of byte i+1
int value2 = ((array[i] & 0x3) << 4) | (array[i + 1] >> 4);
// Bottom 4 bits of byte i+1, top 2 bits of byte i+2
int value3 = ((array[i + 1] & 0xf) << 2) | (array[i + 2] >> 6);
// Bottom 6 bits of byte i+2
int value4 = array[i + 2] & 0x3f;
// Now use value1...value4, e.g. putting them into a char array.
// You'll need to decode from the 6-bit number (0-63) to the character.
}
Just in case if someone is interested - another variant that extracts 6-bit numbers from the stream as soon as they appear there. That is, results can be obtained even if less then 3 bytes are currently read. Would be useful for unpadded streams.
The code saves the state of the accumulator a in variable n which stores the number of bits left in accumulator from the previous read.
int n = 0;
unsigned char a = 0;
unsigned char b = 0;
while (read_byte(&byte)) {
// save (6-n) most significant bits of input byte to proper position
// in accumulator
a |= (b >> (n + 2)) & (077 >> n);
store_6bit(a);
a = 0;
// save remaining least significant bits of input byte to proper
// position in accumulator
a |= (b << (4 - n)) & ((077 << (4 - n)) & 077);
if (n == 4) {
store_6bit(a);
a = 0;
}
n = (n + 2) % 6;
}
I am trying to create a software simulation on an Ubuntu GNU/Linux machine which will work like PPPoE. I would like this simulator to take outgoing packets, strip off the ethernet header, insert the PPP flags (7E, FF, 03, 00, and 21) and place the IP layer information in the PPP packet. I am having trouble with the FCS that goes after the data. From what I can tell, the cell modem I am using has a 2 byte FCS using the CRC16-CCITT method. I have found several pieces of software that will calculate this checksum, but none of them produce what is coming out the serial line (I have a serial line "sniffer" that shows me everything the modem is being sent).
I have been looking into the source of pppd and the linux kernel itself, and I can see that both of them have a method of adding an FCS to the data. It seems quite difficult to implement, as I have no experience in kernel hacking. Can someone come up with a simple way (preferably in Python) of calculating an FCS that matches the one that the kernel produces?
Thanks.
P.S. If anyone wants, I can add a sample of the data output I am getting to the serial modem.
Used simple python library crcmod.
import crcmod #pip3 install crcmod
fcsData = "A0 19 03 61 DC"
fcsData=''.join(fcsData.split(' '))
print(fcsData)
crc16 = crcmod.mkCrcFun(0x11021, rev=True,initCrc=0x0000, xorOut=0xFFFF)
print(hex(crc16(bytes.fromhex(fcsData))))
fcs=hex(crc16(bytes.fromhex(fcsData)))
I recently did something like this while testing code to kill a ppp connection ..
This worked for me:
# RFC 1662 Appendix C
def mkfcstab():
P = 0x8408
def valiter():
for b in range(256):
v = b
i = 8
while i:
v = (v >> 1) ^ P if v & 1 else v >> 1
i -= 1
yield v & 0xFFFF
return tuple(valiter())
fcstab = mkfcstab()
PPPINITFCS16 = 0xffff # Initial FCS value
PPPGOODFCS16 = 0xf0b8 # Good final FCS value
def pppfcs16(fcs, bytelist):
for b in bytelist:
fcs = (fcs >> 8) ^ fcstab[(fcs ^ b) & 0xff]
return fcs
To get the value:
fcs = pppfcs16(PPPINITFCS16, (ord(c) for c in frame)) ^ 0xFFFF
and swap the bytes (I used chr((fcs & 0xFF00) >> 8), chr(fcs & 0x00FF))
Got this from mbed.org PPP-Blinky:
// http://www.sunshine2k.de/coding/javascript/crc/crc_js.html - Correctly calculates
// the 16-bit FCS (crc) on our frames (Choose CRC16_CCITT_FALSE)
int crc;
void crcReset()
{
crc=0xffff; // crc restart
}
void crcDo(int x) // cumulative crc
{
for (int i=0; i<8; i++) {
crc=((crc&1)^(x&1))?(crc>>1)^0x8408:crc>>1; // crc calculator
x>>=1;
}
}
int crcBuf(char * buf, int size) // crc on an entire block of memory
{
crcReset();
for(int i=0; i<size; i++)crcDo(*buf++);
return crc;
}