Background:
I have a section of memory, 1024 bytes. The last 1020 bytes will always be the same. The first 4 bytes will change (serial number of a product). I need to calculate the CRC-16 CCITT (0xFFFF starting, 0x1021 mask) for the entire section of memory, CRC_WHOLE.
Question:
Is it possible to calculate the CRC for only the first 4 bytes, CRC_A, then apply a function such as the one below to calculate the full CRC? We can assume that the checksum for the last 1020 bytes, CRC_B, is already known.
CRC_WHOLE = XOR(CRC_A, CRC_B)
I know that this formula does not work (tried it), but I am hoping that something similar exists.
Yes. You can see how in zlib's crc32_combine(). If you have two sequences A and B, then the pure CRC of AB is the exclusive-or of the CRC of A0 and the CRC of 0B, where the 0's represent a series of zero bytes with the length of the corresponding sequence, i.e. B and A respectively.
For your application, you can pre-compute a single operator that applies 1020 zeros to the CRC of your first four bytes very rapidly. Then you can exclusive-or that with the pre-computed CRC of the 1020 bytes.
Update:
Here is a post of mine from 2008 with a detailed explanation that #ArtemB discovered (that I had forgotten about):
crc32_combine() in zlib is based on two key tricks. For what follows,
we set aside the fact that the standard 32-bit CRC is pre and post-
conditioned. We can deal with that later. Assume for now a CRC that
has no such conditioning, and so starts with the register filled with
zeros.
Trick #1: CRCs are linear. So if you have stream X and stream Y of
the same length and exclusive-or the two streams bit-by-bit to get Z,
i.e. Z = X ^ Y (using the C notation for exclusive-or), then CRC(Z) =
CRC(X) ^ CRC(Y). For the problem at hand we have two streams A and B
of differing length that we want to concatenate into stream Z. What
we have available are CRC(A) and CRC(B). What we want is a quick way
to compute CRC(Z). The trick is to construct X = A concatenated with
length(B) zero bits, and Y = length(A) zero bits concatenated with B.
So if we represent concatenation simply by juxtaposition of the
symbols, X = A0, Y = 0B, then X^Y = Z = AB. Then we have CRC(Z) =
CRC(A0) ^ CRC(0B).
Now we need to know CRC(A0) and CRC(0B). CRC(0B) is easy. If we feed
a bunch of zeros to the CRC machine starting with zero, the register
is still filled with zeros. So it's as if we did nothing at all.
Therefore CRC(0B) = CRC(B).
CRC(A0) requires more work however. Taking a non-zero CRC and feeding
zeros to the CRC machine doesn't leave it alone. Every zero changes
the register contents. So to get CRC(A0), we need to set the register
to CRC(A), and then run length(B) zeros through it. Then we can
exclusive-or the result of that with CRC(B) = CRC(0B), and we get what
we want, which is CRC(Z) = CRC(AB). Voila!
Well, actually the voila is premature. I wasn't at all satisfied with
that answer. I didn't want a calculation that took a time
proportional to the length of B. That wouldn't save any time compared
to simply setting the register to CRC(A) and running the B stream
through. I figured there must be a faster way to compute the effect
of feeding n zeros into the CRC machine (where n = length(B)). So
that leads us to:
Trick #2: The CRC machine is a linear state machine. If we know the
linear transformation that occurs when we feed a zero to the machine,
then we can do operations on that transformation to more efficiently
find the transformation that results from feeding n zeros into the
machine.
The transformation of feeding a single zero bit into the CRC machine
is completely represented by a 32x32 binary matrix. To apply the
transformation we multiply the matrix by the register, taking the
register as a 32 bit column vector. For the matrix multiplication in
binary (i.e. over the Galois Field of 2), the role of multiplication
is played by and'ing, and the role of addition is played by exclusive-
or'ing.
There are a few different ways to construct the magic matrix that
represents the transformation caused by feeding the CRC machine a
single zero bit. One way is to observe that each column of the matrix
is what you get when your register starts off with a single one in
it. So the first column is what you get when the register is 100...
and then feed a zero, the second column comes from starting with
0100..., etc. (Those are referred to as basis vectors.) You can see
this simply by doing the matrix multiplication with those vectors.
The matrix multiplication selects the column of the matrix
corresponding to the location of the single one.
Now for the trick. Once we have the magic matrix, we can set aside
the initial register contents for a while, and instead use the
transformation for one zero to compute the transformation for n
zeros. We could just multiply n copies of the matrix together to get
the matrix for n zeros. But that's even worse than just running the n
zeros through the machine. However there's an easy way to avoid most
of those matrix multiplications to get the same answer. Suppose we
want to know the transformation for running eight zero bits, or one
byte through. Let's call the magic matrix that represents running one
zero through: M. We could do seven matrix multiplications to get R =
MxMxMxMxMxMxMxM. Instead, let's start with MxM and call that P. Then
PxP is MxMxMxM. Let's call that Q. Then QxQ is R. So now we've
reduced the seven multiplications to three. P = MxM, Q = PxP, and R =
QxQ.
Now I'm sure you get the idea for an arbitrary n number of zeros. We
can very rapidly generate transformation matrices Mk, where Mk is the
transformation for running 2k zeros through. (In the
paragraph above M3 is R.) We can make M1 through Mk with only k
matrix multiplications, starting with M0 = M. k only has to be as
large as the number of bits in the binary representation of n. We can
then pick those matrices where there are ones in the binary
representation of n and multiply them together to get the
transformation of running n zeros through the CRC machine. So if n =
13, compute M0 x M2 x M3.
If j is the number of one's in the binary representation of n, then we
just have j - 1 more matrix multiplications. So we have a total of k
j - 1 matrix multiplications, where j <= k = floor(logbase2(n)).
Now we take our rapidly constructed matrix for n zeros, and multiply
that by CRC(A) to get CRC(A0). We can compute CRC(A0) in O(log(n))
time, instead of O(n) time. We exclusive or that with CRC(B) and
Voila! (really this time), we have CRC(Z).
That's what zlib's crc32_combine() does.
I will leave it as an exercise for the reader as to how to deal with
the pre and post conditioning of the CRC register. You just need to
apply the linearity observations above. Hint: You don't need to know
length(A). In fact crc32_combine() only takes three arguments:
CRC(A), CRC(B), and length(B) (in bytes).
Below is example C code for an alternative approach for CRC(A0). Rather than working with a matrix, a CRC can be cycled forward n bits by muliplying (CRC · ((2^n)%POLY)%POLY . So the repeated squaring is performed on an integer rather than a matrix. If n is constant, then (2^n)%POLY can be pre-computed.
/* crcpad.c - crc - data has a large number of trailing zeroes */
#include <stdio.h>
#include <stdlib.h>
typedef unsigned char uint8_t;
typedef unsigned int uint32_t;
#define POLY (0x04c11db7u)
static uint32_t crctbl[256];
void GenTbl(void) /* generate crc table */
{
uint32_t crc;
uint32_t c;
uint32_t i;
for(c = 0; c < 0x100; c++){
crc = c<<24;
for(i = 0; i < 8; i++)
/* assumes twos complement */
crc = (crc<<1)^((0-(crc>>31))&POLY);
crctbl[c] = crc;
}
}
uint32_t GenCrc(uint8_t * bfr, size_t size) /* generate crc */
{
uint32_t crc = 0u;
while(size--)
crc = (crc<<8)^crctbl[(crc>>24)^*bfr++];
return(crc);
}
/* carryless multiply modulo crc */
uint32_t MpyModCrc(uint32_t a, uint32_t b) /* (a*b)%crc */
{
uint32_t pd = 0;
uint32_t i;
for(i = 0; i < 32; i++){
/* assumes twos complement */
pd = (pd<<1)^((0-(pd>>31))&POLY);
pd ^= (0-(b>>31))&a;
b <<= 1;
}
return pd;
}
/* exponentiate by repeated squaring modulo crc */
uint32_t PowModCrc(uint32_t p) /* pow(2,p)%crc */
{
uint32_t prd = 0x1u; /* current product */
uint32_t sqr = 0x2u; /* current square */
while(p){
if(p&1)
prd = MpyModCrc(prd, sqr);
sqr = MpyModCrc(sqr, sqr);
p >>= 1;
}
return prd;
}
/* # data bytes */
#define DAT ( 32)
/* # zero bytes */
#define PAD (992)
/* DATA+PAD */
#define CNT (1024)
int main()
{
uint32_t pmc;
uint32_t crc;
uint32_t crf;
uint32_t i;
uint8_t *msg = malloc(CNT);
for(i = 0; i < DAT; i++) /* generate msg */
msg[i] = (uint8_t)rand();
for( ; i < CNT; i++)
msg[i] = 0;
GenTbl(); /* generate crc table */
crc = GenCrc(msg, CNT); /* generate crc normally */
crf = GenCrc(msg, DAT); /* generate crc for data */
pmc = PowModCrc(PAD*8); /* pmc = pow(2,PAD*8)%crc */
crf = MpyModCrc(crf, pmc); /* crf = (crf*pmc)%crc */
printf("%08x %08x\n", crc, crf);
free(msg);
return 0;
}
Example C code using intrinsic for carryless multiply, pclmulqdq == _mm_clmulepi64_si128:
/* crcpadm.c - crc - data has a large number of trailing zeroes */
/* pclmulqdq intrinsic version */
#include <stdio.h>
#include <stdlib.h>
#include <intrin.h>
typedef unsigned char uint8_t;
typedef unsigned int uint32_t;
typedef unsigned long long uint64_t;
#define POLY (0x104c11db7ull)
#define POLYM ( 0x04c11db7u)
static uint32_t crctbl[256];
static __m128i poly; /* poly */
static __m128i invpoly; /* 2^64 / POLY */
void GenMPoly(void) /* generate __m12i8 poly info */
{
uint64_t N = 0x100000000ull;
uint64_t Q = 0;
for(size_t i = 0; i < 33; i++){
Q <<= 1;
if(N&0x100000000ull){
Q |= 1;
N ^= POLY;
}
N <<= 1;
}
poly.m128i_u64[0] = POLY;
invpoly.m128i_u64[0] = Q;
}
void GenTbl(void) /* generate crc table */
{
uint32_t crc;
uint32_t c;
uint32_t i;
for(c = 0; c < 0x100; c++){
crc = c<<24;
for(i = 0; i < 8; i++)
/* assumes twos complement */
crc = (crc<<1)^((0-(crc>>31))&POLYM);
crctbl[c] = crc;
}
}
uint32_t GenCrc(uint8_t * bfr, size_t size) /* generate crc */
{
uint32_t crc = 0u;
while(size--)
crc = (crc<<8)^crctbl[(crc>>24)^*bfr++];
return(crc);
}
/* carryless multiply modulo crc */
uint32_t MpyModCrc(uint32_t a, uint32_t b) /* (a*b)%crc */
{
__m128i ma, mb, mp, mt;
ma.m128i_u64[0] = a;
mb.m128i_u64[0] = b;
mp = _mm_clmulepi64_si128(ma, mb, 0x00); /* p[0] = a*b */
mt = _mm_clmulepi64_si128(mp, invpoly, 0x00); /* t[1] = (p[0]*((2^64)/POLY))>>64 */
mt = _mm_clmulepi64_si128(mt, poly, 0x01); /* t[0] = t[1]*POLY */
return mp.m128i_u32[0] ^ mt.m128i_u32[0]; /* ret = p[0] ^ t[0] */
}
/* exponentiate by repeated squaring modulo crc */
uint32_t PowModCrc(uint32_t p) /* pow(2,p)%crc */
{
uint32_t prd = 0x1u; /* current product */
uint32_t sqr = 0x2u; /* current square */
while(p){
if(p&1)
prd = MpyModCrc(prd, sqr);
sqr = MpyModCrc(sqr, sqr);
p >>= 1;
}
return prd;
}
/* # data bytes */
#define DAT ( 32)
/* # zero bytes */
#define PAD (992)
/* DATA+PAD */
#define CNT (1024)
int main()
{
uint32_t pmc;
uint32_t crc;
uint32_t crf;
uint32_t i;
uint8_t *msg = malloc(CNT);
GenMPoly(); /* generate __m128 polys */
GenTbl(); /* generate crc table */
for(i = 0; i < DAT; i++) /* generate msg */
msg[i] = (uint8_t)rand();
for( ; i < CNT; i++)
msg[i] = 0;
crc = GenCrc(msg, CNT); /* generate crc normally */
crf = GenCrc(msg, DAT); /* generate crc for data */
pmc = PowModCrc(PAD*8); /* pmc = pow(2,PAD*8)%crc */
crf = MpyModCrc(crf, pmc); /* crf = (crf*pmc)%crc */
printf("%08x %08x\n", crc, crf);
free(msg);
return 0;
}
Related
I'm trying to implement a CRC-CCITT calculator in VHDL. I was able to initially do that; however, I recently found out that data is delivered starting at the least-significant byte. In my code, data is transmitted 7 bytes at a time through a frame. So let's say we have the following data: 123456789 in ASCII or 313233343536373839 in hex. The data would be transmitted as such (with the following CRC):
-- First frame of data
RxFrame.Data <= (
1 => x"39", -- LSB
2 => x"38",
3 => x"37",
4 => x"36",
5 => x"35",
6 => x"34",
7 => x"33"
);
-- Second/last frame of data
RxFrame.Data <= (
1 => x"32",
2 => x"31", -- MSB
3 => xx, -- "xx" means irrelevant data, not part of CRC calculation.
4 => xx, -- This occurs only in the last frame, when it specified in
5 => xx, -- byte 0 which bytes contain data
6 => xx,
7 => xx
);
-- Calculated CRC should be 0x31C3
Another example with data 0x4376669A1CFC048321313233343536373839 and its correct CRC is shown below:
-- First incoming frame of data
RxFrame.Data <= (
1 => x"39", -- LSB
2 => x"38",
3 => x"37",
4 => x"36",
5 => x"35",
6 => x"34",
7 => x"33"
);
-- Second incoming frame of data
RxFrame.Data <= (
1 => x"32",
2 => x"31",
3 => x"21",
4 => x"83",
5 => x"04",
6 => x"FC",
7 => x"1C"
);
-- Third/last incoming frame of data
RxFrame.Data <= (
1 => x"9A",
2 => x"66",
3 => x"76",
4 => x"43", -- MSB
5 => xx, -- Irrelevant data, specified in byte 0
6 => xx,
7 => xx
);
-- Calculated CRC should be 0x2848
Is there a concept I'm missing? Is there a way to calculate the CRC with the data being received in reverse order? I am implementing this for CANopen SDO block protocols. Thanks!
CRC calculation algorithm to verify SDO block transfer from CANopen standard
Example code to generate a CRC16 with the bytes read in reverse (last byte first), using a function to do a carryless multiply modulo the CRC polynomial. An explanation follows.
#include <stdio.h>
typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
#define POLY (0x1021u)
/* carryless multiply modulo crc polynomial */
uint16_t MpyModPoly(uint16_t a, uint16_t b) /* (a*b)%poly */
{
uint16_t pd = 0;
uint16_t i;
for(i = 0; i < 16; i++){
/* assumes twos complement */
pd = (pd<<1)^((0-(pd>>15))&POLY);
pd ^= (0-(b>>15))&a;
b <<= 1;
}
return pd;
}
/* generate crc in reverse byte order */
uint16_t Crc16R(uint8_t * b, size_t sz)
{
uint8_t *e = b + sz; /* end of bfr ptr */
uint16_t crc = 0u; /* crc */
uint16_t pdm = 0x100u; /* padding multiplier */
while(e > b){ /* generate crc */
pdm = MpyModPoly(0x100, pdm);
crc ^= MpyModPoly( *--e, pdm);
}
return(crc);
}
/* msg will be processed in reverse order */
static uint8_t msg[] = {0x43,0x76,0x66,0x9A,0x1C,0xFC,0x04,0x83,
0x21,0x31,0x32,0x33,0x34,0x35,0x36,0x37,
0x38,0x39};
int main()
{
uint16_t crc;
crc = Crc16R(msg, sizeof(msg));
printf("%04x\n", crc);
return 0;
}
Example code using X86 xmm pclmulqdq and psrlq, to emulate a 16 bit by 16 bit hardware (VHDL) carryless multiply:
/* __m128i is an intrinsic for X86 128 bit xmm register */
static __m128i poly = {.m128i_u32[0] = 0x00011021u}; /* poly */
static __m128i invpoly = {.m128i_u32[0] = 0x00008898u}; /* 2^31 / poly */
/* carryless multiply modulo crc polynomial */
/* using xmm pclmulqdq and psrlq */
uint16_t MpyModPoly(uint16_t a, uint16_t b)
{
__m128i ma, mb, mp, mt;
ma.m128i_u64[0] = a;
mb.m128i_u64[0] = b;
mp = _mm_clmulepi64_si128(ma, mb, 0x00); /* mp = a*b */
mt = _mm_srli_epi64(mp, 16); /* mt = mp>>16 */
mt = _mm_clmulepi64_si128(mt, invpoly, 0x00); /* mt = mt*ipoly */
mt = _mm_srli_epi64(mt, 15); /* mt = mt>>15 = (a*b)/poly */
mt = _mm_clmulepi64_si128(mt, poly, 0x00); /* mt = mt*poly */
return mp.m128i_u16[0] ^ mt.m128i_u16[0]; /* ret mp^mt */
}
/* external code to generate invpoly */
#define POLY (0x11021u)
static __m128i invpoly; /* 2^31 / poly */
void GenMPoly(void) /* generate __m12i8 invpoly */
{
uint32_t N = 0x10000u; /* numerator = x^16 */
uint32_t Q = 0; /* quotient = 0 */
for(size_t i = 0; i <= 15; i++){ /* 31 - 16 = 15 */
Q <<= 1;
if(N&0x10000u){
Q |= 1;
N ^= POLY;
}
N <<= 1;
}
invpoly.m128i_u16[0] = Q;
}
Explanation: consider the data as separate strings of ever increasing length, padded with zeroes at the end. For the first few bytes of your example, the logic would calculate
CRC = CRC16({39})
CRC ^= CRC16({38 00})
CRC ^= CRC16({37 00 00})
CRC ^= CRC16({36 00 00 00})
...
To speed up this calculation, rather than actually pad with n zero bytes, you can do a carryless multiply of a CRC by 2^{n·8} modulo POLY, where POLY is the 17 bit polynomial used for CRC16:
CRC = CRC16({39})
CRC ^= (CRC16({38}) · (2^08 % POLY)) % POLY
CRC ^= (CRC16({37}) · (2^10 % POLY)) % POLY
CRC ^= (CRC16({36}) · (2^18 % POLY)) % POLY
...
A carryless multiply modulo POLY is equivalent to what CRC16 does, so this translates into pseudo code (all values in hex, 2^8 = 100)
CRC = 0
PDM = 100 ;padding multiplier
PDM = (100 · PDM) % POLY ;main loop (2 lines per byte)
CRC ^= ( 39 · PDM) % POLY
PDM = (100 · PDM) % POLY
CRC ^= ( 38 · PDM) % POLY
PDM = (100 · PDM) % POLY
CRC ^= ( 37 · PDM) % POLY
PDM = (100 · PDM) % POLY
CRC ^= ( 36 · PDM) % POLY
...
Implementing (A · B) % POLY is based on binary math:
(A · B) % POLY = (A · B) ^ (((A · B) / POLY) · POLY)
Where multiply is carryless (XOR instead of add) and divide is borrowless (XOR instead of subtract). Since the divide is borrowless, and most significant term of POLY is x^16, the quotient
Q = (A · B) / POLY
only depends on the upper 16 bits of (A · B). Dividing by POLY uses multiplication by the 16 bit constant IPOLY = (2^31)/POLY followed by a right shift:
Q = (A · B) / POLY = (((A · B) >> 16) · IPOLY) >> 15
The process uses a 16 bit by 16 bit carryless multiply, producing a 31 bit product.
POLY = 0x11021u ; CRC polynomial (17 bit)
IPOLY = 0x08898u ; 2^31 / POLY
; generated by external software
MpyModPoly(A, B)
{
MP = A · B ; MP = A · B
MT = MP >> 16 ; MT = MP >> 16
MT = MT · IPOLY ; MT = MT · IPOLY
MT = MT >> 15 ; MT = (A · B) / POLY
MT = MT · POLY ; MT = ((A · B) / POLY) * POLY
return MP xor MT ; (A·B) ^ (((A · B) / POLY) · POLY)
}
A hardware based carryless multiply would look something like this 4 bit · 4 bit example.
p[] = [a3 a2 a1 a0] · [b3 b2 b1 b0]
p[] is a 7 bit product generated with 7 parallel circuits.
The time for multiply would be worst case propagation time for p3.
p6 = a3&b3
p5 = a3&b2 ^ a2&b3
p4 = a3&b1 ^ a2&b2 ^ a1&b3
p3 = a3&b0 ^ a2&b1 ^ a1&b2 ^ a0&b3
p2 = a2&b0 ^ a1&b1 ^ a0&b2
p1 = a1&b0 ^ a0&b1
p0 = a0&b0
If the xor gates available only have 2 bit inputs, the logic can
be split up. For example:
p3 = (a3&b0 ^ a2&b1) ^ (a1&b2 ^ a0&b3)
I don't know if your VHDL toolset includes a library for carryless multiply. For a 16 bit by 16 bit multiply resulting in a 31 bit product (p30 to p00), p15 has 16 outputs from the 16 ands (in parallel), which could be xor'ed using a tree like structure, 8 xors in parallel feeding into 4 xors in parallel feeding into 2 xor's in parallel into a single xor. So the propagation time would be 1 and and 4 xor propagation times.
Here is an example in C that you can adapt. Since you mentioned VHDL, this is a bit-wise implementation suitable for casting into gates and flip-flops. However, if cycles are more precious to you than memory and gates, then there is also a byte-wise table-driven version that would run in 1/8 the number of cycles.
What this does is the inverse of what is done in a normal CRC calculation. It then applies the same size input in zeros with a normal CRC to get what the normal CRC would have been on that input. Running the zeros through takes the same number of cycles as the inverse CRC, i.e. O(n) where n is the size of the input. If that latency is too large, that can be reduced to O(log n) cycles, with some investment in gates.
#include <stddef.h>
// Update crc with the CRC-16/XMODEM of n zero bytes. (This can be done in
// O(log n) time or cycles instead of O(n), with a little more effort.)
static unsigned crc16x_zeros_bit(unsigned crc, size_t n) {
for (size_t i = 0; i < n; i++)
for (int k = 0; k < 8; k++)
crc = crc & 0x8000 ? (crc << 1) ^ 0x1021 : crc << 1;
return crc & 0xffff;
}
// Update crc with the CRC-16/XMODEM of the len bytes at mem in reverse. If mem
// is NULL, then return the initial value for the CRC. When done,
// crc16x_zeros_bit() must be used to apply the total length of zero bytes, in
// order to get what the CRC would have been if it were calculated on the bytes
// fed in the opposite order.
static unsigned crc16x_inverse_bit(unsigned crc, void const *mem, size_t len) {
unsigned char const *data = mem;
if (data == NULL)
return 0;
crc &= 0xffff;
for (size_t i = 0; i < len; i++) {
for (int k = 0; k < 8; k++)
crc = crc & 1 ? (crc >> 1) ^ 0x8810 : crc >> 1;
crc ^= (unsigned)data[i] << 8;
}
return crc;
}
#include <stdio.h>
int main(void) {
// Do framed example.
unsigned crc = crc16x_inverse_bit(0, NULL, 0);
crc = crc16x_inverse_bit(crc, (void const *)"9876543", 7);
crc = crc16x_inverse_bit(crc, (void const *)"21", 2);
crc = crc16x_zeros_bit(crc, 9);
printf("%04x\n", crc);
// Do another one.
crc = crc16x_inverse_bit(0, NULL, 0);
crc = crc16x_inverse_bit(crc, (void const *)"9876543", 7);
crc = crc16x_inverse_bit(crc, (void const *)"21!\x83\x04\xfc\x1c", 7);
crc = crc16x_inverse_bit(crc, (void const *)"\x9a" "fvC", 4);
crc = crc16x_zeros_bit(crc, 18);
printf("%04x\n", crc);
return 0;
}
Here is the O(log n) version of crc16x_zeros_bit():
// Return a(x) multiplied by b(x) modulo p(x), where p(x) is the CRC
// polynomial. For speed, a cannot be zero.
static inline unsigned multmodp(unsigned a, unsigned b) {
unsigned p = 0;
for (;;) {
if (a & 1) {
p ^= b;
if (a == 1)
break;
}
a >>= 1;
b = b & 0x8000 ? (b << 1) ^ 0x1021 : b << 1;
}
return p & 0xffff;
}
// Return x^(8n) modulo p(x).
static unsigned x2nmodp(size_t n) {
unsigned p = 1; // x^0 == 1
unsigned q = 0x10; // x^2^2
while (n) {
q = multmodp(q, q); // x^2^k mod p(x), k = 3,4,...
if (n & 1)
p = multmodp(q, p);
n >>= 1;
}
return p;
}
// Update crc with the CRC-16/XMODEM of n zero bytes.
static unsigned crc16x_zeros_bit(unsigned crc, size_t n) {
return multmodp(x2nmodp(n), crc);
}
I want to use PayMaya EMV Merchant Presented QR Code Specification for Payment Systems everything is good except CRC i don't understand how to generate this code.
that's all exist about it ,but i still can't understand how to generate this .
The checksum shall be calculated according to [ISO/IEC 13239] using the polynomial '1021' (hex) and initial value 'FFFF' (hex). The data over which the checksum is calculated shall cover all data objects, including their ID, Length and Value, to be included in the QR Code, in their respective order, as well as the ID and Length of the CRC itself (but excluding its Value).
Following the calculation of the checksum, the resulting 2-byte hexadecimal value shall be encoded as a 4-character Alphanumeric Special value by converting each nibble to an Alphanumeric Special character.
Example: a CRC with a two-byte hexadecimal value of '007B' is included in the QR Code as "6304007B".
This converts a string to its UTF-8 representation as a sequence of bytes, and prints out the 16-bit Cyclic Redundancy Check of those bytes (CRC-16/CCITT-FALSE).
int crc16_CCITT_FALSE(String data) {
int initial = 0xFFFF; // initial value
int polynomial = 0x1021; // 0001 0000 0010 0001 (0, 5, 12)
Uint8List bytes = Uint8List.fromList(utf8.encode(data));
for (var b in bytes) {
for (int i = 0; i < 8; i++) {
bool bit = ((b >> (7-i) & 1) == 1);
bool c15 = ((initial >> 15 & 1) == 1);
initial <<= 1;
if (c15 ^ bit) initial ^= polynomial;
}
}
return initial &= 0xffff;
}
The CRC for ISO/IEC 13239 is this CRC-16/ISO-HDLC, per the notes in that catalog. This implements that CRC and prints the check value 0x906e:
import 'dart:typed_data';
int crc16ISOHDLC(Uint8List bytes) {
int crc = 0xffff;
for (var b in bytes) {
crc ^= b;
for (int i = 0; i < 8; i++)
crc = (crc & 1) != 0 ? (crc >> 1) ^ 0x8408 : crc >> 1;
}
return crc ^ 0xffff;
}
void main() {
Uint8List msg = Uint8List.fromList([0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39]);
print("0x" + crc16ISOHDLC(msg).toRadixString(16));
}
For example, I calculated CRC checksum of a file of size 1024 KB and file includes 22 KB of padding of zeros at the end of the file.
If given checksum of 1024 KB and
size of the padding of zeros of given file
Is it possible to calculate the checksum of the file without the passing. That is in above case getting the checksum of 1002 KB of the file. Assuming we don't have to recalculate the checksum again and reuse the checksum already calculated for the entire file with padding.
After a normal CRC is calculated, a CRC can be "reverse cycled" backwards past the trailing zeroes, but rather than actually reverse cycling the CRC, a carryless multiply can be used:
new crc = (crc · (pow(2,-1-reverse_distance)%poly))%poly
The -1 represents the cyclic period for a CRC. For CRC32, the period is 2^32-1 = 0xffffffff .
By generating a table for pow(2,-1-(i*8))%poly) for i = 1 to n, time complexity can be reduced to O(1), doing a table lookup followed by a carryless multiply mod polynomial (32 iterations).
Example code for a 32 byte message with 14 data bytes, 18 zero bytes, with the new crc to be located at msg[{14,15,16,17}]. After the new bytes are stored in the message, a normal CRC calculation on the shortened message will be zero. The example code doesn't use a table, and the time complexity is O(log2(n)) for the pow(2,-1-(n*8))%poly) calculation.
#include <stdio.h>
typedef unsigned char uint8_t;
typedef unsigned int uint32_t;
static uint32_t crctbl[256];
void GenTbl(void) /* generate crc table */
{
uint32_t crc;
uint32_t c;
uint32_t i;
for(c = 0; c < 0x100; c++){
crc = c<<24;
for(i = 0; i < 8; i++)
crc = (crc<<1)^((0-(crc>>31))&0x04c11db7);
crctbl[c] = crc;
}
}
uint32_t GenCrc(uint8_t * bfr, size_t size) /* generate crc */
{
uint32_t crc = 0u;
while(size--)
crc = (crc<<8)^crctbl[(crc>>24)^*bfr++];
return(crc);
}
/* carryless multiply modulo crc */
uint32_t MpyModCrc(uint32_t a, uint32_t b) /* (a*b)%crc */
{
uint32_t pd = 0;
uint32_t i;
for(i = 0; i < 32; i++){
pd = (pd<<1)^((0-(pd>>31))&0x04c11db7u);
pd ^= (0-(b>>31))&a;
b <<= 1;
}
return pd;
}
/* exponentiate by repeated squaring modulo crc */
uint32_t PowModCrc(uint32_t p) /* pow(2,p)%crc */
{
uint32_t prd = 0x1u; /* current product */
uint32_t sqr = 0x2u; /* current square */
while(p){
if(p&1)
prd = MpyModCrc(prd, sqr);
sqr = MpyModCrc(sqr, sqr);
p >>= 1;
}
return prd;
}
/* message 14 data, 18 zeroes */
/* parities = crc cycled backwards 18 bytes */
int main()
{
uint32_t pmr;
uint32_t crc;
uint32_t par;
uint8_t msg[32] = {0x01,0x02,0x03,0x04,0x05,0x06,0x07,0x08,
0x09,0x0a,0x0b,0x0c,0x0d,0x0e,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00};
GenTbl(); /* generate crc table */
pmr = PowModCrc(-1-(18*8)); /* pmr = pow(2,-1-18*8)%crc */
crc = GenCrc(msg, 32); /* generate crc including 18 zeroes */
par = MpyModCrc(crc, pmr); /* par = (crc*pmr)%crc = new crc */
crc = GenCrc(msg, 14); /* generate crc for shortened msg */
printf("%08x %08x\n", par, crc); /* crc == par */
msg[14] = (uint8_t)(par>>24); /* store parities in msg */
msg[15] = (uint8_t)(par>>16);
msg[16] = (uint8_t)(par>> 8);
msg[17] = (uint8_t)(par>> 0);
crc = GenCrc(msg, 18); /* crc == 0 */
printf("%08x\n", crc);
return 0;
}
Sure. Look at this answer for code that undoes the trailing zeros, crc32_remove_zeros().
What is the most efficient way of generating a sine wave for a device running IOS. For the purposes of the exercise assume a frequency of 440Hz and a sampling rate of 44100Hz and 1024 samples.
A vanilla C implementation looks something like.
#define SAMPLES 1024
#define TWO_PI (3.14159 * 2)
#define FREQUENCY 440
#define SAMPLING_RATE 44100
int main(int argc, const char * argv[]) {
float samples[SAMPLES];
float phaseIncrement = TWO_PI * FREQUENCY / SAMPLING_RATE;
float currentPhase = 0.0;
for (int i = 0; i < SAMPLES; i ++){
samples[i] = sin(currentPhase);
currentPhase += phaseIncrement;
}
return 0;
}
To take advantage of the Accelerate Framework and the vecLib vvsinf function the loop can be changed to only do the addition.
#define SAMPLES 1024
#define TWO_PI (3.14159 * 2)
#define FREQUENCY 440
#define SAMPLING_RATE 44100
int main(int argc, const char * argv[]) {
float samples[SAMPLES] __attribute__ ((aligned));
float results[SAMPLES] __attribute__ ((aligned));
float phaseIncrement = TWO_PI * FREQUENCY / SAMPLING_RATE;
float currentPhase = 0.0;
for (int i = 0; i < SAMPLES; i ++){
samples[i] = currentPhase;
currentPhase += phaseIncrement;
}
vvsinf(results, samples, SAMPLES);
return 0;
}
But is just applying the vvsinf function as far as I should go in terms of efficiency?
I don't really understand the Accelerate framework well enough to know if I can also replace the loop. Is there a vecLib or vDSP function I can use?
For that matter is it possible to use an entirely different alogrithm to fill a buffer with a sine wave?
Given that you are computing the sine of a phase argument which increases in fixed increments, it is generally much faster to implement the signal generation with a recurrence equation as described in this "How to Create Oscillators in Software" post and some more in this "DSP Trick: Sinusoidal Tone Generator" post, both on dspguru:
y[n] = 2*cos(w)*y[n-1] - y[n-2]
Note that this recurrence equation can be subject to numerical roundoff error accumulation, you should avoid computing too many samples at a time (your selection of SAMPLES == 1024 should be fine). This recurrence equation can be used after you have obtained the first two values y[0] and y[1] (the initial conditions). Since you are generating with an initial phase of 0, those are simply:
samples[0] = 0;
samples[1] = sin(phaseIncrement);
or more generally with an arbitrary initial phase (particularly useful to reinitialize the recurrence equation every so often to avoid the numerical roundoff error accumulation I mentioned earlier):
samples[0] = sin(initialPhase);
samples[1] = sin(initialPhase+phaseIncrement);
The recurrence equation can then be implemented directly with:
float scale = 2*cos(phaseIncrement);
// initialize first 2 samples for the 0 initial phase case
samples[0] = 0;
samples[1] = sin(phaseIncrement);
for (int i = 2; i < SAMPLES; i ++){
samples[i] = scale * samples[i-1] - samples[i-2];
}
Note that this implementation could be vectorized by computing multiple tones (each with the same frequency, but with larger phase increments between samples) with appropriate relative phase shifts, then interleaving the results to obtain the original tone (e.g. computing sin(4*w*n), sin(4*w*n+w), sin(4*w*n+2*w) and sin(4*w*n+3*w)). This would however make the implementation a lot more obscure, for a relatively small gain.
Alternatively the equation can be implemented by making use of vDsp_deq22:
// setup dummy array which will hold zeros as input
float nullInput[SAMPLES];
memset(nullInput, 0, SAMPLES * sizeof(float));
// setup filter coefficients
float coefficients[5];
coefficients[0] = 0;
coefficients[1] = 0;
coefficients[2] = 0;
coefficients[3] = -2*cos(phaseIncrement);
coefficients[4] = 1.0;
// initialize first 2 samples for the 0 initial phase case
samples[0] = 0;
samples[1] = sin(phaseIncrement);
vDsp_deq22(nullInput, 1, coefficients, samples, 1, SAMPLES-2);
If efficiency is required, you could pre-load a 440hz (44100 / 440) sine waveform look-up table and loop around it without further mapping or pre-load a 1hz (44100 / 44100) sine waveform look-up table and loop around by skipping samples to reach 440hz just as you did by incrementing a phase counter. Using look-up tables should be faster than computing sin().
Method A (using 440hz sine waveform):
#define SAMPLES 1024
#define FREQUENCY 440
#define SAMPLING_RATE 44100
#define WAVEFORM_LENGTH (SAMPLING / FREQUENCY)
int main(int argc, const char * argv[]) {
float waveform[WAVEFORM_LENGTH];
LoadSinWaveForm(waveform);
float samples[SAMPLES] __attribute__ ((aligned));
float results[SAMPLES] __attribute__ ((aligned));
for (int i = 0; i < SAMPLES; i ++){
samples[i] = waveform[i % WAVEFORM_LENGTH];
}
vvsinf(results, samples, SAMPLES);
return 0;
}
Method B (using 1hz sine waveform):
#define SAMPLES 1024
#define FREQUENCY 440
#define TWO_PI (3.14159 * 2)
#define SAMPLING_RATE 44100
#define WAVEFORM_LENGTH SAMPLING_RATE // since it's 1hz
int main(int argc, const char * argv[]) {
float waveform[WAVEFORM_LENGTH];
LoadSinWaveForm(waveform);
float samples[SAMPLES] __attribute__ ((aligned));
float results[SAMPLES] __attribute__ ((aligned));
float phaseIncrement = TWO_PI * FREQUENCY / SAMPLING_RATE;
float currentPhase = 0.0;
for (int i = 0; i < SAMPLES; i ++){
samples[i] = waveform[floor(currentPhase) % WAVEFORM_LENGTH];
currentPhase += phaseIncrement;
}
vvsinf(results, samples, SAMPLES);
return 0;
}
Please note that:
Method A is susceptible to frequency inaccuracy due to assuming that your frequency always divides correctly the sampling rate, which is not true. That means you may get 441hz or 440hz with a glitch.
Method B is susceptible to aliasing as the frequency goes up an gets closer to the Nyquist frequency, but it's a good trade-off between performance, quality and memory consumption if synthesizing reasonable low frequencies such as the one in your example.
I'm working on an app of hardware communication that I send or require data from an external hardware. I have the require data part done.
And I just find out I could use some help to calculate the checksum.
A package is created as NSMutableData, then it will be converted in to Byte Array before sending out.
A package looks like this:
0x1E 0x2D 0x2F DATA checksum
I'm thinking I can convert hex into binary to calculate them one by one. But I don't know if it's a good idea. Please let me know if this is the only way to do it, or there are some built in functions I don't know.
Any suggestions will be appreciated.
BTW, I just found the code for C# from other's post, I'll try to make it work in my app. If I can, I'll share it with you. Still any suggestions will be appreciated.
package org.example.checksum;
public class InternetChecksum {
/**
* Calculate the Internet Checksum of a buffer (RFC 1071 - http://www.faqs.org/rfcs/rfc1071.html)
* Algorithm is
* 1) apply a 16-bit 1's complement sum over all octets (adjacent 8-bit pairs [A,B], final odd length is [A,0])
* 2) apply 1's complement to this final sum
*
* Notes:
* 1's complement is bitwise NOT of positive value.
* Ensure that any carry bits are added back to avoid off-by-one errors
*
*
* #param buf The message
* #return The checksum
*/
public long calculateChecksum(byte[] buf) {
int length = buf.length;
int i = 0;
long sum = 0;
long data;
// Handle all pairs
while (length > 1) {
// Corrected to include #Andy's edits and various comments on Stack Overflow
data = (((buf[i] << 8) & 0xFF00) | ((buf[i + 1]) & 0xFF));
sum += data;
// 1's complement carry bit correction in 16-bits (detecting sign extension)
if ((sum & 0xFFFF0000) > 0) {
sum = sum & 0xFFFF;
sum += 1;
}
i += 2;
length -= 2;
}
// Handle remaining byte in odd length buffers
if (length > 0) {
// Corrected to include #Andy's edits and various comments on Stack Overflow
sum += (buf[i] << 8 & 0xFF00);
// 1's complement carry bit correction in 16-bits (detecting sign extension)
if ((sum & 0xFFFF0000) > 0) {
sum = sum & 0xFFFF;
sum += 1;
}
}
// Final 1's complement value correction to 16-bits
sum = ~sum;
sum = sum & 0xFFFF;
return sum;
}
}
When I post this question a year ago, I was still quite new to Objective-C. It turned out to be something very easy to do.
The way you calculate checksum is based on how checksum is defined in your communication protocol. In my case, checksum is just the sum of all the previous bytes sent or the data you want to send.
So if I have a NSMutableData *cmd that has five bytes:
0x10 0x14 0xE1 0xA4 0x32
checksum is the last byte of 0x10+0x14+0xE1+0xA4+0x32
So the sum is 01DB, checksum is 0xDB.
Code:
//i is the length of cmd
- (Byte)CalcCheckSum:(Byte)i data:(NSMutableData *)cmd
{ Byte * cmdByte = (Byte *)malloc(i);
memcpy(cmdByte, [cmd bytes], i);
Byte local_cs = 0;
int j = 0;
while (i>0) {
local_cs += cmdByte[j];
i--;
j++;
};
local_cs = local_cs&0xff;
return local_cs;
}
To use it:
Byte checkSum = [self CalcCheckSum:[command length] data:command];
Hope it helps.