How does one create insertion or deletion mutations using LibFuzzer? - clang

libFuzzer has functions that can be implemented by the end-user like this:
size_t LLVMFuzzerCustomMutator(
uint8_t* data, size_t size, size_t max_size, unsigned int seed)
Am I free to sometimes insert some bytes in data thereby making it larger; I assume max_size may not be exceeded? If I needed more bytes than max_bytes to perform the necessary insertion how would I do that? Do I return the new size?

Related

How to use libfuzzers custom mutators API?

Libfuzzer offers two APIs to develop custom mutators.
size_t LLVMFuzzerCustomMutator(uint8_t *Data, size_t Size, size_t MaxSize, unsigned int Seed)
size_t LLVMFuzzerCustomCrossOver(const uint8_t *Data1, size_t Size1, const uint8_t *Data2, size_t Size2, uint8_t *Out, size_t MaxOutSize, unsigned int Seed)
How are these APIs supposed to be used?
The fuzzer is required to be deterministic. How do I ensure that with the custom mutators?
You just need to implement those functions alongside your LLVMFuzzerTestOneInput.
The google/fuzzing repository has a tutorial on how to implement structure-aware fuzzing.
Also, you could take inspiration from CustomMutatorTest.cpp, and CustomCrossOverTest.cpp, from the LLVM repository.
The fuzzer is required to be deterministic.
Right, but here you will be writing the mutation functions, which are different; Mutations will happen before your LLVMFuzzerTestOneInput is called.
However, they have an analogous requirement.
As outlined in the source code, alongside LLVMFuzzerCustomMutator, and LLVMFuzzerCustomCrossOver respectively:
Optional user-provided custom mutator.
Mutates raw data in [Data, Data+Size) inplace.
Returns the new size, which is not greater than MaxSize.
Given the same Seed produces the same mutation.
Optional user-provided custom cross-over function.
Combines pieces of Data1 & Data2 together into Out.
Returns the new size, which is not greater than MaxOutSize.
Should produce the same mutation given the same Seed.
i.e. Two calls to a mutation function with the same Data and seed should produce the same result.
One last thing: you don't need to implement both functions; LLVMFuzzerCustomMutator should be enough in most cases.

Fast streaming to framebuffer

What is the most best method to write a stream/array of raw RGB pixels (format as required by Xorg) to a window of fixed size? No synchronisation is required, nor are there any timing requirements, just minimal average CPU usage.
Is it necessary to copy the data to a memory location managed by Xorg
or can one pass just a pointer?
If it is necessary to perform a full copy of the data, is there also a portable way (Linux only, but portable between Intel, Nvidia and AMD) of doing this with GPU hardware acceleration?
How to write existing data to the screen in a fast way?
struct {
unsigned char r;
unsigned char g;
unsigned char b;
} pixel_t; //other format/padding also possible, if required
struct {
pixel_t[WIDTH*HEIGHT] data;
} frame_t;
frame_t* frame = get_next_frame_from_stream(); //<= already optimized
set_as_xorg_framebuffer(frame); //<= I'm searching for this

Detect how many bytes can be written to NSOutputStream

Basic problem I'm try to implement:
I have two streams. NSInputStream and NSOutputStream.
Now I want to take some data from input process them (add some frames encode them and so on) and pass to output. So far so good.
Actual problem
Problem is that NSOutputStream API - write:maxLength: which can return actual number of bytes written. This value can be different from length passed. This is a problem since it requires creating extra logic to maintain some kind of buffer.
I want to avoid this. I'd like to now how many bytes output stream will accept without buffering so I could calculate how much data I should read from input stream (I will add some frames and encoding).
I don't want to maintain extra buffer.
Output stream is associated with a TCP socket, input stream can be associated with any kind of resource.
This is the apple implementation for the problem:
- (void)stream:(NSStream *)stream handleEvent:(NSStreamEvent)eventCode
{
switch(eventCode) {
case NSStreamEventHasSpaceAvailable:
{
uint8_t *readBytes = (uint8_t *)[_data mutableBytes];
readBytes += byteIndex; // instance variable to move pointer
int data_len = [_data length];
unsigned int len = ((data_len - byteIndex >= 1024) ?
1024 : (data_len-byteIndex));
uint8_t buf[len];
(void)memcpy(buf, readBytes, len);
len = [stream write:(const uint8_t *)buf maxLength:len];
byteIndex += len;
break;
}
// continued ...
}
}
In this implementation a chunck of 1024 bytes is written at a time.
And a note was provided:
There is no firm guideline on how many bytes to write at one time.
Although it may be possible to write all the data to the stream in one
event, this depends on external factors, such as the behavior of the
kernel and device and socket characteristics. The best approach is to
use some reasonable buffer size, such as 512 bytes, one kilobyte (as
in the example above), or a page size (four kilobytes).
As described it depends on the other side. don't know if that can be figured out by investigating the receiver. But, maybe the size suggested to write at a time may decrease the chance that some bytes will not be written. That logic should be implemented.
You'll have to buffer. The stream can't predict what the can be written until it makes the attempt. But you can keep the buffer as small as you like by (a) attempting to write less data at once, and (b) buffering only the data that wasn't written on the prior attempt.
The result of such an arrangement is to trade away speed for space. Consider the one byte buffer as the degenerate case.

How could I vectorize this for loop?

I have this loop
void f1(unsigned char *data, unsigned int size) {
unsigned int A[256] = {0u};
for (register unsigned int i = 0u; i < size; i++) {
++A[data[i]];
}
...
Is there any way to vectorize it manually?
Since multiple entries in data[i] might contain the same value, I don't see how this could be vectorized simply since there can be race conditions. The point of vectorization is that each element is independent of the other elements, and so can be computed in parallel. But your algorithm doesn't allow that. "Vectorize" is not the same thing as "make go faster."
What you seem to be building here is a histogram, and iOS has built-in, optimized support for that. You can create a single-channel, single-row image and use vImageHistogramCalculation_Planar8 like this:
void f1(unsigned char *data, unsigned int size) {
unsigned long A[256] = {0u};
vImage_Buffer src = { data, 1, size, size };
vImage_Error err = vImageHistogramCalculation_Planar8(&src, A, kvImageDoNotTile);
if (err != kvImageNoError) {
// error
}
...
}
Be careful about assuming this is always a win, though. It depends on the size of your data. Making a function call is very expensive, so it can take several million bytes of data to make it worth it. If you're computing this on smaller sets than that, then a simple, compiler-optimized loop is often the best approach. You need to profile this on real devices to see which is faster for your purposes.
Just make sure to allow the compiler to apply all vectorizing optimizations by turning on -Ofast (Fastest, Aggressive). That won't matter in this case because your loop can't be simply vectorized. But in general, -Ofast allows the compiler to apply vectorizing optimizations in cases that it might slightly grow code size (which isn't allowed under the default -Os). -Ofast also allows a little sloppiness in how floating point math is performed, so should not be used in cases where strict IEEE floating point conformance is required (but this is almost never the case for iOS apps, so -Ofast is almost always the correct setting).
The optimisation the compiler would attempt to do here is to parallelize ++A[data[i]]
It cannot do so because the contents of A depend on the previous iteration of the loop.
You could break this dependancy by using one frequency array (A) per way of parallelism, and then computing the sum of these at the end. I assume here you've got two ways of parallelism and that the size is even.
void f1(const unsigned char * const data, unsigned int size) {
unsigned int A0[256] = {0u};
unsigned int A1[256] = {0u};
for (unsigned int i = 0u; i < size /2u; i++) {
++A0[data[2*i]];
++A1[data[2*i+1]];
}
for (unsigned i=0u; i < 256; ++i){
A0[i] = A0[i] + A1[i];
}
}
Does this win you much? There only one way to find out - try it and measure the results. I suspect that the Accelerate framework will do much better than this, even for relatively small values on size. It's also optimised at run-time for the target architecture.
Compilers are pretty smart, but there are things you can do in C or C++ to help the compiler:
Apply const wherever possible: It's then obvious which data is invariant.
Identify pointers to non-overlapping memory regions with the restrict (__restrict in C++) qualifier. Without knowing this, the compiler must assume a write through one pointer potentially alters data that could be read with another. clang will in fact generate run-time checks and code-paths for both the overlapping and non-overlapping case, but there will be limits to this, and you can probably reduce code-size by being explicit.
I doubt the register qualifier for i makes any difference.

Optimum buffer to memory ratio

I am trying to build a DAQ using Sparrow's Kmax. I have a ready template in which the total memory is 16 MB.
static final int evSize = 4; // The num of parameters per event of this type
static final int BUF_SIZE = evSize*1000; /** <------------------Why pick this buffer size*/ // Buffer size
static final int LP_MEM_TOP = 0xFFFF00; // Memory size 16MB
static final int READ_START = LP_MEM_TOP - BUF_SIZE; // We start the read/write pointer 1 buffer before the end
In the above code you can see that the buffer is very small compared to the total memory. From what know, the buffer is the temporary memory where data is stored before being sent to the computer.
In my case I am using a SCSI bus to transfer the data and the system is really slow. What can I do with the buffer to increase the speed or the performance? Is there a particular reason to have such a small buffer? I am not sure if I have understood what exactly does the memory and the buffer do.
Any help is more than welcome!!!

Resources