c stream buffer - stream

I am using C and need a stream buffer mechanism that I can write arbitrary bytes two and read bytes from. I would prefer something that is platform independent (or that can at least run on osx and linux). Is anyone aware of any permissive lightweight libraries or code than I can drop in?
I've used buffers within libevent and I may end up going that route, but it seems overkill to have libevent as a dependency when I don't do any sort of event based io.

If you don't mind depending on C++ and possibly some bits of STL, you can use std::stringstream. It shouldn't be too difficult to write a thin C wrapper around it.

Is setbuf(3) (and its aliases) the 'mechanism' you are searching for?
Please consider the following example:
#include <stdio.h>
int main()
{
char buf[256];
setbuffer(stderr, buf, 256);
fprintf(stderr, "Error: no more oxygen.\n");
buf[1] = 'R';
buf[2] = 'R';
buf[3] = 'O';
buf[4] = 'R';
fflush(stderr);
}

Related

Import and write GeoTIFF in Octave

I am using MATLAB in my office and Octave when I am at home. Although they are very similar, I was trying to do something I would expected to be very easy and obvious, but found it really annoying. I can't find out how to import TIFF images in Octave. I know the MATLAB geotiffread function is not present, but I thought there would be another method.
I could also skip importing them, as I can work with the imread function in some cases, but then the second problem would be that I can't find a way to write a georeferenced TIFF file (in MATLAB I normally call geotiffwrite with geotiffinfo inputs inside). My TIFF files are usually 8 bit unsigned integer or 32 bit signed integer. I hope someone can suggest a way to solve this problem. I also saw this thread but did not understand if it is possible to use the code proposed by Ashish in Octave.
You may want to look at the mapping library in Octave.
You can also use the raster functions to work with GeoTiffs
Example:
pkg load mapping
filename=”C:\\sl\SDK\\DTED\\n45_w122_1arc_v2.tif”
rasterinfo (filename)
rasterdraw (filename)
The short answer is you can't do it in Octave out of the box. But this is not because it is impossible to do it. It is simply because no one has yet bothered to implement it. As a piece of free software, Octave has the features that its users are willing to spend time or money implementing.
About writing of signed 32-bit images
As of version 3.8.1, Octave uses either GraphicsMagick or ImageMagick to handle the reading and writing of images. This introduces some problems. The number 1 is that your precision is limited to how you built GraphicsMagick (its quantum-depth option). In addition, you can only write unsigned integers. Hopefully this will change in the future but since not many users require it, it's been this way until now.
Dealing with geotiff
Provided you know C++, you can write this functions yourself. This shouldn't be too hard since there is already libgeotiff, a C library for it. You would only need to write a wrapper as an Octave oct function (of course, if you don't know C or C++, then this "only" becomes a lot of work).
Here is the example oct file code which needs to be compiled. I have taken reference of https://gerasimosmichalitsianos.wordpress.com/2018/01/08/178/
#include <octave/oct.h>
#include "iostream"
#include "fstream"
#include "string"
#include "cstdlib"
#include <cstdio>
#include "gdal_priv.h"
#include "cpl_conv.h"
#include "limits.h"
#include "stdlib.h"
using namespace std;
typedef std::string String;
DEFUN_DLD (test1, args, , "write geotiff")
{
NDArray maindata = args(0).array_value ();
const dim_vector dims = maindata.dims ();
int i,j,nrows,ncols;
nrows=dims(0);
ncols=dims(1);
//octave_stdout << maindata(i,0);
NDArray transform1 = args(1).array_value ();
double* transform = (double*) CPLMalloc(sizeof(double)*6);
float* rowBuff = (float*) CPLMalloc(sizeof(float)*ncols);
//GDT_Float32 *rowBuff = CPLMalloc(sizeof(GDT_Float32)*ncols);
String tiffname;
tiffname = "nameoftiff2.tif";
cout<<"The transformation matrix is";
for (i=0; i<6; i++)
{
transform[i]=transform1(i);
cout<<transform[i]<<" ";
}
GDALAllRegister();
CPLPushErrorHandler(CPLQuietErrorHandler);
GDALDataset *geotiffDataset;
GDALDriver *driverGeotiff;
GDALRasterBand *geotiffBand;
OGRSpatialReference oSRS;
char **papszOptions = NULL;
char *pszWKT = NULL;
oSRS.SetWellKnownGeogCS( "WGS84" );
oSRS.exportToWkt( &pszWKT );
driverGeotiff = GetGDALDriverManager()->GetDriverByName("GTiff");
geotiffDataset = (GDALDataset *) driverGeotiff->Create(tiffname.c_str(),ncols,nrows,1,GDT_Float32,NULL);
geotiffDataset->SetGeoTransform(transform);
geotiffDataset->SetProjection(pszWKT);
//CPLFree( pszSRS_WKT );
cout<<" \n Number of rows and columns in array are: \n";
cout<<nrows<<" "<<ncols<<"\n";
for (i=0; i<nrows; i++)
{
for (j=0; j <ncols; j++)
rowBuff[j]=maindata(i,j);
//cout<<rowBuff[0]<<"\n";
geotiffDataset->GetRasterBand(1)->RasterIO(GF_Write,0,i,ncols,1,rowBuff,ncols,1,GDT_Float32,0,0);
}
GDALClose(geotiffDataset) ;
CPLFree(transform);
CPLFree(rowBuff);
CPLFree(pszWKT);
GDALDestroyDriverManager();
return octave_value_list();
}
it can be compiled and run using following
mkoctfile -lgdal test1.cc
aa=rand(50,53);
b=[60,1,0,40,0,-1];
test1(aa,b);

Reading and Writing Structs to and from Arduino's EEPROM

I'm trying to write data structures defines in C to my Arduino Uno board's non-volatile memory, so the values of the struct will be retained after the power goes off or it is reset.
To my understanding, the only way to do this (while the sketch is running) would be to write to arduino's EEPROM. Although I can write individual bytes (sets a byte with value 1 at address 0):
eeprom_write_byte(0,1);
I am stuck trying to write a whole struct:
typedef struct NewProject_Sequence {
NewProject_SequenceId sequenceId;
NewProject_SequenceLength maxRange;
NewProject_SequenceLength minRange;
NewProject_SequenceLength seqLength;
NewProject_SceneId sceneList[5];
} NewProject_Sequence;
Because of the EEPROM's limit of 100,000 writes, I don't want to write to the Arduino in a loop going through each byte, for this will probably use it up pretty fast. Does anyone know a more efficient way of doing this, either with EEPROM or if there's a way to write to PROGMEM while the sketch is running? (without using the Arduino Library, just C).
RESOLVED
I ended up writing two custom functions -- eepromWrite and eepromRead. They are listed below:
void eepromRead(uint16_t addr, void* output, uint16_t length) {
uint8_t* src;
uint8_t* dst;
src = (uint8_t*)addr;
dst = (uint8_t*)output;
for (uint16_t i = 0; i < length; i++) {
*dst++ = eeprom_read_byte(src++);
}
}
void eepromWrite(uint16_t addr, void* input, uint16_t length) {
uint8_t* src;
uint8_t* dst;
src = (uint8_t*)input;
dst = (uint8_t*)addr;
for (uint16_t i = 0; i < length; i++) {
eeprom_write_byte(dst++, *src++);
}
}
The would be implemented like this:
uint16_t currentAddress;
struct {
uint16_t x;
uint16_t y;
} data;
struct {
} output;
uint16_t input
eepromWrite(currentAddress, data, sizeof(data);
eepromRead(currentAddress, output, sizeof(data));
Several solutions and or combinations.
setup a timer event to store the values periodically, rather then
back to back.
use a checksum, then increment the initial offset,
when writing. Where when reading you attempt each increment until
you have a valid checksum. this spreads your data across the entire
range increasing your life. modern flash drives do this.
Catch the unit turning off, by using an external Brown Out Detector to trigger an INT to then quickly write the EEPROM. Where you can then also use the internal BOD to prevent corruption, before it falls below safe writing voltages. By having the external significantly higher than the internal thresholds. The time to write before complete shutdown can be increased by increasing the VCC capacitance. Where the external BOD is compared before the VCC and not directly the VCC itself.
Here is a video explaining how to enable the internal BOD, for a ATtiny, where it is nearly identical for the other ATmega's. Video
The Arduino EEPROM library provides get/put functions that are able to read and write structs...
Link to EEPROM.put(...)
The write is made only when a byte has changed.
So, using put/get is the solution to your problem.
I'm using these in a wide (25k) project without any problem.
And as already said I've used a timer to write not each time but some time to times.
Turning off detection is also a very good way to do this.

Using read() system call of UNIX to find the user given pattern

I am trying to emulate grep pattern of UNIX using a C program( just for learning ). The code that i have written is giving me a run time error..
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#define MAXLENGTH 1000
char userBuf[MAXLENGTH];
int main ( int argc, char *argv[])
{
int numOfBytes,fd,i;
if (argc != 2)
printf("Supply correct number of arguments.\n");
//exit(1);
fd =open("pattern.txt",O_RDWR);
if ( fd == -1 )
printf("File does not exist.\n");
//exit(1);
while ( (numOfBytes = read(fd,userBuf,MAXLENGTH)) > 0 )
;
printf("NumOfBytes = %d\n",numOfBytes);
for(i=0;userBuf[i] != '\0'; ++i)
{
if ( strstr(userBuf,argv[1]) )
printf("%s\n",userBuf);
}
}
The program is printing infinitely, the lines containing the pattern . I tried debugging , but couldn't figure out the error. Please let me know where am i wrong.,
Thanks
Say the string is "fooPATTERN". Your first time through the loop, you check for the pattern in "fooPATTERN" and find it. Then your second time through the loop, you check for the pattern in "ooPATTERN" and find it again. Then your third time, you check for the pattern in "oPATTERN" and find it again.
Since you're doing this to learn, I won't tell you much more. You can decide how best to solve it. There are at least two fundamentally different ways you could solve it. One is to do less on each pass of the loop to ensure you only find it once. The other is to make sure your next pass of the loop is past any pattern that was found.
One thing to think about: If the pattern is 'oo' and the string is 'ooo', how many patterns should be found? 1 or 2?
The 'read' does not delimit the data with a null character.
The while loop should encompase the for loop - it doesn't
First, you shouldn't be using raw Unix i/o with open and read if you're just learning C. Start with standard C i/o with fopen and fread/fscanf/fgets and so forth.
Second, you're reading in successive pieces of the file into the same buffer, overwriting the buffer each time, and only ever processing the last contents of the buffer.
Third, nothing guarantees that your buffer will be zero-terminated when you read into it with read(). In fact, it usually won't be.
Fourth, you're not using the i variable in the body of your loop. I can't tell exactly what you were shooting for here, but doing the same thing on the same data umpteen thousand times surely wasn't it.
Fifth, always compile with the fullest warning settings you can abide -- at lest -Wall with GCC. It should have complained that you call read() without including <unistd.h>.

Lua runs out of memory

I've written a complicated lua script which uses the lua sockets library. It reads a list of files from disk, sorts them by date and sends them to a HTTP process. The number of files on disk is around 65K.The memory usage in taskmanager doesn't exceed 200Mb.
After quite a while the script returns:
lua: not enough memory
I print out the current GC count at points and it never goes above 110Mb
local freeMem = collectgarbage('count');
print("GC Count : " .. freeMem/1024 .. " MB");
This is on a 32 bit windows machine.
What's the best way to diagnose this?
All memory goes through the single lua_Alloc function. This takes the form of:
typedef void* (*lua_Alloc) (void* ud, void* ptr, size_t oszie, size_t nsize);
All allocations, reallocations and frees go through this. The documentation for this can be found at this web page. You can easily write your own to track all memory operations. For example,
void* MyAlloc (void* ud, void* ptr, size_t osize, size_t nsize)
{
(void)ud; (void)osize; // Not used
if (nsize == 0)
{
free(ptr)
TrackSubtract(osize);
return NULL;
}
else
{
void* p = realloc(ptr,nsize);
TrackSubtract(osize);
if (p) TrackAdd(nsize);
return p;
}
}
You can write the TrackAdd() and TrackSubtract() functions to whatever you want: output to a log; adjust a counter and so on.
To use your new function you pass a pointer to it when you create the Lua state:
lua_State* L = lua_newstate(&MyAlloc,0);
The documentation to lua_newstate is found here.
Good luck.
Use perfmon to monitor your process and add counters for private bytes and virtual bytes.
When your script ends with 'not enough memory' see the value of each counter. If you see sudden peaks in your memory usage, try to add more points in which you print the memory usage.

Parsing really big log files (>1Gb, <5Gb)

I need to parse very large log files (>1Gb, <5Gb) - actually I need to strip the data into objects so I can store them in a DB. The log file is sequential (no line breaks), like:
TIMESTAMP=20090101000000;PARAM1=Value11;PARAM2=Value21;PARAM3=Value31;TIMESTAMP=20090101000100;PARAM1=Value11;PARAM2=Value21;PARAM3=Value31;TIMESTAMP=20090101000152;PARAM1=Value11;PARAM2=Value21;PARAM3=Value31;...
I need to strip this into the table:
TIMESTAMP | PARAM1 | PARAM2 | PARAM3
The process need to be as fast as possible. I'm considering using Perl, but any suggestions using C/C++ would be really welcome. Any ideas?
Best regards,
Arthur
Write a prototype in Perl and compare its performance against how fast you can read data off of the storage medium. My guess is that you'll be I/O bound, which means that using C won't offer a performance boost.
This presentation about the use of Python generators blew my mind:
http://www.dabeaz.com/generators-uk/
David M. Beazley shows how to process multi-gigabyte log files by basically defining a generator for each processing step. The generators are then 'plugged' into each other until you have some simple utility functions
lines = lines_from_dir("access-log*","www")
log = apache_log(lines)
for r in log:
print r
which can then be used for all sorts of querying:
stat404 = set(r['request'] for r in log
if r['status'] == 404)
large = (r for r in log
if r['bytes'] > 1000000)
for r in large:
print r['request'], r['bytes']
He also shows that performance compares well to the performance of standard unix tools like grep, find etc.
Of course this being Python, it's much easier to understand and most importantly easier to customise or adapt to different problem sets than perl or awk scripts.
(The code examples above are copied from the presentation slides.)
Lex handles this sort of things amazingly well.
But really, use AWK. It's performance is not bad, even comparing with Perl, etc. Of cource Map/Reduce would work quite well, but what about the overhead of splitting the file into appropriate chunks?
Try AWK
The key won't be the language because the problem is I/O bound, so pick the language that you feel most comfortable with.
The key is how it is coded. You'll be fine as long as you don't load the whole file in memory -- load chunks at a time, and save the data chunks at a time, it will be more efficient.
Java has a PushbackInputStream that may make this easier to code. The idea is that you guess how much to read, and if you read too little, then push the data back, and read a larger chunk.
Then when you've read too much, process the data and then push back the remaining bit and continue to the next iteration of the loop.
Something like this should work.
use strict;
use warnings;
my $filename = shift #ARGV;
open my $io, '<', $filename or die "Can't open $filename";
my ($match_buf, $read_buf, $count);
while (($count = sysread($io, $read_buf, 1024, 0)) != 0) {
$match_buf .= $read_buf;
while ($match_buf =~ s{TIMESTAMP=(\d{14});PARAM1=([^;]+);PARAM2=([^;]+);PARAM3=([^;]+);}{}) {
my ($timestamp, #params) = ($1, $2, $3, $4);
print $timestamp ."\n";
last unless $timestamp;
}
}
This is easily handled in Perl, Awk, or C. Here's a start on a version in C for you:
#include <stdio.h>
#include <err.h>
int
main(int argc, char **argv)
{
const char *filename = "noeol.txt";
FILE *f;
char buffer[1024], *s, *p;
char line[1024];
size_t n;
if ((f = fopen(filename, "r")) == NULL)
err(1, "cannot open %s", filename);
while (!feof(f)) {
n = fread(buffer, 1, sizeof buffer, f);
if (n == 0)
if (ferror(f))
err(1, "error reading %s", filename);
else
continue;
for (s = p = buffer; p - buffer < n; p++) {
if (*p == ';') {
*p = '\0';
strncpy(line, s, p-s+1);
s = p + 1;
if (strncmp("TIMESTAMP", line, 9) != 0)
printf("\t");
printf("%s\n", line);
}
}
}
fclose(f);
}
Sounds like a job for sed:
sed -e 's/;\?[A-Z0-9]*=/|/g' -e 's/\(^\|\)\|\(;$\)//g' < input > output
You might want to take a look at Hadoop (java) or Hadoop Streaming (runs Map/Reduce jobs with any executable or script).
If you code your own solution, you will probably benefit from reading larger chunks of data from the file and processing them in batches (rather than using, say, readline()) and looking for the newline marking the end of each row. With this approach, you need to be mindful that you may not have retrieved the entirety of the last line, so some logic would be required to handle that.
I don't know what performance benefits you'd realize, since I haven't tested it, but I've leveraged similar techniques with success.
I know this is an exotic language and may be not the best solution to do that but when i've ad hoc data, i consider PADS

Resources