I have my own class (nBuffer) like wxMemoryBuffer and I use it to load/save custom data, it's more convenient than using streams because I have a lot of overloaded methods for different data types based on these:
class nBuffer
{ // ...
bool wr(void* buf, long unsigned int length);// write
bool rd(void* buf, long unsigned int length);// read
}
I'm trying to implemets methods to load/save wxString from/to this buffer.
With wxWidgets 2.8 I've used the next code (simplified):
bool nBuffer::wrString(wxString s)
{ // save string:
int32 lng=s.Length()*4;
wr(&lng,4);// length
wr(s.GetData(),lng);// string itself
return true;
}
bool nBuffer::rdString(wxString &s)
{ // load string:
uint32 lng;
rd(&lng,4);// length
s.Alloc(lng);
rd(s.GetWriteBuf(lng),lng);// string itself
s.UngetWriteBuf();
s=s.Left(lng/4);
return true;
}
This code is not good because:
Is assumes there are 4 bytes of data for each string character (it might be less),
With wxWidgets 3.0, wxString.GetData() returns wxCStrData instead of *void, so the compiler fails on wr(s.GetData(),lng); and I have no idea of how to convert it to a simple byte buffer.
Strange, but I found nothing googling that for hours... Also I've found nothing useful in wxWidgets docs.
The questions are:
That is the preferred, correct and safe way to convert wxString to byte buffer,
The same about converting the byte buffer back to wxString.
For arbitrary wxStrings you need to serialize them in either UTF-8 or UTF-16 format. The former is a de facto standard for data exchange, so I advise to use it, but you could prefer UTF-16 if you know that your data is biased to the sort of characters that take less space in it than in UTF-8 and if space saving is important for you.
Assuming you use UTF-8, serializing is done using utf8_str() method:
wxScopedCharBuffer const utf8 = s.utf8_str();
wr(utf8.data(), utf8.length());
Deserializing is as simple as using wxString::FromUTF8(data, length).
For UTF-16 you would use general mb_str(wxMBConvUTF16) and wxString(data, wxMBConvUTF16, length) methods, which could also be used with wxMBConvUTF8, but the UTF-8-specific methods above are more convenient and, in some build configurations, more efficient.
Related
Using Embarcadero C++ Builder 10.3.
I have a DynamicArray<uint8_t> myData object. I want to send/write its raw binary content (bytes) to a server using the TIdTcpClient component. I'm going about it like this:
TIdTcpClient tcpClient1;
// Bla Bla Bla
tcpClient1->IOHandler->Write(rawData);
Where rawData should be of type TIdBytes or TIdStream
So basically, it boils down to the following: How to convert myData object to a rawData type of either TIdBytes or TIdStream?
First off, TIdStream has not been part of Indy in a VERY VERY LONG time, which makes me wonder if you are using a very old version of Indy, not the one that shipped with C++Builder 10.3. Indy has supported the RTL's standard TStream class for a very long time.
That being said...
TIdBytes is an alias for System::DynamicArray<System::Byte>, where System::Byte is an alias for unsigned char, which is the same size and sign-ness as uint8_t (depending on compiler, uint8_t might even just be an alias for unsigned char).
So, the simplest solution, without having to make a separate copy of your data, is to simply type-cast it, eg:
tcpClient1->IOHandler->Write(reinterpret_cast<TIdBytes&>(myData));
This is technically undefined behavior, since DynamicArray<uint8_t> and DynamicArray<Byte> are unrelated types (unless uint8_t and Byte are both aliases for unsigned char), but it will work in your case since it is the same underlying code behind both arrays, and uint8_t and Byte have the same underlying memory layout.
Alternatively, the next simplest solution, without copying data or invoking undefined behavior, is to use Indy's TIdReadOnlyMemoryBufferStream class in IdGlobal.hpp, eg:
TIdReadOnlyMemoryBufferStream *ms = new TIdReadOnlyMemoryBufferStream(&myData[0], myData.Length);
try {
tcpClient1->IOHandler->Write(ms);
}
__finally {
delete ms;
}
Or:
{
auto ms = std::make_unique<TIdReadOnlyMemoryBufferStream>(&myData[0], myData.Length);
tcpClient1->IOHandler->Write(ms.get());
}
Otherwise, the final solution is to just copy the data into a TIdBytes, eg:
{
TIdBytes bytes;
bytes.Length = myData.Length;
memcpy(&bytes[0], &myData[0], myData.Length);
or:
std::copy(myData.begin(), myData.end(), bytes.begin());
tcpClient1->IOHandler->Write(bytes);
}
In C, before using the scanf or gets "stdio.h" functions to get and store user input, the programmer has to manually allocate memory for the data that's read to be stored in. In Rust, the std::io::Stdin.read_line function can seemingly be used without the programmer having to manually allocate memory prior. All it needs is for there to be a mutable String variable to store the data it reads in. How does it do this seemingly without knowledge about how much memory will be required?
Well, if you want a detailed explanation, you can dig a bit into the read_line method which is part of the BufRead trait. Heavily simplified, the function look like this.
fn read_line(&mut self, target: &mut String)
loop {
// That method fills the internal buffer of the reader (here stdin)
// and returns a slice reference to whatever part of the buffer was filled.
// That buffer is actually what you need to allocate in advance in C.
let available = self.fill_buf();
match memchr(b'\n', available) {
Some(i) => {
// A '\n' was found, we can extend the string and return.
target.push_str(&available[..=i]);
return;
}
None => {
// No '\n' found, we just have to extend the string.
target.push_str(available);
},
}
}
}
So basically, that method extends the string as long as it does not find a \n character in stdin.
If you want to allocate a bit of memory in advance for the String that you pass to read_line, you can create it using String::with_capacity. This will not prevent the String to reallocate if it is not large enough though.
The TIdComproessorZLib component is used for compression and decompression in the Delphi/C++ Builder Indy library. The CompressStream Method has the following definition:
public: virtual __fastcall CompressStream(TStream AInStream, TStream AOutStream, const TIdCompressionLevel ALevel, const int AWindowBits, const int AMemLevel, const int AStrategy);
The complete description of those parameters in the help file is:
CompressStream is a public overridden procedure. that implements the
abstract the virtual method declared in the ancestor class.
AInStream is the stream containing the uncompressed contents used in
the compression operation.
AOutStream is the stream used to store the compressed contents from
the compression operation. AOutStream is cleared prior to outputting
the compressed contents from the operation. When AOutStream is
omitted, the stream in AInStream is cleared and reused for the output
from the compression operation.
Use ALevel to indicate the desired compression level for the
operation.
Use AWindowsBits and AMemLevel to control the memory footprint
required to perform in-memory compression using the ZLib library.
Use AStrategy to control the RLE-encoding strategy used in the
compression operation.
ALevel's values defined on the help page for TIdCompressionLevel, but I cannot find any indication of what values should be used for AWindowBits, AMemLevel, or AStrategy, which are just integers.
I looked in the source code, but CompressStream just delegates to IndyCompressStream, which is listed in the help file as:
IndyCompressStream(TStream InStream, TStream OutStream, const int level = Z_DEFAULT_COMPRESSION, const int WinBits = MAX_WBITS, const int MemLevel = MAX_MEM_LEVEL, const int Stratagy = Z_DEFAULT_STRATEGY);
The help for IndyCompressStream doesn't even list the minimal description of the parameters that CompressStream does.
I tracked down the file where (I think) those default constants mentioned in IndyCompressStream live, source\Indy10\Protocols\IdZLibHeaders.pas, and they are
Z_DEFAULT_STRATEGY = 0;
Z_DEFAULT_COMPRESSION = -1;
MAX_WBITS = 15; { 32K LZ77 window }
MAX_MEM_LEVEL = 9;
However, the value given for Z_DEFAULT_COMPRESSION is not even a legal value for that parameter according to the documentation for TIdCompressionLevel
Is there some documentation somewhere about what AWindowBits, AMemLevel, and AStrategy mean to this component, and what values are reasonable to use for them? Are the values listed above the actual recommended defaults? Also, the source files include "indy", "Indy10", and "indyimpl" directories. Which of those should we be using to find the source for the current Indy components?
Thanks!
You will need to look to the zlib documentation in zlib.h. In particular, the parameters to deflateInit2().
In nearly all cases, the only ones you should mess with are the compression level and the window bits. For window bits, you would normally leave the window size at 32K (15), but either add 16 for the gzip format (31), or negate (-15) to get the raw deflate format with no header or trailer. For some special kinds of data, you may get an improvement with a different compression strategy, e.g. image or other numerical arrays of data.
Thank you for the comments and answers, especially Remy and Mark. I had not realized that the Indy units were wrappers around zlib, and that the parameters were defined in the zlib library.
I was trying to create a gzip format stream for uploading to a server that was expecting gzip.
Here is the working code for gzip compression and decompression:
void __fastcall TForm1::Button1Click(TObject *Sender)
{
TStringStream* streamIn = new TStringStream(String("This is some data to compress"));
TMemoryStream* streamCompressed = new TMemoryStream;
TStringStream* streamOut = new TStringStream;
/* this also works to compress to gzip format, but you must #include <IdZlib.hpp>
CompressStreamEx(streamIn, streamCompressed, Idzlib::clDefault, zsGZip); */
// NOTE: according to docs, you can leave outstream null, and instream
// will be replaced and reused, but I could not get that to work
IdCompressorZLib1->CompressStream(
streamIn, // System::Classes::TStream* AInStream,
streamCompressed, // System::Classes::TStream* AOutStream,
1, // const Idzlibcompressorbase::TIdCompressionLevel ALevel,
15 + 16, // const int AWindowBits, -- add 16 to get gzip format
8, // const int AMemLevel, -- see note below
0); // const int AStrategy);
streamCompressed->Position = 0;
IdCompressorZLib1->DecompressGZipStream(streamCompressed, streamOut);
String out = streamOut->DataString;
ShowMessage(out);
}
In particular, note that passing -1 for ALevel produces ZLib Error -2, Z_STREAM_ERROR which means invalid parameter, in spite of the defaults I had found. Also, AWindowBits normally ranges from 8 to 15, but adding 16 gives you a gzip format, and negative numbers give you a raw format, as described in the zlib documentation referenced by Mark Adler, one of the authors of the zlib library. I changed AMemLevel from Indy's default per Mark Adler's comment.
Also, as noted the CompressStreamEx function will produce gzip compression using the parameters included in the comments above.
The above was tested in RAD Studio XE3. Thanks again for your help!
I'm having a problem with UTF8 encoding in my asp.net mvc 2 application in C#. I'm trying let user download a simple text file from a string. I am trying to get bytes array with the following line:
var x = Encoding.UTF8.GetBytes(csvString);
but when I return it for download using:
return File(x, ..., ...);
I get a file which is without BOM so I don't get Croatian characters shown up correctly. This is because my bytes array does not include BOM after encoding. I triend inserting those bytes manually and then it shows up correctly, but that's not the best way to do it.
I also tried creating UTF8Encoding class instance and passing a boolean value (true) to its constructor to include BOM, but it doesn't work either.
Anyone has a solution? Thanks!
Try like this:
public ActionResult Download()
{
var data = Encoding.UTF8.GetBytes("some data");
var result = Encoding.UTF8.GetPreamble().Concat(data).ToArray();
return File(result, "application/csv", "foo.csv");
}
The reason is that the UTF8Encoding constructor that takes a boolean parameter doesn't do what you would expect:
byte[] bytes = new UTF8Encoding(true).GetBytes("a");
The resulting array would contain a single byte with the value of 97. There's no BOM because UTF8 doesn't require a BOM.
I created a simple extension to convert any string in any encoding to its representation of byte array when it is written to a file or stream:
public static class StreamExtensions
{
public static byte[] ToBytes(this string value, Encoding encoding)
{
using (var stream = new MemoryStream())
using (var sw = new StreamWriter(stream, encoding))
{
sw.Write(value);
sw.Flush();
return stream.ToArray();
}
}
}
Usage:
stringValue.ToBytes(Encoding.UTF8)
This will work also for other encodings like UTF-16 which requires the BOM.
UTF-8 does not require a BOM, because it is a sequence of 1-byte words. UTF-8 = UTF-8BE = UTF-8LE.
In contrast, UTF-16 requires a BOM at the beginning of the stream to identify whether the remainder of the stream is UTF-16BE or UTF-16LE, because UTF-16 is a sequence of 2-byte words and the BOM identifies whether the bytes in the words are BE or LE.
The problem does not lie with the Encoding.UTF8 class. The problem lies with whatever program you are using to view the files.
Remember that .NET strings are all unicode while there stay in memory, so if you can see your csvString correctly with the debugger the problem is writing the file.
In my opinion you should return a FileResult with the same encoding that the files. Try setting the returning File encoding,
I found the folliwing code sample in BlackBerry Java Development, Best Practices. Could somebody explain what the below same code means? What is the this in the code sample poining to?
Avoiding StringBuffer.append (StringBuffer)
To append a String buffer to another, a BlackBerry® Java Application should use net.rim.device.api.util.StringUtilities.append( StringBuffer dst, StringBuffer src[, int offset, int length ] ).
Code sample
public synchronized StringBuffer append(Object obj) {
if (obj instanceof StringBuffer) {
StringBuffer sb = (StringBuffer)obj;
net.rim.device.api.util.StringUtilities.append( this, sb, 0, sb )
return this;
}
return append(String.valueOf(obj));
}
StringBuffer does not offer an overload for the append() method that takes another StringBuffer. This means developers are likely to use StringBuffer.append(String str) and call .toString() on the second StringBuffer. This requires the second buffer to be turned into a string, which is immutable, and then the characters from the string are appended to the first StringBuffer. Thus every character in the second buffer is touched twice, and there is the unnecessary allocation of the String just to transfer the characters to the first StringBuffer.
The efficient way of doing this would copy each character from the second buffer onto the end of the first. However, StringBuffer does not provide any easy way of doing this. Thus the recommendation is to use StringUtilities.append(StringBuffer, StringBuffer) which is able to directly read the characters from the second buffer without copying them into an intermediate collection.
This saves the runtime of the extra copying, the runtime needed to allocate a temporary String, and the memory needed to allocate a temporary string.
It means that the StringBuffer class is not implemented efficiently. Java Strings are supposed to be immutable, that's what StringBuffer is used for. However, the StringBuffer class you're using is not efficient when using StringBuffer.append() so you need to use net.rim.device.api.util.StringUtilities. That's what the code is doing, encapsulating the use of that class in a new append() method.