How to GetBytes() in C# with UTF8 encoding with BOM?

How to GetBytes() in C# with UTF8 encoding with BOM? - asp.net-mvc

I'm having a problem with UTF8 encoding in my asp.net mvc 2 application in C#. I'm trying let user download a simple text file from a string. I am trying to get bytes array with the following line:
var x = Encoding.UTF8.GetBytes(csvString);
but when I return it for download using:
return File(x, ..., ...);
I get a file which is without BOM so I don't get Croatian characters shown up correctly. This is because my bytes array does not include BOM after encoding. I triend inserting those bytes manually and then it shows up correctly, but that's not the best way to do it.
I also tried creating UTF8Encoding class instance and passing a boolean value (true) to its constructor to include BOM, but it doesn't work either.
Anyone has a solution? Thanks!

Try like this:
public ActionResult Download()
{
var data = Encoding.UTF8.GetBytes("some data");
var result = Encoding.UTF8.GetPreamble().Concat(data).ToArray();
return File(result, "application/csv", "foo.csv");
}
The reason is that the UTF8Encoding constructor that takes a boolean parameter doesn't do what you would expect:
byte[] bytes = new UTF8Encoding(true).GetBytes("a");
The resulting array would contain a single byte with the value of 97. There's no BOM because UTF8 doesn't require a BOM.

I created a simple extension to convert any string in any encoding to its representation of byte array when it is written to a file or stream:
public static class StreamExtensions
{
public static byte[] ToBytes(this string value, Encoding encoding)
{
using (var stream = new MemoryStream())
using (var sw = new StreamWriter(stream, encoding))
{
sw.Write(value);
sw.Flush();
return stream.ToArray();
}
}
}
Usage:
stringValue.ToBytes(Encoding.UTF8)
This will work also for other encodings like UTF-16 which requires the BOM.

UTF-8 does not require a BOM, because it is a sequence of 1-byte words. UTF-8 = UTF-8BE = UTF-8LE.
In contrast, UTF-16 requires a BOM at the beginning of the stream to identify whether the remainder of the stream is UTF-16BE or UTF-16LE, because UTF-16 is a sequence of 2-byte words and the BOM identifies whether the bytes in the words are BE or LE.
The problem does not lie with the Encoding.UTF8 class. The problem lies with whatever program you are using to view the files.

Remember that .NET strings are all unicode while there stay in memory, so if you can see your csvString correctly with the debugger the problem is writing the file.
In my opinion you should return a FileResult with the same encoding that the files. Try setting the returning File encoding,

Related

Encoding utf-8 with TIdHTTP->Put()

I want to PUT json data to a REST service with TIdHTTP.
It works, as long as I don't have scandinavian letter in the json-data (ÅÄÖ). Then the server rejects the message. I can send the same data OK with Postman, so it is not a server issue.
My code:
String JsonData = "{...}";
TStringStream *JsonStream = new TStringStream(JsonData);
IdHTTP1->Request->CustomHeaders->AddValue("user", AUser);
IdHTTP1->Request->CustomHeaders->AddValue("password", APassword);
IdHTTP1->Request->ContentType = "application/json";
IdHTTP1->Request->CharSet = "utf-8";
IdHTTP1->Put("https://restserver", JsonStream);
delete JsonStream;
I've found examples in Delphi, where you create the TStringStream with an encoding flag:
AStream := TStringStream.Create(SomeData, TEncoding.UTF8);
But I can not see how an eqvuivalent works in c++.
This is an multi device application written with C++Builder v10.3

The TEncoding class is declared in the <System.SysUtils.hpp> header, and has a static UTF8 property (an example of its use is in Embarcadero's DocWiki). In your case, the construction of the TStringStream should look like this:
TStringStream *JsonStream = new TStringStream(JsonData, TEncoding::UTF8, false);

How to load/save wxString from/to wxStream or wxMemoryBuffer?

I have my own class (nBuffer) like wxMemoryBuffer and I use it to load/save custom data, it's more convenient than using streams because I have a lot of overloaded methods for different data types based on these:
class nBuffer
{ // ...
bool wr(void* buf, long unsigned int length);// write
bool rd(void* buf, long unsigned int length);// read
}
I'm trying to implemets methods to load/save wxString from/to this buffer.
With wxWidgets 2.8 I've used the next code (simplified):
bool nBuffer::wrString(wxString s)
{ // save string:
int32 lng=s.Length()*4;
wr(&lng,4);// length
wr(s.GetData(),lng);// string itself
return true;
}
bool nBuffer::rdString(wxString &s)
{ // load string:
uint32 lng;
rd(&lng,4);// length
s.Alloc(lng);
rd(s.GetWriteBuf(lng),lng);// string itself
s.UngetWriteBuf();
s=s.Left(lng/4);
return true;
}
This code is not good because:
Is assumes there are 4 bytes of data for each string character (it might be less),
With wxWidgets 3.0, wxString.GetData() returns wxCStrData instead of *void, so the compiler fails on wr(s.GetData(),lng); and I have no idea of how to convert it to a simple byte buffer.
Strange, but I found nothing googling that for hours... Also I've found nothing useful in wxWidgets docs.
The questions are:
That is the preferred, correct and safe way to convert wxString to byte buffer,
The same about converting the byte buffer back to wxString.

For arbitrary wxStrings you need to serialize them in either UTF-8 or UTF-16 format. The former is a de facto standard for data exchange, so I advise to use it, but you could prefer UTF-16 if you know that your data is biased to the sort of characters that take less space in it than in UTF-8 and if space saving is important for you.
Assuming you use UTF-8, serializing is done using utf8_str() method:
wxScopedCharBuffer const utf8 = s.utf8_str();
wr(utf8.data(), utf8.length());
Deserializing is as simple as using wxString::FromUTF8(data, length).
For UTF-16 you would use general mb_str(wxMBConvUTF16) and wxString(data, wxMBConvUTF16, length) methods, which could also be used with wxMBConvUTF8, but the UTF-8-specific methods above are more convenient and, in some build configurations, more efficient.

.NET RSAKeyValue base64 private key to a single base64 private key

I am supplied the following RSA private key in the format
<RSAKeyValue>
<Modulus>XXXXXXXX</Modulus>
<Exponent>XXXXXXXX</Exponent>
<P>XXXXXXXX</P>
<Q>XXXXXXXX</Q>
<DP>XXXXXXXX</DP>
<DQ>XXXXXXXX</DQ>
<InverseQ>XXXXXXXXXX/InverseQ>
<D>XXXXXXXX</D>
</RSAKeyValue>
The XXXX are in Base64 format.
I want to know how to combine it all the XXXXXX bits to a single Base64 string.
With this single Base64 string i do the following:
1. Feed it to a TMemorStream
2. use Indy's TIdDecoderMIME class to decode Base64 from the MemoryStream
3. The decoded MemoryStream is then feed into CryptDecrypt function from wcrypt2.pas (a delphi wrapper of Microsoft's Cryptographic API) from Jedi
I know the solution for public key in the same format
<RSAKeyValue>
<Modulus>xqiYKv0umaLdmrKPyBfYmAfzZYVsvsOJyS4c1lBPjqpn7zh+XyxPXK7MxJkAlenQJM33M+ZYfmlPLya7JWXXTPviylEEtlmul9GshpX2caxWu2YO9vNIHRZYYau4ccbkm95iMyJi8KN2ANtqDwiJv55vcXZDqjPSDE4ap49xmog==</Modulus>
<Exponent>AAQC</Exponent>
</RSAKeyValue>
The solution is to add "BgIAAACkAABSU0ExAAQAAAE" + Exponent + Modulus
The result is:
BgIAAACkAABSU0ExAAQAAAEAAQCxqiYKv0umaLdmrKPyBfYmAfzZYVsvsOJyS4c1lBPjqpn7zh+XyxPXK7MxJkAlenQJM33M+ZYfmlPLya7JWXXTPviylEEtlmul9GshpX2caxWu2YO9vNIHRZYYau4ccbkm95iMyJi8KN2ANtqDwiJv55vcXZDqjPSDE4ap49xmog==
With the private key how do we combine it? I know it starts off like this:
"BwIAAACkAABSU0ExAAQAAAE" + Exponent + Modulus + ???????

The XXXX in the RSAKeyValue XML are in base64, just that i do not want to expose the details there. I want to know how do i combine all the XXXX base64 codes into a single base64 private key.
I suspect that this means that you are performing the base64 encoding line by line. It's much simpler to perform the encoding on the entire file.
For example you might do this as follows:
Load the file into a TStringList.
Extract a single string representing the file using the Text property of the string list.
Base64 encode that string.
Send it over the wire.
At the receiving end, decode the string.
Assign the string to the Text property of a string list.

System.IO.Stream in favor of HttpPostedFileBase

I have a site where I allow members to upload photos. In the MVC Controller I take the FormCollection as the parameter to the Action. I then read the first file as type HttpPostedFileBase. I use this to generate thumbnails. This all works fine.
In addition to allowing members to upload their own photos, I would like to use the System.Net.WebClient to import photos myself.
I am trying to generalize the method that processes the uploaded photo (file) so that it can take a general Stream object instead of the specific HttpPostedFileBase.
I am trying to base everything off of Stream since the HttpPostedFileBase has an InputStream property that contains the stream of the file and the WebClient has an OpenRead method that returns Stream.
However, by going with Stream over HttpPostedFileBase, it looks like I am loosing ContentType and ContentLength properties which I use for validating the file.
Not having worked with binary stream before, is there a way to get the ContentType and ContentLength from a Stream? Or is there a way to create a HttpPostedFileBase object using the Stream?

You're right to look at it from a raw stream perspective because then you can create one method that handles streams and therefore many scenarios from which they come.
In the file upload scenario, the stream you're acquiring is on a separate property from the content-type. Sometimes magic numbers (also a great source here) can be used to detect the data type by the stream header bytes but this might be overkill since the data is already available to you through other means (i.e. the Content-Type header, or the .ext file extension, etc).
You can measure the byte length of the stream just by virtue of reading it so you don't really need the Content-Length header: the browser just finds it useful to know what size of file to expect in advance.
If your WebClient is accessing a resource URI on the Internet, it will know the file extension like http://www.example.com/image.gif and that can be a good file type identifier.
Since the file info is already available to you, why not open up one more argument on your custom processing method to accept a content type string identifier like:
public static class Custom {
// Works with a stream from any source and a content type string indentifier.
static public void SavePicture(Stream inStream, string contentIdentifer) {
// Parse and recognize contentIdentifer to know the kind of file.
// Read the bytes of the file in the stream (while counting them).
// Write the bytes to wherever the destination is (e.g. disk)
// Example:
long totalBytesSeen = 0L;
byte[] bytes = new byte[1024]; //1K buffer to store bytes.
// Read one chunk of bytes at a time.
do
{
int num = inStream.Read(bytes, 0, 1024); // read up to 1024 bytes
// No bytes read means end of file.
if (num == 0)
break; // good bye
totalBytesSeen += num; //Actual length is accumulating.
/* Can check for "magic number" here, while reading this stream
* in the case the file extension or content-type cannot be trusted.
*/
/* Write logic here to write the byte buffer to
* disk or do what you want with them.
*/
} while (true);
}
}
Some useful filename parsing features are in the IO namespace:
using System.IO;
Use your custom method in the scenarios you mentioned like so:
From an HttpPostedFileBase instance named myPostedFile
Custom.SavePicture(myPostedFile.InputStream, myPostedFile.ContentType);
When using a WebClient instance named webClient1:
var imageFilename = "pic.gif";
var stream = webClient1.DownloadFile("http://www.example.com/images/", imageFilename)
//...
Custom.SavePicture(stream, Path.GetExtension(imageFilename));
Or even when processing a file from disk:
Custom.SavePicture(File.Open(pathToFile), Path.GetExtension(pathToFile));
Call the same custom method for any stream with a content identifer that you can parse and recognize.

BlackBerry J2ME Efficient Coding GuideLines? Could Somebody elaborate this?

I found the folliwing code sample in BlackBerry Java Development, Best Practices. Could somebody explain what the below same code means? What is the this in the code sample poining to?
Avoiding StringBuffer.append (StringBuffer)
To append a String buffer to another, a BlackBerry® Java Application should use net.rim.device.api.util.StringUtilities.append( StringBuffer dst, StringBuffer src[, int offset, int length ] ).
Code sample
public synchronized StringBuffer append(Object obj) {
if (obj instanceof StringBuffer) {
StringBuffer sb = (StringBuffer)obj;
net.rim.device.api.util.StringUtilities.append( this, sb, 0, sb )
return this;
}
return append(String.valueOf(obj));
}

StringBuffer does not offer an overload for the append() method that takes another StringBuffer. This means developers are likely to use StringBuffer.append(String str) and call .toString() on the second StringBuffer. This requires the second buffer to be turned into a string, which is immutable, and then the characters from the string are appended to the first StringBuffer. Thus every character in the second buffer is touched twice, and there is the unnecessary allocation of the String just to transfer the characters to the first StringBuffer.
The efficient way of doing this would copy each character from the second buffer onto the end of the first. However, StringBuffer does not provide any easy way of doing this. Thus the recommendation is to use StringUtilities.append(StringBuffer, StringBuffer) which is able to directly read the characters from the second buffer without copying them into an intermediate collection.
This saves the runtime of the extra copying, the runtime needed to allocate a temporary String, and the memory needed to allocate a temporary string.

It means that the StringBuffer class is not implemented efficiently. Java Strings are supposed to be immutable, that's what StringBuffer is used for. However, the StringBuffer class you're using is not efficient when using StringBuffer.append() so you need to use net.rim.device.api.util.StringUtilities. That's what the code is doing, encapsulating the use of that class in a new append() method.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to GetBytes() in C# with UTF8 encoding with BOM? - asp.net-mvc

Remember that .NET strings are all unicode while there stay in memory, so if you can see your csvString correctly with the debugger the problem is writing the file. In my opinion you should return a FileResult with the same encoding that the files. Try setting the returning File encoding,

Related

Encoding utf-8 with TIdHTTP->Put()

How to load/save wxString from/to wxStream or wxMemoryBuffer?

.NET RSAKeyValue base64 private key to a single base64 private key

System.IO.Stream in favor of HttpPostedFileBase

BlackBerry J2ME Efficient Coding GuideLines? Could Somebody elaborate this?

Categories

Resources