jsoup and character encoding - character-encoding

I have a bunch of questions relating to jsoup's charset support, most of which are supported by quotes from the API docs:
jsoup.Jsoup:
public static Document parse(File in, String charsetName) ...
Set to null to determine from http-equiv meta tag, if present, or fall back to UTF-8 ...
Does this mean the 'charset' meta-tag isn't used to detect the encoding?
jsoup.nodes.Document:
public void charset(Charset charset)
... This method is equivalent to OutputSettings.charset(Charset) but in addition ...
public Charset charset()
... This method is equivalent to Document.OutputSettings.charset().
Does this mean there isn't an "input charset" and "output charset", and that they are indeed the same setting?
jsoup.nodes.Document:
public void charset(Charset charset)
... Obsolete charset / encoding definitions are removed!
Will this remove the 'http-equiv' meta-tag in lieu of the 'charset' meta-tag? For backwards compatibility, is there any way to keep both?
jsoup.nodes.Document.OutputSettings:
public Charset charset()
Where possible (when parsing from a URL or File), the document's output charset is automatically set to the input charset. Otherwise, it defaults to UTF-8.
I need to know if the document hasn't specified an encoding*. Does this mean jsoup can't provide this information?
* instead of defaulting to UTF-8, I will run juniversalchardet.

The docs are out of date / incomplete. Jsoup does use the charset meta tag, as well as the http-equiv tag to detect the charset. From the source, we see that this method looks like this:
public static Document parse(File in, String charsetName) throws IOException {
return DataUtil.load(in, charsetName, in.getAbsolutePath());
}
DataUtil.load in turn calls parseByteData(...), which looks like this: (Source, scroll down)
//reads bytes first into a buffer, then decodes with the appropriate charset. done this way to support
// switching the chartset midstream when a meta http-equiv tag defines the charset.
// todo - this is getting gnarly. needs a rewrite.
static Document parseByteData(ByteBuffer byteData, String charsetName, String baseUri, Parser parser) {
String docData;
Document doc = null;
if (charsetName == null) { // determine from meta. safe parse as UTF-8
// look for <meta http-equiv="Content-Type" content="text/html;charset=gb2312"> or HTML5 <meta charset="gb2312">
docData = Charset.forName(defaultCharset).decode(byteData).toString();
doc = parser.parseInput(docData, baseUri);
Element meta = doc.select("meta[http-equiv=content-type], meta[charset]").first();
if (meta != null) { // if not found, will keep utf-8 as best attempt
String foundCharset = null;
if (meta.hasAttr("http-equiv")) {
foundCharset = getCharsetFromContentType(meta.attr("content"));
}
if (foundCharset == null && meta.hasAttr("charset")) {
try {
if (Charset.isSupported(meta.attr("charset"))) {
foundCharset = meta.attr("charset");
}
} catch (IllegalCharsetNameException e) {
foundCharset = null;
}
}
(Snip...)
The following line from the above code snippet shows us that indeed, it uses either meta[http-equiv=content-type] or meta[charset] to detect the encoding, otherwise falling back to utf8.
Element meta = doc.select("meta[http-equiv=content-type], meta[charset]").first();
I'm not quite sure what you mean here, but no, the output charset setting controls what characters are escaped when the document HTML / XML is printed to string, whereas the input charset determines how the file is read.
It will only ever remove meta[name=charset] items. From the source, the method which updates / removes the charset definition in the document: (Source, again scroll down)
private void ensureMetaCharsetElement() {
if (updateMetaCharset) {
OutputSettings.Syntax syntax = outputSettings().syntax();
if (syntax == OutputSettings.Syntax.html) {
Element metaCharset = select("meta[charset]").first();
if (metaCharset != null) {
metaCharset.attr("charset", charset().displayName());
} else {
Element head = head();
if (head != null) {
head.appendElement("meta").attr("charset", charset().displayName());
}
}
// Remove obsolete elements
select("meta[name=charset]").remove();
} else if (syntax == OutputSettings.Syntax.xml) {
(Snip..)
Essentially, if you call charset(...) and it does not have a charset meta tag, it will add one, otherwise update the existing one. It does not touch the http-equiv tag.
If you want to find out if the documet specifies an encoding, just look for http-equiv charset or meta charset tags, and if there are no such tags, this means that the document does not specify an encoding.
Jsoup is opens source, you can look at the source yourself to see exactly how it works: https://github.com/jhy/jsoup/ (You can also modify it to do exactly what you want!)
I'll update this answer with further details when I have time. Let me know if you have any other questions.

Related

C# MVC: Encoding a png, jpg, or pdf return value to prevent XSS

Suppose I have an C# MVC app which has a controller method that returns one of 3 content types: image png, image jpeg, or application pdf. I have read that it is possible to have images that contain XSS payloads. What would be the best way to Encode/escape these return contents so they aren't vulnerable to XSS? The controller method looks like this:
string contentType = "image/png";
MemoryStream mem = new MemoryStream();
if (ImageFormat == null || ImageFormat == "")
{
image.Save(mem, System.Drawing.Imaging.ImageFormat.Png);
}
else
{
if (ImageFormat.ToUpper() == "PNG") image.Save(mem, System.Drawing.Imaging.ImageFormat.Png);
if (ImageFormat.ToUpper() == "JPEG")
{
image.Save(mem, System.Drawing.Imaging.ImageFormat.Jpeg);
contentType = "image/jpeg";
}
}
mem.Position = 0;
mem.Seek(0, SeekOrigin.Begin);
return this.Image(mem, contentType);
Where Image is defined the following class here:
using …
namespace x.Classes
{
public static class ControllerExtensions
{
public static ImageResult Image(this Controller controller, Stream imageStream, string contentType)
{
return new ImageResult(imageStream, contentType);
}
}
}
And the OutputStream is written to using:
using …
namespace x.Classes
{
public class ImageResult : ActionResult
{
public ImageResult(Stream imageStream, string contentType)
{
if (imageStream == null)
throw new ArgumentNullException("imageStream");
if (contentType == null)
throw new ArgumentNullException("contentType");
this.ImageStream = imageStream;
this.ContentType = contentType;
}
public Stream ImageStream { get; private set; }
public string ContentType { get; private set; }
public override void ExecuteResult(ControllerContext context)
{
if (context == null)
throw new ArgumentNullException("context");
HttpResponseBase response = context.HttpContext.Response;
response.ContentType = this.ContentType;
byte[] buffer = new byte[4096];
while (true)
{
int read = this.ImageStream.Read(buffer, 0, buffer.Length);
if (read == 0)
break;
response.OutputStream.Write(buffer, 0, read);
}
response.End();
}
}
}
Is there a way for me to escape/encode the buffer that is getting written to the OutputStream here:`
response.OutputStream.Write(buffer, 0, read);
To protect against XSS attacks? For example if this were HTML that was being returned:
response.OutputStream.Write(HttpUtility.HtmlEncode(buffer), 0, read);
But we know we are returning a jpeg, pdf, or png which means Html encode won't work here. So what do we use to safely escape/encode an image/pdf?
By the time you have buffer ready, it's too late. The same as with HTML, you want to context-sensitively encode any user input in those files, not the whole thing.
Now, with images this doesn't make much sense in the context of XSS, an image is rendered by an image renderer, and not as html, so there won't be any javascript to be run. The general best practice for uploaded images is to process them on the server and save them as a new image, because this removes all unnecessary things, but it has its risks as well if your processor itself is the target of an attack.
SVG for example is a different beast, SVG can have code in it, as can PDF. But again, PDFs will be open on the client with a PDF viewer, not in the context of the web application even if the PDF viewer is the browser itself (the browser hopefully separates Javascript in the PDF from the web page even if the origin is the same).
But javascript in a PDF can still be an issue for the client. Javascript running in a PDF may do harmful things, the simplest of which is consume client resources (ie. DoS of some sort), or it may try to break out of the PDF context somehow exploiting a viewer vulnerability. So the attack would be that one user uploads a malicious PDF for others to download. I think the best you can do against this is scan uploaded files for malware (which you should do anyway).
If you are generating all of this from user input (images, PDFs), then the libraries you use should take care of properly encoding values so that a malicious user can't inject code in a PDF. When the PDF is already generated, you can't "fix" it anymore, user input is mixed with code.
Also make sure to set the following header in responses (along with the correct Content-Type of course):
X-Content-Type-Options: nosniff
You do not need to encode the images themselves, you need to encode/escape the links to the images.
For example:
Link Title
where image.url.png?logout comes from user input.
You would url encode image.url.png?logout as image.url.png%3Flogout so that it is rendered useless to an attacker.

Easiest way of porting html table data to readable document

Ok,
For the past 6 months i've been struggeling to build a system that allows user input in form of big sexy textareas(with loads of support for tables,list etc). Pretty much enables the user to input data as if it were word. However when wanting to export all this data I haven't been able to find a working solution...
My first step was to try and find a reporting software that did support raw HTML from the data source and render it as normal html, worked perfectly except that the keep together function is awful, either data is split in half(tables,lists etc) which I dont want. Or report always skips to the next page to avoid this, ending up in 15+ empty pages within the final document.
So Im looking for some kind of tip/direction to what would be the best solution to export my data into a readable document(pdf or word pref).
What I got is the following data breakdown, where data is often raw html.
-Period
--Unit
---Group
----Question
-----Data
What would be the best choice? Trying to render html to pdf or rtf? I need tips :(
And also sometimes the data is 2-3 pages long with mixed tables lists and plain text.
I would suggest that you try to keep this in the browser, and add a print stylesheet to the HTML to make it render one way on the screen and another way on paper. Adding a print stylesheet to your HTML is as easy as this:
<link rel="stylesheet" media="print" href="print.css">
You should be able to parse the input it with something like Html Agility Pack and transform it (i.e. with XSLT) to whatever output format you want.
Another option is to write HTML to the browser, but with Content-Type set to a Microsoft Word-specific variant (there are several to choose from, depending on the version of Word you're targeting) should make the browser ask if the user wants to open the page with Microsoft Word. With Word 2007 and newer you can also write Office Open XML Word directly, since it's XML-based.
The content-types you can use are:
application/msword
For binary Microsoft Word files, but should also work for HTML.
application/vnd.openxmlformats-officedocument.wordprocessingml.document
For the newer "Office Open XML" formats of Word 2007 and newer.
A solution you could use is to run an application on the server using System.Diagnostics.Process that will convert the site and save it as a PDF document.
You could use wkhtmltopdf which is an open source console program that can convert from HTML to PDF or image.
The installer for windows can be obtained from wkhtmltox-0.10.0_rc2 Windows Installer (i368).
After installing wkhtmltopdf you can copy the files in the installation folder inside your solution. You can use a setup like this in the solution:
The converted pdf's will be saved to the pdf folder.
And here is code for doing the conversion:
var wkhtmltopdfLocation = Server.MapPath("~/wkhtmltopdf/") + "wkhtmltopdf.exe";
var htmlUrl = #"http://stackoverflow.com/q/7384558/750216";
var pdfSaveLocation = "\"" + Server.MapPath("~/wkhtmltopdf/pdf/") + "question.pdf\"";
var process = new Process();
process.StartInfo.UseShellExecute = false;
process.StartInfo.CreateNoWindow = true;
process.StartInfo.FileName = wkhtmltopdfLocation;
process.StartInfo.Arguments = htmlUrl + " " + pdfSaveLocation;
process.Start();
process.WaitForExit();
The htmlUrl is the location of the page you need to convert to pdf. It is set to this stackoverflow page. :)
Its a general question, but two things come to mind the Visitor Pattern and Changing the Mime Type.
Visitor Pattern
You can have two seperate rendering techniques. This would be up to your implementation.
MIME Type
When the request is made write date out in the Response etc
HttpContext.Current.Response.Clear();
HttpContext.Current.Response.Charset = "utf-16";
HttpContext.Current.Response.ContentEncoding = System.Text.Encoding.GetEncoding("windows-1250");
HttpContext.Current.Response.AddHeader("content-disposition", string.Format("attachment; filename={0}.doc", filename));
HttpContext.Current.Response.ContentType = "application/msword";
HttpContext.Current.Response.Write("-Period");
HttpContext.Current.Response.Write("/n");
HttpContext.Current.Response.Write("--Unit");
HttpContext.Current.Response.Write("/n");
HttpContext.Current.Response.Write("---Group");
HttpContext.Current.Response.Write("/n");
HttpContext.Current.Response.Write("----Question");
HttpContext.Current.Response.Write("/n");
HttpContext.Current.Response.Write("-----Data");
HttpContext.Current.Response.Write("/n");
HttpContext.Current.Response.End();
Here is another option, use print screens (Although it doesnt take care of scrolling, I think you should be able to build this in). This example can be expanded to meet the needs of your business, although it is a hack of sorts. You pass it a URL it generates an image.
Call like this
protected void Page_Load(object sender, EventArgs e)
{
int screenWidth = Convert.ToInt32(Request["ScreenWidth"]);
int screenHeight = Convert.ToInt32(Request["ScreenHeight"]);
string url = Request["Url"].ToString();
string bitmapName = Request["BitmapName"].ToString();
WebURLToImage webUrlToImage = new WebURLToImage()
{
Url = url,
BrowserHeight = screenHeight,
BrowserWidth = screenWidth,
ImageHeight = 0,
ImageWidth = 0
};
webUrlToImage.GenerateBitmapForUrl();
webUrlToImage.GeneratedImage.Save(Server.MapPath("~") + #"Images\" +bitmapName + ".bmp");
}
Generate an image from a webpage.
using System;
using System.Drawing;
using System.Windows.Forms;
using System.Threading;
using System.IO;
public class WebURLToImage
{
public string Url { get; set; }
public Bitmap GeneratedImage { get; private set; }
public int ImageWidth { get; set; }
public int ImageHeight { get; set; }
public int BrowserWidth { get; set; }
public int BrowserHeight { get; set; }
public Bitmap GenerateBitmapForUrl()
{
ThreadStart threadStart = new ThreadStart(ImageGenerator);
Thread thread = new Thread(threadStart);
thread.SetApartmentState(ApartmentState.STA);
thread.Start();
thread.Join();
return GeneratedImage;
}
private void ImageGenerator()
{
WebBrowser webBrowser = new WebBrowser();
webBrowser.ScrollBarsEnabled = false;
webBrowser.Navigate(Url);
webBrowser.DocumentCompleted += new
WebBrowserDocumentCompletedEventHandler(webBrowser_DocumentCompleted);
while (webBrowser.ReadyState != WebBrowserReadyState.Complete)
Application.DoEvents();
webBrowser.Dispose();
}
void webBrowser_DocumentCompleted(object sender,
WebBrowserDocumentCompletedEventArgs e)
{
WebBrowser webBrowser = (WebBrowser)sender;
webBrowser.ClientSize = new Size(BrowserWidth, this.BrowserHeight);
webBrowser.ScrollBarsEnabled = false;
GeneratedImage = new Bitmap(webBrowser.Bounds.Width, webBrowser.Bounds.Height);
webBrowser.BringToFront();
webBrowser.DrawToBitmap(GeneratedImage, webBrowser.Bounds);
if (ImageHeight != 0 && ImageWidth != 0)
GeneratedImage =
(Bitmap)GeneratedImage.GetThumbnailImage(ImageWidth, ImageHeight,
null, IntPtr.Zero);
}
}

Formating a textfile to display in an EditText on android

I got my little html fisher working and it grabs a text file form a URL string. I can call setText to my EditText view and it will indeed display the text in the text file, but there is like no formatting at all (I am talking simple stuff like carriage returns and line feeds). How can I get this to render a bit more nicely in the EditText? Original text file looks like:
Imagine the following from a resource:
http://www.someaddress.com/thetextfile.txt
Original Text File here.
1. First thing on the text file.
2. Second thing on the text file...
Finally the end of the text file. This is a long string blah blah
that I like to use here. Notice the carriage return line breaks and indentation
on the paragraph.
I can get the above as a string but there is no carriage returns at all when it displays in the EditView. Is there anyway I can add this? Its not a matter of adding \n or \r cause those would already be in the text file I would suspect (was written in Notepad++). So is there anyway to get this even ever so slightly more formatted? (and preserve the formatting when the string is saved back out to disk or to a database?
EDIT:
<EditText android:id="#+id/contract_text_input"
android:layout_width="650sp" android:layout_height="wrap_content"
android:lines="25"
android:scrollbars = "vertical"
android:gravity="top|left" android:inputType="textMultiLine"
android:scrollHorizontally="false"
android:minWidth="10.0dip"
android:maxWidth="5.0dip"/>
EDIT:
These are the methods that fetch this text file from the internet. They seem to be ripping out all the \n and \r. But if I inspect that file that's on line it only has \t's in it. So maybe its filezilla's uploading to my webserver of the file?
public static InputStream getInputStreamFromUrl(String url){
InputStream contentStream = null;
try{
HttpClient httpclient = new DefaultHttpClient();
HttpResponse response = httpclient.execute(new HttpGet(url));
contentStream = response.getEntity().getContent();
} catch(Exception e){
e.printStackTrace();
}
return contentStream;
}
public static String getStringFromUrl(String url) {
BufferedReader br = new BufferedReader(new InputStreamReader(getInputStreamFromUrl(url)));
StringBuffer sb = new StringBuffer();
try{
String line = null;
while ((line = br.readLine())!=null){
sb.append(line);
}
}catch (IOException e){
e.printStackTrace();
}
return sb.toString();
}
This is how I am ultimately updating the EditText:
private class FragmentHttpHelper extends AsyncTask<Void, Void, String>{
protected void onPostExecute(String result) {
contractTextTxt.setText(result);
}
#Override
protected String doInBackground(Void... params) {
return getStringFromUrl(urlReferenceTxt.getText().toString());
}
}
Check the Android API. The problem is in the following line of code:
while ((line = br.readLine())!=null)
{
//code goes here
}
The call to readLine() removes newlines, carriage returns and linefeeds. You need to use read() if you want to get everything.
Hi sorry I just did a similar thing in an app of mine
and called
((EditText) findViewById(R.id.textView1))
.setText("Original Text File here.\r\n\r\n1. First thing on the text file.\r\n\r\n2. Second thing on the text file...\r\n\r\n Finally the end of the text file. This is a long string blah blah that I like to use here. Notice the carriage return line breaks and indentation on the paragraph.");
which would be the same as your string
and I get
which looks fine to me!
my edit text was
<EditText android:id="#+id/textView1" android:layout_width="wrap_content"
android:layout_weight="0" android:layout_height="wrap_content"></EditText>
edit sorry huge image
edit2: check the debugger's string value

writing UI content to a file on a server

I've got a pop-up textarea, where the user writes some lengthy comments. I would like to store the content of the textarea to a file on the server on "submit". What is the best way to do it and how?
THanks,
This would be very easy to do. The text could be just a string or stringBuffer for size and formatting, then just pass that to your java code and use file operations to write to a file.
This is some GWT code, but it's still Ajax, so it will be similar. Get a handler for an event to capture the button submittal, then get the text in the text area.
textArea.addChangeHandler(new ChangeHandler() {
public void onChange(ChangeEvent changeEvent) {
String text = textArea.getText();
}
});
The passing off mechanism I don't know because you don't show any code, but I just created a file of filenames, line by line by reading filenamesout of a list of files with this:
private void writeFilesListToFile(List<File> filesList) {
for(File file : filesList){
String fileName = file.getName();
appendToFile(fileName);
}
}
private void appendToFile(String text){
try {
BufferedWriter out = new BufferedWriter(new FileWriter<file path andfile name>));
out.write(text);
out.newLine();
out.close();
} catch (IOException e) {
System.out.println("Error appending file with filename: " + text);
}
}
You could do something similar, only write out the few lines you got from the textarea. Without more to go on I can't really get more specific.
HTH,
James

In C#, how can I know the file type from a byte[]?

I have a byte array filled from a file uploaded. But, in another part of the code, I need to know this file type uploaded from the byte[] so I can render the correct content-type to browser!
Thanks!!
As mentioned, MIME magic is the only way to do this. Many platforms provide up-to-date and robust MIME magic files and code to do this efficiently. The only way to do this in .NET without any 3rd party code is to use FindMimeFromData from urlmon.dll. Here's how:
public static int MimeSampleSize = 256;
public static string DefaultMimeType = "application/octet-stream";
[DllImport(#"urlmon.dll", CharSet = CharSet.Auto)]
private extern static uint FindMimeFromData(
uint pBC,
[MarshalAs(UnmanagedType.LPStr)] string pwzUrl,
[MarshalAs(UnmanagedType.LPArray)] byte[] pBuffer,
uint cbSize,
[MarshalAs(UnmanagedType.LPStr)] string pwzMimeProposed,
uint dwMimeFlags,
out uint ppwzMimeOut,
uint dwReserverd
);
public static string GetMimeFromBytes(byte[] data) {
try {
uint mimeType;
FindMimeFromData(0, null, data, (uint)MimeSampleSize, null, 0, out mimeType, 0);
var mimePointer = new IntPtr(mimeType);
var mime = Marshal.PtrToStringUni(mimePointer);
Marshal.FreeCoTaskMem(mimePointer);
return mime ?? DefaultMimeType;
}
catch {
return DefaultMimeType;
}
}
This uses the Internet Explorer MIME detector. This is the same code used by IE to send a MIME type along with uploaded files. You can see the list of MIME types supported by urlmon.dll. One thing to watch out for is image/pjpeg and image/x-png which are non-standard. In my code I replace these with image/jpeg and image/png.
Not sure, but maybe you should investigate about magic numbers.
Update:
Reading about it, I don't think it's very reliable though.
If you know it's a System.Drawing.Image, you can do:
public static string GetMimeTypeFromImageByteArray(byte[] byteArray)
{
using (MemoryStream stream = new MemoryStream(byteArray))
using (Image image = Image.FromStream(stream))
{
return ImageCodecInfo.GetImageEncoders().First(codec => codec.FormatID == image.RawFormat.Guid).MimeType;
}
}
You can't know it from the byte stream, but you can store the MIME type when you initially populate the byte[].
Short answer: you can't
Longer answer: Usually, programs use the file extension to know what type of file they're dealing with. If you don't have that extension, you can only make guesses... for instance, you could look at the first few bytes and check if you recognize a well-known header (XML declaration tag for instance, or bitmap or JPEG header). But that will always be a guess in the end : without some metadata or information about the content, an array of bytes is just meaningless...
If you know extension of the file name, may be System.Web.MimeMapping will do the trick:
MimeMapping.GetMimeMapping(fileDisplayNameWithExtension)
I used it in MVC Action like this:
return File(fileDataByteArray, MimeMapping.GetMimeMapping(fileDisplayNameWithExtension), fileDisplayNameWithExtension);
Reminds me of back in the day we, er um "some people" used to share 50MB rar files on the early free image hosting sites, by just adding the .gif extension to the .rar filename.
Clearly if you are public facing and your are expecting a certain file type, and you have to be sure it is that file type, then you can't just trust the extension.
On the other hand, if your app would have no reason to distrust the the uploaded extension and or MIME type, then just get those when the file is uploaded like the answers you received from #rossfabircant and #RandolphPotter. create a type that has the byte[], as well as the original extension or mimetype, and pass that around.
If you need to verify that the file is actually a certain expected type like a valid .jpeg, or .png you can try to interpret the file as those types and see if it opens successfully. (System.Drawing.Imaging.ImageFormat)
If you are trying to classify the file only from the binary contents, and it could be any format in the whole wide world, that is really a tough, open-ended problem and there is no 100% reliable way to do it. You could invoke TrID against it, and there are likely similar forensics tools used by law enforcement investigators if you can find (and afford) them.
If you don't have to do it the hard way, don't.
You don't want to do it that way. Call Path.GetExtension when the file is uploaded, and pass the extension around with the byte[].
If you have a limited number of expected file types you want to support, magic numbers can be the way to go.
A simple way to check is to just open example files with a text/hex editor, and study the leading bytes to see if there is something there you can use to differentiate/discard files from the supported set.
If, on the other hand, you are looking to recognize any arbitrary file type, yeah, as everyone has stated already, tough.
Using the System.Drawing.Image 'RawFormat.Guid' Property you can detect MIME Type of Images.
but i am not sure how to find other File Types.
http://www.java2s.com/Code/CSharp/Network/GetImageMimeType.htm
UPDATE: you may try taking a look on this post
Using .NET, how can you find the mime type of a file based on the file signature not the extension
I got AccessViolationException while accessing memory using other answers, so I solved my problem using this code:
[DllImport("urlmon.dll", CharSet = CharSet.Unicode, ExactSpelling = true, SetLastError = false)]
private static extern int FindMimeFromData(IntPtr pBc,
[MarshalAs(UnmanagedType.LPWStr)] string pwzUrl,
[MarshalAs(UnmanagedType.LPArray, ArraySubType = UnmanagedType.I1, SizeParamIndex = 3)]
byte[] pBuffer,
int cbSize,
[MarshalAs(UnmanagedType.LPWStr)] string pwzMimeProposed,
int dwMimeFlags,
out IntPtr ppwzMimeOut,
int dwReserved
);
/**
* This function will detect mime type from provided byte array
* and if it fails, it will return default mime type
*/
private static string GetMimeFromBytes(byte[] dataBytes, string defaultMimeType)
{
if (dataBytes == null) throw new ArgumentNullException(nameof(dataBytes));
var mimeType = string.Empty;
IntPtr suggestPtr = IntPtr.Zero, filePtr = IntPtr.Zero;
try
{
var ret = FindMimeFromData(IntPtr.Zero, null, dataBytes, dataBytes.Length, null, 0, out var outPtr, 0);
if (ret == 0 && outPtr != IntPtr.Zero)
{
mimeType = Marshal.PtrToStringUni(outPtr);
Marshal.FreeCoTaskMem(outPtr);
}
}
catch
{
mimeType = defaultMimeType;
}
return mimeType;
}
How to call it:
string ContentType = GetMimeFromBytes(byteArray, "image/jpeg");
Hope this helps!

Resources