Formrecognizer preview: Train model error Payload too large - machine-learning

I'm trying to train a model in the formrecognizer preview using the test console (western europe) provided by azure. But i get an error code 413 Payload Too Large. The error message is: Unable to process dataset. Size of dataset exceeds size limit (4.00MB).
I've provided a SAS-Storage-Url pointing to a blob container. This blob container contains 5 png-files. Each with a size between 2.7 and 3.1 MB and with a content-type of application/png.
From the documentation i know that the file size must be smaller than 4 Mb.
Is the size of the complete dataset (sum of all trainig files) restricted to four Megabytes?

Form Recognizer v1 supports a training set which must be less than 4 megabytes (MB) total. The Form Recognizer v1 APIs are sync APIs and have a time limit on the processing time, hence the limits. Form Recognizer v2.0 (preview) is an async API and enabling training on large data sets. Please use the v2.0 (preview) API.

Further detail: try the V2 API or later.
It isn't possible to "solve" this problem other than to shrink the training documents. A process for shrinking them manually rather than using code is below:
PDFEscape (has a free trial, excellent tool)
Export all images from page
Irfanview - batch operations, resize images by percent, save to PDF
PDFEscape select all PDF versions of individual page images, right-click in Windows explorer and PDFEscape-merge. Reorder if necessary, then save the new PDF

Related

Size limit on chatMessageHostedContent content bytes

Is there any size limit on the contentBytes Microsoft Graph API v1 chatMessageHostedContent? I am figuring out how to download hosted content, given that I am constrained with a ~8mb content download size limit.
One of the comments in this question says that contentBytes can be in GBs. If so, what is the way to upload such a huge hosted content. I was able to send only around 3mb of hosted content bytes along with the SendMessage API.
According to this discussion, while posting the maximum size is 4 MB. Generally, the size should not exceed 3 GB and it's better to split up the files.

AWS Cognito dataset sync limits - What happens when you sync more than 1mb per set

AWS Cognito allows you to sync datasets across devices but the documentation states:
Each dataset can have a maximum size of 1 MB.
You can associate up to 20 datasets with an identity.
However it appears that if you DO NOT sync the datasets and simply keep them local then the dataset can be larger than 1mb in size.
What happens if you then try and sync those sets? is it that Cognito throws an error and simply doesn't allow it OR The dataset is trimmed to 1mb such that only the most recent records sync OR something else?
NOTE: I am aware that one could split data across multiple sets and then perform a sync but this is NOT a solution for me as I require all 20 sets already
Cognito will throw an exception (LimitExceededException, if memory serves) when you have put more than 1 MB into a dataset. It won't truncate data and accept the synchronization request.

ImageResizer .net for multiple product images perfomance issues?

I'm building an Asp.Net MVC4 Application with product pages. I have come by the ImageResizer Library for handling and serving the images. My page has jpg thumbnails 160x160px in dimensions and 3~5KB size each.
To my understanding using the ImageResizer library i could just upload the original large product image 600 x 600px & 10~20KB and resize it on the fly to the thumbnail size when the visitor requests the page. Something like:
<img src="#Url.Content("~/images/imagename?width=160&height=160")" alt="">
Which i understand is fine for a couple of images but my product page consists of 20 to 100 product jpg unique thumbnails (depending on pagesize).
Should performance hurt with processing on-the-fly 20-100 pics each time? Has anyone faced a similar scenario? I could always go back back and generate 2 different images (thumbnail and large) during the upload process but i'm very curius if i could get away with just one image per product and dynamic resizing.
When i say performance i mean that anything above 0.5 - 1s extra response time is a no-no for me.
In documentation it is mentioned, that there's caching plugin, which improves performance by 100-10000X:
Every public-facing website needs disk caching for their dynamically resized images (no, ASP.NET's output cache won't work). This module is extremely fast, but decoding the original image requires a large amount of contiguous RAM (usually 50-100MB) to be available. Since it requires contiguous, non-paged, non-fragmented RAM, it can't be used a (D)DOS attack vector, but it does mean that there is a RAM-based limit on how many concurrent image processing requests can be handled. The DiskCache plugin improves the throughput 100-10,000X by delegating the serving of the cached files back to IIS and by utilizing a hash-tree disk structure. It easily scales to 100,000 variants and can be used with as many as a million images. It is part of the Performance edition, which costs $249. The DiskCache plugin requires you to use the URL API (read why).
http://imageresizing.net/plugins/diskcache
http://imageresizing.net/docs/basics
When it comes to websites, every operation that can be cached should be. This allows the server to deal with more visitors rather than more processing.
You could either use the caching plugin for ImageResizer, or manually write to file using a certain filename, e.g.: product_154_180x180.jpg where 154 is product id, and 180 is the width and height, then check for whether it exists when you want to display it.
If you do the latter, you may be able to use the server to manage this for you, by linking to the expected filename in the page source, and if it doesn't exist, the server then calls a script that resizes and writes the resized image to disk using imageresizer.
This last method will also avoid the call to ImageResizer saving you some processing power.

I need to embed a > 4MB Content File in a BlackBerry Webworks Application ( fror BlackBerry 6/7 ). What are my options?

I am in the process of developing a content-heavy Webworks app. In order for the app to be useful, it needs to maintain a local content database ( approx. 4MB in SQL form; the only way to reduce it further is to rip out entire categories of content ).
My original thinking was that I would embed the SQL file in the app ( just like the CSS and JS ), then load into SQlite on first run. The strategy worked in development on the Ripple emulator.
When I attempted to build and run on a real test device, grief resulted. The compiled COD had > 127 sibling CODs, so it wouldn't install ( took a week to find that out ).
I have prototyped a different approach - downloading the SQL file from the web on first run. I do not like this second approach - with reason; this application is intended for use in a zone of the world that has expensive / spotty bandwidth.
Is there a way to embed significant amounts of content in a BlackBerry application for BB 6/7 without running into application size limits ( either number of Sibling CODs [ cannot exceed 127 ] or absolute size of the application)?
Doesn't look like it:
http://supportforums.blackberry.com/t5/Testing-and-Deployment/The-maximum-size-of-a-BlackBerry-smartphone-application/ta-p/1300209
Specifically this:
The limit for the number of sibling COD files that can exist within a
single application is 127. This means that the maximum theoretical
size limit for an application would be 16256 KB, which consists of
8128 KB of application data and 8128 KB of resource data. There is
some overhead to this value, which brings the actual maximum size
limit closer to 14000 KB. The actual maximum size for an application
will vary slightly based on the application's contents.
It is not possible for either data type (application or resource) to
make use of unused space of another data type, meaning resource data
cannot use application data space even if the application data is well
under the limit.

Out of memory errors with permsize 1024M and heap size 2048M, for XMLs processed inside for-each loop in XPL, each being about 4.5 MB in size on disk

I have a few for-each loops, that I use to iterate over the elements of a configuration XML which is very small size (2 KB on disk) to arrive at a source URL and a target URL dynamically. Then, I retrieve data from source URL using URLGenerator (because it performs streaming) and load it to XML database using XForms Submissions processor. Source URL and Target URLs are computed dynamically and the innermost loops where the retrieval and loads take place happen about 32 times in total, each time with an XML file of about 4.5 MB on average (max. about 6 MB, min. a few KB). Every time I try this, I get out of memory error for Tomcat running with permgen and heap sizes generously allocated, JVM is 32-bit, OS is 32-bit; I want a way out of this out-of-memory errors:
I had thought the separate XForms Submissions will be separate transactions, so will not accumulate causing out-of-memory problem
Is there a way to perform a streaming load using XForms Submissions processor instead of creating full document in memory?
I do not know if that would help, but is there a way to perform aggressive garbage collection in Orbeon so I do not get out-of-memory problems?
If needed, I can post the code here (for the XPL).
Source code (in reply to the comment asking for it)
If you run the oxf:xforms-submission inside a loop, they will run independently, and uploading 30 documents in the loop should only take the memory necessary for the largest document.
The XForms submission needs to have the full document in memory to be able to upload it; it doesn't support streaming (unlike the oxf:url-generator).
The default VM pergen setting is often too low, so I would recommend you to try to increase your permgen space.

Resources