Camel file component charset with more than one endpoints

Camel file component charset with more than one endpoints - character-encoding

when i have a route that sends data to two different file component endpoints,
where about one EP i don't really care about the encoding but about the other EP, i need to ensure a certain encoding, should i still set the charsetname in both encodings?
I'm asking because a client of ours had a problem in that area. the route receives UTF-8 and and we need to write iso-8859-1 to the file.
And now, after the whole hardware was restarted (after power-outage), we found things like "??" instead of the expected "ä".
Now, by specifying the charsetname on all file producer endpoints, we were able to solve the issue.
My actual question now is:
do you think i can now expect that the problem is solved for good?
Or shouldn't there be a relation and I would be well advised not to lean back until I 100% understand the issue.
Notes that might be helpfull:
in addition, before writing to any of those two file endpoints, we
also do .convertBodyTo(byte[].class, "iso-8859-1")
we use camel 2.16.1

In the end, the problem was not about having two file endpoints in one pipeline.
It was about the JVM's default encoding as written here:
http://camel.465427.n5.nabble.com/Q-about-how-to-help-the-file2-component-to-do-a-better-job-td5783381.html

Related

Usage of URL_HASH in FetchContent_Declare

I am a newbie in CMake and trying to understand the following CMake command
FetchContent_Declare(curl
URL https://github.com/curl/curl/releases/download/curl-7_75_0/curl-7.75.0.tar.xz
URL_HASH SHA256=fe0c49d8468249000bda75bcfdf9e30ff7e9a86d35f1a21f428d79c389d55675
USES_TERMINAL_DOWNLOAD TRUE)
When I open a browser and put in https://github.com/curl/curl/releases/download/curl-7_75_0/curl-7.75.0.tar.xz, the file curl-7.75.0.tar.xz will start downloading without the need for the URL_HASH. I am sure it is not redundant. I wanted to know what the purpose of the URL_HASH is?
Also how can SHA256 be found? Because when I visit https://github.com/curl/curl/releases/download/curl-7_75_0 to find out more, the link is broken.

I am sure it is not redundant. I wanted to know what the purpose of the URL_HASH is?
Secure hash functions like SHA256 are designed to be one-way; it is (in practice) impossible to craft a malicious version of a file with the same SHA256 hash as the original. It is even impossible to find two arbitrary files that have the same hash. Such a pair is called a "collision" and finding even one would constitute a major breakthrough in cryptanalysis.
The purpose of this hash in a CMakeLists.txt, then, is as an integrity check. If a bad actor has intercepted your connection somehow, then checking the hash of the file you actually downloaded against this hard-coded expected hash will detect whether or not the file changed in transit. This will even catch less nefarious data corruptions, like those caused by a faulty hard drive.
Including such a hash (a "checksum") is absolutely necessary when downloading code or other binary artifacts.
Also how can SHA256 be found?
Often, these will be published alongside the binaries. Use a published value if available.
If you have to compute it yourself, you have a few options. On the Linux command line, you can use the sha256sum command. As a hack, you can write a deliberately wrong SHA256=0 value or something and fish the observed value from the error message.
Note that if you compute the hash yourself, you should either (a) download the file from an absolutely trusted connection and device or (b) download it from multiple independent devices (free CI systems like GitHub Actions are useful for this) and ensure the hash is the same across all of them.

Should I use an YAML file or the DB to store my success/error messages in Rails app?

In my Rails app, after executing some code I want to send Slack messages to users to notify them of the execution result. There are multiple processes for which I want to send these messages, and in short, I need somewhere to store message templates for successes/errors (they're just short strings, like "Hi, we've successfully done x!", but they differ for each process).
Right now, I have a SlackMessage model in the DB from which you can retrieve the message content. However, I heard that it's better to manage custom messages like this in a yml file, since it's easier to add/edit the messages later on (like this, even though this is for locales).
What is the best practice for this kind of scenario? If it's not to use a DB, I appreciate if you could give pointers or a link on how to do it (in terms of using yml files, the only material I could find was on internationalisation).

Why don't you use the already existing I18n module in Rails? This is perfect for storing messages, and gives you the ability to use translations would you ever need them in the future.
Getting a message is simple:
Slack.message(I18n.t(:slack_message, scope:'slack'))
In this case you need a translation file like this:
en:
slack:
slack_message: This is the message you are going to select.
Read more on I18n: https://guides.rubyonrails.org/i18n.html

YAML is generally much slower than a DB to load data from. Additionally YAML parsers load all of the data usually even if there are multiple documents in the YAML stream.
For programs that have a long run-time and use a large part of the messages, it is usually not a problem to use YAML. But on short running programs the loading can be a significant part of the run-time and techniques like delaying the loading and caching might not help. As an example: I got a PR for my YAML library some time ago, that delayed the instantiation of regular expressions in the library, as that delayed the startup of some programs.
If you have many messages, they all stay in memory after loading from YAML, that might be a problem. With a DB it is much more common to only retrieve what is needed, and rely on the DB to do that efficiently (caching, etc).
If the above mentioned advantages and criteria don't help you decide, you can also have it both ways: by having the ease of reading/editing of YAML and the speed, caching, etc. of a DB. "Just" convert the YAML stream to a DB, either explicitly after editing the YAML document or on first use by your program (by looking at the files date-time-stamps). That is approach that programs like postfix use relying on postmap (although the inputs are text, but not YAML files).

How can I simulate HTTP requests in different encodings?

I run a Ruby on Rails application and since the site is becoming increasingly popular internationally, I started having errors related to encoding, eg:
Encoding::UndefinedConversionError: "\xE8" from ASCII-8BIT to UTF-8
I'm trying to find an HTTP request simulator that supports various encodings to reproduce the errors, but I'm not having much luck.
Does anyone know how to simulate or test HTTP requests with non UTF8 parameters / path infos?

An encoding is just how your represent your text into bytes. As long as you encode/decode the text using the same encoding, you should be fine. If you encode/decode it using different encodings, it'll interpret the byte sequences differently and result in errors.
Normally, you are in control of the encodings used and the webserver can handle basic conversions.
Browser <--encoding--> server <--encoding--> files
There is normally no need to "find an HTTP request simulator that supports various encodings" since you usually define which one is used on server side, or the webserver handles the conversion.
If there is some strange client using some strange encoding that can't be recognized, I'd say it's either a serious issue in your webserver, your configuration or something similar. ...or in the files themeselves that are not encoded in the same format you use to read them.
...lastly, I believe almost any HTTP client supports many encodings for the body.
EDIT:
Since you mention URLs:
URLs must be encoded using plain old ASCII. Even if you use fancy UTF8 characters, the browser will translate them underneath.
http://en.wikipedia.org/wiki/Percent-encoding
Using a strange encodings for URLs and invalid characters is a client error and should be fixed client side IMHO.

PHP fails to parse large post variable

I'm trying to pass a rather large post request to php, and when I var_dump $_POST array, one, the most large, variable is missing. (Actually that's base64 encoded binary upload as part of a post request)
Funny thing, that on my development PC exactly same request is parsed correctly, without any missing variables.
I checked out contents of php://input on server and development PC and they are exactly the same, md5 matches. Yet development PC recognizes all variables, and server misses one.
I tried changing many different options in php.ini, and got zero effect.
Maybe someone will point me to the right one.
Here is my php://input (~5 megabytes) http://www.mediafire.com/?lp0uox53vhr35df

It's possible the server is blocking it because of Suhosin extension.
http://www.hardened-php.net/suhosin/configuration.html#suhosin.post.max_value_length
suhosin.post.max_value_length
Type: Integer Default: 65000 Defines the maximum length of a variable
that is registered through a POST request.
This will have to be changed in the php.ini.
Keep in mind that this is different than the Suhosin patch which is common on alot of shared hosts. I don't know if the patch would cause this problem.

Methods of reducing URL size?

So, we have a very large and complex website that requires a lot of state information to be placed in the URL. Most of the time, this is just peachy and the app works well. However, there are (an increasing number of) instances where the URL length gets reaaaaallllly long. This causes huge problems in IE because of the URL length restriction.
I'm wondering, what strategies/methods have people used to reduce the length of their URLs? Specifically, I'd just need to reduce certain parameters in the URL, maybe not the entire thing.
In the past, we've pushed some of this state data into session... however this decreases addressability in our application (which is really important). So, any strategy which can maintain addressability would be favored.
Thanks!
Edit: To answer some questions and clarify a little, most of our parameters aren't an issue... however some of them are dynamically generated with the possibility of being very long. These parameters can contain anything legal in a URL (meaning they aren't just numbers or just letters, could be anything). Case sensitivity may or may not matter.
Also, ideally we could convert these to POST, however due to the immense architectural changes required for that, I don't think that is really possible.

If you don't want to store that data in the session scope, you can:
Send the data as a POST parameter (in a hidden field), so data will be sent in the HTTP request body instead of the URL
Store the data in a database and pass a key (that gives you access to the corresponding database record) back and forth, which opens a lot of scalability and maybe security issues. I suppose this approach is similar to use the session scope.

most of our parameters aren't an issue... however some of them are dynamically generated with the possibility of being very long
I don't see a way to get around this if you want to keep full state info in the URL without resorting to storing data in the session, or permanently on server side.
You might save a few bytes using some compression algorithm, but it will make the URLs unreadable, most algorithms are not very efficient on small strings, and compressing does not produce predictable results.
The only other ideas that come to mind are
Shortening parameter names (query => q, page=> p...) might save a few bytes
If the parameter order is very static, using mod_rewritten directory structures /url/param1/param2/param3 may save a few bytes because you don't need to use parameter names
Whatever data is repetitive and can be "shortened" back into numeric IDs or shorter identifiers (like place names of company branches, product names, ...) keep in an internal, global, permanent lookup table (London => 1, Paris => 2...)
Other than that, I think storing data on server side, identified by a random key as #Guido already suggests, is the only real way. The up side is that you have no size limit at all: An URL like
example.com/?key=A23H7230sJFC
can "contain" as much information on server side as you want.
The down side, of course, is that in order for these URLs to work reliably, you'll have to keep the data on your server indefinitely. It's like having your own little URL shortening service... Whether that is an attractive option, will depend on the overall situation.
I think that's pretty much it!

One option which is good when they really are navigatable parameters is to work these parameters into the first section of the URL e.g.
http://example.site.com/ViewPerson.xx?PersonID=123
=>
http://example.site.com/View/Person/123/

If the data in the URL is automatically generated can't you just generate it again when needed?
With little information it is hard to think of a solution but I'd start by researching what RESTful architectures do in terms of using hypermedia (i.e. links) to keep state. REST in Practice (http://tinyurl.com/287r6wk) is a very good book on this very topic.

Not sure what application you are using. I have had the same problem and I use a couple of solutions (ASP.NET):
Use Server.Transfer and HttpContext (PreviousPage in .Net 2+) to get access to a public property of the source page which holds the data.
Use Server.Transfer along with a hidden field in the source page.
Using compression on querystring.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart