I am using Jenkins Job triggered remotely to trigger POST request (curl), and my body can be as large as 100,000 characters. I am using Multiline String as the input parameter to hold this data.
However, this is only able to hold around 58000 characters. How can I increase the character limit of Multiline String?
If not, what else can I use?
Related
Currently, pasting 112,359,1003 into Google Sheets automatically converts the value to 1123591003.
This prevents me from applying the Split text to columns option as there are no commas left to split by.
Note that my number format is set to the following (rather than being Automatic):
Selecting the Plain text option prevents the commas from being truncated but also prevents me from being able to use the inserted data in formulas.
The workaround for this is undesirable when inserting large amounts of data: select cells that you expect to occupy, set to Plain Text, paste data, set to back to desired number format.
How do I disable the automatic interpretation by Google Spreadsheet of the commas in my pasted numeric values?
You can not paste it in any number format, because of the nature of numerical format types. It will parse it into an actual number and physically store it in this format. Using plaintext type, like you are, is the way to go for this.
However, there are some options to perform these tasks in a slightly different way;
- you might be able to use CSV-import functionality, which prevents having to change types for a sheet.
- you can use int() function to parse the plaintext value into an int. (and combine this with lookup functions).
TEXT formatting:
Use ' to prepend the number. It'll be stored as text regardless of actual formatting.
Select the column and set formatting as `plain text.
In both the above cases, You can multiply the resulting text by 1 *1 to use in any formula as a number.
NUMBER formatting:
Keep Number formatting with ,/Automatic.
Here, though split text to columns might not work, You can use TEXT() or TO_TEXT()
=ARRAYFORMULA(SPLIT(TO_TEXT(A1:A5),","))
i am trying to check if a string is empty by doing the following
if trim(somestring) = '' then
begin
//that an empty string
end
i am doing this empty check in my client application , but i notice that some clients inserts empty strings even with this check applied .
on my Linux server side those empty chars showing as squares and when i copy those chars i can be able to bypass the empty string check like the follwing example
if you copy this empty chars the check will not be effected how can i avoid that ?
Your code is working correctly and the strings are not empty. An empty string is one whose length is zero, it has no characters. You refer to empty characters, but there is no such thing. When. your system displays a small empty square that indicates that the chosen font has no glyph for that character.
Let us assume that these strings are invalid. In that case the real problem you have is that you don't yet fully understand what properties are required for a string to be valid. Your code is written assuming that a string is valid if it contains at least one character with ordinal value greater than 32. Clearly that test is not correct. You need to step back and work out the precise rules for validity. Only when these are clear in your mind can you correct you program and check for validity correctly.
On the other hand perhaps these strings are valid and the mistake is simply that you are erroneously determining otherwise when you inspect the data. Only you can know this, we don't have the information.
A useful technique in all of this is to inspect the ordinal values of the strings. Loop through the characters printing the ordinal value of each one. That allows you to see what is really there and not be at the mercy of non-printing characters, characters with no glyph, invalid encodings, etc.
Since Trim is pretty simple function, it omits only characters with less than or equal dec 32 in ASCII table
( Sample from System.SysUtils.pas )
while S.Chars[L] <= ' ' do Dec(L);
Therefore it's possible that You just can't see some exotic chars ( > ASCII 128) due to bad encoding used with Your input string.
try to use :
StrToInt(Ord(SomeChar))
On every char that is not "trimmed" and remove them by hand or check Your encoding.
Kind Regards
I am talking about the custom parsing phase happening in some program not related to Solr and even before the Solr tokenizers can work on it. If I parse the data for say white spaces, tabs and other non printable characters then when that data actually comes to Solr master for indexing, how would the Solr tokenizers differentiate between separate words which were previously separated by spaces or tabs or some other non-printable characters?
Example code and output from pre-processor:
<?php$text = '<div>This is a sample text to be indexed</div>';
//Remove HTML tags
$text_refined1 = strip_tags($text);
//Remove non-printable unicode characters
$text_refined2 = preg_replace('/[\x00-\x08\x0B\x0C\x0E-\x1F\x80-\x9F]/u', '', $text_refined1);
//Remove line feeds, carriage returns and tabs
$text_refined3 = preg_replace('/\s+/', '', $text_refined2);
echo $text_refined3;
---output---
Thisisasampletexttobeindexed
Based on the example you give. e.g. output Thisisasampletexttobeindexed, Solr's existing query analyzer will not be able to tokenize it correctly.
Solr(Lucene) needs some way to seperate the individual words from the input.
You can use solr's analysis admin UI to test this string with different analyzers. In my solr test instance, they all return the original string.
You can configure which Tokenizer to use in Solr. There is a list at https://cwiki.apache.org/confluence/display/solr/Tokenizers
Indexing a stream of non-delimited English words properly is not supported by any existing Tokenizer in Solr. You could conceivably build a custom one with a dictionary, but it would produce errors as the input is ambiguous. Or you could use the N-Gram Tokenizer and accept a lot of false positives when you search.
The right solution is not to feed such a stream in the first place. If you need the tightly concatenated string for something internal, then produce a separate version for indexing, where you replace the offending characters with space instead of the empty string.
I have an application that has list of ID's as a part of the URL but soon the number of ID's are going to increase drastically. What is the best method to have a big number of ID's in the URL. The ID's can go up to 200 or more at a time.
You could encode your ID array in a string (JSON is an easy format for that) and transmit it as a single variable via POST.
Simple GET Parameters or even the URL itself has some limits on it's length that can no be avoided. Most Webservers also have security Filters in place that wont accept more than a certain number of Parameters. (Suhosin)
See:
What is the maximum length of a URL in different browsers?
What is apache's maximum url length?
http://www.suhosin.org/stories/configuration.html
In EDIFACT there are numeric data elements, specified e.g. as format n..5 -- we want to store those fields in a database table (with alphanumeric fields, so we can check them). How long must the db-fields be, so we can for sure store every possible valid value? I know it's at least two additional chars (for decimal point (or comma or whatever) and possibly a leading minus sign).
We are building our tables after the UN/EDIFACT standard we use in our message, not the specific guide involved, so we want to be able to store everything matching that standard. But documentation on the numeric data elements isn't really straightforward (or at least I could not find that part).
Thanks for any help
I finally found the information on the UNECE web site in the documentation on UN/EDIFACT rules Part 4. UN/EDIFACT rules Chapter 2.2 Syntax Rules . They don't say it directly, but when you put all the parts together, you get it. See TOC-entry 10: REPRESENTATION OF NUMERIC DATA ELEMENT VALUES.
Here's what it basically says:
10.1: Decimal Mark
Decimal mark must be transmitted (if needed) as specified in UNA (comma or point, put always one character). It shall not be counted as a character of the value when computing the maximum field length of a data element.
10.2: Triad Seperator
Triad separators shall not be used in interchange.
10.3: Sign
[...] If a value is to be indicated to be negative, it shall in transmission be immediately preceded by a minus sign e.g. -112. The minus sign shall not be counted as a character of the value when computing the maximum field length of a data element. However, allowance has to be made for the character in transmission and reception.
To put it together:
Other than the digits themselves there are only two (optional) chars allowed in a numeric field: the decimal seperator and a minus sign (no blanks are permitted in between any of the characters). These two extra chars are not counted against the maximum length of the value in the field.
So the maximum number of characters in a numeric field is the maximal length of the numeric field plus 2. If you want your database to be able to store every syntactically correct value transmitted in a field specified as n..17, your column would have to be 19 chars long (something like varchar(19)). Every EDIFACT-message that has a value longer than 19 chars in a field specified as n..17 does not need to be stored in the DB for semantic checking, because it is already syntactically wrong and can be rejected.
I used EDI Notepad from Liaison to solve a similar challenge. https://liaison.com/products/integrate/edi/edi-notepad
I recommend anyone looking at EDI to at least get their free (express) version of EDI Notepad.
The "high end" version (EDI Notepad Productivity Suite) of their product comes with a "Dictionary Viewer" tool that you can export the min / max lengths of the elements, as well as type. You can export the document to HTML from the Viewer tool. It would also handle ANSI X12 too.