pytesseract ocr limit length of out characters using config

pytesseract ocr limit length of out characters using config - opencv

I am building an application in python using opencv which extracts characters from an image and runs pytesseract to convert to text.
I know that the characters are always 2 digits longs (range 10-99). How do I configure the parameters so that single digit outputs are not returned.
I have the following in my code:
text = pytesseract.image_to_string(Image.open(filename),config='--psm 100 --eom 3 -c tessedit_char_whitelist=0123456789')
what do I put instead of config='--psm 100 --eom 3 -c tessedit_char_whitelist=0123456789' so that it only returns 2 digit numbers (i.e. 01 but not 5)

Related

How to generate a sequence code string in Rails

I have a model which has a column named code, which is a combination of the model's name column and its ID with leading zeros.
name = 'Rocky'
id = 16
I have an after_create callback which runs and generates the code:
update(code: "#{self.name[0..2].upcase}%.4d" % self.id)
The generated code will be:
"ROC0016"
The code is working.
I found (%.4d" % self.id) from another project, but I don't know how it works.
How does it determine the number of zeros to be preceded based on the passed integer.

You’re using a "format specifier". There are many specifiers, but the one you’re using, "%d", is the decimal specifier:
% starts it. 4 means it should always use at least four numbers, so if the number is only two digits, it gets padded with 0s to fill in the rest of the numbers. The second % means replace 4d with whatever comes after it. So in your case, 4d is getting replaced with "0016".
sprintf has more information about format specifiers.
You can read more about String#% in the documentation also.

After the percentage sign ("%") is a decimal (".") and a number. That number is the number of total digits in the result. If the result is less than this value, additional zeros will be added.
Thus, in this first example, the result is "34" but length was set to "4". The result will have two leading zeros to fill it into four digits.
"This is test string %.4d" % 34
result => "This is test string 0034"
"I want more zeroes in my code %.7d" % 34
result => "I want more zeroes in my code 0000034"

is there a way to convert an integer to be always a 4 digit hex number using Lua

I'm creating a Lua script which will calculate a temperature value then format this value as a 4 digit hex number which must always be 4 digits. Having the answer as a string is fine.
Previously in C I have been able to use
data_hex=string.format('%h04x', -21)
which would return ffeb
however the 'h' string formatter is not available to me in Lua
dropping the 'h' doesn't cater for negative answers i.e
data_hex=string.format('%04x', -21)
print(data_hex)
which returns ffffffeb
data_hex=string.format('%04x', 21)
print(data_hex)
which returns 0015
Is there a convenient and portable equivalent to the 'h' string formatter?

I suggest you try using a bitwise AND to truncate any leading hex digits for the value being printed.
If you have a variable temp that you are going to print then you would use something like data_hex=string.format("%04x",temp & 0xffff) which would remove the leading hex digits leaving only the least significant 4 hex digits.
I like this approach as there is less string manipulation and it is congruent with the actual data type of a signed 16 bit number. Whether reducing string manipulation is a concern would depend on the rate at which the temperature is polled.
For further information on the format function see The String Library article.

pyBrain using letters as input

With pybrain, it's not possible to use letters as input in a dataset. For example, if I do this:
from pybrain.datasets import ClassificationDataSet
ds = ClassificationDataSet(2)
ds.addSample(('a','b'),1)
I get:
ValueError: could not convert string to float: a
Does it make sense to convert each letter to an integer and make those integers be the features for pybrain? For example, the letter a would be 1 and the letter z would be 26.
My concern with this is that there is 0 relation between letters, and I'm not sure whether a number replacing each position in the string would be incorrectly treated as greater/less quantities of some feature by the neural network.

Delphi base convert Binary to Decimal

Im converting binary to decimal and Im converting Decimal to binary. My problem is Length of the binary integer. For example:
Convertx("001110",2,10) = 14
Convertx("14",10,2) = 1110
But length of the binary is NOT constant, So How can I get exact original binary with zeros front of it? How can I get "001110" instead of "1110" ?
I m using this function in Delphi 7. -> How can I convert base of decimal string to another base?

The function you are using returns a string that is the shortest length required to express the value you have converted.
Any zeroes in front of that string are simply padding - they do not alter the binary value represented. If you need a string of a minimum length then you need to add this "padding" yourself. e.g. if you want a binary representation of a "byte" (i.e. 8 binary digits) then the minimum length you would need is 8:
binStr := Convertx("14",10,2);
while Length(binStr) < 8 do
binStr := '0' + binStr;
If you need the exact number of zeroes that were included in the "padding" of some original binary value when converting from binary to decimal and then back to "original" binary again, then this is impossible unless you separately record how many padding zeroes there were or the length of the original string, including those zeroes.
i.e. in your example, the ConvertX function has no idea (and now way to figure out) that the number "14" it is asked to convert to binary was originally converted from a 6 digit binary string with 2 leading zeroes, rather than an 8 digit binary with 4 leading zeroes (or a 16 digit binary with 12 leading zeroes, etc etc).

What you are hoping for is impossible. Consider
Convertx('001110', 2, 10)
and
Convertx('1110', 2, 10)
These both return the same output, 14. At that point there is no way to recover the length of the original input.
The way forward is therefore clear. You must remember the length of the original binary, as well as the equivalent decimal. However, once you have reached that conclusion then you might wonder whether there is an even simpler approach. Just remember the original binary value and save yourself having to convert back from decimal.

Why 255 is the limit

I've seen lots of places say:
The maximum number of characters is 255.
where characters are ASCII. Is there a technical reason for that?
EDIT: I know ASCII is represented by 8 bits and so there're 256 different characters. The question is why do they specify the maximum NUMBER of characters (with duplicates) is 255.

I assume the limit you're referring to is on the length of a string of ASCII characters.
The limit occurs due to an optimization technique where smaller strings are stored with the first byte holding the length of the string. Since a byte can only hold 256 different values, the maximum string length would be 255 since the first byte was reserved for storing the length.
Some older database systems and programming languages therefore had this restriction on their native string types.

Extended ASCII is an 8-bit character set. (Original ASCII is 7-bit, but that's not relevant here.)
8 bit means that 2^8 different characters can be referenced.
2^8 equals 256, and as counting starts with 0, the maximum ASCII char code has the value 255.
Thus, the statement:
The maximum number of characters is 255.
is wrong, it should read:
The maximum number of characters is 256, the highest possible character code is 255.
To understand better how characters are mapped to the numbers from 0 to 255, see the 8-bit ASCII table.

the limit is 255 because 9+36+84+126 = 255. the 256th character (which is really the first character) is zero.
using the combinatoric formula Ck(n) = n/k = n!/(k!(n-k)!) to find the number of non-repeating combinations for 1,2,3,4,5,6,7,8 digits you get this:
of digits: 1 2 3 4 5 6 7 8
of combinations: 9 36 84 126 126 84 36 9
it is unnecessary to include 5-8 digits since it's a symmetric group of M. in other words, a 4 element generator is a group operation for an octet and its group action has 255 permutations.
interestingly, it only requires 3 digits to "count" to 1000 (after 789 the rest of the numbers are repetitions of previous combinations).

The total number of Character in ASCII table is 256 (0 to 255). 0 to 31(total 32 character ) is called as ASCII control characters (character code 0-31). 32 to 127 character is called as ASCII printable characters (character code 32-127). 128 to 255 is called as The extended ASCII codes (character code 128-255).
The ASCII value of a-z = 97-122
The ASCII value of A-Z = 65-90
The ASCII value of 0-9 = 48-57

Is there a technical reason for that?
Yes there is. Early ASCII encoding standard is 7 bit log, which can represent 2^7 = 128 (0 .. 127) different character codes.
What you are talking about here is a variant of ASCII encoding developed later, which is 8 bit log and can hold 2^8 = 256 (0 .. 255) character codes.
See Wikipedia for more information on the same.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

pytesseract ocr limit length of out characters using config - opencv

Related

How to generate a sequence code string in Rails

is there a way to convert an integer to be always a 4 digit hex number using Lua

pyBrain using letters as input

Delphi base convert Binary to Decimal

Why 255 is the limit

Categories

Resources