What is this user input encoded as? - character-encoding

What encoding/encryption/manipulation would turn the following values from what you see on the left to what's on the right?
146.00 => 4046401A36E2EB1D
36.30 => 4042266666666666
76.22 => 40530E147AE147AE
3865.20 => 40DA06683E8C7FD4
0.200 => 3FC999999999999A
I am working with an XML file from a software application we use at work. I am trying to set up a tool that helps interpret and manipulate the XML files outside of the software, to allow work to be done while off of the limited licenses we have. In the software, users populate fields and can import/export XML files containing the info they have entered. When I open these XML files in a text editor, all the fields are clearly labeled as they would be in the program itself. The user input data is "encoded" however (hoping that's the accurate term), and it appears to be hexadecimal.
I have been able to take string and integer inputs and convert them back and forth to what's in the XML file, although the strings are backwards (the hex decodes to "w im 9" when the user input "9 mi w"). However anything the user enters as a decimal number is giving me trouble [edit: I determined the trouble is with fields that have associated units]. Some preliminary research has brought me to the idea of "attributes", but I don't know enough of XML to make use/sense of it. Below are two lines from the XML, the first one where the user data plays nice when trying to decode, and the second where something else is happening:
<BRIDGE_ID HEX="true">#31</BRIDGE_ID> Here the user just entered "1" for the Bridge ID
<LENGTH Units="23" HEX="true">#3FD381D7DBF487FD</LENGTH> Here the user entered "1" for length and the program forced it to 1.00 before exporting. This field is in feet.
I have discovered that the fields which assign units to the values are the ones that are not reversing nicely. Any field without units, i.e. no attributes in XML, works great in a simple web decoder. So the attributes complicate it somehow. In the first 5 examples at the top, the first value is in feet (Units="23"), while the second and third fields are both degrees (Units="52").
I know this is all over the place! Thank you anyone who can make sense of it and help me out!

For the 2nd, 3rd, and 5th values, the 16-digit hex string is simply the hex representation of the internal 64-bit double-precision IEEE floating point value whose decimal representation appears on the left.
That doesn't work for the 1st and 4th values, where the hex string is the representation of 44.50 and 26649.628817677338 respectively. Since you talk about units, perhaps there might be conversion from American units to metric involved?
This question has nothing to do with XML. Just because the data is wrapped in XML tags doesn't make it an XML question.

Related

How can these strings be different?

I am facing a weird problem.
I have extracted data from an Excel file. It should contain an IBAN account number.
Then I tried to analyze the set of account numbers (which the source guarantees to be good) with a Java library.
To keep the scope of the question narrow, I can't explain the following. The below strings are different
030​69
03069
The first is a copy & paste from the Excel file, the second is handwritten. Google returns different results for abi [above number] and in fact in the second case I can find that it is the bank code for Intesa Sanpaolo bank (exact page displaying the ABI code, localized, here).
So, to keep the scope narrow: how is that possible? Is it something to do with the encoding?
Try it yourself: do CTRL+F and try type "030", it will select both lines. Now type 6, it will match only the 2nd line.
Same happened in Notepad++
There's an U+200B ZERO WIDTH SPACE in between 030 and 69 in the first text.
Paste the text in https://www.branah.com/unicode-converter for example, or edit in a hexadecimal capable editor.
The solution for cleaning such strings could be for example to whitelist characters, so replace everything that isn't A-Z0-9 will be scrubbed.

Delphi - comparison of two "Real" number variables

I have problem with comparison of two variables of "Real" type. One is a result of mathematical operation, stored in a dataset, second one is a value of an edit field in a form, converted by StrToFloat and stored to "Real" variable. The problem is this:
As you can see, the program is trying to tell me, that 121,97 is not equal to 121,97... I have read
this topic, and I am not copletely sure, that it is the same problem. If it was, wouldn't be both the numbers stored in the variables as an exactly same closest representable number, which for 121.97 is 121.96999 99999 99998 86313 16227 83839 70260 62011 71875 ?
Now let's say that they are not stored as the same closest representable number. How do I find how exactly are they stored? When I look in the "CPU" debugging window, I am completely lost. I see the adresses, where those values should be, but nothing even similar to some binary, hexadecimal or whatever representation of the actual number... I admit, that advanced debugging is unknown universe to me...
Edit:
those two values really are slightly different.
OK, I don't need to understand everything. Although I am not dealing with money, there will be maximum 3 decimal places, so "currency" is the way out
BTW: The calculation is:
DATA[i].Meta.UnUsedAmount := DATA[i].AMOUNT - ObjQuery.FieldByName('USED').AsFloat;
In this case it is 3695 - 3573.03
For reasons unknown, you cannot view a float value (single/double or real48) as hexadecimal in the watch list.
However, you can still view the hexadecimal representation by viewing it as a memory dump.
Here's how:
Add the variable to the watch list.
Right click on the watch -> Edit Watch...
View it as memory dump
Now you can compare the two values in the debugger.
Never use floats for monetary amounts
You do know of course that you should not use floats to count money.
You'll get into all sorts of trouble with rounding and comparisons will not work the way you want them too.
If you want to work with money use the currency type instead. It does not have these problems, supports 4 decimal places and can be compared using the = operator with no rounding issues.
In your database you use the money or currency datatype.

tFuzzyMatch apparently not working on Arabic text strings

I have created a job in talend open studio for data integration v5.5.1.
I am trying to find matches between two customer names columns, one is a lookup and the other contain dirty data.
The job runs as expected when the customer names are in english. However, for arabic names, only exact matches are found regardless of the underlying match algorithm i used (levenschtein, metaphone, double metaphone) even with loose bounds for the levenschtein algorithm min 1 max 50).
I suspect this has to do with character encoding. How should I proceed? any way I can operate using the unicode or even UTF-8 interpretation in Talend?
I am using excel data sources through tFileInputExcel
I got it resolved by moving the data to mysql with a UTF-8 collation. Somehow Excel input wasn't preserving the collation.

iOS - Best way to get numbers out of NSString! (Geo coordinates)

For my internship I'm working on a App that uses GPS data! That's already implemented and I wrote a class which converts the double-value the mapView sends into an user-picked format for Geo Coordinates (Degrees, Degrees-Minutes or Degrees-Minutes-Seconds)! Now there are also text fields the user should enter some coordinates in for adding waypoints!
What's the best technique here to get a the seperate numbers out of a string in a format similar to this 57° 14' 03" N!
Since it's a user input, the format won't be this, it's only similar! So is it better to parse these out the string or maybe limit the users input method from a textfield to something more strict which only allows one format (separate textfields for each number f.ex.)!
Actually a question to UX rather than a how-to-do!
Acting as the delegate of the text field and not allowing invalid content / format is a good first step.
For parsing the string, NSScanner is the appropriate class to use to split out the parts. If you tie the format down though, you could use componentsSeparatedByString: to separate each number by the space between them.
First, a comment. Ending all your sentences with exclamation points is silly!
As to your question. Yes, you should enforce a strict input format on your users. If you look in the software developers's dictionary, user is a synonym for idiot.
I would suggest separate UITextFields for each numeric value of each lat/long, with the symbols drawn in place with labels. The user would enter degrees, and the input would jump to minutes. The user enters minutes, and the input jumps to seconds. The user enters seconds, and the input jumps to degrees on the longitude.
Validate each input as a well-formed number.
If you want to use free-form input of strings like "57° 14' 03" N!", you might want to create a regular expression to validate it, plus range-checking on the numeric parts. If you Google it you should find online docs on regular expression. I don't use them often enough to be able to write a regular expression off the top of my head.

How many chars can numeric EDIFACT data elements be long?

In EDIFACT there are numeric data elements, specified e.g. as format n..5 -- we want to store those fields in a database table (with alphanumeric fields, so we can check them). How long must the db-fields be, so we can for sure store every possible valid value? I know it's at least two additional chars (for decimal point (or comma or whatever) and possibly a leading minus sign).
We are building our tables after the UN/EDIFACT standard we use in our message, not the specific guide involved, so we want to be able to store everything matching that standard. But documentation on the numeric data elements isn't really straightforward (or at least I could not find that part).
Thanks for any help
I finally found the information on the UNECE web site in the documentation on UN/EDIFACT rules Part 4. UN/EDIFACT rules Chapter 2.2 Syntax Rules . They don't say it directly, but when you put all the parts together, you get it. See TOC-entry 10: REPRESENTATION OF NUMERIC DATA ELEMENT VALUES.
Here's what it basically says:
10.1: Decimal Mark
Decimal mark must be transmitted (if needed) as specified in UNA (comma or point, put always one character). It shall not be counted as a character of the value when computing the maximum field length of a data element.
10.2: Triad Seperator
Triad separators shall not be used in interchange.
10.3: Sign
[...] If a value is to be indicated to be negative, it shall in transmission be immediately preceded by a minus sign e.g. -112. The minus sign shall not be counted as a character of the value when computing the maximum field length of a data element. However, allowance has to be made for the character in transmission and reception.
To put it together:
Other than the digits themselves there are only two (optional) chars allowed in a numeric field: the decimal seperator and a minus sign (no blanks are permitted in between any of the characters). These two extra chars are not counted against the maximum length of the value in the field.
So the maximum number of characters in a numeric field is the maximal length of the numeric field plus 2. If you want your database to be able to store every syntactically correct value transmitted in a field specified as n..17, your column would have to be 19 chars long (something like varchar(19)). Every EDIFACT-message that has a value longer than 19 chars in a field specified as n..17 does not need to be stored in the DB for semantic checking, because it is already syntactically wrong and can be rejected.
I used EDI Notepad from Liaison to solve a similar challenge. https://liaison.com/products/integrate/edi/edi-notepad
I recommend anyone looking at EDI to at least get their free (express) version of EDI Notepad.
The "high end" version (EDI Notepad Productivity Suite) of their product comes with a "Dictionary Viewer" tool that you can export the min / max lengths of the elements, as well as type. You can export the document to HTML from the Viewer tool. It would also handle ANSI X12 too.

Resources