Impala Column Name Issue

Impala Column Name Issue - identifier

We are facing a problem with Imapla Column naming convention which seems unclear to us.
The CDH imapala documentation (http://www.cloudera.com/documentation/archive/impala/2-x/2-0-x/topics/impala_identifiers.html) 3rd bullet point says : An identifier must start with an alphabetic character. The remainder can contain any combination of alphanumeric characters and underscores. Quoting the identifier with backticks has no effect on the allowed characters in the name.
Now, due to dependency with the upstream SAP systems, we had to name a column name starting with (0) zero as numeric. While defining and extracting the records from the table impala does not show any semantic error. While connecting Imapala with SAP HANA through SDA (Smart Data Access), the extraction is failing for this particular column which is starting with a leading zero (0) and fine for rest of the columns which are starting with an alphabet. The error shows as "... ^ Encountered: DECIMAL LITERAL "
I have to points.
If the documentation says, an identifier can not start anything other that alphabet, then how the imapla query is running without any issues.
Why the error is only raised while it is getting extracted from SAP HANA.
Any insight will be highly appreciable.

Ok, I can only say something about the SAP HANA side here, so you will have to check the Impala side somehow.
The error message you get while accessing an external table via SDA typically comes from the 3rd party client software, in this case the ODBC driver you use to connect to Impala.
So, SAP HANA tries to access the table through the Impala ODBC driver and that driver returns the error message.
I assume that the object name check for Impala is implemented in the client in this case. Not sure if the way you use to run the query in Impala also uses the driver.
But even if Impala has the limitation for the table naming in place, I fail to see why this would force you to name the table in SAP HANA that way. If the upstream data access requires the leading 0 just create a view on top of the table and you're good to go.

Related

Informix with en_US.57372 character set gives an error for a LATIN CAPITAL LETTER A WITH CIRCUMFLEX

I am trying to read data from Informix DB using an ODBC driver.
Everything is just fine until I am trying to read a few characters such as ÂðŸ“ˆ'.
The ERR message I am having from the driver is Error -21005 which is:
"Invalid byte in codeset conversion input.".
Is there a reason this char set is not able to read those characters? If so, is there a website (I haven't found one) where I can see the whole supported characters for this codeset?

This error -21005 could also mean that you have inserted invalid characters in your database due to your CLIENT_LOCALE being wrong, but this having not been detected by Informix because of it being set the same as DB_LOCALE, which prevented any conversion from having been done.
Then, when you try to read the data containing invalid characters, Informix would produce error -21005, to warn that some of the characters have been replaced by a placeholder, and therefore the data will not be reversible.
See https://www.ibm.com/support/pages/error-21005-when-using-odbc-select-data-database for a detailed explanation on how an incorrect CLIENT_LOCALE can produce error -21005 when querying data.
CLIENT_LOCALE should always be set to the locale of the pc where your queries are being generated, and DB_LOCALE must match the locale with which the database was defined, and which you can find out with "SELECT tabname, site FROM systables WHERE tabid in (90, 91)" but beware that for example en_US.57372 would really mean en_us.utf8, you would need to look in gls\cm3\registry to see the mappings.
EDIT: The answer on Queries problems using IBM Informix ODBC Driver in C# also explains in great detail the misery a wrong CLIENT_LOCALE can bring, and how to fix it.

How to Deploy/compile multiple Stored Procedures using IBM Data Studio?

I am using IBM data studio 4.1.2 client to connect to DB2 data base. I want to know how to deploy(compile) multiple Stored procedures(SP) at once. It was very straight forward and easy in Toad. But unfortunately I cannot use Toad because of some company policy of mine.
In Data Studio I tried these 1) Data source explorer -> Schema -> Stored procedure -> right click -> new stored procedure -> wrote two simple SP one after the other. 'END' of first SP asked 'Statement terminator expected', When I put ';' after END then 'Statement terminator expected instead of ;', when I used '--' at the top then all statements including BEGIN statement of second SP giving syntax error 'END was expected to form a complete scope'. I tried all possible workarounds, But still same syntax error.
I am 100 % sure that there is nothing wrong in SP. But I don't know what is the solution for this. I had gone through lot many IBM support pages, tutorials, demo. All of them demonstrates how to compile one SP at a time opening separate editor window. Actually I have to deploy 300 odd SPs. Please Help me.
This question is not a duplicate of other as This is related to Multiple SP deployment using IDE.

How to locate the field that produces the “data type mismatch” exception?

I have a really long insert query with more than 40 fields (from an 'inherited' Foxpro database) processed using OleDb, that produces the exception 'Data type mismatch.' Is there any way to know which field of the query is producing this exception?
By now I'm using the force brute method of reducing the number of fields on the insert until I locate the buggy one, but I guess it must be a more straight way to found it...

There isn't really any shortcut beyond taking a guess at which 20 might be the problem, chopping out the other 20 and testing, and repeating that reductive process until you hit it.
Or alternatively looking at the table structure(s) in the DBF and making sure the field types match to the OleDB types you're using. The details of how .NET types are mapped to Visual FoxPro table field types is here.
If you have access to the Visual FoxPro IDE you could probably do that a lot quicker by knocking up a little program or even just doing it in the Command Window.

You are not telling us the language you use, so that we could possibly give a sample to handle it.
Basically what you would do is:
Get the structure,
Parse the insert statement and get values,
Compare data types.
It should be a short code to make this check.

tFuzzyMatch apparently not working on Arabic text strings

I have created a job in talend open studio for data integration v5.5.1.
I am trying to find matches between two customer names columns, one is a lookup and the other contain dirty data.
The job runs as expected when the customer names are in english. However, for arabic names, only exact matches are found regardless of the underlying match algorithm i used (levenschtein, metaphone, double metaphone) even with loose bounds for the levenschtein algorithm min 1 max 50).
I suspect this has to do with character encoding. How should I proceed? any way I can operate using the unicode or even UTF-8 interpretation in Talend?
I am using excel data sources through tFileInputExcel

I got it resolved by moving the data to mysql with a UTF-8 collation. Somehow Excel input wasn't preserving the collation.

Delphi TBytesField - How to see the text properly - Source is HIT OLEDB AS400

We are connecting to a multi-member AS400 iSeries table via HIT OLEDB and HIT ODBC.
You connect to this table via an alias to access a specific multi-member. We create the alias on the AS400 this way:
CREATE ALIAS aliasname FOR table(membername)
We can then query each member of the table this way:
SELECT * FROM aliasname
We are testing this in Delphi6 first, but will move it to D2010 later
We are using HIT OLEDB for the AS400.
We are pulling down records from a table and the field is being seen as a tBytesField. I have also tried ODBC driver and it sees as tBytesField as well.
Directly on the AS400 I can query the data and see readable text. I can use the iSeries Navigation tool and see readable text as well.
However when I bring it down to the Delphi client via the HIT OLEDB or HIT ODBC and try to view via asString then I just see unreadable text.. something like this:
ñðð#ðõñððððñ÷#õôððõñòøóóöøñðÂÁÕÒ#ÖÆ#ÁÔÅÙÉÃÁ########ÂÈÙÉâãæÁðòñè#ÔK#k#ÉÕÃK#########ç
I jumbled up the text above, but that is the character types that show up.
When I did a test in D2010 the text looks like japanse or chinese characters, but if I display as AnsiString then it looks like what it does in Delphi 6.
I am thinking this may have something to do with code pages or character sets, but I have no experience in this are so it is new to me if it is related. When I look at the Coded Character Set on the AS400 it is set to 65535.
What do I need to do to make this text readable?
We do have a third party component (Delphi400) that makes things behave in a more native AS400 manner. When I use its AS400 connection and AS400 query components it shows the field as a tStringField and displays just fine. BUT we are phasing out this product (for a number of reasons) and would really like the OLEDB with the ADO components work.
Just for clarification the HIT OLEDB with tADOQuery do have some fields showing as tStringFields for many of the other tables we use... not sure why it is showing as a tBytesField in this case. I am not an AS400 expert, but looking at the field definititions on the AS400 the ones showing up as tBytesField look the same as the ones showing up as tStringFields... but there must be a difference. Maybe due to being a multi-member?
So... does anyone have any guidance on how to get the correct string data that is readable?
If you need more info please ask.
Greg

One problem is that your client doesn't know that it ought to convert the data from EBCDIC to ASCII because the CCSID on the server's table was set incorrectly.
A CCSID of 65535 is supposed to mean that the field contains binary data. Your client doesn't know that the column contains an EBCDIC encoded string, and therefore doesn't try to convert it.
On my servers, all of our character fields have a CCSID of 37, which is EBCDIC.

I found the answer... on both HIT ODBC 400 and HIT OLEDB 400 there is a property called: "Convert CCSID 65535=True" or in the OLEDB UDL it looks like "Binary Characters=True".
Don't know how I missed those, but that did the trick!
Thanks for the feedback.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart