SAS Join Not Matching Character Variables - character-encoding

I am joining two tables by character fields which contain five digits but I'm matching on only about 20 records out of 6,000.
An example of a non-match is the first table shows a value of '09813' but no match is found; yet manually querying the second table for a value of '09813' yields a result. (This seems to indicate an issue with the value in the first table...) I've also noted some anomalies where '7144' seems to be matching to '7144D' but manually retrieving '7144' in the second table doesn't match, but '07144' retrieved the correct match.
I've tried converting the value in the first table to Hex and there doesn't seem to be any additional characters aside from the empty spaces (20) added to the end due to the length of the hex conversion. I also tried adding strip(compress([Field Name],,'kw')) to the join statement and a few other variations to try to remove line breaks, etc., but haven't had any luck thus far.
Any suggestions would be greatly appreciated!

I had the incorrect join field, thank you for the help!

Related

HL7v2: Should the empty/blank repeating fields be removed?

Some segments in HL7v2 can be repeating, but what if one of those repetitions is blank? Should the blank repetition be removed? Or should they remain?
For example, in the below extract PID.13 is a repeating field, but the first repetition is blank. It does not even contain "" (empty string).
PID|||A123456789^^^555^PI||Data^Test^^^Mr||19500101|M|||123 Test Road^Testington^^^AA1 2AA||~07778895566|||M|||||||||||||""|||
The PID-13 field has been deprecated as of v2.7 and should no longer be used. Use PID-40 instead.
PID-13 is a special case because the first occurrence has a special meaning, so if there are multiple field repetitions then you shouldn't remove the first one even if it isn't populated. For other fields which don't have documented special cases, you can safely delete empty field occurrence without changing the meaning of the message.
Please refer to this answer.
There are two things needs to be understood.
First:- Empty/blank/null value is also a value. Blank repetitions should not be removed.
Following is what specifications (2.3.2.4 Repetition Separator) say:
2.3.2.4 Repetition Separator.
The repetition separator is used in some data fields to separate multiple occurrences of a field. It is used only where specifically authorized in the descriptions of the relevant data fields. The character that represents the repetition separator is specified for each message as the second character in the Encoding Characters data field of the MSH segment. Absent other considerations it is recommended that all sending applications use '~' as the repetition separator. However, all applications are required to accept whatever character is included in the Message Header and use it to parse the message.
Yes; it does not clearly say anything about removing or keeping empty sub-components. Yes, it neither specifically say that empty value is also a value nor the opposite. I fail to find it in other parts of specifications as well.
To come to the conclusion, we need to move to second thing.
Second:- The sequence of repetition values may also be important. This sequence will change if empty values are removed. This may also change the meaning of the value.
Let us take an example of PID.13 you mentioned in the question.
This field contains the patient's personal phone numbers. All personal phone numbers for the patient are sent in the following sequence. The first sequence is considered the primary number (for backward compatibility). If the primary number is not sent, then a repeat delimiter is sent in the first sequence.
As you can see above, empty value for first sub component tells you that "there is no primary number available for patient". By removing empty value, you are actually putting "secondary number" in place of primary number which may be wrong depending on your use case or implementation.
Other example of PID.3:
This field contains the list of identifiers (one or more) used by the facility to uniquely identify a patient (e.g., medical record number, billing number, birth registry, national unique individual identifier, etc.).
As you can see, by removing empty values in-between changes the meaning of identifier.
I will still prefer clear reference from specifications, but based on what said above, I will avoid removing empty values.

How to create 2 query options inside an IF function in google sheets

I am trying to create an if function and if true execute one query if false execute another query, but I keep on having a 'Array Literal was missing values for one or more rows' error, the funtion goes like this:
=ARRAYFORMULA({"Cases";IF(H2="SFDC",QUERY(OPTION1),QUERY(OPTION2))})
Any ideia why, is there any other way to construct a conditional like this?
Thanks in Advance.
You need to make sure that our output as the same number of columns. For example, in your equation, you have one column of data (the header "Cases"). Because of this, the result of your IF statement must also have only one column of data.
One way to fix this is to define some extra empty columns. For example:
=ARRAYFORMULA({"Cases","","";IF(H2="SFDC",QUERY({1,2,3},"select *"),QUERY({4,5,6},"select *"))})
I added two more columns after Cases, therefore allowing my resulting query to properly expand.
I managed to solve the issue, I was not specifying the header at the end of the query which was generating a conflict.
Thanks to all.
This error is produced When two arrays have different numbers of columns, in you case make sure the length of array in QUERY(OPTION1) is the same length as
the one in QUERY(OPTION2).
use this formula to check array length.
for QUERY(OPTION1)
=COUNTA(SPLIT(QUERY(OPTION1),"/"))
and this for QUERY(OPTION2)
=COUNTA(SPLIT(QUERY(OPTION2),"/"))
Replace it with your formula of course.

Google Sheets DGET() does not find values

I have a fairly simple table, which looks like the following:
I want to get a specific value from that table. For that I am using two different formulas:
=iferror(DGET(Bazaar!$A:$K;"Top Sell-Offer";{"Item";"ENCHANTED_COAL"});"?") ==> returns "?"; Should get some numbers
=iferror(DGET(Bazaar!$A:$K;"Top Sell-Offer";{"Item";"ENCHANTED_OBSIDIAN"});"?") ==> returns "2747" as expected
I also tried =index(filter(Bazaar!$B:$K;Bazaar!$A:$A="ENCHANTED_COAL");;1)
which does return what I expected, but I can't specify the column I want by header.
Note that both strings for conditions and column headers are copy-pasted and thus character-perfect (As you can also see in the results.)
Also note that this does not happen with the truncated table provided, please refer to this sheet.
Why do I get such inconsistent results and what can I do about that?
Thanks in advance! Stay healthy!
I made a copy of your sheet and messed around with it. Finally, I decided to remove the IFERROR part and I got the error of "More than one match found in DGET evaluation.". The error was the key.
The formula is seeing "ENCHANTED_COAL_BLOCK" as another match for "ENCHANTED_COAL". Once I removed "COAL" from coal block, the formula worked as it should.
In order to get it to stop seeing double (to find an exact match), simply add an equal sign in front of the word you are looking for:
=IFERROR(DGET(Bazaar!$A:$K,"Top Sell-Offer",{"Item";"=ENCHANTED_COAL"}),"?")
=IFERROR(DGET(Bazaar!$A:$K,C$3,{$A$3;"="&$A5}),"?")
I recommend adding the equal sign in front of all words you search for just for consistency purposes.
Reference: https://infoinspired.com/google-docs/spreadsheet/exact-match-in-database-functions-in-google-sheets/

Google Sheets Cross Join Function Tables with More than Two Columns

The crossJoin function posted by #Max Makhrov from the below thread works almost completely for what I was hoping to achieve. It was in response to cross joining two columns and I tried joining two tables, one with two columns and one with five columns. It works but only partially.
The delimiter of the column data is stuck as comma ",". This could be problematic for values with commas. The delimiter variable in the function only defines the two ranges being joined.
If the column being joined is a date for example, it seems to extend out the full date text inclusive of time zone and fixed as text. Is there a way to allow for it to be non-text to be formatted? Even when it's parsed using the split() function it's definitely still text.
Result of JOIN is longer than the limit of 50,000 characters
Below is a link to the example input and output. The first output example is a standard cross join. The other is the actual desired output which filters for any data rows where the date in column 5 is greater than or equal to the date in column 2.
https://docs.google.com/spreadsheets/d/1FGS8lYyy60AH49Qyug8Uxaey5jxDksihOks7ll8Hq10/edit?usp=drivesdk
Your spreadsheet is View Only, so i can't demo it there, but try this. On the demo sheet, start a new tab, then put this formula in cell A2.
Happy to walk you through it a bit if it works. Otherwise, maybe make the sample editable so i can troubleshoot w/ you in the same place?
=ARRAYFORMULA(QUERY({HLOOKUP({"A","B"},{"A","B";Sheet1!A5:B},SEQUENCE(COUNTA(Sheet1!D5:D)*COUNTA(Sheet1!A5:A),1,0)/COUNTA(Sheet1!D5:D)+2),HLOOKUP({"D","E","F","G"},{"D","E","F","G";Sheet1!D5:G},MOD(SEQUENCE(COUNTA(Sheet1!D5:D)*COUNTA(Sheet1!A5:A),1,0),COUNTA(Sheet1!D5:D))+2)},"where Col2>=Col5"))

#NAME? Error with Google Sheets Pivot Table Calculated Field

I looked this issue up and did not find any sufficient solution for myself.
I am trying to create a Calculated Field through Google Sheets Pivot Tables and I'm getting the #NAME? error.
To my best understanding I'm following the steps given to create the Calculated Field correctly and no where can I find an explanation as to what I'm doing wrong.
Here is a sample of my spreadsheet (with private information removed):
https://docs.google.com/spreadsheets/d/1QCtZceVYFaPEWh57TrMLyFS8e7vGXunf7EWIEHZmCpo/edit?usp=sharing
I've gone through all the steps to verify that my method is correct. None of those column headers have any spaces in them, so I don't need the ' ' encasing each word.
What am I doing wrong? Please help.
I figured it out. Funny how the universe works. It's like as soon as I was willing to ask for help, the answer came to me :).
The column headers themselves were in a number format (Accounting to be specific). Because the values of those columns were intended to be dollar amounts, I formatted the entire column to Accounting. In doing so, the header (which contained text) also became formatted as an Accounting value.
Because of the headers were in a number format, the Calculated Field was unable to match the text to the value in the header. What this means: When creating a Calculated Field with Google Sheets Pivot Tables, the values being entered are explicitly defined (and matched accordingly) by Google Sheets. Text is probably actually looked at as a string type where as numbers are looked at as numbers (which makes sense of course).
Solution: I changed the headers to a text format and the calculated field no longer threw the #NAME? error! Yay :)

Resources