Cant seem to find the Syntax error - data-warehouse

Can somebody tell me what I am doing wrong here, I am getting incorrect syntax near where. Whats wrong with the statement?
INSERT INTO dbo.a1
select x1 as id, x2 as enc_id, x3 as dev,x4 as mang, x4 as sre, x5 as phase,x6, x7, x8
from
(select *,
row_number() over(partition by x2 order by x8) as rank
from ccsm.n9 where xx=1 and xx_pat=1 and xx_encz='Tesz')
where rank=1;

You never told us what version of SQL you are using, but one possibility is that your derived table needs an alias:
INSERT INTO dbo.a1
SELECT t.x1, t.x2, t.x3, t.x4, t.x4, t.x5, t.x6, t.x7, t.x8
FROM
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY x2 ORDER BY x8) AS rank
FROM ccsm.n9
WHERE xx = 1 and xx_pat = 1 and xx_encz = 'Tesz'
) t
WHERE t.rank = 1;
I know that Oracle enforces the derived table alias rule, and possibly other databases as well. Note that the aliases you had in the SELECT statement serve no purpose because the values are just being inserted. If you wish to insert x4 twice then just repeat it twice.

Related

Need help explaining this formula provided to me

I recently posted on here to get help with a formula, here is the link...https://stackoverflow.com/questions/75068029/vlook-up-style-forumla-but-range-is-2-cells A user called rockinfreakshow was really awesome and provided a great solution for me. I'm not very experienced and don't understand what the formula at all but I'd love to be able to add more attributes to it. Is anyone able to help break it down for me ?
I havent tried anything here, it's totally out of my realm of understanding
=MAKEARRAY(COUNTA(B2:B),COUNTA(D1:O1),LAMBDA(r,c,IF(REGEXMATCH(LAMBDA(ax,bx,IFS(REGEXMATCH(ax,"Mixed")*REGEXMATCH(INDEX(C2:C,r),"Blend")*REGEXMATCH(INDEX(C2:C,r),"Filter"),"BLEND-"&bx&"|FILTER-"&bx,REGEXMATCH(ax,"Mixed")*NOT(REGEXMATCH(INDEX(C2:C,r),"Blend"))*REGEXMATCH(INDEX(C2:C,r),"Filter"),"ESP-"&bx&"|FILTER-"&bx,REGEXMATCH(ax,"Mixed")*NOT(REGEXMATCH(INDEX(C2:C,r),"Filter")),"BLEND-"&bx&"|ESP-"&bx,LEN(ax),SUBSTITUTE(ax&"-"&bx,"Espresso","ESP")))(regexextract(INDEX(B2:B,r),"([^\s]*?) Subscription"),IFNA(SWITCH(REGEXEXTRACT(INDEX(C2:C,r),"Small|Medium|Large"),"Small",250,"Medium",450,"Large",900),SWITCH(REGEXEXTRACT(INDEX(B2:B,r),"Medium|Large"),"Medium",225,"Large",450))),"(?i)"&INDEX(D1:O1,,c)),1,)))
see the WHY LAMBDA? part of this answer to understand the LAMBDA
the formula contains 2x LAMBDA and there are a total of 4 placeholders which translates to:
r - COUNTA(B2:B)
c - COUNTA(D1:O1)
ax - REGEXEXTRACT(INDEX(B2:B, r), "([^\s]*?) Subscription")
bx - IFNA(SWITCH(REGEXEXTRACT(INDEX(C2:C, r), "Small|Medium|Large"),
"Small", 250, "Medium", 450, "Large", 900),
SWITCH(REGEXEXTRACT(INDEX(B2:B, r), "Medium|Large"),
"Medium", 225, "Large", 450))
r counts how many items are in B column
c counts how many items are in row 1 of range D1:O1
ax extracts the word from B column that precedes the word Subscription
bx is a bit complex but essentially it extracts from C column word Small or Medium or Large and replaces it with 250, 450 or 900 respectively. then if C column does not contain one of those 3 words it checks for Medium or Large within B column and assigns 225 or 450 respectively
what we are left with is the core of the formula:
IFS( REGEXMATCH(ax, "Mixed")*
REGEXMATCH(INDEX(C2:C, r), "Blend")*
REGEXMATCH(INDEX(C2:C, r), "Filter"), "BLEND-"&bx&"|FILTER-"&bx,
___________________________________________________________________________
REGEXMATCH(ax, "Mixed")*
NOT(REGEXMATCH(INDEX(C2:C, r), "Blend"))*
REGEXMATCH(INDEX(C2:C, r), "Filter"), "ESP-"&bx&"|FILTER-"&bx,
___________________________________________________________________________
REGEXMATCH(ax, "Mixed")*
NOT(REGEXMATCH(INDEX(C2:C, r), "Filter")), "BLEND-"&bx&"|ESP-"&bx,
___________________________________________________________________________
LEN(ax), SUBSTITUTE(ax&"-"&bx, "Espresso", "ESP"))
for better visualization, the IFS formula contains only 4 elements. each of these 4 elements acts as a switch - if there is a match x we get output y. for example let's dissect the first element...
REGEXMATCH(ax, "Mixed")*
REGEXMATCH(INDEX(C2:C, r), "Blend")*
REGEXMATCH(INDEX(C2:C, r), "Filter"), "BLEND-"&bx&"|FILTER-"&bx
there are 3x REGEXMATCHes multiplied by each other. whenever there is such multiplication in array formulae it translates as AND logic gate (if there would be + it would mean OR logic gate) eg.:
1 * 1 = 1
1 * 0 = 0
0 * 1 = 0
0 * 0 = 0
REGEXMATCH outputs TRUE or FALSE so if we get 3x TRUE the whole argument is considered as TRUE (because 1 * 1 * 1 = 1) so we proceed to output our first switch
therefore if B column contains Mixed and C column contains Blend and C column contains Filter then we output Blend-000|Filter-000 where 000 stands for a specific number determined from bx placeholder/formula and also you can notice the | (which btw stands for OR logic within the regex) but in this case, it's just a unique symbol to join stuff for REGEXMATCH. which REGEXMATCH is this for you may ask? ...this one:
so the output of IFS formula is the input for most outer REGEXMATCH and we check if the IFS output matches something within D1:O1 range. IF yes then output 1 otherwise output nothing. shortened:
IF(REGEXMATCH(IFS(...), "(?i)"&INDEX(D1:O1,,c), 1, )
(?i) in regex means "case insensitive". it is there just for safety reasons because regex is by default case sensitive.
and we reached the MAKEARRAY formula that creates an array of numbers across the whole range with height r and width c where output is the result of IF eg. either 1 or empty cell

What is the most efficient way of checking N-way equation equivalence in Z3?

Suppose I have a set of Z3 expressions:
exprs = [A, B, C, D, E, F]
I want to check whether any of them are equivalent and, if so, determine which. The most obvious way is just an N×N comparison (assume exprs is composed of some arbitrarily-complicated boolean expressions instead of the simple numbers in the example):
from z3 import *
exprs = [IntVal(1), IntVal(2), IntVal(3), IntVal(4), IntVal(3)]
for i in range(len(exprs) - 1):
for j in range(i+1, len(exprs)):
s = Solver()
s.add(exprs[i] != exprs[j])
if unsat == s.check():
quit(f'{(i, j)} are equivalent')
Is this the most efficient method, or is there some way of quantifying over a set of arbitrary expressions? It would also be acceptable for this to be a two-step process where I first learn whether any of the expressions are equivalent, and then do a longer check to see which specific expressions are equivalent.
As with anything performance related, the answer is "it depends." Before delving into options, though, note that z3 supports Distinct, which can check whether any number of expressions are all different: https://z3prover.github.io/api/html/namespacez3py.html#a9eae89dd394c71948e36b5b01a7f3cd0
Though of course, you've a more complicated query here. I think the following two algorithms are your options:
Explicit pairwise checks
Depending on your constraints, the simplest thing to do might be to call the solver multiple times, as you alluded to. To start with, use Distinct and make a call to see if its negation is satisfiable. (i.e., check if some of these expressions can be made equal.) If the answer comes unsat, you know you can't make any equal. Otherwise, go with your loop as before till you hit the pair that can be made equal to each other.
Doing multiple checks together
You can also solve your problem using a modified algorithm, though with more complicated constraints, and hopefully faster.
To do so, create Nx(N-1)/2 booleans, one for each pair, which is equal to that pair not being equivalent. To illustrate, let's say you have the expressions A, B, and C. Create:
X0 = A != B
X1 = A != C
X2 = B != C
Now loop:
Ask if X0 || X1 || X2 is satisfiable.
If the solver comes back unsat, then all of A, B, and C are equivalent. You're done.
If the solver comes back sat, then at least one of the disjuncts X0, X1 or X2 is true. Use the model the solver gives you to determine which ones are false, and continue with those until you get unsat.
Here's a simple concrete example. Let's say the expressions are {1, 1, 2}:
Ask if 1 != 1 || 1 != 2 || 1 != 2 is sat.
It'll be sat. In the model, you'll have at least one of these disjuncts true, and it won't be the first one! In this case the last two. Drop them from your list, leaving you with 1 != 1.
Ask again if 1 != 1 is satisfiable. The answer will be unsat and you're done.
In the worst case you'll make Nx(N-1)/2 calls to the solver, if it happens that none of them can be made equivalent with you eliminating one at a time. This is where the first call to Not (Distinct(A, B, C, ...)) is important; i.e., you will start knowing that some pair is equivalent; hopefully iterating faster.
Summary
My initial hunch is that the second algorithm above will be more performant; though it really depends on what your expressions really look like. I suggest some experimentation to find out what works the best in your particular case.
A Python solution
Here's the algorithm coded:
from z3 import *
exprs = [IntVal(i) for i in [1, 2, 3, 4, 3, 2, 10, 10, 1]]
s = Solver()
bools = []
for i in range(len(exprs) - 1):
for j in range(i+1, len(exprs)):
b = Bool(f'eq_{i}_{j}')
bools.append(b)
s.add(b == (exprs[i] != exprs[j]))
# First check if they're all distinct
s.push()
s.add(Not(Distinct(*exprs)))
if(s.check()== unsat):
quit("They're all distinct")
s.pop()
while True:
# Be defensive, bools should not ever become empty here.
if not bools:
quit("This shouldn't have happened! Something is wrong.")
if s.check(Or(*bools)) == unsat:
print("Equivalent expressions:")
for b in bools:
print(f' {b}')
quit('Done')
else:
# Use the model to keep bools that are false:
m = s.model()
bools = [b for b in bools if not(m.evaluate(b, model_completion=True))]
This prints:
Equivalent expressions:
eq_0_8
eq_1_5
eq_2_4
eq_6_7
Done
which looks correct to me! Note that this should work correctly even if you have 3 (or more) items that are equivalent; of course you'll see the output one-pair at a time. So, some post-processing might be needed to clean that up, depending on the needs of the upstream algorithm.
Note that I only tested this for a few test values; there might be corner case gotchas. Please do a more thorough test and report if there're any bugs!

code missing values for many/all variables at once

Suppose i have such table
if i use syntax to indicate missing values for variables, i must for each variable write something like
if missing (s1) s1=999.
MISSING VALUES s1 (999).
exe
if missing (s20) s20=999.
MISSING VALUES s20 (999).
exe
and so on.
but if i have 100 variables it will long and difficult.
is it possible to inditace missing values at once for all vars in my data
something like?
if missing (s1-q35) s1-q35=999.
MISSING VALUES s1-q35 (999).
exe
You can use recode like this:
recode s1 s2 s3 s4 s5 s6 .... q1 q2 q3 q4 q5 ..... (miss=999).
If some of your variables are consecutive in the data you can use "to". For example:
recode s1 to s21 q1 to q35 (miss=999).
If they are all consecutive you can use to for all of them:
missing values s1 to q35 (999).

Computing Variable using SYSMIS

I'm running into a problem that I've never encountered before and I cannot quite figure out what's going on.
I am trying to compute a variable as follows:
COMPUTE A=$SYSMIS.
IF B=$SYSMIS A=$SYSMIS.
IF C > SUM(B, 1) A= 1.
IF C = SUM(B, 1) A= 2.
IF C = B A= 2.
IF C < B A=3.
This runs fine except for the fact that when B=$SYSMIS there are very clear examples of A that are not in fact missing.
I tested it using:
TEMP.
SELECT IF B=$SYSMIS.
FREQ A.
It tells me that "No cases were input to this procedure. Either there are none in the working data file or all of them have been filtered out."
Meaning, the code worked correctly.
But...I found over 1,000 cases that are not fitting this logic.
TEMP.
SELECT IF ID=102.
FREQ A B.
This shows a specific ID that has A=$SYSMIS and B=2.
A, B and C are all numeric.
Thanks in advance for any insight! (:
First, instead of IF B=$SYSMIS you should use if missing(B) - for computing, for analysis and for select.
Another probable reason for your results is in commands like this:
IF C > SUM(B, 1) A=1.
If B is missing, the result of SUM(B, 1) is 1. Therefore if C>1 A gets the value 1, in spite of B being missing.
There are two ways to overcome this.
First, using X+Y instead of sum(X,Y) will result in a missing value when X or Y is missing:
IF C > (B + 1) A=1.
Second option: put the command COMPUTE A=$SYSMIS. at the end of the syntax instead of the beginning, so any values entered in A when B is missing will be replaced with a missing value.

Possible to use less/greater than operators with IF ANY?

Is it possible to use <,> operators with the if any function? Something like this:
select if (any(>10,Q1) AND any(<2,Q2 to Q10))
You definitely need to create an auxiliary variable to do this.
#Jignesh Sutar's solution is one that works fine. However there are often multiple ways in SPSS to accomplish a certain task.
Here is another solution where the COUNT command comes in handy.
It is important to note that the following solution assumes that the values of the variables are integers. If you have float values (1.5 for instance) you'll get a wrong result.
* count occurrences where Q2 to Q10 is less then 2.
COUNT #QLT2 = Q2 TO Q10 (LOWEST THRU 1).
* select if Q1>10 and
* there is at least one occurrence where Q2 to Q10 is less then 2.
SELECT (Q1>10 AND #QLT2>0).
There is also a variant for this sort of solution that deals with float variables correctly. But I think it is less intuitive though.
* count occurrences where Q2 to Q10 is 2 or higher.
COUNT #QGE2 = Q2 TO Q10 (2 THRU HIGHEST).
* select if Q1>10 and
* not every occurences of (the 9 variables) Q2 to Q10 is two or higher.
SELECT IF (Q1>10 AND #QGE2<9).
Note: Variables beginning with # are temporary variables. They are not stored in the data set.
I don't think you can (would be nice if you could - you can do something similar in Excel with COUNTIF & SUMIF IIRC).
You've have to construct a new variable which tests the multiple ANY less than condition, as per below example:
input program.
loop #j = 1 to 1000.
compute ID=#j.
vector Q(10).
loop #i = 1 to 10.
compute Q(#i) = trunc(rv.uniform(-20,20)).
end loop.
end case.
end loop.
end file.
end input program.
execute.
vector Q=Q2 to Q10.
loop #i=1 to 9 if Q(#i)<2.
compute #QLT2=1.
end loop if Q(#i)<2.
select if (Q1>10 and #QLT2=1).
exe.

Resources