How to replace an observation in a attribute table using if-then conditional statement in ARC GIS Pro - field

I have an attribute table with a column that has many observations below a chemical detection limit - 0.005. Since these are not valid measurements, I would like to replace all values in the field under 0.005 with the value 0.0025. So: if observation < 0.005 then replace with 0.0025.
I am not familiar with python coding but using calculate field I have tried (unsuccessfully) to reclassify values to another value using:
def Reclass(arg):
if arg is < 0.005:
return 0.0025

This worked. Choose Python in the dropdown:
def r(fld):
if fld is None or fld >= 0.005:
return fld
elif fld < 0.005:
return 0.0025

Related

Selecting a cut-off score in SPSS

I have 5 variables for one questionnaire about social support. I want to define the group with low vs. high support. According to the authors low support is defined as a sum score <= 18 AND two items scoring <= 3.
It would be great to get a dummy variable which shows which people are low vs high in support.
How can I do this in the syntax?
Thanks ;)
Assuming your variables are named Var1, Var2 .... Var5, and that they are consecutive in the dataset, this should work:
recode Var1 to Var5 (1 2 3=1)(4 thr hi=0) into L1 to L5.
compute LowSupport = sum(Var1 to Var5) <= 18 and sum(L1 to L5)>=2.
execute.
New variable LowSupport will have value 1 for rows that have the parameters you defined and 0 for other rows.
Note: If your variables are not consecutive you'll have to list all of them instead of using Var1 to var5.

hypothesis function space in decision tree

I am reading the book "Artificial Intelligence" by Stuart Russell and Peter Norvig (Chapter 18). The following paragraph is from the decision trees context.
For a wide variety of problems, the decision tree format yields a
nice, concise result. But some functions cannot be represented
concisely. For example, the majority function, which returns true if
and only if more than half of the inputs are true, requires an
exponentially large decision tree.
In other words, decision trees are good for some kinds of functions
and bad for others. Is there any kind of representation that is
efficient for all kinds of functions? Unfortunately, the answer is no.
We can show this in a general way. Consider the set of all Boolean
functions on "n" attributes. How many different functions are in this
set? This is just the number of different truth tables that we can
write down, because the function is defined by its truth table.
A truth table over "n" attributes has 2^n rows, one for each
combination of values of the attributes.
We can consider the “answer” column of the table as a 2^n-bit number
that defines the function. That means there are (2^(2^n)) different
functions (and there will be more than that number of trees, since
more than one tree can compute the same function). This is a scary
number. For example, with just the ten Boolean attributes of our
restaurant problem there are 2^1024 or about 10^308 different
functions to choose from.
What does author mean by "answer" column of the table as a 2^n-bit number that defines the function?
How did author derive (2^(2^n)) different functions?
Please elaborate on above question, preferably with simple example, such as n = 3.
Consider a general truth table for a 3-input function, where the result for each triple is also a Boolean (1 or 0), represented by variables i through 'p':
A B C f(a,b,c)
0 0 0 i
0 0 1 j
0 1 0 k
0 1 1 l
1 0 0 m
1 0 1 n
1 1 0 o
1 1 1 p
We can now represent any function on three variables as an 8-bit number, ijklmnop. For instance, and is 00000001; or is 01111111; one_hot (exactly one input True) is 01101000.
For 3 variables, you have 2^3 bits in the "answer", the complete function definition. Since there are 8 bits in the "answer", there are 2^8 possible functions we can define.
Does that outline the field of comprehension for you?
More detail on an example function
You simply (once you see the pattern) make the eight bits correspond to the entires in the table. For instance, the table for one-hot looks like this:
A B C f(a,b,c)
0 0 0 0
0 0 1 1
0 1 0 1
0 1 1 0
1 0 0 1
1 0 1 0
1 1 0 0
1 1 1 0
Reading down the "answer" column, labeled f(a,b,c), you get the 8-bit sequence 01101000. That 8-bit number is sufficient to completely define the function: the rows listing all the combinations of a, b, c are in a fixed (numerical) sequence.
You can write any such function in a template format:
def and(a, b, c):
and_def = '00000001'
index = 4*a + 2*b + 1*c
return and_def[index]
Now, if we generalize this to any 3-input binary function:
def_bin_func(a, b, c, func_def)
return func_def[4*a + 2*b + 1*c]
If you wish, you can further generalize the template for a list of inputs: concatenate the bits and use that integer as the index into the func_def string.
Does that clear it up?

Table displaying the percentages of chosen responses for a vingette (male then female)

People in my study have completed a questionnaire. One of the questions involves the participant reading a scenario/vingette and then they are asked to 'identify the problem' in the scenario. They are then presented with a 9 options (multiple response question).
In my data file men and women are coded numerically. I am currently trying to create a table with the percentage of responses to each of the 9 options for men in one column (adding to 100%), and women in the other (100%).
I know this is probably quite simple, but I've completely forgotten! Any help in exactly how to carry this out in SPSS?
You Can Use the MULT RESPONSE GROUPS Command with the /BASE=RESPONSES option.
*** Create some example fake data.
INPUT PROGRAM.
LOOP #i = 1 TO 15.
LOOP #g = 1 TO 2.
COMPUTE Gender = #g.
DO REPEAT response = r1 TO r9.
COMPUTE response = TRUNC(RV.UNIFORM(0,2)).
END REPEAT.
END CASE.
END LOOP.
END LOOP.
END FILE.
END INPUT PROGRAM.
FORMATS Gender r1 TO r9 (F1.0).
VALUE LABELS Gender
1 'Male'
2 'Female'.
EXECUTE.
And that's how the command looks like with the example data:
MULT RESPONSE GROUPS=$responses (r1 r2 r3 r4 r5 r6 r7 r8 r9 (1))
/VARIABLES=Gender(1 2)
/TABLES=$responses BY Gender
/BASE=RESPONSES
/CELLS=COLUMN.
It sounds like you can get what you want from the CROSSTABS procedure, see example below and image it produces with my default settings.
*Example fake data.
INPUT PROGRAM.
LOOP #i = 1 TO 15.
LOOP #g = 1 TO 2.
COMPUTE Gender = #g.
COMPUTE Response = TRUNC(RV.UNIFORM(1,10)).
END CASE.
END LOOP.
END LOOP.
END FILE.
END INPUT PROGRAM.
FORMATS Gender Response (F1.0).
VALUE LABELS Gender
1 'Male'
2 'Female'.
*Table.
CROSSTABS TABLE Response BY GENDER /CELLS=COL.

How to reference a specific row in the "compute variable" dialog of SPSS?

This is my problem:
I have a table of measured fluorescence values depending on a drug concentration. I can't use the values directly, because even in absence of drug, there is some small fluorescence. Consequently, I have to substract the value from the drug=0 measurement from all other values.
I figured I could calculate a new variable (normalized fluorescence), but how do I reference to the fluorescence value in the drug=0 row? In excel, I'd use sth like $34$2 to reference to that field, but how to do that in SPSS? Entering the value "hard-coded" seems a bit unflexible, and I wanna know how to do it by reference :). Hours of googling and reading in books have yielded no answer so far.
Thanks :)
Edit:
An example would be
Drug conc. | Fluorescence
0 | 0.1 <- this value is to be substracted from all fluo values
1 | 1.1
2 | 2.1
3 | 3.1
4 | 4.1
Constant in SPSS can be represented as a variable with the same value for all rows. So you have to make a new variable with value of Fluorescence when drug=0. Example:
data list free
/drug (f8) Fluorescence (f8.1).
begin data
0 0.1
1 1.1
2 2.1
3 3.1
4 4.1
end data.
sort cases drug.
do if drug = 0.
comp const = Fluorescence.
else.
comp const = lag(const).
end if.
exe.
comp Fluorescence2 = Fluorescence - const.
form const Fluorescence2 (f8.1).
exe.

Constrained Sequence to Index Mapping

I'm puzzling over how to map a set of sequences to consecutive integers.
All the sequences follow this rule:
A_0 = 1
A_n >= 1
A_n <= max(A_0 .. A_n-1) + 1
I'm looking for a solution that will be able to, given such a sequence, compute a integer for doing a lookup into a table and given an index into the table, generate the sequence.
Example: for length 3, there are 5 the valid sequences. A fast function for doing the following map (preferably in both direction) would be a good solution
1,1,1 0
1,1,2 1
1,2,1 2
1,2,2 3
1,2,3 4
The point of the exercise is to get a packed table with a 1-1 mapping between valid sequences and cells.
The size of the set in bounded only by the number of unique sequences possible.
I don't know now what the length of the sequence will be but it will be a small, <12, constant known in advance.
I'll get to this sooner or later, but though I'd throw it out for the community to have "fun" with in the meantime.
these are different valid sequences
1,1,2,3,2,1,4
1,1,2,3,1,2,4
1,2,3,4,5,6,7
1,1,1,1,2,3,2
these are not
1,2,2,4
2,
1,1,2,3,5
Related to this
There is a natural sequence indexing, but no so easy to calculate.
Let look for A_n for n>0, since A_0 = 1.
Indexing is done in 2 steps.
Part 1:
Group sequences by places where A_n = max(A_0 .. A_n-1) + 1. Call these places steps.
On steps are consecutive numbers (2,3,4,5,...).
On non-step places we can put numbers from 1 to number of steps with index less than k.
Each group can be represent as binary string where 1 is step and 0 non-step. E.g. 001001010 means group with 112aa3b4c, a<=2, b<=3, c<=4. Because, groups are indexed with binary number there is natural indexing of groups. From 0 to 2^length - 1. Lets call value of group binary representation group order.
Part 2:
Index sequences inside a group. Since groups define step positions, only numbers on non-step positions are variable, and they are variable in defined ranges. With that it is easy to index sequence of given group inside that group, with lexicographical order of variable places.
It is easy to calculate number of sequences in one group. It is number of form 1^i_1 * 2^i_2 * 3^i_3 * ....
Combining:
This gives a 2 part key: <Steps, Group> this then needs to be mapped to the integers. To do that we have to find how many sequences are in groups that have order less than some value. For that, lets first find how many sequences are in groups of given length. That can be computed passing through all groups and summing number of sequences or similar with recurrence. Let T(l, n) be number of sequences of length l (A_0 is omitted ) where maximal value of first element can be n+1. Than holds:
T(l,n) = n*T(l-1,n) + T(l-1,n+1)
T(1,n) = n
Because l + n <= sequence length + 1 there are ~sequence_length^2/2 T(l,n) values, which can be easily calculated.
Next is to calculate number of sequences in groups of order less or equal than given value. That can be done with summing of T(l,n) values. E.g. number of sequences in groups with order <= 1001010 binary, is equal to
T(7,1) + # for 1000000
2^2 * T(4,2) + # for 001000
2^2 * 3 * T(2,3) # for 010
Optimizations:
This will give a mapping but the direct implementation for combining the key parts is >O(1) at best. On the other hand, the Steps portion of the key is small and by computing the range of Groups for each Steps value, a lookup table can reduce this to O(1).
I'm not 100% sure about upper formula, but it should be something like it.
With these remarks and recurrence it is possible to make functions sequence -> index and index -> sequence. But not so trivial :-)
I think hash with out sorting should be the thing.
As A0 always start with 0, may be I think we can think of the sequence as an number with base 12 and use its base 10 as the key for look up. ( Still not sure about this).
This is a python function which can do the job for you assuming you got these values stored in a file and you pass the lines to the function
def valid_lines(lines):
for line in lines:
line = line.split(",")
if line[0] == 1 and line[-1] and line[-1] <= max(line)+1:
yield line
lines = (line for line in open('/tmp/numbers.txt'))
for valid_line in valid_lines(lines):
print valid_line
Given the sequence, I would sort it, then use the hash of the sorted sequence as the index of the table.

Resources