Transform string variable into 0-1 columns - spss

As a very begginer in SPSS I would ask you for help with some transformation from table A into table B. I have to recode values of "brand" variable into columns and make 0-1 variables.
#table A#
nr brand
1 GREEN CARE PROFESSIONAL
1 GREEN CARE PROFESSIONAL
1 GREEN CARE PROFESSIONAL
2 HENKEL
3 HENKEL
3 HENKEL
3 HENKEL
3 VIZIR
4 BIEDRONKA
4 BOBINI
4 BOBINI
4 BOBINI
4 BOBINI
4 BOBINI
4 HENKEL
5 VIZIR
6 HENKEL
#table B#
nr GREEN HENKEL VIZIR BIEDR BOBINI
1 1 0 0 0 0
1 1 0 0 0 0
1 1 1 0 0 0
2 0 1 0 0 0
3 0 1 0 0 0
3 0 1 0 0 0
3 0 1 0 0 0
3 0 0 1 0 0
4 0 0 0 1 0
4 0 0 0 0 1
4 0 0 0 0 1
4 0 0 0 0 1
4 0 0 0 0 1
4 0 0 0 0 1
4 0 1 0 0 0
5 0 0 1 0 0
6 0 1 0 0 0
I can do it in this particular case in this simple way:
compute HENKEL=0.
...
do if BRAND='GREEN_CARE' .
compute GREEN_CARE=1.
else if ....
but the loop has to be usable with another variable and different number of values ect. I was trying to make it all day and gave up.
Do you have any idea to make it in a easy way?
Thanks!

The following syntax does the job on the sample data you provided.
First, let's recreate the sample data to demonstrate on:
Data list list/nr (f1) brand (a30).
begin data
1 "GREEN CARE PROFESSIONAL"
1 "GREEN CARE PROFESSIONAL"
1 "GREEN CARE PROFESSIONAL"
2 "HENKEL"
3 "HENKEL"
3 "HENKEL"
3 "HENKEL"
3 "VIZIR"
4 "BIEDRONKA"
4 "BOBINI"
4 "BOBINI"
4 "BOBINI"
4 "BOBINI"
4 "BOBINI"
4 "HENKEL"
5 "VIZIR"
6 "HENKEL"
end data.
dataset name originalDataset.
Now for the restructure.
sort cases by nr brand.
* creating an index to enumerate cases for each combination of `nr` and `brand`.
* This is necessary for the `casestovars` command to work later.
compute ind=1.
if $casenum>1 and lag(nr)=nr and lag(brand)=brand ind=lag(ind)+1.
exe.
* variable names can't have spaces in them, so changing the category names accordingly.
compute brand=replace(rtrim(brand)," ","_").
sort cases by nr ind brand.
compute exist=1.
casestovars /id=nr ind /index= brand/autofix=no.

Related

How can I label connected components in APL?

I'm trying to do leet puzzle https://leetcode.com/problems/max-area-of-island/, requiring labelling connected (by sides, not corners) components.
How can I transform something like
0 0 1 0 0
0 0 0 0 0
0 1 1 0 1
0 1 0 0 1
0 1 0 0 1
into
0 0 1 0 0
0 0 0 0 0
0 2 2 0 3
0 2 0 0 3
0 2 0 0 3
I've played with the stencil ⌺ operator and also tried using scan operators but still not quite there. Can somebody help?
We can start off by enumerating the ones. We do the by applying the function ⍸ (where, but since all are 1s, it is equivalent to 1,2,3,…) # at the subset masked by ⊢ the bits themselves, i.e. ⍸#⊢:
⍸#⊢m
0 0 1 0 0
0 0 0 0 0
0 2 3 0 4
0 5 0 0 6
0 7 0 0 8
Now we need to flood-fill the lowest number in each component. We do this with repeated application until the fix-point ⍣≡ of processing Moore neighbourhoods ⌺3 3. To get the von Neumann neighbours, we reshape the 9 elements in the Moore neighbourhood into a 4-row 2-column matrix with 4 2⍴ and use ⊢/ to select the right column. We remove any 0s with 0~⍨ them prepend , the original value ⍵[2;2] (even if 0) and have ⌊/ select the smallest value:
{⌊/⍵[2;2],0~⍨⊢/4 2⍴⍵}⌺3 3⍣≡⍸#⊢m
0 0 1 0 0
0 0 0 0 0
0 2 2 0 4
0 2 0 0 4
0 2 0 0 4
We map the values to indices by finding their ⊢ indices ⍳⍨ in the unique elements of ∘∪ 0 followed by , the ravelled matrix ,:
(⊢⍳⍨∘∪0,,){⌊/⍵[2;2],0~⍨⊢/4 2⍴⍵}⌺3 3⍣≡⍸#⊢m
1 1 2 1 1
1 1 1 1 1
1 3 3 1 4
1 3 1 1 4
1 3 1 1 4
And decrement which adjusts back to begin with zero:
¯1+(⊢⍳⍨∘∪0,,){⌊/⍵[2;2],0~⍨⊢/4 2⍴⍵}⌺3 3⍣≡⍸#⊢m
0 0 1 0 0
0 0 0 0 0
0 2 2 0 3
0 2 0 0 3
0 2 0 0 3

Predict next integer in sequence using ML.NET

Given a lengthy sequence of integers in the range of 0-1 I would like to be able to predict the next likely integer.
Example dataset:
1 1 1 0 0 0 0 1 1 0 0 1 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 1 0 0 1 0 1 1 0 1 0 1 0 1 0 1 0 0 1 0 0 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 0 1 0 1 1 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0
A quick look at the above perhaps shows some obvious patterns which may be recognised by an ML model.
I do have other features available in the dataset but I don't think they correlate to the integer result so the prediction should be based purely on the statistical relevance of the supplied integer dataset.
I'm unsure how to approach this using ML.NET. I have successfully classified models previously but those predictions are all made based on multiple features. In this case if I just supply a 0 or 1 there's no relevant historical sequence to aid the prediction.
How do I train an ML.NET model to return a prediction based on a range of previous data?
Working theory: the above dataset has 100 integers. I could create a class which has 100 properties (Integer0..Integer99) and painstakingly map each field and submit that but it seems really clunky.

what kind of encoding is this and how to decode it

I asked my friend and he said that it's AST. But I'm not sure and don't know how to decode it.
SCE = {} SCE[0x3B06D907] = function() return SCE[0x16D35994]({SCE[0x2CCB9B3C](), SCE[0x24AA4743](), SCE[0x1E716305](), SCE[0x2A7C5767](), SCE[0x165A6AC3](), SCE[0x1D0ACBC5](), SCE[0x3B79C46E](), SCE[0x1D4DE704](), SCE[0x27680DA1](), SCE[0x1C317CFE](), SCE[0x316934F5](), SCE[0x18C41DEC](), SCE[0x2A7D468A](), SCE[0x3A8183EA](), SCE[0x28E77657](), SCE[0x3159E5B8](), SCE[0x1ACECA1C]() , SCE[0x285218A3](), SCE[0x341688EA](), SCE[0x26DCC896](), SCE[0x1A80B2E1](), SCE[0x17D18EE4](), SCE[0x29C39207](), SCE[0x27BA5263](), SCE[0x166769AE](), SCE[0x1A81C222](), SCE[0x3549905A]()}) end SCE[0x3813D8D1] = '0 3 2 0 3 9 0 3 3 0 4 0 0 3 6 0 3 2 0 3 2 0 3 3 0 3 3 0 4 0 0 3 8 0 3 2 0 3 3 0 4 0 0 3 2 0 3 3 0 3 2 0 3 3 0 3 7 0 3 7 0 3 3 0 3 4 0 3 5 0 3 2 0 3 3 0 3 5 0 3 2 0 ' SCE[0x38B2F6A7] = '3 2 0 3 8 0 3 2 0 3 2 0 3 3 0 3 3 0 4 0 0 4 0 0 3 2 0 3 2 0 3 9 0 3 2 0 3 3 0 3 6 0 3 2 0 3 2 0 3 2 0 3 2 0 3 2 0 3 3 0 3 3 0 3 7 0 3 3 0 3 2 0 3 2 0 3 6 0 3 2 0 3 ' SCE[0x2D2D5130] = function(Ox26464833) if SCE[0x38B2F6A7] == SCE[0x23AC3D05] or SCE[0x38B2F6A7] == SCE[0x3B509C73] or SCE[0x142600A7](SCE[0x38B2F6A7]) <= SCE[0x2F7A2C68] then return else Ox26464833 = SCE[0x2EC981D9] end return SCE[0x38B2F6A7] end SCE[0x3111B14D] = '2 0 3 9 0 3 3 0 4 0 0 3 8 0 3 2 0 3 2 0 3 9 0 3 2 0 3 3 0 3 2 0 3 3 0 3 7 0 3 2 0 3 3 0 3 4 0 3 5 0 3 2 0 3 3 0 3 6 0 3 2 0 3 3 0 3 5 0 3 3 0 3 4 0 3 5 0 3 2 0 3 2 ' SCE[0x32D39C31] = function(Ox26464833) if SCE[0x3111B14D] == SCE[0x23AC3D05] or SCE[0x3111B14D] == SCE[0x3B509C73] or SCE[0x142600A7](SCE[0x3111B14D]) <= SCE[0x2F7A2C68] then return else Ox26464833 = SCE[0x14EF2C5C] end return SCE[0x3111B14D] end SCE[0x290A5F73] = '0 3 6 0 3 3 0 4 0 0 4 0 0 3 2 0 3 2 0 3 7 0 3 2 0 3 3 0 3 6 0 3 2 0 3 2 0 3 5 0 3 2 0 3 2 0 3 9 0 3 3 0 4 0 0 3 6 0 3 3 0 4 0 0 3 8 0 3 3 0 4 0 0 4 0 0 3 2 0 3 2 0 ' SCE[0x1AD49A98] = function(Ox26464833) if SCE[0x290A5F73] == SCE[0x23AC3D05] or SCE[0x290A5F73] == SCE[0x3B509C73] or SCE[0x142600A7](SCE[0x290A5F73]) <= SCE[0x2F7A2C68] then return

R-Package vegan Decorana

enter image description hereI'm new to R and I was trying to run a Detrended correspondence analysis (DCA) which is a multivariate statistical analysis for ordination of species, I have four sites. I keep getting the error message:
> Error rowsums x must be numeric
Species Haasfontein Mini Pit Vlaklaagte Mini Pit Vlaklaagte Block 3 Mini Pit Block 10 Mini Pit
Agrostis lachnantha 1 0 0 0
Aristida congesta subsp. Congesta 0 0 0 0
Brachiaria nigropedata 0 0 0 0
Cynodon dactylon 0 12 2 3
Cyperus esculentus  0 5 0 0
Digitaria eriantha 0 1 6 20
Elionurus muticus 0 0 0 0
Eragrostis acraea De Winter 0 0 1 0
Eragrostis chloromelas 35 0 12 4
Eragrostis curvula 6 0 0 0
Eragrostis lehmanniana 5 0 0 0
Eragrostis rigidior 3 0 1 0
Eragrostis rotifer 3 0 0 0
Eragrostis trichophora 10 1 2 2
Hyparrhenia hirta 0 0 9 1
Melinis repens 0 0 2 0
Panicum coloratum 0 4 0 0
Panicum deustum  3 0 0 0
Paspalum dilatatum 0 0 0 0
Setaria sphacelata var. sphacelata 0 1 0 0
Sporobolus africanus 0 0 2 0
Sporobolus centrifuges 1 0 1 0
Sporobolus fimbriatus 0 0 0 0
Sporobolus ioclados 2 0 5 1
Themeda triandra 0 0 0 0
Trachypogon spicatus 0 0 0 0
Tragus berteronianus 0 0 0 1
Verbena bonariensis 16 0 2 0
Cirsium vulgare 0 0 0 0
Eucalyptus cameldulensis 1 0 0 0
Xanthium strumarium 0 0 0 0
Argemone ochroleuca 0 0 0 0
Solanum sisymbriifolium 0 0 0 0
Campuloclinium macrocephalum  7 0 0 0
Paspalum dilatatum 0 0 0 0
Senecio ilicifolius 0 0 0 0
Pseudognaphalium luteoalbum (L.) 8 0 0 0
 Cyperus esculentus  0 0 0 0
Foeniculum vulgare  0 0 0 0
Conyza canadensis 0 0 0 1
Tagetes minuta 0 0 0 0
Hypochaeris radicata 0 0 0 0
Solanum incanum 0 0 0 0
Asclepias fruticosa 11 0 0 0
Hypochaeris radicata 0 0 0 0
My data is organised as shown above and I'm not sure if my data is organised correctly or there is some other error. Can someone please assist me
You're still fighting to get you data into R. That is your first problem. After you tackle this problem and manage to read in your data, you have the following problems:
You should not have empty (all zero) rows in your data, but they will give an error (empty columns are removed and only give a warning).
DCA treats rows and columns non-symmetrically, and you should have species as columns and sampling units as rows. You should transpose your data (function t()).
You really should not use DCA with only four sampling units. It will be meaningless.
I think the last point is most important.

OneVsRestClassifier(svm.SVC()).predict() gives continous values

I am trying to use y_scores=OneVsRestClassifier(svm.SVC()).predict() on datasets
like iris and titanic .The trouble is that I am getting y_scores as continous values.like for iris dataset I am getting :
[[ -3.70047231 -0.74209097 2.29720159]
[ -1.93190155 0.69106231 -2.24974856]
.....
I am using the OneVsRestClassifier for other classifier models like knn,randomforest,naive bayes and they are giving appropriate results in the form of
[[ 0 1 0]
[ 1 0 1]...
etc on the iris dataset .Please help.
Well this is simply not true.
>>> from sklearn.multiclass import OneVsRestClassifier
>>> from sklearn.svm import SVC
>>> from sklearn.datasets import load_iris
>>> iris = load_iris()
>>> clf = OneVsRestClassifier(SVC())
>>> clf.fit(iris['data'], iris['target'])
OneVsRestClassifier(estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
kernel='rbf', max_iter=-1, probability=False, random_state=None,
shrinking=True, tol=0.001, verbose=False),
n_jobs=1)
>>> print clf.predict(iris['data'])
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 2 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]
maybe you called decision_function instead (which would match your output dimension, as predict is supposed to return a vector, not a matrix). Then, SVM returns signed distances to each hyperplane, which is its decision function from mathematical perspective.

Resources