SPSS delete variables (filter) - spss

I want to delete most of my variables from my SPSS data file.
However, I have following filter:
any(pui, 419, 1018, 1108, 1204, 1511, 2004, 2612)
These parties (party unique identifier), as well as some categories concerning party positions (e.g. S16_09) must not be deleted - however, when I enter following code
delete variables
...
.
and exclude the variables I want to keep (the parties mentioned above, as well as some other like S16_09, S18_14, S13_19), all necessary variables are kept except the filter. Can you please help me?
The parties should be in the rows, whereas the different categories are in columns.
Thanks for your help! In case you have any questions pls let me know.

If you have more variables to delete then keep, this is a better way to do this (and you can filter before deleting the variables and still keep your filter):
compute myfilter=any(pui, 419, 1018, 1108, 1204, 1511, 2004, 2612).
add files /file=* /keep pui S16_09 S18_14 S13_19 myfilter.
exe.

already have an answer, in case someone has the same problem
First delete the variables, then filter for relevant parties. It does not work the other way round.

Related

Lua script for statistics from Diameter 3GPP

I'm trying to create a lua script to go through a Diameter pcap, gather information interesting for me and generate a statistic.
This is partially successful, working script can be found in GitHub but I'm still having some doubts
Field.new() and multiple occurrences of an AVP
I'm using Field.new() to retrieve AVPs, for example:
local rrField = Field.new("diameter.3GPP-Reporting-Reason")
local toField = Field.new("diameter.CC-Total-Octets")
But in a single packet there might be multiple occurrences of an AVP. Of course, I can access them as an array from
local rrFields = {rrField()}
local toFields = {toField()}
But I'm missing a reference where from the AVP was retrieved. A a good example is Result-Code AVP:
It this single Diameter message it occurs three times, but in result I'm getting just an array of three 2001's without a good understanding on which level this appeared.
Situation is becoming even more messy when a single package contains multiple Diameter messages. Then I even cannot figure from which message the AVP is.
Function tap.packet(pinfo, tvb, tapdata) does not populate tapdata
Another idea was to dig into tapdata. If I understood correctly 11.4.1.5. listener.packet, the tapdata (aka tapinfo) shall be populated with dissected data, right? Hence I should be able to parse the message.
However, regardless how hard I try, tapdata always is unset (i.e. nil). In GitHub code
tap = Listener.new("diameter", filter)
but I also experimented with the 3rd parameter, setting it to true (hoping for generating all fields, even in cost of performance penalty). No luck.
[Update 2020/03/20]
Self-answering to Function tap.packet(pinfo, tvb, tapdata) does not populate tapdata
After examining source code of Wireshark (tshark) it turns out that Diameter does not populate this variable as tapdata does not have reference to Diameter. I've tried to add it to taps definition and the variable (table) has been populated, even names of the hashes are OK. But variables in the hashes are not... Anyway, here is the change:
MBP:wireshark jhartman$ git diff epan/wslua/taps
diff --git a/epan/wslua/taps b/epan/wslua/taps
index 11b1132171..ea28865109 100644
--- a/epan/wslua/taps
+++ b/epan/wslua/taps
## -62,4 +62,5 ## tcp ../dissectors/packet-tcp.h tcp_info_t
#tls ../dissectors/packet-tls.h ssl_info_t
#tr ../dissectors/packet-tr.h tr_info_t
wlan ../dissectors/packet-ieee80211.h wlan_hdr_t
+diameter ../dissectors/packet-diameter.h diam_sub_dis_t
#wsp ../dissectors/packet-wsp.h wsp_info_t
Question
Is this approach right? Or should I use other ways - such as chained dissectors or post dissector? But it was not clear to me if I can access dissected data to the level I need?
Any help will be very much appreciated.
Thank you in advance and best regards,
Jarek

Question about SPSS modeler (There is an obstacle for make the stream run automatically)

I have SPSSmodeler stream which is now used and updated every week constantly to generate a certain dataset. A raw data for this stream is also renewed on a weekly basis.
In part of this stream, there is a chunk of nodes that were necessary to modify and update manually every week, and the sequence of this part is below: Type Node => Restructure Node => Aggregate Node
To simplify the explanation of those nodes' role, I drew an image of them as bellow.
Because the original raw data is changed weekly basis, the range of Unit value above is always varied, sometimes more than 6 (maybe 100) others less than 6 (maybe 3). That is why somebody has to modify there and update those chunk of nodes on a weekly basis until now. *Unit value has a certain limitation (300 for now)
However, now we are aiming to run this stream automatically without touching any human operations on it that we need to customize there to work perfectly, automatically. Please help and will appreciate your efforts, thanks!
In order to automatize, I suggest to try to use global nodes combined with clem scripts inside the execution (default script). I have a stream that calculates the first date and the last date and those variables are used to rename files at the end of execution. I think you could use something similar as explained here:
1) Create derive nodes to bring the unit values used in the weekly stream
2) Save this information in a table named 'count_variable'
3) Use a Global node named Global with a query similar to this:
#GLOBAL_MAX(variable created in (2)) (only to record the number of variables. The step 2 created a table with only 1 values, so the GLOBAL_MAX will only bring the number of variables).
4) The query inside the execution tab will be similar to this:
execute count_variable
var tabledata
var fn
set tabledata = count_variable.output
set count_variable = value tabledata at 1 1
execute Global
5) You now can use the information of variables just using the already creatde "count_variable"
It's not easy to explain just by typing, but I hope to have been helpful.
Please mark as +1 in this answer if it was relevant one.
I think there is a better, simpler and more effective (yet risky, due to node's requirements to input data) solution to your problem. It is called Transpose node and does exactly that - pivot your table. But just from version 18.1 on. Here's an example:
https://developer.ibm.com/answers/questions/389161/how-does-new-feature-partial-transpose-work-in-sps/

I want to know Complete project line of code in TFS project collection

Team,
Could you please help on the advise.
I want to know how to calculate the line of code for my TFS project collection. I need for entire instance to calculate the line of code.
Please advise. Thank you
Note: I'm assuming you're using TFVC, not Git.
You should be able to get this from the data warehouse (Tfs_Warehouse) assuming you have Reporting Services configured.
There is a Code Churn table. I believe you should be able to sum the NetLinesAdded field to get the total number of lines of code.
The Analysis Cube has a Total Lines field, as well.
However, you can also get this information from your file system with PowerShell, for example:
(gci -Path 'C:\Users\Daniel\Source\Repos\' -rec -Include '*.cs' | select-string .).Count
This comes with the caveat that "lines of code" is, in almost every single case, a totally meaningless, worthless number.

Delete variables based on the number of observations

I have an SPSS file that contains about 1000 variables and I have to delete the ones having 0 valid values. I can think of a loop with an if statement but I can't find how to write it.
The simplest way would be to use the spssaux2.FindEmptyVars Python function like this:
begin program.
import spssaux2
spssaux2.FindEmptyVars(delete=True)
end program.
If you don't already have the spssaux2 module installed, you would need to get it from the SPSS Community website or the IBM Predictive Analytics site and save it in the python\lib\site-packages directory under your Statistics installation.
Otherwise, the VALIDATEDATA command, if you have it, will identify the variables violating such rules as maximum percentage of missing values, but you would have to turn that output into a DELETE VARIABLES command. You could also look for variables with zero missing values using, say, DESCRIPTIVES and select out the ones with N=0.
If you've never worked with python in SPSS, here's a way to get the job done without it (not as elegant, but should do the job):
This will count the valid cases in each variable, and select only those that have 0 valid cases. Then you'll manually copy the names of these variables into a syntax command that will delete them.
DATASET NAME Orig.
DATASET DECLARE VARLIST.
AGGREGATE /OUTFILE='VARLIST'/BREAK=
/**list_all_the_variable_names_here = NU(*FirstVarName to *LastVarName).
DATASET ACTIVATE VARLIST.
VARSTOCASES /MAKE NumValid FROM *FirstVarName to *LastVarName/INDEX=VarName(NumValid).
SELECT IF NumValid=0.
EXECUTE.
Pause here to copy the remaining names in the list and complete the syntax, then continue:
DATASET ACTIVATE Orig.
DELETE VARIABLES *paste_here_all_the_remaining_variable_names_from_varlist .
Notes:
* I put stars where you have to replace my text with your variable names.
** If the variables are neatly named like Q1, Q2, Q3 .... Q1000, you can use the "FirstVarName to LastVarName" form (Q1 to Q1000) instead of listing all the variable names.
BTW it is of course possible to do this completely automatically without manually copying those names (using only syntax, no Python), but the added complexity is not worth bothering with for a single use...

How do you include categories with 0 responses in SPSS frequency output?

Is there a way to display response options that have 0 responses in SPSS frequency output? The default is for SPSS to omit in the frequency table output any response option that is not selected by at least a single respondent. I looked for a syntax-driven option to no avail. Thank you in advance for any assistance!
It doesn't show because there is no one single case in the data is with that attribute. So, by forcing a row of zero you'll need to realize we're asking SPSS to do something incorrect.
Having said that, you can introduce a fake case with the missing category. E.g. if you have Orange, Apple, and Pear, but no one answered they like Pear, the add one fake case that says Pear.
Now, make a new weight variable that consists of only 1. But for the Pear case, make it very very small like 0.00001. Then, go to Data > Weight Cases > Weight cases by and put that new weight variable over. Click OK to apply. Now what happens is that SPSS will treat the "1" with a weight of 1 and the fake case with a weight that is 1/10000 of a normal case. If you rerun the frequency you should see the one with zero count shows up.
If you have purchased the Custom Table module you can also do that directly as well, as far as I can tell from their technical document. That module costs 637 to 3630 depending on license type, so probably only worth a try if your institute has it.
So, I'm a noob with SPSS, I (shame on me) have a cracked version of SPSS 22 and if I understood your question correctly, this is my solution:
double click the Frequency table in Output
right click table, select Table Properties
go to General and then uncheck the Hide empty rows and columns option
Hope this helps someone!
If your SPSS version has no Custom Tables installed and you haven't collected money for that module yet then use the following (run this syntax):
*Note: please use variable names up to 8 characters long.
set mxloops 1000. /*in case your list of values is longer than 40
matrix.
get vars /vari= V1 V2 /names= names /miss= omit. /*V1 V2 here is your categorical variable(s)
comp vals= {1,2,3,4,5,99}. /*let this be the list of possible values shared by the variables
comp freq= make(ncol(vals),ncol(vars),0).
loop i= 1 to ncol(vals).
comp freq(i,:)= csum(vars=vals(i)).
end loop.
comp names= {'vals',names}.
print {t(vals),freq} /cnames= names /title 'Frequency'. /*here you are - the frequencies
print {t(vals),freq/nrow(vars)*100} /cnames= names /format f8.2 /title 'Percent'. /*and percents
end matrix.
*If variables have missing values, they are deleted listwise. To include missings, use
get vars /vari= V1 V2 /names= names /miss= -999. /*or other value
*To exclude missings individually from each variable, analyze by separate variables.

Resources