Syntax if variable includes a specific string of text - spss

I was wondering if there was a syntax to select all account names that include a certain string of text.
For example, if I have a SPSS file that has 3 million account names, I'd want to look at only the account names that have a / TKS at the end. The account name could like like Stack Overflow / TKS.

You can use char.index to check whether a string includes a specific substring.
So for example:
compute containsTKS=0.
if char.index(account_name,"/ TKS")>0 containsTKS=1.
execute.
You can then use containsTKS to filter or select cases.

The solution eli-k provided checks if / TKS is inside the account_name, at any position.
If you want to check if the "/ TKS" text is at the end of your account_name, you need a slightly changed syntax:
compute containsTKS=0.
if char.index(account_name,"/ TKS")=char.len(rtrim(account_name))-5 containsTKS=1.
execute.
Then, as eli-k mentioned, "You can then use containsTKS to filter or select cases."

Related

How to read out a list of cases in one variable in SPSS and use that to add data?

To explain my problem I use this example data set:
SampleID Date Project Problem
03D00173 03-Dec-2010 1,00
03D00173 03-Dec-2010 1,00
03D00173 28-Sep-2009 YNTRAD
03D00173 28-Sep-2009 YNTRAD
Now, the problem is that I need to replace the text "YNTRAD" with "YNTRAD_PILOT" but only for the cases with Date = 28-Sep-2009.
This is example is part of a much larger database, with many more cases having Project=YNTRAD and Data=28-Sep-2009, so I can not simply select first all cases with 28-Sep-2009, then check which of these cases have Project=YNTRAD and then replace. Instead, what I need to do is:
Look at each case that has a 1,00 in Problem (these are problem
cases)
Then find the SampleID that corresponds with that sample
Then find all other cases with the same SampleID BUT WITH
Date=28-Sep-2009 (this is needed because only those samples are part
of a pilot study) and then replace YNTRAD in Project to
YNTRAD_PILOT.
I read a lot about:
LOOP
- DO REPEAT
- DO IF
but I don't know how to use these in solving this problem.
I first tried making a list containing only the sample ID's that need eventually to be changed (again, this is part of a much larger database).
STRING SampleID2 (A20).
IF (Problem=1) SampleID2=SampleID.
EXECUTE.
AGGREGATE
/OUTFILE=*
/BREAK=SampleID2
/n_SampleID2=N.
This gives a dataset with only the SampleID's for which a change should be made. However I don't know how to read out this dataset case by case and looking up each SampleID in the overall file with all the date and then change only those cases were Date = 28-Sep-2009.
It sounds like once we can identify the IDs that need to be changed we've done the tricky part here. We can use AGGREGATE with MODE=ADDVARIABLES to add a problem Id counter variable to our dataset. From there, it's as you'd expect.
* Add var IdProblemCnt to your database . Stores # of times a given Id had a record with Problem = 1.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=SampleId
/IdProblemCnt=CIN(Problem, 1, 1) .
EXE .
* once we've identified the "problem" Ids we can use `RECODE` Project var.
DO IF (IdProblemCnt>0 AND Date = DATE.MDY(9,28,2009) .
RECODE Project ('YNTRAD' = 'YNTRAD_PILOT') .
END IF .
EXE .

GSheets - How to query a partial string

I am currently using this formula to get all the data from everyone whose first name is "Peter", but my problem is that if someone is called "Simon Peter" this data is gonna show up on the formula output.
=QUERY('Data'!1:1000,"select * where B contains 'Peter'")
I know that for the other formulas if I add an * to the String this issue is resolved. But in this situation for the QUERY formula the same logic do not applies.
Do someone knows the correct syntax or a workaround?
How about classic SQL syntax
=QUERY('Data'!1:1000,"select * where B like 'Peter %'")
The LIKE keyword allows use of wildcard % to represent characters relative to the known parts of the searched string.
See the query reference: developers.google.com/chart/interactive/docs/querylanguage You could split firstname and lastname into separate columns, then only search for firstnames exactly equal to 'Peter'. Though you may want to also check if lowercase/uppercase where lower(B) contains 'peter' or whitespaces are present in unexpected places (e.g., trim()). You could also search only for values that start with Peter by using starts with instead of contains, or a regular expression using matches. – Brian D
It seems that for my case using 'starts with' is a perfect fit. Thank you!

Teradata:Get a specific part of a large variable text field

My first Post: (be kind )
PROBLEM: I need to extract the View Name from a Text field that contains a full SQL Statements so I can link the field a different data source. There are two text strings that always exist on both sides of the target view. I was hoping to use these as identifying "anchors" along with a substring to bring in the View Name text from between them.
EXAMPLE:
from v_mktg_dm.**VIEWNAME** as lead_sql
(UPPER CASE/BOLD is what I want to extract)
I tried using
SELECT
SUBSTR(SQL_FIELD,INSTR(SQL_FIELD,'FROM V_MKTG_TRM_DM.',19),20) AS PARSED_FIELD
FROM DATABASE.SQL_STORAGE_DATA
But am not getting good results -
Any help is appreciated
You can apply a Regular Expression:
RegExp_Substr_gpl(SQL_FIELD, '(v_mktg_dm\.)(.*?)( AS lead_sql)',1,1,'i',2)
This looks for the string between 'v_mktg_dm.' and ' AS lead_sql'.
RegExp_Substr_gpl is an undocumented variation of RegExp_Substr which simplifies the syntax for ignoring parts of the match

Custom order on string

I have a project model. Projects have a code attribute, which is in AAXXXX-YY format like "AA0001-18", "ZA0012-19", where AA is two characters, XXXX is a progressive number, and YY is the last two digits of the year of its creation.
I need to define a default scope that orders projects by code in a way that the year takes precedence over the other part. Supposing I have the codes "ZZ0001-17", "AA0001-18", and "ZZ002-17", "ZZ001-17" is first, "ZZ002-17" is second, and "AA001-18" is third.
I tried:
default_scope { order(:code) }
but I get "AA001-18" first.
Short answer
order("substring(code from '..$') ASC, code ASC")
Wait but why?
So as you said, you want to basically sort by 2 things:
the last 2 characters in the code string. YY
the rest of the code AAXXXX-
So first things first,
the order function as per Rails documentation will take the arguments you added and use them in the ORDER BY clause of the query.
Then, the substring function according to the documentation of PostgreSQL is:
substring(string from pattern)
If we want 2 characters .. from the end of the string $ we use ..$
Hence, substring(code from '..$')
For more information about pattern matching please refer to the documentation here.
Now finally, with the second part of our ordering the code which already will act as a sorter for all the preceding characters AAXXXX-.

How to sort a list of 1million records by the first letter of the title

I have a table with 1 million+ records that contain names. I would like to be able to sort the list by the first letter in the name.
.. ABCDEFGHIJKLMNOPQRSTUVWXYZ
What is the most efficient way to setup the db table to allow for searching by the first character in the table.name field?
The best idea right now is to add an extra field which stores the first character of the name as an observer, index that field and then sort by that field. Problem is it's no longer necessarily alphabetical.
Any suggestions?
You said in a comment:
so lets ignore the first letter part. How can I all records that start with A? All A's no B...z ? Thanks – AnApprentice Feb 21 at 15:30
I issume you meant How can I RETURN all records...
This is the answer:
select * from t
where substr(name, 1, 1) = 'A'
I agree with the questions above as to why you would want to do this -- a regular index on the whole field is functionally equivalent. PostgreSQL (with some new ones in v. 9) has some rather powerful indexing capabilities for special cases which you might want to read about here http://www.postgresql.org/docs/9.1/interactive/sql-createindex.html

Resources