To explain my problem I use this example data set:
SampleID Date Project Problem
03D00173 03-Dec-2010 1,00
03D00173 03-Dec-2010 1,00
03D00173 28-Sep-2009 YNTRAD
03D00173 28-Sep-2009 YNTRAD
Now, the problem is that I need to replace the text "YNTRAD" with "YNTRAD_PILOT" but only for the cases with Date = 28-Sep-2009.
This is example is part of a much larger database, with many more cases having Project=YNTRAD and Data=28-Sep-2009, so I can not simply select first all cases with 28-Sep-2009, then check which of these cases have Project=YNTRAD and then replace. Instead, what I need to do is:
Look at each case that has a 1,00 in Problem (these are problem
cases)
Then find the SampleID that corresponds with that sample
Then find all other cases with the same SampleID BUT WITH
Date=28-Sep-2009 (this is needed because only those samples are part
of a pilot study) and then replace YNTRAD in Project to
YNTRAD_PILOT.
I read a lot about:
LOOP
- DO REPEAT
- DO IF
but I don't know how to use these in solving this problem.
I first tried making a list containing only the sample ID's that need eventually to be changed (again, this is part of a much larger database).
STRING SampleID2 (A20).
IF (Problem=1) SampleID2=SampleID.
EXECUTE.
AGGREGATE
/OUTFILE=*
/BREAK=SampleID2
/n_SampleID2=N.
This gives a dataset with only the SampleID's for which a change should be made. However I don't know how to read out this dataset case by case and looking up each SampleID in the overall file with all the date and then change only those cases were Date = 28-Sep-2009.
It sounds like once we can identify the IDs that need to be changed we've done the tricky part here. We can use AGGREGATE with MODE=ADDVARIABLES to add a problem Id counter variable to our dataset. From there, it's as you'd expect.
* Add var IdProblemCnt to your database . Stores # of times a given Id had a record with Problem = 1.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=SampleId
/IdProblemCnt=CIN(Problem, 1, 1) .
EXE .
* once we've identified the "problem" Ids we can use `RECODE` Project var.
DO IF (IdProblemCnt>0 AND Date = DATE.MDY(9,28,2009) .
RECODE Project ('YNTRAD' = 'YNTRAD_PILOT') .
END IF .
EXE .
I have a list of email addresses in SPSS. I'm trying to write syntax to count how many times each email address appears.
For instance:
In my desired output, if johndoe#aol.com appears in the data 3 times, I want all instances of his email to show a 3 in my new column.
I know I can write syntax to have it count (ie johndoe#aol.com will be assigned 1 the first time, then 2 then 3)... but this is not what I want.
Thanks!
Steps to do this:
Sort cases by email.
Get the counts using the Aggregate command.
Use the Identify Duplicate Cases command to generate an indicator of whether a given email is the first of its kind in the file.
Select cases that aren't the first with that particular email.
All four of those commands are in the Data menu in the GUI. Syntax to do the whole thing:
SORT CASES BY Email.
*This will create a new variable N_EMAIL with the counts. It will appear for every case.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/PRESORTED
/BREAK=Email
/N_EMAIL=N.
*Now we generate a "PrimaryFirst" indicator showing whether a given case is the first instance of its email.
MATCH FILES
/FILE=*
/BY Email
/FIRST=PrimaryFirst
/LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE MatchSequence=MatchSequence+1.
END IF.
LEAVE MatchSequence.
FORMATS MatchSequence (f7).
COMPUTE InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
/FILE=*
/DROP=PrimaryLast InDupGrp MatchSequence.
EXECUTE.
*Filter out duplicate cases.
SELECT IF PrimaryFirst = 1.
EXECUTE.
*Final cleanup.
DELETE VARIABLES PrimaryFirst.
Just run this:
AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=EmailAddress /num_instances=N.
A new column will appear in the dataset called num_instances (you can of course select another name) which will have the desired count appear in all instances of each Email address.
How can I convert String Time into Integer in Neo4j. I have been trying for quite a long time, but there does not seem to be any clear solution for it. Only solution, I found is to add a new property to my node while loading it from CSV. I don't want to do this.
I have Time in the following format:
"18:11:00"
and I want to do some subtraction on them.
I tried doing the following but to no avail:
match (st1:Stoptime)-[r:PRECEDES]->(st2:Stoptime)
return st1.arrival_time, toInt(st1.arrival_time)
limit 10
But I get following output:
"18:11:00" null
You can install APOC procedures and do it using the function apoc.date.parse:
return apoc.date.parse('18:11:00','s','HH:mm:ss')
Running this example the output will be:
╒════════════════════════════════════════════╕
│"apoc.date.parse("18:11:00",'s','HH:mm:ss')"│
╞════════════════════════════════════════════╡
│65460 │
└────────────────────────────────────────────┘
The first parameter is the date/time to be parsed. The second parameter is the target time unit. In this case I have specified seconds (s). The third parameter indicates the date/time format of the first parameter.
Note: remember to install APOC procedures according the version of Neo4j you are using. Take a look in the version compatibility matrix.
I want an alternative to running frequency for string variables because I also want to get a case number for each of the string value (I have a separate variable for case ID).
After reviewing the string values I will need to find them to recode which is the reason I need to know the case number.
I know that PRINT command should do what I want but I get an error - is there any alternative?
PRINT / id var2 .
EXECUTE.
>Error # 4743. Command name: PRINT
>The line width specified exceeds the output page width or the record length or
>the maximum record length of 2147483647. Reduce the number of variables or
>split the output line into several records.
>Execution of this command stops.
Try the LIST command.
I often use the TEMPORARY commond prior to the LIST command, as often there is only a small select of record of interest I may want to "list"/investigate.
For example, in the below, only to list the records where VAR2 is not a blank string.
TEMP.
SELECT IF (len(VAR2)>0).
LIST ID VAR2.
Alternatively, you could also (but dependent on having CUSTOM TABLES add-on module), do something like below which would get the results into a tabular format also (which may be preferable if then exporting to Excel, for example.
CTABLES /TABLE CTABLES /VLABELS VARIABLES=ALL DISPLAY=NONE
/TABLE A[C]>B[C]
/CATEGORIES VARIABLES=ALL EMPTY=EXCLUDE.
After having the unpleasant surprise that Comma Seperated Value (CSV) files are not necessarily comma-separated, I'm trying to find out if there is any way to detect what the regional settings list separator value is on the client machine from http request.
Scenario is as follows: A user can download some data in CSV format from web site (RoR, if it matters). That CSV file is generated on the fly, sent to the user, and most of the time double-clicked and opened in MS Excel on Windows machine at the destination. Now, if the user has ',' set as the list separator, the data is properly arranged in columns, but if any other separator (';' is widely used here) is set, it all just gets thrown into a single column. So, is there any way to detect what separator is used on the client machine, and generate the file accordingly?
I have a sinking feeling that it is not, but I'd like to be sure before I pass the 'can't be done, sorry' line to the customer :)
Here's a JavaScript solution that I just wrote based on the method shown here:
function getListSeparator() {
var list = ['a', 'b'], str;
if (list.toLocaleString) {
str = list.toLocaleString();
if (str.indexOf(';') > 0 && str.indexOf(',') == -1) {
return ';';
}
}
return ',';
}
The key is in the toLocaleString() method that uses the system list separator.
You could use JavaScript to get the list separator and set it in a cookie which you could then detect from your server.
I checked all the Windows Locales, and it seems that the default list separator is virtually always either ',' or ';'. For some locales the drop-down list in the Control Panel offers both options; for others it offers just ','. One locale, Divehi, has a strange character that I've not seen before as the list separator, and, for any locale, it is possible for the user to enter any string they want as the list separator.
Putting random strings as the separator in a CSV file sounds like trouble to me, so my function above will only return either a ';' or a '.', and it will only return a ';' if it can't find a ',' in the Array.toLocaleString string. I'm not entirely sure about whether array.toLocaleString has a format that's guaranteed across browsers, hence the indexOf checks rather than picking out a character at a specific index.
Using Array.toLocaleString to get the list separator works on IE6, IE7, and IE8, but unfortunately it doesn't seem to work on Firefox, Safari, Opera, and Chrome (or at least the versions of those browsers on my computer): they all seem to separate array items with a comma, irrespective of the Windows "list separator" setting.
Also worth noting that by default Excel seems to use the system "decimal separator" when it's parsing numbers out of CSV files. Yuk. So, if you're localizing the list separator you might want to localize the decimal separator too.
I think everyone should use Calc from OpenOffice - it asks when you open a file about encoding, column separators and other. I don't know answer for your question, but maybe you can try to send data in html tables or in xml - excel should read both of them correctly. From my experience it isn't easy to export data to excel. Few weeks ago I have problem with it and after few hours of work I asked a person, who couldn't open my csv file in excel, about version. It was Excel 98...
Take a look on html example and xml.
The simplier version of getListSeparator function, enabling any character to be a separator:
function getListSeparator_bis()
{
var list = ['a', 'b'];
return(list.toLocaleString().charAt(1));
}// getListSeparator_bis
Just set any char (f.e. '#') as list separator in your OS and try the code as above. The appropriate char (i.e. '#' if set as sugested) is returned.
Could you just have the users with non comma separators set a profile kind of option and then generate CSVs based on user settings with the default being commas?
Toms, as far as I'm aware there is no way of achieving what you're after. The most you can do is try and detect the user locale and map it against a database of locales/list separators, altering the list separator in the .CSV file as a result.