I have a file with more than 250 variables and more than 100 cases. Some of these variables have an error in decimal dot (20445.12 should be 2.044512).
I want to modify programatically these data, I found a possible way in a Visual Basic editor provided by SPSS (I show you a screen shot below), but I have an absolute lack of knowledge.
How can I select a range of cells in this language?
How can I store the cell once modified its data?
--- EDITED NEW DATA ----
Thank you for your fast reply.
The problem now its the number of digits that number has. For example, error data could have the following format:
Case A) 43998 (five digits) ---> 4.3998 as correct value.
Case B) 4399 (four digits) ---> 4.3990 as correct value, but parsed as 0.4399 because 0 has been removed when file was created.
Is there any way, like:
IF (NUM < 10000) THEN NUM = NUM / 1000 ELSE NUM = NUM / 10000
Or something like IF (Number_of_digits(NUM)) THEN ...
Thank you.
there's no need for VB script, go this way:
open a syntax window, paste the following code:
do repeat vr=var1 var2 var3 var4.
compute vr=vr/10000.
end repeat.
save outfile="filepath\My corrected data.sav".
exe.
Replace var1 var2 var3 var4 with the names of the actual variables you need to change. For variables that are contiguous in the file you may use var1 to var4.
Replace vr=vr/10000 with whatever mathematical calculation you would like to use to correct the data.
Replace "filepath\My corrected data.sav" with your path and file name.
WARNING: this syntax will change the data in your file. You should make sure to create a backup of your original in addition to saving the corrected data to a new file.
Suppose, I have following sample ARFF file with two attributes:
(1) sentiment: positive [1] or negative [-1]
(2) tweet: text
#relation sentiment_analysis
#attribute sentiment {1, -1}
#attribute tweet string
#data
-1,'is upset that he can\'t update his Facebook by texting it... and might cry as a result School today also. Blah!'
-1,'#Kenichan I dived many times for the ball. Managed to save 50\% The rest go out of bounds'
-1,'my whole body feels itchy and like its on fire '
-1,'#nationwideclass no, it\'s not behaving at all. i\'m mad. why am i here? because I can\'t see you all over there. '
-1,'#Kwesidei not the whole crew '
-1,'Need a hug '
1,'#Cliff_Forster Yeah, that does work better than just waiting for it In the end I just wonder if I have time to keep up a good blog.'
1,'Just woke up. Having no school is the best feeling ever '
1,'TheWDB.com - Very cool to hear old Walt interviews! ? http://blip.fm/~8bmta'
1,'Are you ready for your MoJo Makeover? Ask me for details '
1,'Happy 38th Birthday to my boo of alll time!!! Tupac Amaru Shakur '
1,'happy #charitytuesday #theNSPCC #SparksCharity #SpeakingUpH4H '
I want to convert the values of second attribute into equivalent TF-IDF values.
Btw, I tried following code but its output ARFF file doesn't contain first attribute for positive(1) values for respective instances.
// Set the tokenizer
NGramTokenizer tokenizer = new NGramTokenizer();
tokenizer.setNGramMinSize(1);
tokenizer.setNGramMaxSize(1);
tokenizer.setDelimiters("\\W");
// Set the filter
StringToWordVector filter = new StringToWordVector();
filter.setAttributeIndicesArray(new int[]{1});
filter.setOutputWordCounts(true);
filter.setTokenizer(tokenizer);
filter.setInputFormat(inputInstances);
filter.setWordsToKeep(1000000);
filter.setDoNotOperateOnPerClassBasis(true);
filter.setLowerCaseTokens(true);
filter.setTFTransform(true);
filter.setIDFTransform(true);
// Filter the input instances into the output ones
outputInstances = Filter.useFilter(inputInstances, filter);
Sample output ARFF file:
#data
{0 -1,320 1,367 1,374 1,397 1,482 1,537 1,553 1,681 1,831 1,1002 1,1033 1,1112 1,1119 1,1291 1,1582 1,1618 1,1787 1,1810 1,1816 1,1855 1,1939 1,1941 1}
{0 -1,72 1,194 1,436 1,502 1,740 1,891 1,935 1,1075 1,1256 1,1260 1,1388 1,1415 1,1579 1,1611 1,1818 2,1849 1,1853 1}
{0 -1,374 1,491 1,854 1,873 1,1120 1,1121 1,1197 1,1337 1,1399 1,2019 1}
{0 -1,240 1,359 2,369 1,407 1,447 1,454 1,553 1,1019 1,1075 3,1119 1,1240 1,1244 1,1373 1,1379 1,1417 1,1599 1,1628 1,1787 1,1824 1,2021 1,2075 1}
{0 -1,198 1,677 1,1379 1,1818 1,2019 1}
{0 -1,320 1,1070 1,1353 1}
{0 -1,210 1,320 2,477 2,867 1,1020 1,1067 1,1075 1,1212 1,1213 1,1240 1,1373 1,1404 1,1542 1,1599 1,1628 1,1815 1,1847 1,2067 1,2075 1}
{179 1,1815 1}
{298 1,504 1,662 1,713 1,752 1,1163 1,1275 1,1488 1,1787 1,2011 1,2075 1}
{144 1,785 1,1274 1}
{19 1,256 1,390 1,808 1,1314 1,1350 1,1442 1,1464 1,1532 1,1786 1,1823 1,1864 1,1908 1,1924 1}
{84 1,186 1,320 1,459 1,564 1,636 1,673 1,810 1,811 1,966 1,997 1,1094 1,1163 1,1207 1,1592 1,1593 1,1714 1,1836 1,1853 1,1964 1,1984 1,1997 2,2058 1}
{9 1,1173 1,1768 1,1818 1}
{86 1,935 1,1112 1,1337 1,1348 1,1482 1,1549 1,1783 1,1853 1}
As you can see that first few instances are okay(as they contains -1 class along with other features), but the last remaining instances don't contain positive class attribute(1).
I mean, there should have been {0 1,...} as very first attribute in the last instances in output ARFF file, but it is missing.
You have to specify which is your class attribute explicitly in the java program as, when you apply StringToWordVector filter your input gets divided among specified n-grams. Hence class attribute location changes once StringToWordVector vectorizes the input. You can just use Reorder filer which will ultimately place class attribute at last position and Weka will pick last attribute as class attribute.
More info about Reordering in Weka can be found at http://weka.sourceforge.net/doc.stable-3-8/weka/filters/unsupervised/attribute/Reorder.html. Also example 5 at http://www.programcreek.com/java-api-examples/index.php?api=weka.filters.unsupervised.attribute.Reorder may help you in doing reordering.
Hope it helps.
Your process for obtaining TF-IDF seems correct.
According to my experiments, if you have n classes, Weka shows information labels for records for each n-1 classes and records for the nth class are implied.
In your case, you have 2 classes -1 and 1, so weka is showing labels in records with class label -1 and records with label 1 are implied.
I have a VB6 form with a text boxes for minimum and maximum values. The text boxes have a MaxLength of 4, and I have code for the keyPress event to limit it to numeric entry. The code checks to make sure that max > min, however it is behaving very strangely. It seems to be comparing the values in scientific notation or something. For example, it evaluates 30 > 200 = true, and 100 > 20 = false. However if I change the entries to 030 > 200 and 100 > 020, then it gives me the correct answer. Does anyone know why it would be acting this way?
My code is below, I am using control arrays for the minimum and maximum text boxes.
For cnt = 0 To 6
If ParameterMin(cnt) > ParameterMax(cnt) Then
MsgBox ("Default, Min, or Max values out of range. Line not updated.")
Exit Sub
End If
Next cnt
That is how text comparison behaves for numbers represented as variable length text (in general, not just VB6).
Either pad with zeros to a fixed length and continue comparing as text (as you noted)
OR
(preferable) Convert to integers and then compare.
If I understood correctly, you can alter the code to
If Val(ParameterMin(cnt)) > Val(ParameterMax(cnt)) Then
I wish to advise one thing -(IMHO...) if possible, avoid checking data during key_press/key_up/key_down .
Can you change the GUI to contain a "submit" button and check your "form" there ?
Hope I helped...
I'm writing a Wireshark dissector in lua and trying to decode a time-based protocol field.
I've two components 1)
local ref_time = os.time{year=2000, month=1, day=1, hour=0, sec=0}
and 2)
local offset_time = tvbuffer(0:5):bytes()
A 5-Byte (larger than uint32 range) ByteArray() containing the number of milliseconds (in network byte order) since ref_time. Now I'm looking for a human readable date. I didn't know this would be so hard, but 1st it seems I cannot simple add an offset to an os.time value and 2nd the offset exceeds Int32 range ...and most function I tested seem to truncate the exceeding input value.
Any ideas on how I get the date from ref_time and offset_time?
Thank you very much!
Since ref_time is in seconds and offset_time is in milliseconds, just try:
os.date("%c",ref_time+offset_time/1000)
I assume that offset_time is a number. If not, just reconstruct it using arithmetic. Keep in mind that Lua uses doubles for numbers and so a 5-byte integer fits just fine.