How can filter any SET by its concat value according to another SET in Redis - lua

I have a filter optimization problem in Redis.
I have a Redis SET which keeps the doc and pos pairs of a type in a corpus.
example:
smembers type_in_docs.1
result: doc.pos pairs
array (size=216627)
0 => string '2805.2339' (length=9)
1 => string '2410.14208' (length=10)
2 => string '3516.1810' (length=9)
...
Another redis set i create live according to user choices
It contains selected docs.
smembers filteredDocs
I want to filter doc.pos pairs "type_in_docs" set according to user Doc id choices.
In fact if i didnt use concat values in set it was easy with SINTER.
So i implement a php filter code as below.
It works but need an optimization.
In big doc.pairs set too much time need. (Nearly After 150000 members!)
$concordance= $this->redis->smembers('types_in_docs.'.$typeID);
$filteredDocs= $this->redis->smembers('filteredDocs');
$filtered = array_filter($concordance, function($pairs) use ($filteredDocs) {
if( in_array(substr($pairs, 0, strpos($pairs, '.')), $filteredDocs) ) return true;
});
I tried sorted set with scores as docId.
Bu couldnt find a intersect or filter option for score values.
I am thinking and searching a Redis based solution with supported keys, sets or Lua script for time optimization.
But nothing find.
How can i filter Redis sets with concat values?
Thanks for helps.

Your code is slow primarily because you're moving a lot of data from Redis to your PHP filter. The general motivation here should be perform as much filtering as possible on the server. To do that you'd need to pay some sort of price in CPU & RAM.
There are many ways to do this, here's one:
Ensure you're using Redis v2.8.9 or above.
To allow efficiently looking for doc only, keep your doc.pos pairs as is but use Sorted Sets with score = 0, your e.g.:
ZADD type_in_docs.1 0 2805.2339 0 2410.14208 0 3516.1810
This will allow you to mimic SISMEMBER for doc in the set with:
ZRANGEBYLEX type_in_docs.1 [<$typeID> (<$typeID + "\xff">
You can now just SMEMBERS on the (usually) smaller filterDocs set and then call ZRANGEBYLEX on each for immediate gains.
If you want to do better - in extreme cases (i.e. large filterDocs, small type_in_docs) you should do the reverse.
If you want to do even better, use Lua to wrap up the filtering logic - something like:
-- #usage: redis-cli --filter_doc_pos.lua <filter set keyname> <type pairs keyname>
-- #returns: list of matching doc.pos pairs
local r = {}
for _, fv in pairs(redis.call("SMEMBERS", KEYS[1])) do
local t = redis.call("ZRANGEBYLEX", KEYS[2], "[" .. fv , "(" .. fv .. "\xff")
for _, tv in pairs(t) do
r[#r+1] = tv
end
end
return r

Related

Issue returning desired data with Lua

Wondering if I could get some help with this:
function setupRound()
local gameModes = {'mode 1','mode 2','mode 3'} -- Game modes
local maps = {'map1','map2','map3'}
--local newMap = maps[math.random(1,#maps)]
local mapData = {maps[math.random(#maps)],gameModes[math.random(#gameModes)]}
local mapData = mapData
return mapData
end
a = setupRound()
print(a[1],a[2]) --Fix from Egor
What the problem is:
`
When trying to get the info from setupRound() I get table: 0x18b7b20
How I am trying to get mapData:
a = setupRound()
print(a)
Edit:
Output Issues
With the current script I will always the the following output: map3 mode 2.
What is the cause of this?
Efficiency; is this the best way to do it?
While this really isn't a question, I just wanted to know if this method that I am using is truly the most efficient way of doing this.
First of all
this line does nothing useful and can be removed (it does something, just not something you'd want)
local mapData = mapData
Output Issues
The problem is math.random. Write a script that's just print(math.random(1,100)) and run it 100 times. It will print the same number each time. This is because Lua, by default, does not set its random seed on startup. The easiest way is to call math.randomseed(os.time()) at the beginning of your program.
Efficiency; is this the best way to do it?
Depends. For what you seem to want, yes, it's definitely efficient enough. If anything, I'd change it to the following to avoid magic numbers which will make it harder to understand the code in the future.
--- etc.
local mapData = {
map = maps[math.random(#maps)],
mode = gameModes[math.random(#gameModes)]
}
-- etc.
print(a.map, a.mode)
And remember:
Premature optimization is the root of all evil.
— Donald Knuth
You did very good by creating a separate function for generating your modes and maps. This separates code and is modular and neat.
Now, you have your game modes in a table modes = {} (=which is basically a list of strings).
And you have your maps in another table maps = {}.
Each of the table items has a key, that, when omitted, becomes a number counted upwards. In your case, there are 3 items in modes and 3 items in maps, so keys would be 1, 2, 3. The key is used to grab a certain item in that table (=list). E.g. maps[2] would grab the second item in the maps table, whose value is map 2. Same applies to the modes table. Hence your output you asked about.
To get a random game mode, you just call math.random(#mode). math.random can accept up to two parameters. With these you define your range, to pick the random number from. You can also pass a single parameter, then Lua assumes to you want to start at 1. So math.random(3) becomes actually math.random(1, 3). #mode in this case stand for "count all game modes in that table and give me that count" which is 3.
To return your chosen map and game mode from that function we could use another table, just to hold both values. This time however the table would have different keys to access the values inside it; namely "map" and "mode".
Complete example would be:
local function setupRound()
local modes = {"mode 1", "mode 2", "mode 3"} -- different game modes
local maps = {"map 1", "map 2", "map 3"} -- different maps
return {map = maps[math.random(#maps)], mode = modes[math.random(#modes)]}
end
for i = 1, 10 do
local freshRound = setupRound()
print(freshRound.map, freshRound.mode)
end

Other ways to call/eval dynamic strings in Lua?

I am working with a third party device which has some implementation of Lua, and communicates in BACnet. The documentation is pretty janky, not providing any sort of help for any more advanced programming ideas. It's simply, "This is how you set variables...". So, I am trying to just figure it out, and hoping you all can help.
I need to set a long list of variables to certain values. I have a userdata 'ME', with a bunch of variables named MVXX (e.g. - MV21, MV98, MV56, etc).
(This is all kind of background for BACnet.) Variables in BACnet all have 17 'priorities', i.e., every BACnet variable is actually a sort of list of 17 values, with priority 16 being the default. So, typically, if I were to say ME.MV12 = 23, that would set MV12's priority-16 to the desired value of 23.
However, I need to set priority 17. I can do this in the provided Lua implementation, by saying ME.MV12_PV[17] = 23. I can set any of the priorities I want by indexing that PV. (Corollaries - what is PV? What is the underscore? How do I get to these objects? Or are they just interpreted from Lua to some function in C on the backend?)
All this being said, I need to make that variable name dynamic, so that i can set whichever value I need to set, based on some other code. I have made several attempts.
This tells me the object(MV12_PV[17]) does not exist:
x = 12
ME["MV" .. x .. "_PV[17]"] = 23
But this works fine, setting priority 16 to 23:
x = 12
ME["MV" .. x] = 23
I was trying to attempt some sort of what I think is called an evaluation, or eval. But, this just prints out function followed by some random 8 digit number:
x = 12
test = assert(loadstring("MV" .. x .. "_PV[17] = 23"))
print(test)
Any help? Apologies if I am unclear - tbh, I am so far behind the 8-ball I am pretty much grabbing at straws.
Underscores can be part of Lua identifiers (variable and function names). They are just part of the variable name (like letters are) and aren't a special Lua operator like [ and ] are.
In the expression ME.MV12_PV[17] we have ME being an object with a bunch of fields, ME.MV12_PV being an array stored in the "MV12_PV" field of that object and ME.MV12_PV[17] is the 17th slot in that array.
If you want to access fields dynamically, the thing to know is that accessing a field with dot notation in Lua is equivalent to using bracket notation and passing in the field name as a string:
-- The following are all equivalent:
x.foo
x["foo"]
local fieldname = "foo"
x[fieldname]
So in your case you might want to try doing something like this:
local n = 12
ME["MV"..n.."_PV"][17] = 23
BACnet "Commmandable" Objects (e.g. Binary Output, Analog Output, and o[tionally Binary Value, Analog Value and a handful of others) actually have 16 priorities (1-16). The "17th" you are referring to may be the "Relinquish Default", a value that is used if all 16 priorities are set to NULL or "Relinquished".
Perhaps your system will allow you to write to a BACnet Property called "Relinquish Default".

How to combine search text with other criteria using Redis?

I successfully wrote an intersection of text search and other criteria using Redis. To achieve that I'm using a Lua script. The issue is that I'm not only reading, but also writing values from that script. From Redis 3.2 it's possible to achieve that by calling redis.replicate_commands(), but not before 3.2.
Below is how I'm storing the values.
Names
> HSET product:name 'Cool product' 1
> HSET product:name 'Nice product' 2
Price
> ZADD product:price 49.90 1
> ZADD product:price 54.90 2
Then, to get all products that matches 'ice', for example, I call:
> HSCAN product:name 0 MATCH *ice*
However, since HSCAN uses a cursor, I have to call it multiple times to fetch all results. This is where I'm using a Lua script:
local cursor = 0
local fields = {}
local ids = {}
local key = 'product:name'
local value = '*' .. ARGV[1] .. '*'
repeat
local result = redis.call('HSCAN', key, cursor, 'MATCH', value)
cursor = tonumber(result[1])
fields = result[2]
for i, id in ipairs(fields) do
if i % 2 == 0 then
ids[#ids + 1] = id
end
end
until cursor == 0
return ids
Since it's not possible to use the result of a script with another call, like SADD key EVAL(SHA) .... And also, it's not possible to use global variables within scripts. I've changed the part inside the fields' loop to access the list of ID's outside the script:
if i % 2 == 0 then
ids[#ids + 1] = id
redis.call('SADD', KEYS[1], id)
end
I had to add redis.replicate_commands() to the first line. With this change I can get all ID's from the key I passed when calling the script (see KEYS[1]).
And, finally, to get a list 100 product ID's priced between 40 and 50 where the name contains "ice", I do the following:
> ZUNIONSTORE tmp:price 1 product:price WEIGHTS 1
> ZREMRANGEBYSCORE tmp:price 0 40
> ZREMRANGEBYSCORE tmp:price 50 +INF
> EVALSHA b81c2b... 1 tmp:name ice
> ZINTERSTORE tmp:result tmp:price tmp:name
> ZCOUNT tmp:result -INF +INF
> ZRANGE tmp:result 0 100
I use the ZCOUNT call to know in advance how many result pages I'll have, doing count / 100.
As I said before, this works nicely with Redis 3.2. But when I tried to run the code at AWS, which only supports Redis up to 2.8, I couldn't make it work anymore. I'm not sure how to iterate with HSCAN cursor without using a script or without writing from the script. There is a way to make it work on Redis 2.8?
Some considerations:
I know I can do part of the processing outside Redis (like iterate the cursor or intersect the matches), but it'll affect the application overall performance.
I don't want to deploy a Redis instance by my own to use version 3.2.
The criteria above (price range and name) is just an example to keep things simple here. I have other fields and type of matches, not only those.
I'm not sure if the way I'm storing the data is the best way. I'm willing to listen suggestion about it.
The only problem I found here is storing the values inside a lua scirpt. So instead of storing them inside a lua, take that value outside lua (return that values of string[]). Store them in a set in a different call using sadd (key,members[]). Then proceed with intersection and returning results.
> ZUNIONSTORE tmp:price 1 product:price WEIGHTS 1
> ZREVRANGEBYSCORE tmp:price 0 40
> ZREVRANGEBYSCORE tmp:price 50 +INF
> nameSet[] = EVALSHA b81c2b... 1 ice
> SADD tmp:name nameSet
> ZINTERSTORE tmp:result tmp:price tmp:name
> ZCOUNT tmp:result -INF +INF
> ZRANGE tmp:result 0 100
IMO your design is the most optimal one. One advice would be to use pipeline wherever possible, as it would process everything at one go.
Hope this helps
UPDATE
There is no such thing like array ([ ]) in lua you have to use the lua table to achieve it. In your script you are returning ids right, that itself is an array you can use it as a separate call to achieve the sadd.
String [] nameSet = (String[]) evalsha b81c2b... 1 ice -> This is in java
SADD tmp:name nameSet
And the corresponding lua script is the same as that of your 1st one.
local cursor = 0
local fields = {}
local ids = {}
local key = 'product:name'
local value = '*' .. ARGV[1] .. '*'
repeat
local result = redis.call('HSCAN', key, cursor, 'MATCH', value)
cursor = tonumber(result[1])
fields = result[2]
for i, id in ipairs(fields) do
if i % 2 == 0 then
ids[#ids + 1] = id
end
end
until cursor == 0
return ids
The problem isn't that you're writing to the database, it's that you're doing a write after a HSCAN, which is a non-deterministic command.
In my opinion there's rarely a good reason to use a SCAN command in a Lua script. The main purpose of the command is to allow you to do things in small batches so you don't lock up the server processing a huge key space (or hash key space). Since scripts are atomic, though, using HSCAN doesn't help—you're still locking up the server until the whole thing's done.
Here are the options I can see:
If you can't risk locking up the server with a lengthy command:
Use HSCAN on the client. This is the safest option, but also the slowest.
If you're want to do as much processing in a single atomic Lua command as possible:
Use Redis 3.2 and script effects replication.
Do the scanning in the script, but return the values to the client and initiate the write from there. (That is, Karthikeyan Gopall's answer.)
Instead of HSCAN, do an HKEYS in the script and filter the results using Lua's pattern matching. Since HKEYS is deterministic you won't have a problem with the subsequent write. The downside, of course, is that you have to read in all of the keys first, regardless of whether they match your pattern. (Though HSCAN is also O(N) in the size of the hash.)

Stata: perform a foreach loop to calculate kappa across a large data file

I have a data file in Stata with 50 variables
j-r-hp j-p-hp j-m-hp p-c-hp p-r-hp p-p-hp p-m-hp ... etc,
I want to perform a weighted kappa between pairs, so that the first might be
kap j-r-hp j-p-hp, wgt(w2)
and the next would be
kap j-r-hp j-m-hp, wgt(w2)
I am new to Stata. Is there a straightforward way to use a loop for this, like a foreach loop?
Your variable names are not legal names in Stata, so I've changed the hyphens to underscores in the example below. Also, I don't know what it means to 'perform a weighted kappa', so my answer uses random normal variables and the corr[elate] command. You can use the results that Stata leaves behind in r() (see return list) to gather the results for the separate analyses.
The idea is to gather the variables in a list using a local, then to loop over each element in that list (but skipping the repeated pairs using continue). If you have many variables with structured names, you could instead use ds, which leaves r(varlist) in r().Have a look at the help file for macros (help macro and help extended_fcn), especially the section on 'Macro extended functions for parsing'. Hope this helps.
clear
set obs 100
local vars j_r_hp j_p_hp j_m_hp p_c_hp p_r_hp p_p_hp p_m_hp
foreach var of local vars {
gen `var'=rnormal()
}
forval ii=1/`: word count `vars'' {
forval jj=1/`: word count `vars'' {
if `ii'<`jj' continue
corr `: word `ii' of `vars'' `: word `jj' of `vars''
}
}
You can take advantage of the user-written command tuples (run ssc install tuples):
clear
set more off
*----- example data -----
set obs 100
local vars j_r_hp j_p_hp j_m_hp p_c_hp p_r_hp p_p_hp p_m_hp
foreach var of local vars {
gen `var' = abs(round(rnormal()*100))
}
*----- what you want -----
tuples `vars', min(2) max(2)
forvalues i = 1/`ntuples' {
display _newline(3) "variables `tuple`i''"
kappa `tuple`i''
}
How you get the variables names together to feed them into tuples will depend on the dataset.
This is a variation on the helpful answer by #Matthijs, but it really won't fit well into a comment. The main extra twists are
The use of tokenize to avoid repeated use of word # of. After tokenize the separate words of the argument (here separate variable names) are held in macros 1 up. Thus tokenize a b c puts a in local macro 1, b in local macro 2 and c in local macro 3. Nested macro references are treated exactly like parenthesised expressions in elementary algebra; what is on the inside is evaluated first.
Focusing directly on part of the notional matrix of results on one side of the diagonal. The small trick is to ensure that one matrix subscript exceeds the other subscript.
Random normal input doesn't make sense for kap, but you will be using your own data any way.
clear
set obs 100
local vars j_r_hp j_p_hp j_m_hp p_c_hp p_r_hp p_p_hp p_m_hp
foreach var of local vars {
gen `var' = rnormal()
}
tokenize `vars'
local p : word count `vars'
local pm1 = `p' - 1
forval i = 1/`pm1' {
local ip1 = `i' + 1
forval j = `ip1'/`p' {
di "``i'' and ``j''"
kap ``i'' ``j''
di
}
}
I thought I might add my own answer in addition to highlight a few things.
The first thing to note is that for a new user, the most "straightforward" way to do it would likely involve hard-coding all variables into a local to use in a loop (as other answers suggest), or referencing them using a wildcard and writing more than one loop for each group. See the example below on how you might use a wildcard:
clear *
sysuse auto
/* Rename variables to match your .dta file and identify groups */
rename (price mpg rep78) (j_r_hp j_p_hp j_m_hp)
rename (headroom trunk weight) (p_c_hp p_r_hp p_m_hp)
rename (length turn displacement foreign) (z_r_hp z_m_hp z_p_hp z_c_hp)
/* Loop over all variables beginning with j and ending hp */
foreach x of varlist j*hp {
foreach i of varlist j*hp {
if "`x'" != "`i'" & "`i'" >= "`x'"{ // This section ensures you get only
// unique pairs of x & i
kap `x' `i'
}
}
}
/* Loop over all variables beginning with p and ending hp */
foreach x of varlist p*hp {
* something involving x
}
* etc.
Now, depending on how many groups you have or how many variables you have, this might not seem straightforward after all.
This brings up the second thing I would like to mention. In cases where hard-coding many variables or many repeated commands becomes cumbersome, I tend to favor a programmatic solution. This will often involve writing more code up front, but in many cases tends to be at least quasi-generalizable, and will allow you to easily evaluate hundreds of variables if you ever have the need without having to write them all out.
The code below uses the returned results from describe, along with some foreach loops and some extended macro functions to execute the kappa command over your variables without having to store them in a local manually.
clear *
sysuse auto
rename (price mpg rep78) (j_r_hp j_p_hp j_m_hp)
rename (headroom trunk weight) (p_c_hp p_r_hp p_m_hp)
rename (length turn displacement foreign) (z_r_hp z_m_hp z_p_hp z_c_hp)
/*
use gear_ratio as an arbitrary weight, order it first to easily extract
from the local containing varlist
*/
order gear_ratio, first
qui describe, varlist
local Varlist `r(varlist)' // store varlist in a local macro
preserve // preserve data so canges can be reverted back
foreach x of local Varlist {
capture confirm numeric variable `x'
if _rc {
drop `x' // Keep only numeric variables to use in kappa
}
}
qui describe, varlist // replace the local macro varlist with now numeric only variables
local Varlist `r(varlist)'
local vars : list Varlist - weight // remove weight from analysis varlist
foreach x of local vars {
foreach i of local vars {
if "`x'" != "`i'" & "`i'" >= "`x'" {
gettoken leftx : x, parse("_")
gettoken lefti : i, parse("_")
if "`leftx'" == "`lefti'" {
kap `x' `i'
}
}
}
}
restore
There of course will be a learning curve here for new users but I've found the use of macros, loops and returned results to be wonderfully effective in adding flexibility to my programs and do files - I would highly suggest anybody using Stata at least studies the basics of these three topics.

splitting space delimited entries into new columns in R

I am coding a survey that outputs a .csv file. Within this csv I have some entries that are space delimited, which represent multi-select questions (e.g. questions with more than one response). In the end I want to parse these space delimited entries into their own columns and create headers for them so i know where they came from.
For example I may start with this (note that the multiselect columns have an _M after them):
Q1, Q2_M, Q3, Q4_M
6, 1 2 88, 3, 3 5 99
6, , 3, 1 2
and I want to go to this:
Q1, Q2_M_1, Q2_M_2, Q2_M_88, Q3, Q4_M_1, Q4_M_2, Q4_M_3, Q4_M_5, Q4_M_99
6, 1, 1, 1, 3, 0, 0, 1, 1, 1
6,,,,3,1,1,0,0,0
I imagine this is a relatively common issue to deal with but I have not been able to find it in the R section. Any ideas how to do this in R after importing the .csv ? My general thoughts (which often lead to inefficient programs) are that I can:
(1) pull column numbers that have the special suffix with grep()
(2) loop through (or use an apply) each of the entries in these columns and determine the levels of responses and then create columns accordingly
(3) loop through (or use an apply) and place indicators in appropriate columns to indicate presence of selection
I appreciate any help and please let me know if this is not clear.
I agree with ran2 and aL3Xa that you probably want to change the format of your data to have a different column for each possible reponse. However, if you munging your dataset to a better format proves problematic, it is possible to do what you asked.
process_multichoice <- function(x) lapply(strsplit(x, " "), as.numeric)
q2 <- c("1 2 3 NA 4", "2 5")
processed_q2 <- process_multichoice(q2)
[[1]]
[1] 1 2 3 NA 4
[[2]]
[1] 2 5
The reason different columns for different responses are suggested is because it is still quite unpleasant trying to retrieve any statistics from the data in this form. Although you can do things like
# Number of reponses given
sapply(processed_q2, length)
#Frequency of each response
table(unlist(processed_q2), useNA = "ifany")
EDIT: One more piece of advice. Keep the code that processes your data separate from the code that analyses it. If you create any graphs, keep the code for creating them separate again. I've been down the road of mixing things together, and it isn't pretty. (Especially when you come back to the code six months later.)
I am not entirely sure what you trying to do respectively what your reasons are for coding like this. Thus my advice is more general – so just feel to clarify and I will try to give a more concrete response.
1) I say that you are coding the survey on your own, which is great because it means you have influence on your .csv file. I would NEVER use different kinds of separation in the same .csv file. Just do the naming from the very beginning, just like you suggested in the second block.
Otherwise you might geht into trouble with checkboxes for example. Let's say someone checks 3 out of 5 possible answers, the next only checks 1 (i.e. "don't know") . Now it will be much harder to create a spreadsheet (data.frame) type of results view as opposed to having an empty field (which turns out to be an NA in R) that only needs to be recoded.
2) Another important question is whether you intend to do a panel survey(i.e longitudinal study asking the same participants over and over again) . That (among many others) would be a good reason to think about saving your data to a MySQL database instead of .csv . RMySQL can connect directly to the database and access its tables and more important its VIEWS.
Views really help with survey data since you can rearrange the data in different views, conditional on many different needs.
3) Besides all the personal / opinion and experience, here's some (less biased) literature to get started:
Complex Surveys: A Guide to Analysis Using R (Wiley Series in Survey Methodology
The book is comparatively simple and leaves out panel surveys but gives a lot of R Code and examples which should be a practical start.
To prevent re-inventing the wheel you might want to check LimeSurvey, a pretty decent (not speaking of the templates :) ) tool for survey conductors. Besides I TYPO3 CMS extensions pbsurvey and ke_questionnaire (should) work well too (only tested pbsurvey).
Multiple choice items should always be coded as separate variables. That is, if you have 5 alternatives and multiple choice, you should code them as i1, i2, i3, i4, i5, i.e. each one is a binary variable (0-1). I see that you have values 3 5 99 for Q4_M variable in the first example. Does that mean that you have 99 alternatives in an item? Ouch...
First you should go on and create separate variables for each alternative in a multiple choice item. That is, do:
# note that I follow your example with Q4_M variable
dtf_ins <- as.data.frame(matrix(0, nrow = nrow(<initial dataframe>), ncol = 99))
# name vars appropriately
names(dtf_ins) <- paste("Q4_M_", 1:99, sep = "")
now you have a data.frame with 0s, so what you need to do is to get 1s in an appropriate position (this is a bit cumbersome), a function will do the job...
# first you gotta change spaces to commas and convert character variable to a numeric one
y <- paste("c(", gsub(" ", ", ", x), ")", sep = "")
z <- eval(parse(text = y))
# now you assing 1 according to indexes in z variable
dtf_ins[1, z] <- 1
And that's pretty much it... basically, you would like to reconsider creating a data.frame with _M variables, so you can write a function that does this insertion automatically. Avoid for loops!
Or, even better, create a matrix with logicals, and just do dtf[m] <- 1, where dtf is your multiple-choice data.frame, and m is matrix with logicals.
I would like to help you more on this one, but I'm recuperating after a looong night! =) Hope that I've helped a bit! =)
Thanks for all the responses. I agree with most of you that this format is kind of silly but it is what I have to work with (survey is coded and going into use next week). This is what I came up with from all the responses. I am sure this is not the most elegant or efficient way to do it but I think it should work.
colnums <- grep("_M",colnames(dat))
responses <- nrow(dat)
for (i in colnums) {
vec <- as.vector(dat[,i]) #turn into vector
b <- lapply(strsplit(vec," "),as.numeric) #split up and turn into numeric
c <- sort(unique(unlist(b))) #which values were used
newcolnames <- paste(colnames(dat[i]),"_",c,sep="") #column names
e <- matrix(nrow=responses,ncol=length(c)) #create new matrix for indicators
colnames(e) <- newcolnames
#next loop looks for responses and puts indicators in the correct places
for (i in 1:responses) {
e[i,] <- ifelse(c %in% b[[i]],1,0)
}
dat <- cbind(dat,e)
}
Suggestions for improvement are welcome.

Resources