Looking to split a data string (paramaters) into an array but have some specific criteria for this to occur on. The string that I have looks like the following:
cyr_ad_id\tproject_number\tname\tremote_reference\tsample_size\tad_length\tsample_description\tmedia_type\tnotes\r\n13342a\t13342\tMore Dad_BC\t2897855894001\t150\t:30\t50% Customers\r\n50% Non-customers\tFilm\tBroadcast\r\n13342c\t13342\tDRTV - Hogs\t2897815438001\t150\t:60\t100% Non-customers\tFilm\tBroadcast\r\n13342d\t13342\tMake Way For More\t2897815439001\t150\t:30\t50% Customers\r\n50% Non-customers\tFilm\tBroadcast\r\n
and I would like to have the following result:
["cyr_ad_id\tproject_number\tname\tremote_reference\tsample_size\tad_length\tsample_description\tmedia_type\tnotes", "13342a\t13342\tMore Dad_BC\t2897855894001\t150\t:30\t50% Customers\r\n50% Non-customers\tFilm\tBroadcast", "13342c\t13342\tDRTV - Hogs\t2897815438001\t150\t:60\t100% Non-customers\tFilm\tBroadcast", "13342d\t13342\tMake Way For More\t2897815439001\t150\t:30\t50% Customers\r\n50% Non-customers\tFilm\tBroadcast"]
What I feel like I need to do is something similar to splitting the string by "\r\n" after the 8th occurrence of "\t". The 8th occurrence is something that I'd like to have passed into a split statement by a variable.
There are potential "\r\n" occurrences that I do not want to split this string by, hence why the nth or 8th occurrence of "\t" is critical in this example.
Thanks!
Does this fit your need?
def ssplit str, n
r = Regexp.new "([^\t]*?\t){#{n}}(.*?\r\n)"
matched = []
while s = str.slice!(r) do
matched << s.strip
end
matched
end
your_string.gsub("\r\n", "\t").gsub("\n", '').split("\t").each_slice(8).to_a
Replace all \r\n with \t, then kill any left over \n. Split on the \ts. Now you have an array of your items.
You could also probably use regex, which I can't seem to get to work. .split(/\\r|\\t|\\n/)
.each_slice(8).to_a takes 8 items at a time and builds them back up, so you have an array for each of the groups.
you could also do .in_groups_of(8) instead of .each_slice. no need for .to_a then, either.
[["cyr_ad_id", "project_number", "name", "remote_reference", "sample_size", "ad_length", "sample_description", "media_type"], ["notes", "13342a", "13342", "More Dad_BC", "2897855894001", "150", ":30", "50% Customers"], ["50% Non-customers", "Film", "Broadcast", "13342c", "13342", "DRTV - Hogs", "2897815438001", "150"], [":60", "100% Non-customers", "Film", "Broadcast", "13342d", "13342", "Make Way For More", "2897815439001"], ["150", ":30", "50% Customers", "50% Non-customers", "Film", "Broadcast"]]
That seems more usable?
Related
I want to be able to copy and paste a large string of words from say a text document where there are spaces, returns and not commas between each and every word. Then i want to be able to take out each word individually and put them in a table for example...
input:
please i need help
output:
{1, "please"},
{2, "i"},
{3, "need"},
{4, "help"}
(i will have the table already made with the second column set to like " ")
havent tried anything yet as nothing has come to mind and all i could think of was using gsub to turn spaces into commas and find a solution from there but again i dont think that would work out so well.
Your delimiters are spaces ( ), commas (,) and newlines (\n, sometimes \r\n or \r, the latter very rarely). You now want to find words delimited by these delimiters. A word is a sequence of one or more non-delimiter characters. This trivially translates to a Lua pattern which can be fed into gmatch. Paired with a loop & inserting the matches in a table you get the following:
local words = {}
for word in input:gmatch"[^ ,\r\n]+" do
table.insert(words, word)
end
if you know that your words are gonna be in your locale-specific character set (usually ASCII or extended ASCII), you can use Lua's %w character class for matching sequences of alphanumeric characters:
local words = {}
for word in input:gmatch"%w+" do
table.insert(words, word)
end
Note: The resulting table will be in "list" form:
{
[1] = "first",
[2] = "second",
[3] = "third",
}
(for which {"first", "second", "third"} would be shorthand)
I don't see any good reasons for the table format you have described, but it can be trivially created by inserting tables instead of strings into the list.
Often I'm facing lines like
result = 'Some text'
result += some_text_variable if some_text_variable.present?
And every time I want to replace that with something more accurate but I don't know how
Any ideas plz?
result += some_text_variable.to_s
It will work if some_text_variable is nil or empty string for example
But it always will concat empty string to original string
You can also use
result += some_text_variable.presence.to_s
It will work for all presence cases (for example for " " string)
You could "compact" and join an array, e.g.
['Some text', some_text_variable].select(&:present?).join
I realise this is a longhand form, just offering as an alternative to mutating strings.
This can look a bit nicer, if you have a large number of variables to munge together, or you want to join them in some other way e.g.
[
var_1,
var_2,
var_3,
var_4
].select(&:present?).join("\n")
Again, nothing gets mutated - which may or may not suit your coding style.
I have user controller input like so (the length and # of items may change):
str = "['honda', 'toyota', 'lexus']"
I would like to convert this into an array, but I'm struggling to find the best way to do so. eval() does exactly what I need, but it is not very elegant and is dangerous in this case, since it's user controller input.
Another way is:
str[1..-2].split(',').collect { |car| car.strip.tr("'", '') }
=> ["honda", "toyota", "lexus"]
But this is also not very elegant. Any suggestions that are more 'Rubyish'?
You could use a regular expression:
# match (in a non-greedy way) characters up to a comma or `]`
# capture each word as a group, and don't capture `,` or `]`
str.scan(/'(.+?)'(?:,|\])/).flatten
Or JSON.parse (but accounting for the fact that single quotes are in fact technically not allowed in JSON):
JSON.parse( str.tr("'", '"') )
JSON.parse probably has a small edge over the regexp in terms of performance, but if you're expecting your users to do single quote escaping, then that tr is going to mess things up. In this case, I'd stick with the regexp.
The JSON.parse looks more correct, but here is another alternative:
str.split(/[[:punct:] ]+/).drop(1)
I only found this related to what I am looking for: Split string by count of characters but it is not useful for what I mean.
I have a string variable, which is an ammount of 3 numbers (can be from 000 to 999). I need to separate each of the numbers (characters) and get them into a table.
I am programming for a game mod which uses lua, and it has some extra functions. If you could help me to make it using: http://wiki.multitheftauto.com/wiki/Split would be amazing, but any other way is ok too.
Thanks in advance
Corrected to what the OP wanted to ask:
To just split a 3-digit number in 3 numbers, that's even easier:
s='429'
c1,c2,c3=s:match('(%d)(%d)(%d)')
t={tonumber(c1),tonumber(c2),tonumber(c3)}
The answer to "How do I split a long string composed of 3 digit numbers":
This is trivial. You might take a look at the gmatch function in the reference manual:
s="123456789"
res={}
for num in s:gmatch('%d%d%d') do
res[#res+1]=tonumber(num)
end
or if you don't like looping:
res={}
s:gsub('%d%d%d',function(n)res[#res+1]=tonumber(n)end)
I was looking for something like this, but avoiding looping - and hopefully having it as one-liner. Eventually, I found this example from lua-users wiki: Split Join:
fields = {str:match((str:gsub("[^"..sep.."]*"..sep, "([^"..sep.."]*)"..sep)))}
... which is exactly the kind of syntax I'd like - one liner, returns a table - except, I don't really understand what is going on :/ Still, after some poking about, I managed to find the right syntax to split into characters with this idiom, which apparently is:
fields = { str:match( (str:gsub(".", "(.)")) ) }
I guess, what happens is that gsub basically puts parenthesis '(.)' around each character '.' - so that match would consider those as a separate match unit, and "extract" them as separate units as well... But I still don't get why is there extra pair of parenthesis around the str:gsub(".", "(.)") piece.
I tested this with Lua5.1:
str = "a - b - c"
fields = { str:match( (str:gsub(".", "(.)")) ) }
print(table_print(fields))
... where table_print is from lua-users wiki: Table Serialization; and this code prints:
"a"
" "
"-"
" "
"b"
" "
"-"
" "
"c"
Is there anything better than string.scan(/(\w|-)+/).size (the - is so, e.g., "one-way street" counts as 2 words instead of 3)?
string.split.size
Edited to explain multiple spaces
From the Ruby String Documentation page
split(pattern=$;, [limit]) → anArray
Divides str into substrings based on a delimiter, returning an array
of these substrings.
If pattern is a String, then its contents are used as the delimiter
when splitting str. If pattern is a single space, str is split on
whitespace, with leading whitespace and runs of contiguous whitespace
characters ignored.
If pattern is a Regexp, str is divided where the pattern matches.
Whenever the pattern matches a zero-length string, str is split into
individual characters. If pattern contains groups, the respective
matches will be returned in the array as well.
If pattern is omitted, the value of $; is used. If $; is nil (which is
the default), str is split on whitespace as if ' ' were specified.
If the limit parameter is omitted, trailing null fields are
suppressed. If limit is a positive number, at most that number of
fields will be returned (if limit is 1, the entire string is returned
as the only entry in an array). If negative, there is no limit to the
number of fields returned, and trailing null fields are not
suppressed.
" now's the time".split #=> ["now's", "the", "time"]
While that is the current version of ruby as of this edit, I learned on 1.7 (IIRC), where that also worked. I just tested it on 1.8.3.
I know this is an old question, but this might be useful to someone else looking for something more sophisticated than string.split. I wrote the words_counted gem to solve this particular problem, since defining words is pretty tricky.
The gem lets you define your own custom criteria, or use the out of the box regexp, which is pretty handy for most use cases. You can pre-filter words with a variety of options, including a string, lambda, array, or another regexp.
counter = WordsCounted::Counter.new("Hello, Renée! 123")
counter.word_count #=> 2
counter.words #=> ["Hello", "Renée"]
# filter the word "hello"
counter = WordsCounted::Counter.new("Hello, Renée!", reject: "Hello")
counter.word_count #=> 1
counter.words #=> ["Renée"]
# Count numbers only
counter = WordsCounted::Counter.new("Hello, Renée! 123", rexexp: /[0-9]/)
counter.word_count #=> 1
counter.words #=> ["123"]
The gem provides a bunch more useful methods.
If the 'word' in this case can be described as an alphanumeric sequence which can include '-' then the following solution may be appropriate (assuming that everything that doesn't match the 'word' pattern is a separator):
>> 'one-way street'.split(/[^-a-zA-Z]/).size
=> 2
>> 'one-way street'.split(/[^-a-zA-Z]/).each { |m| puts m }
one-way
street
=> ["one-way", "street"]
However, there are some other symbols that can be included in the regex - for example, ' to support the words like "it's".
This is pretty simplistic but does the job if you are typing words with spaces in between. It ends up counting numbers as well but I'm sure you could edit the code to not count numbers.
puts "enter a sentence to find its word length: "
word = gets
word = word.chomp
splits = word.split(" ")
target = splits.length.to_s
puts "your sentence is " + target + " words long"
The best way to do is to use split method.
split divides a string into sub-strings based on a delimiter, returning an array of the sub-strings.
split takes two parameters, namely; pattern and limit.
pattern is the delimiter over which the string is to be split into an array.
limit specifies the number of elements in the resulting array.
For more details, refer to Ruby Documentation: Ruby String documentation
str = "This is a string"
str.split(' ').size
#output: 4
The above code splits the string wherever it finds a space and hence it give the number of words in the string which is indirectly the size of the array.
The above solution is wrong, consider the following:
"one-way street"
You will get
["one-way","", "street"]
Use
'one-way street'.gsub(/[^-a-zA-Z]/, ' ').split.size
This splits words only on ASCII whitespace chars:
p " some word\nother\tword|word".strip.split(/\s+/).size #=> 4