I have a question about foreach in tcl:
foreach id "6 8" {
#do something here;
}
is this "6 8" a list? and what does "6 8" mean?
The main thing to remember is that Tcl doesn't have types, per se, at least not in a way that the user should need to worry about them. Rather, each value is a string and each command tries to treat it as the type of value it needs.
For example:
set value "1"
expr {$value + 1} ; # treat $value as a number
lindex $value 0 ; # treat $value as a list
For your code, the value 6 8 is being interpreted as a list by the foreach command, with the value 6 and 8. The double quotes around the value just group the content inside them as a single value. They (the dqs) don't signify any specific type (ie, string, list, number).
foreach id "6 8" {
# do something
}
In this context, "6 8" is a list of two elements, 6 and 8. The loop for assign id first to 6, enter the loop's body. The next time around, id will be 8 and enter the loop's body. When it runs out of items in the list, the loop exits.
Related
Essentially, what I need is to read a certain part of a string.
Example:
I have a string that contains "12 31".
However, I need to put these numbers into separate variables. Just sorts 12 into lets say variable A, and 31 in variable B.
How should I go about this?
You can use Lua Patterns:
> ExampleString = "12 31"
> ExampleString:match("(%d+)%s+(%d+)")
12 31
> SubString1, SubString2= ExampleString:match("(%d+)%s+(%d+)")
> Number1 = tonumber(SubString1)
> Number2 = tonumber(SubString2)
The Pattern expression seems complex but is actually quite simple. The things between ( and ) are named captures and will be returned if they are found. Here, we want 2 results so we have 2 couples ( and ). %d+ means that we want to find a string which contains at least 1 digit (+).
The 2 numbers are separated by some spaces %s+, at least 1 (+).
In summary, we want to extract (Number1)space(Number2)
The function string.match is used to match against the given pattern and returns the found strings. The last step is to use the function tonumber to convert the found sub-strings into Lua numbers.
I want to extract the VALUE of lines containing key="VALUE", and I am trying to use a simple Lua pattern to solve this.
It works for lines except for those which contains a literal 1 in the VALUE. It seems the pattern parser is confusing my capture group for an escape sequence.
> return string.find('... key = "PHONE2" ...', 'key%s*=%s*(["\'])([^%1]-)%1')
5 18 " PHONE2
> return string.find('... key = "PHONE1" ...', 'key%s*=%s*(["\'])([^%1]-)%1')
nil
>
You do not need to use the [^%1] at all. Just use .- as it, by definition, matches the smallest possible string.
Also, you can use multiline string syntax, to not have to escape the quotes in your pattern:
> s=[[... key = "PHONE1" ...]]
> return s:find [[key%s*=%s*(["'])(.-)%1]]
5 18 " PHONE1
The pattern [^%1] actually means, do not search for characters % and 1 individually.
I am trying to make a program that will read in a number and then output every digit of that number in a list. However, most of the things look fine until I try with number 8 and 9. The program only output \b \t instead.
if the input number contains 8 or 9, and in the same time there are other numbers, for example 283, it will print normally. Otherwise if there is only 8 or 9, such as8, 99, then it will give me that binary representation of 8 and 9 (if I remember correctly).
My program is as below:
digitize(0)-> 0;
digitize(N) when N < 10 -> [N];
digitize(N) when N >= 10 -> digitize(N div 10)++[N rem 10].
The function returns the expected list, but the shell shows lists of numbers which are ASCII-codes of characters as strings (because that's just what strings are in Erlang; there's no special string type). You can see it by just entering [8, 8] (e.g.) at the prompt and disable this behavior by calling shell:strings(false) (and shell:strings(true) when you need the normal behavior again).
Strings in Erlang are no separate type but a list of numbers. List printing has a heuristic to detect when it might be a string. If it thinks it's a string it will be printed as such. \b is the backspace character and \t is the tab character which are ASCII codes 8 and 9
See also:
Description what a string means
Erlang escape sequences
Explanation of this in LYSE
Is there anything better than string.scan(/(\w|-)+/).size (the - is so, e.g., "one-way street" counts as 2 words instead of 3)?
string.split.size
Edited to explain multiple spaces
From the Ruby String Documentation page
split(pattern=$;, [limit]) → anArray
Divides str into substrings based on a delimiter, returning an array
of these substrings.
If pattern is a String, then its contents are used as the delimiter
when splitting str. If pattern is a single space, str is split on
whitespace, with leading whitespace and runs of contiguous whitespace
characters ignored.
If pattern is a Regexp, str is divided where the pattern matches.
Whenever the pattern matches a zero-length string, str is split into
individual characters. If pattern contains groups, the respective
matches will be returned in the array as well.
If pattern is omitted, the value of $; is used. If $; is nil (which is
the default), str is split on whitespace as if ' ' were specified.
If the limit parameter is omitted, trailing null fields are
suppressed. If limit is a positive number, at most that number of
fields will be returned (if limit is 1, the entire string is returned
as the only entry in an array). If negative, there is no limit to the
number of fields returned, and trailing null fields are not
suppressed.
" now's the time".split #=> ["now's", "the", "time"]
While that is the current version of ruby as of this edit, I learned on 1.7 (IIRC), where that also worked. I just tested it on 1.8.3.
I know this is an old question, but this might be useful to someone else looking for something more sophisticated than string.split. I wrote the words_counted gem to solve this particular problem, since defining words is pretty tricky.
The gem lets you define your own custom criteria, or use the out of the box regexp, which is pretty handy for most use cases. You can pre-filter words with a variety of options, including a string, lambda, array, or another regexp.
counter = WordsCounted::Counter.new("Hello, Renée! 123")
counter.word_count #=> 2
counter.words #=> ["Hello", "Renée"]
# filter the word "hello"
counter = WordsCounted::Counter.new("Hello, Renée!", reject: "Hello")
counter.word_count #=> 1
counter.words #=> ["Renée"]
# Count numbers only
counter = WordsCounted::Counter.new("Hello, Renée! 123", rexexp: /[0-9]/)
counter.word_count #=> 1
counter.words #=> ["123"]
The gem provides a bunch more useful methods.
If the 'word' in this case can be described as an alphanumeric sequence which can include '-' then the following solution may be appropriate (assuming that everything that doesn't match the 'word' pattern is a separator):
>> 'one-way street'.split(/[^-a-zA-Z]/).size
=> 2
>> 'one-way street'.split(/[^-a-zA-Z]/).each { |m| puts m }
one-way
street
=> ["one-way", "street"]
However, there are some other symbols that can be included in the regex - for example, ' to support the words like "it's".
This is pretty simplistic but does the job if you are typing words with spaces in between. It ends up counting numbers as well but I'm sure you could edit the code to not count numbers.
puts "enter a sentence to find its word length: "
word = gets
word = word.chomp
splits = word.split(" ")
target = splits.length.to_s
puts "your sentence is " + target + " words long"
The best way to do is to use split method.
split divides a string into sub-strings based on a delimiter, returning an array of the sub-strings.
split takes two parameters, namely; pattern and limit.
pattern is the delimiter over which the string is to be split into an array.
limit specifies the number of elements in the resulting array.
For more details, refer to Ruby Documentation: Ruby String documentation
str = "This is a string"
str.split(' ').size
#output: 4
The above code splits the string wherever it finds a space and hence it give the number of words in the string which is indirectly the size of the array.
The above solution is wrong, consider the following:
"one-way street"
You will get
["one-way","", "street"]
Use
'one-way street'.gsub(/[^-a-zA-Z]/, ' ').split.size
This splits words only on ASCII whitespace chars:
p " some word\nother\tword|word".strip.split(/\s+/).size #=> 4
I've been given a large file with a funny CSV format to parse into a database.
The separator character is a semicolon (;). If one of the fields contains a semicolon it is "escaped" by wrapping it in doublequotes, like this ";".
I have been assured that there will never be two adjacent fields with trailing/ leading doublequotes, so this format should technically be ok.
Now, for parsing it in VBScript I was thinking of
Replacing each instance of ";" with a GUID,
Splitting the line into an array by semicolon,
Running back through the array, replacing the GUIDs with ";"
It seems to be the quickest way. Is there a better way? I guess I could use substrings but this method seems to be acceptable...
Your method sounds fine with the caveat that there's absolutely no possibility that your GUID will occur in the text itself.
On approach I've used for this type of data before is to just split on the semi-colons regardless then, if two adjacent fields end and start with a quote, combine them.
For example:
Pax;is;a;good;guy";" so;says;his;wife.
becomes:
0 Pax
1 is
2 a
3 good
4 guy"
5 " so
6 says
7 his
8 wife.
Then, when you discover that fields 4 and 5 end and start (respectively) with a quote, you combine them by replacing the field 4 closing quote with a semicolon and removing the field 5 opening quote (and joining them of course).
0 Pax
1 is
2 a
3 good
4 guy; so
5 says
6 his
7 wife.
In pseudo-code, given:
input: A string, first character is input[0]; last
character is input[length]. Further, assume one dummy
character, input[length+1]. It can be anything except
; and ". This string is one line of the "CSV" file.
length: positive integer, number of characters in input
Do this:
set start = 0
if input[0] = ';':
you have a blank field in the beginning; do whatever with it
set start = 2
endif
for each c between 1 and length:
next iteration unless string[c] = ';'
if input[c-1] ≠ '"' or input[c+1] ≠ '"': // test for escape sequence ";"
found field consting of half-open range [start,c); do whatever
with it. Note that in the case of empty fields, start≥c, leaving
an empty range
set start = c+1
endif
end foreach
Untested, of course. Debugging code like this is always fun….
The special case of input[0] is to make sure we don't ever look at input[-1]. If you can make input[-1] safe, then you can get rid of that special case. You can also put a dummy character in input[0] and then start your data—and your parsing—from input[1].
One option would be to find instances of the regex:
[^"];[^"]
and then break the string apart with substring:
List<string> ret = new List<string>();
Regex r = new Regex(#"[^""];[^""]");
Match m;
while((m = r.Match(line)).Success)
{
ret.Add(line.Substring(0,m.Index + 1);
line = line.Substring(m.Index + 2);
}
(Sorry about the C#, I don't known VBScript)
Using quotes is normal for .csv files. If you have quotes in the field then you may see opening and closing and the embedded quote all strung together two or three in a row.
If you're using SQL Server you could try using T-SQL to handle everything for you.
SELECT * INTO MyTable FROM OPENDATASOURCE('Microsoft.JET.OLEDB.4.0',
'Data Source=F:\MyDirectory;Extended Properties="text;HDR=No"')...
[MyCsvFile#csv]
That will create and populate "MyTable". Read more on this subject here on SO.
I would recommend using RegEx to break up the strings.
Find every ';' that is not a part of
";" and change it to something else
that does not appear in your fields.
Then go through and replace ";" with ;
Now you have your fields with the correct data.
Most importers can swap out separator characters pretty easily.
This is basically your GUID idea. Just make sure the GUID is unique to your file before you start and you will be fine. I tend to start using 'Z'. After enough 'Z's, you will be unique (sometimes as few as 1-3 will do).
Jacob