Java Scanner Delimiters - delimiter

I am scanning a text file using Scanner and the next() method and adding each individual word to an ArrayList. How can I use delimiters to ignore punctuation? I have words such as:
cat,
dog.
"mouse
I want to remove the comma, period and quotation marks in these words respectively. How can I do this?

You can split a string on a regex using the String.split method, so you could pass a regex with all the punctuation you don't want to it, in the case you mentioned it would be
String[] words = "cat, dog. \" mouse".split("[\\\",\\.]");
Although in this case words would contain some empty values so to add it to your array you would need to do something like:
for (String word : words) {
if(!word.isEmpty()) {
arrayList.add(word);
}
}

Related

How to find a word in a single long string?

I want to be able to copy and paste a large string of words from say a text document where there are spaces, returns and not commas between each and every word. Then i want to be able to take out each word individually and put them in a table for example...
input:
please i need help
output:
{1, "please"},
{2, "i"},
{3, "need"},
{4, "help"}
(i will have the table already made with the second column set to like " ")
havent tried anything yet as nothing has come to mind and all i could think of was using gsub to turn spaces into commas and find a solution from there but again i dont think that would work out so well.
Your delimiters are spaces ( ), commas (,) and newlines (\n, sometimes \r\n or \r, the latter very rarely). You now want to find words delimited by these delimiters. A word is a sequence of one or more non-delimiter characters. This trivially translates to a Lua pattern which can be fed into gmatch. Paired with a loop & inserting the matches in a table you get the following:
local words = {}
for word in input:gmatch"[^ ,\r\n]+" do
table.insert(words, word)
end
if you know that your words are gonna be in your locale-specific character set (usually ASCII or extended ASCII), you can use Lua's %w character class for matching sequences of alphanumeric characters:
local words = {}
for word in input:gmatch"%w+" do
table.insert(words, word)
end
Note: The resulting table will be in "list" form:
{
[1] = "first",
[2] = "second",
[3] = "third",
}
(for which {"first", "second", "third"} would be shorthand)
I don't see any good reasons for the table format you have described, but it can be trivially created by inserting tables instead of strings into the list.

Split words contained in string based on uppercase

I have a string that has no spaces, what marks every single word is the uppercase letter at the beginning of each word, what would be the best way for extracting them?
here's what i've got:
str = "TheseAreAFewWordsAndThis-one-contains-wildcards"
Desired output would be:
These
Are
A
Few
Words
And
This-one-contains-wildcards
I don't need to treat any magical characters as such, they can stay in the string no problems
for wrd in str:gmatch("%u%U*") do print(wrd) end
"%u%U*" is a string pattern that matches a single capital letter followed by any number of non capital letter characters.
Please read https://www.lua.org/manual/5.4/manual.html#6.4.1

from list to string and back to list

I have read a multiline file and converted it to a list with the following code:
Lines = string:tokens(erlang:binary_to_list(Binary), "\n"),
I converted it to a string to do some work on it:
Flat = string:join(Lines, "\r\n"),
I finished working on the string and now I need to convert it back to a multiline list, I tried to repeat the first snippet shown above but that never worked, I tried string:join and that didnt work.. how do i convert it back to a list just like it used to be (although now modified)?
Well that depends on the modifications you made on the flattened string.
string:tokens/2 will always explode a string using the separator you provide. So as long as your transformation preserves a specific string as separator between the individual substrings there should be no problem.
However, if you do something more elaborate and destructive in your transformation then the only way is to iterate on the string manually and construct the individual substrings.
Your first snippet above contains a call to erlang:binary_to_list/1 which first converts a binary to a string (list) which you then split with the call to string:tokens/2 which then join together with string:join/2. The result of doing the tokens then join as you have written it seems to be to convert it from a string containing lines separated by \n into one containing lines separated by \r\n. N.B. that this is a flat list of characters.
Is this what you intended?
What you should do now depends on what you mean by "I need to convert it back to a multiline list". Do you mean everything in a single list of characters (string), or in a nested list of lines where each line is a list of characters (string). I.e. if you ended up with
"here is line 1\r\nhere is line 2\r\nhere is line 3\r\n"
this already is a multiline line list, or do you mean
["here is line 1","here is line 2","here is line 3"]
Note that each "string" is itself a list of characters. What do you intend to do with it afterwards?
You have your terms confused. A string in any language is a sequence of integer values corresponding to a human-readable characters. Whether the representation of the value is a binary or a list does not matter, both are technically strings because of the data they contain.
That being said, you converted a binary string to a list string in your first set of instructions. To convert a list into a binary, you can call erlang:list_to_binary/1, or erlang:iolist_to_binary/1 if your list is not flat. For instance:
BinString = <<"this\nis\na\nstring">>.
ListString = "this\nis\na\nstring" = binary_to_list(BinString).
Words = ["this", "is", "a", "string"] = string:tokens(ListString, "\n").
<<"thisisastring">> = iolist_to_binary(Words).
Rejoined = "this\r\nis\r\na\r\nstring" = string:join(Words, "\r\n").
BinAgain = <<"this\r\nis\r\na\r\nstring">> = list_to_binary(Rejoined).
For your reference, the string module always expects a flat list (e.g., "this is a string", but not ["this", "is", "a", "string"]), except for string:join, which takes a list of flat strings.

How to capitalise the last alphabet of each word present in the string in ruby?

How you capitalise the last alphabet of each word present in the string in ruby?
For example:
Input String: the creator never dies
Output String Must Be: thE creatoR neveR dieS
Note: Length of the string is not constant.
your_string.gsub(/\w\b/) { |s| s.capitalize }
A quick and dirty way is:
(s.reverse.split(" ").each {|w| w.capitalize!}).join(" ").reverse
where s is your string
str.split.map do |word|
word[-1] = word[-1].upcase
word
end.join(' ')
That is - split the word at whitespace; form a new array of each word with the last character uppercased; join them back together

In Lua need to separate the string based on backslash through Regex

I have a String like
file:c:\test\xyz.exe
how can I separate the above string in 3 parts through Regex in Lua?
For the example, the first part would be file:,
the second part of string should be c:\test
and the third part of string should be yz.exe.
have a look at the String manipulation part of the Lua manual : http://www.lua.org/manual/5.1/manual.html#5.4
In particular match() and gmatch(). For example :
s = "file:c:\\test\\xyz.exe"
for first, second, third in string.gmatch(s, "(%a+):(.+)\\([%a%p]+)") do
print(first)
print(second)
print(third)
end
To allow alphanumerical character in the first and third place, replace %a with %w. All others possible pattern are referenced at the end of the linked manual chapter.
You must double each '\' in your input string, otherwise pattern matching won't work. Backslash is an escaping character in Lua, so if you want to have one in your string, you must escape it : "\\"
The given code will work for "file:c:\test\xyz.exe" and "file:C:\test\test3\a\abc.exe"

Resources