How can I delete special characters? - ruby-on-rails

I'm practicing with Ruby and regex to delete certain unwanted characters. For example:
input = input.gsub(/<\/?[^>]*>/, '')
and for special characters, example ☻ or ™:
input = input.gsub('&#', '')
This leaves only numbers, ok. But this only works if the user enters a special character as a code, like this:
™
My question:
How I can delete special characters if the user enters a special character without code, like this:
™ ☻

First of all, I think it might be easier to define what constitutes "correct input" and remove everything else. For example:
input = input.gsub(/[^0-9A-Za-z]/, '')
If that's not what you want (you want to support non-latin alphabets, etc.), then I think you should make a list of the glyphs you want to remove (like ™ or ☻), and remove them one-by-one, since it's hard to distinguish between a Chinese, Arabic, etc. character and a pictograph programmatically.
Finally, you might want to normalize your input by converting to or from HTML escape sequences.

If you just wanted ASCII characters, then you can use:
original = "aøbauhrhræoeuacå"
cleaned = ""
original.each_byte { |x| cleaned << x unless x > 127 }
cleaned # => "abauhrhroeuac"

You can use parameterize:
'#!#$%^&*()111'.parameterize
=> "111"

You can match all the characters you want, and then join them together, like this:
original = "aøbæcå"
stripped = original.scan(/[a-zA-Z]/).to_s
puts stripped
which outputs "abc"

An easier way to do this inspirated by Can Berk Güder answer is:
In order to delete special characters:
input = input.gsub(/\W/, '')
In order to keep word characters:
input = input.scan(/\w/)
At the end input is the same! Try it on : http://rubular.com/

Related

How to remove from string before __

I am building a Rails 5.2 app.
In this app I got outputs from different suppliers (I am building a webshop).
The name of the shipping provider is in this format:
dhl_freight__233433
It could also be in this format:
postal__US-320202
How can I remove all that is before (and including) the __ so all that remains are the things after the ___ like for example 233433.
Perhaps some sort of RegEx.
A very simple approach would be to use String#split and then pick the second part that is the last part in this example:
"dhl_freight__233433".split('__').last
#=> "233433"
"postal__US-320202".split('__').last
#=> "US-320202"
You can use a very simple Regexp and a ask the resulting MatchData for the post_match part:
p "dhl_freight__233433".match(/__/).post_match
# another (magic) way to acces the post_match part:
p $'
Postscript: Learnt something from this question myself: you don't even have to use a RegExp for this to work. Just "asddfg__qwer".match("__").post_match does the trick (it does the conversion to regexp for you)
r = /[^_]+\z/
"dhl_freight__233433"[r] #=> "233433"
"postal__US-320202"[r] #=> "US-320202"
The regular expression matches one or more characters other than an underscore, followed by the end of the string (\z). The ^ at the beginning of the character class reads, "other than any of the characters that follow".
See String#[].
This assumes that the last underscore is preceded by an underscore. If the last underscore is not preceded by an underscore, in which case there should be no match, add a positive lookbehind:
r = /(?<=__[^_]+\z/
This requires the match to be preceded by two underscores.
There are many ruby ways to extract numbers from string. I hope you're trying to fetch numbers out of a string. Here are some of the ways to do so.
Ref- http://www.ruby-forum.com/topic/125709
line.delete("^0-9")
line.scan(/\d/).join('')
line.tr("^0-9", '')
In the above delete is the fastest to trim numbers out of strings.
All of above extracts numbers from string and joins them. If a string is like this "String-with-67829___numbers-09764" outut would be like this "6782909764"
In case if you want the numbers split like this ["67829", "09764"]
line.split(/[^\d]/).reject { |c| c.empty? }
Hope these answers help you! Happy coding :-)

Rails strip all except numbers commas and decimal points

Hi I've been struggling with this for the last hour and am no closer. How exactly do I strip everything except numbers, commas and decimal points from a rails string? The closest I have so far is:-
rate = rate.gsub!(/[^0-9]/i, '')
This strips everything but the numbers. When I try add commas to the expression, everything is getting stripped. I got the aboves from somewhere else and as far as I can gather:
^ = not
Everything to the left of the comma gets replaced by what's in the '' on the right
No idea what the /i does
I'm very new to gsub. Does anyone know of a good tutorial on building expressions?
Thanks
Try:
rate = rate.gsub(/[^0-9,\.]/, '')
Basically, you know the ^ means not when inside the character class brackets [] which you are using, and then you can just add the comma to the list. The decimal needs to be escaped with a backslash because in regular expressions they are a special character that means "match anything".
Also, be aware of whether you are using gsub or gsub!
gsub! has the bang, so it edits the instance of the string you're passing in, rather than returning another one.
So if using gsub! it would be:
rate.gsub!(/[^0-9,\.]/, '')
And rate would be altered.
If you do not want to alter the original variable, then you can use the version without the bang (and assign it to a different var):
cleaned_rate = rate.gsub!(/[^0-9,\.]/, '')
I'd just google for tutorials. I haven't used one. Regexes are a LOT of time and trial and error (and table-flipping).
This is a cool tool to use with a mini cheat-sheet on it for ruby that allows you to quickly edit and test your expression:
http://rubular.com/
You can just add the comma and period in the square-bracketed expression:
rate.gsub(/[^0-9,.]/, '')
You don't need the i for case-insensitivity for numbers and symbols.
There's lots of info on regular expressions, regex, etc. Maybe search for those instead of gsub.
You can use this:
rate = rate.gsub!(/[^0-9\.\,]/g,'')
Also check this out to learn more about regular expressions:
http://www.regexr.com/

Standardize a String for Filename, remove accents and special chars

I'm trying to find a way to normalize a string to pass it as a filename.
I have this so far:
my_string.mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/n, '').downcase.gsub(/[^a-z]/, '_')
But first problem: the - character. I guess there is more problems with this method.
I don't control the name, the name string can have accents, white spaces and special chars. I want to remove all of them, replace the accents with the corresponding letter ('é' => 'e') and replace the rest with the '_' character.
The names are like:
"Prélèvements - Routine"
"Carnet de santé"
...
I want them to be like a filename with no space/special chars:
"prelevements_routine"
"carnet_de_sante"
...
Thanks for the help :)
Take a look at ActiveSupport::Inflector.transliterate, it's very useful handling this kind of chars problems. Read there: ActiveSupport::Inflector
Then, you could do something like:
ActiveSupport::Inflector.transliterate my_string.downcase.gsub(/\s/,"_")
Use ActiveStorage::Filename#sanitized, if spaces are okay.
If spaces are okay, which I would suggest keeping, if this is a User-provided and/or User-downloadable file, then you can make use of the ActiveStorage::Filename#sanitized method that is meant for exactly this situation.
It removes special characters that are not allowed in a file name, whilst keeping all of the nice characters that Users typically use to nicely organize and describe their files, like spaces and ampersands (&).
ActiveStorage::Filename.new( "Prélèvements - Routine" ).sanitized
#=> "Prélèvements - Routine"
ActiveStorage::Filename.new( "Carnet de santé" ).sanitized
#=> "Carnet de santé"
ActiveStorage::Filename.new( "Foo:Bar / Baz.jpg" ).sanitized
#=> "Foo-Bar - Baz.jpg"
Use String#parameterize, if you want to remove nearly everything.
And if you're really looking to remove everything, try String#parameterize:
"Prélèvements - Routine".parameterize
#=> "prelevements-routine"
"Carnet de santé".parameterize
#=> "carnet-de-sante"
"Foo:Bar / Baz.jpg".parameterize
#=> "foo-bar-baz-jpg"

a simple regexp validator

How do a I create a validator, which has these simple rules. An expression is valid if
it must start with a letter
it must end with a letter
it can contain a dash (minus sign), but not at start or end of the expression
^[a-zA-Z]+-?[a-zA-Z]+$
E.g.
def validate(whatever)
reg = /^[a-zA-Z]+-?[a-zA-Z]+$/
return (reg.match(whatever)) ? true : false;
end
/^[A-Za-z]+(-?[A-Za-z]+)?$/
this seems like what you want.
^ = match the start position
^[A-Za-z]+ = start position is followed by any at least one or more letters.
-? = is there zero or one hyphens (use "*" if there can be multiple hyphens in a row).
[A-Za-z]+ = hyphen is followed by one or more letters
(-?[A-Za-z]+)? = for the case that there is a single letter.
$= match the end position in the string.
xmammoth pretty much got it, with one minor problem. My solution is:
^[a-zA-Z]+\-?[a-zA-Z]+$
Note that the original question states, it can contain a dash. The question mark is needed after the dash to make sure that it is optional in the regex.
^[A-Za-z].*[A-Za-z]$
In other words: letter, anything, letter.
Might also want:
^[A-Za-z](.*[A-Za-z])?$
so that a single letter is also matched.
What I meant, to be able to create tags. For example: "Wild-things" or "something-wild" or "into-the-wild" or "in-wilderness" "my-wild-world" etc...
This regular expression matches sequences, that consist of one or more words of letters, that are concatenated by dashes.
^[a-zA-Z]+(?:-[a-zA-Z]+)*$
Well,
[A-Za-z].*[A-Za-z]
According to your rules that will work. It will match anything that:
starts with a letter
ends with a letter
can contain a dash (among everything else) in between.

removing whitespaces in ActionScript 2 variables

let's say that I have an XML file containing this :
<description><![CDATA[
<h2>lorem ipsum</h2>
<p>some text</p>
]]></description>
that I want to get and parse in ActionScript 2 as HTML text, and setting some CSS before displaying it. Problem is, Flash takes those whitespaces (line feed and tab) and display it as it is.
<some whitespace here>
lorem ipsum
some text
where the output I want is
lorem ipsum
some text
I know that I could remove the whitespaces directly from the XML file (the Flash developer at my workplace also suggests this. I guess that he doesn't have any idea on how to do this [sigh]). But by doing this, it would be difficult to read the section in the XML file, especially when lots of tags are involved and that makes editing more difficult.
So now, I'm looking for a way to strip those whitespaces in ActionScript. I've tried to use PHP's str_replace equivalent (got it from here). But what should I use as a needle (string to search) ? (I've tried to put in "\t" and "\r", don't seem to be able to detect those whitespaces).
edit :
now that I've tried to throw in newline as a needle, it works (meaning that newline successfully got stripped).
mystring = str_replace(newline, '', mystring);
But, newlines only got stripped once, meaning that in every consecutive newlines, (eg. a newline followed by another newline) only one newline can be stripped away.
Now, I don't see that this as a problem in the str_replace function, since every consecutive character other than newline get stripped away just fine.
Pretty much confused about how stuff like this is handled in ActionScript. :-s
edit 2:
I've tried str_replace -ing everything I know of, \n, \r, \t, newline, and tab (by pressing tab key). Replacing \n, \r, and \t seem to have no effect whatsoever.
I know that by successfully doing this, my content can never have real line breaks. That's exactly my intention. I could format the XML the way I want without Flash displaying any of the formatting stuff. :)
Several ways to approach this. Perhaps the simplest answer is, in one sense your Flash developer is probably right, and you should move your whitespace outside of the CDATA container. The reason being, many people (me at least) tend to assume that everything inside a CDATA is "real data", as opposed to markup. On the other hand, whitespace outside a CDATA is normally assumed to be irrelevant, so data like this:
<description>
<![CDATA[<h2>lorem ipsum</h2>
<p>some text</p>]]>
</description>
would be easier to understand and to work with. (The flash developer can use the XML.ignoreWhite property to ignore the whitespace outside the CDATA.)
With that said, if you're editing the XML by hand, then I can see why it would be easier to use the formatting you describe. However, if the extra whitespace is inside the CDATA, then it will inevitable be included in the String data you extract, so your only option is to grab the content of the CDATA and remove the whitespace afterwards.
Then your question reduces to "how do I strip leading/trailing whitespace from a String in AS2?". And unfortunately, since AS2 doesn't support RegEx there's no simple way to do this. I think your best option would be to parse through from the beginning and end to find the first/last non-white character. Something along these lines (untested pseudocode):
myString = stuffFromXML;
whitespace = " " + "\t" + "\n" + "\r" + newline;
start = 0;
end = myString.length;
while ( testString( myString.substr(start,1), whitespace ) ) { start++; }
while ( testString( myString.substr(end-1,1), whitespace ) ) { end--; }
trimmedString = myString.substring( start, end );
function testString( needle, haystack ) {
return ( haystack.indexOf( needle ) > -1 );
}
Hope that helps!
Edit: I notice that in your example you'd also need to remove tabs and whitespace within your text data. This would be tricky, unless you can guarantee that your data will never include "real" tabs in addition to the ones for formatting. No matter what you do with the CDATA tags, it would probably be wiser not to insert extraneous formatting inside your real content and then remove it programmatically afterward. That's just making your own life difficult.
Second edit: As for what character to remove to get rid of newlines, it depends partially on what characters are actually in the XML to begin with (which probably depends on what OS is running where the file is generated), and partially on what character the client machine (that's showing the flash) considers a newline. Lots of gory details here. In practice though, if you remove \r, \n, and \r\n, that usually does the trick. That's why I added both \r and \n to the "whitespace" string in my example code.
its been a while since I've tinkered with AS2.
someXML = new XML();
someXML.ignoreWhite = true;
if you wanted to str_replace try '\n'
Is there a reason that you are using cdata? Admittedly I have no idea what the best practice for this sort of this is, but I tend to leave them out and just have the HTML sit there inside the node.
var foo = node.childnodes.join("") parses it out just fine and I never seem to come across these whitespace problems.
I'm reading this over and over again, and if I'm interpreting you right, all you want to know how to do is strip certain characters (tabs and newlines) from a string in AS2, right? I cannot believe no one has given you the simple one line answer yet:
myString = myString.split("\n").join("");
That's it. Repeat that for \r, \n, and \t and all newlines and tabs will be gone. If you want it as an easy function, then do this:
function stripWhiteSpace(str: String) : String
{
return str.split("\r").join("").split("\n").join("").split("\t").join("");
}
That function won't modify your old string, it will return a new one without \r, \n, or \t. To actually modify the old string use that function like this:
myString = stripWhiteSpace(myString);

Resources