Clean up & style characters from text - ruby-on-rails

I am getting text from a feed that has alot of characters like:
Insignia™ 2.0 Stereo Computer Speaker System (2-Piece) - Black
4th-Generation Apple® iPod® touch
Is there an easy way to get rid of these, or do I have to anticipate which characters I want to delete and use the delete method to remove them? Also, when I try to remove
&
with
str.delete("&")
It leaves behind "amp;" Is there a better way to delete this type of character? Do I need to re-encode the text?

String#delete is certainly not what you want, as it works on characters, not the string as a whole.
Try
str.gsub /&/, ""
You may also want to try replacing the & with a literal ampersand, such as:
str.gsub /&/, "&"
If this is closer to what you really want, you may get the best results unescaping the HTML string. If so try this:
CGI::unescapeHTML(str)
Details of the unescapeHTML method are here.

If you are getting data from a 'feed', aka RSS XML, then you should be using an XML parser like Nokogiri to process the XML. This will automatically unescape HTML entities and allow you to get the proper string representation directly.

For removing try to use gsub method, something like this:
text = "foo&bar"
text.gsub /\b&\b/, "" #=> foobar

Related

How can I put a link inside of a text string in HAML?

This should probably be easier than it is. I just want to put a link inside an HTML paragraph element.
%p{class: "answer"}="Please upload your data to this portal in text (tab-separated) format. Download our template #{raw(link_to 'here', '/templates/upload_template.xlsx')} for sample data and a description of each column."
Rails is encoding the tag information. I don't want tags to be encoded. I want them to be tags.
You can use more than one line inside any block, to solve your problem we will have something like this:
%p{class: "answer"}
Please upload your data to this portal in text (tab-separated) format. Download our template
= link_to 'here', '/templates/upload_template.xlsx'
for sample data and a description of each column."
You can use interpolation directly in Haml, and doing this seems to fix the issue in this case.
So instead of doing this:
%p= "Text with #{something_interpolated} in it."
you can just do
%p Text with #{something_interpolated} in it.
i.e. you don’t need the = or the quotes, since you are just adding a single string. You shouldn’t need to use raw either.
(Also, note you can do %p.answer to set the class attribute, which may be cleaner if the value isn’t being set dynamically.)
Why this is being escaped here is a different matter. I would have expected the two cases (%p= "#{foo}" and %p #{foo}) to behave the same way. However, after a bit of research, this behaviour seems to match how Rails behaves with Erb.

Need to convert string "&#x0398" to "\u0398"

My Rails application stores strings containing html entity codes, e.g. "&#x0398", which display Greeks and other characters on browser pages. To display these same characters in Prawn documents, I need to convert "&#x0398" to "\u0398". Using a regexp I can extract the bare codepoint, "0398", from the original string. But I'm unable to use this to create a new string variable containing "\u0398".
I've tried many variations of string concatenation, interpolation and even array operations, but no joy. Anything that looks like
new_string_var = "\u" + my_codepoint
generates an "invalid Unicode escape" error at "\u".
Anything that looks like
new_string_var = "\\u" + my_codepoint
runs without error but inserts the literal string "\u0398" in the Prawn document.
Is it possible in Ruby to construct a string like this? Is there a better approach?
Actually, you don't need \uxxxx notation - this is for display purposes in Ruby. Try CGI.unescapeHTML(string_with_entities) from built-in CGI module.

Encoding only spaces and commas in URL

I am trying to encode a url in rails for an image attachment, but using CGI::escape or URI.escape seems to encode everything. I just need the commas and spaces encoded, nothing else. How would I go about doing that in rails? I did gsub, but could only replace commas or spaces, not both. Is there a way to do both?
http://"URL"?operation=getfieldclip&outlinePoints=600%2C600%2C400%2CTest Area%2C44.982643%2C-94.696723%2C44.982343%2C-94.696723%2C44.982293%2C-94.697170%2C44.982293%2C-94.697555%2C44.982313%2C-94.697740%2C44.982333%2C-94.697987%2C44.982363%2C-94.698110%2C44.982403%2C-94.698233%2C44.982453%2C-94.698341%2C44.982493%2C-94.698511%2C44.982553
I can get the commas changed, but not sure how to get the space changed at the same time as well.
You should use CGI.escape. Don't use it for the entire URL, but only for the values of it.
require "cgi"
"http://URL?" + params.map{|k, v| "#{k}=#{CGI.escape(v)}"}.join("&")

What is the proper way to sanitize user input when using a Ruby system call?

I have a Ruby on Rails Application that is using the X virtual framebuffer along with another program to grab images from the web. I have structured my command as shown below:
xvfb-run --server-args=-screen 0 1024x768x24 /my/c++/app #{user_provided_url}
What is the best way to make this call in rails with the maximum amount of safety from user input?
You probably don't need to sanitize this input in rails. If it's a URL and it's in a string format then it already has properly escaped characters to be passed as a URL to a Net::HTTP call. That said, you could write a regular expression to check that the URL looks valid. You could also do the following to make sure that the URL is parse-able:
uri = URI.parse(user_provided_url)
You can then query the object for it's relevant parts:
uri.path
uri.host
uri.port
Maybe I'm wrong, but why don't you just make sure that the string given is really an URL (URI::parse), surround it with single quotes and escape any single quote (') character that appears inside?

Why is this query string invalid?

In my asp.net mvc page I create a link that renders as followed:
http://localhost:3035/Formula/OverView?colorId=349405&paintCode=744&name=BRILLANT%20SILVER&formulaId=570230
According to the W3C validator, this is not correct and it errors after the first ampersand. It complains about the & not being encoded and the entity &p not recognised etc.
AFAIK the & shouldn't be encoded because it is a separator for the key value pair.
For those who care: I send these pars as querystring and not as "/" seperated values because there is no decent way of passing on optional parameters that I know of.
To put all the bits together:
an anchor (<a>) tag's href attribute needs an encoded value
& encodes to &
to encode an '&' when it is part of your parameter's value, use %26
Wouldn't encoding the ampersand into & make it part of my parameter's value?
I need it to seperate the second variable from the first
Indeed, by encoding my href value, I do get rid of the errors. What I'm wondering now however is what to do if for example my colorId would be "123&456", where the ampersand is part of the value.
Since the separator has to be encoded, what to do with encoded ampersands. Do they need to be encoded twice so to speak?
So to get the url:
www.mySite.com/search?query=123&456&page=1
What should my href value be?
Also, I think I'm about the first person in the world to care about this.. go check the www and count the pages that get their query string validated in the W3C validator..
Entities which are part of the attributes should be encoded, generally. Thus you need & instead of just &
It works even if it doesn't validate because most browsers are very, very, very lenient in what to accept.
In addition, if you are outputting XHTML you have to encode every entity everywhere, not just inside the attributes.
All HTML attributes need to use character entities. You only don't need to change & into & within script blocks.
Whatever
Anywhere in an HTML document that you want an & to display directly next to something other than whitespace, you need to use the character entity &. If it is part of an attribute, the & will work as though it was an &. If the document is XHTML, you need to use character entities everywhere, even if you don't have something immediately next to the &. You can also use other character entities as part of attributes to treat them as though they were the actual characters.
If you want to use an ampersand as part of a URL in a way other than as a separator for parameters, you should use %26.
As an example...
Hello
Would send the user to http://localhost/Hello, with name=Bob and text=you & me "forever".
This is a slightly confusing concept to some people, I've found. When you put & in a HTML page, such as in <a href="abc?def=5&ghi=10">, the URL is actually abc?def=5&ghi=10. The HTML parser converts the entity to an ampersand.
Think of exactly the same as how you need to escape quotes in a string:
// though you define your string like this:
myString = "this is \"something\" you know?"
// the string is ACTUALLY: this is "something" you know?
// when you look at the HTML, you see:
<a href="foo?bar=1&baz=2">
// but the url is ACTUALLY: foo?bar=1&bar=2

Resources