Get Text after Quotes In Ruby - ruby-on-rails

My sample text looks like this:
30","formatedMinDeliveryDate":null,"formatedMaxDeliveryDate":null,"actualDeliveryDate":null,"trackingNumber":"ID180135116580CN","shippmentTrackingUrl":"https:\u002F\u002Fwww.057872.m2749.l5119","localizedCurrency":null}},"actions":[{"label":"Leave feedback","icon":null,"value":null,"action":"link","actionParam":{"label":"LEAVE_FEEDBACK_FOR_SELLER","u
I want to get the number ID180135116580CN and I'm having trouble achieving this using regexp.
The file is full of them and I'm doing this
out_file = File.open('public/orders.txt').each do |line|
p line[/(?<="trackingNumber":")[^"]*(?=")/]
end
but it only prints nil and doesn't extract the number I'm looking for.
Is the regexp wrong or do I need to traverse the file differently?
Basically after every trackingNumber, I want to get whatever is in quotes there after.
Thanks!
Edit:
Attempted this as per #WiktorStribiżew suggestion in the comments
p line.scan(/"trackingNumber"\s*:\s*"([^"]+)"/)
Now, I'm getting all of trackingNumbers as an array like this
[["UB08578YP"], ["UB085789YP"], ["ID180135791CN"], ["ID180135728CN"]]
How do I modify this to get them in individual lines like this?
UB08578YP
UB085789YP
ID180135791CN
ID180135728CN

it can be simpler if you read the file completely and run the scan there something like this
file_content = open("./my_file.txt").read
results = file_content.scan(/(?<="trackingNumber":")[^"]*(?=")/)
puts results
It will have them formated. in the way you want it.

Related

Rails Plain Text Asset Extract String

Sorry, I am just learning how to use Rails.
I've got a simple .txt file asset which I would like to pull random Strings from to display on my landing page.
Is there an easy way in Rails to do this?
Assuming each string is in a separate line, you can do this:
strings = File.readlines('path/to/file.txt')
Then, to get a random string use sample, like this:
strings.sample
If you wan't more than one random string, just use sample with an argument, for example:
strings.sample(3)
This will return an array with 3 random lines from strings array.
Finally, you can do all in one line, for example, try this in the controller:
#string = File.readlines('path/to/file.txt').sample
And you will have #string available to use in the view.
So you are not giving me much. but I am going to assume that you want to get 1 line of a text file.
This is how I would do it
File.readlines("my/file/path.txt").sample
I hope that get you started :)

How to format a text area sent as an array via JSON?

I'm pretty new to Ruby, and I'm using it work with an API. Text area's sent over the API are converted to the format below before being sent to me via a JSON POST request:
"Comment": [
"hdfdhgdfgdfg\r",
"This is just a test\r",
"Thanks!\r",
"- Kyle"
]
And I'm getting the value like this:
comments = params["Comment"]
So each line is broken down into what looks like an array. My issue is, it functions just like one big string instead of an array with 4 values. I tried using comments[0] and just printing comments but both return the same result, it just displays everything as a string, ie
["hdfdhgdfgdfg\r", "This is just a test\r", "Thanks!\r", "- Kyle"]
But I need to display it as it appears in the text area, ie
hdfdhgdfgdfg
This is just a test
Thanks!
- Kyle
I know I could just strip out all the extra characters, but I feel like there has to be a better way. Is there a good way to convert this back to the original format of a text area, or at least to convert it to an array so I can loop through each item and re-format it?
First, get rid of those ugly \rs:
comments.map!(&:chomp)
Then, join the lines together:
comment = comments.join("\n") # using a newline
# OR, for HTML output:
comment = comments.join('<br>')
You should be able to parse the JSON and populate a hash with all of the values:
require 'json'
hash = JSON.parse(params["Comment"])
puts hash
=> {"Comment"=>['all', 'of', 'your', 'values']}
This should work for all valid json. One of the rules of json syntax is that
Data is in name/value pairs
The json you provided doesn't supply names for the values, therefore this method might not work. If this is the case, parsing the raw string and extracting values would do the job as well (although more messy).
How you might go about doing that:
json = params["Comment"]
newArray = []
json.split(" ").each do |element|
if element.length > 1
newArray << element
end
end
This would at least give you an array with all of your values.

Clean up & style characters from text

I am getting text from a feed that has alot of characters like:
Insignia&#153; 2.0 Stereo Computer Speaker System (2-Piece) - Black
4th-Generation Apple® iPod® touch
Is there an easy way to get rid of these, or do I have to anticipate which characters I want to delete and use the delete method to remove them? Also, when I try to remove
&
with
str.delete("&")
It leaves behind "amp;" Is there a better way to delete this type of character? Do I need to re-encode the text?
String#delete is certainly not what you want, as it works on characters, not the string as a whole.
Try
str.gsub /&/, ""
You may also want to try replacing the & with a literal ampersand, such as:
str.gsub /&/, "&"
If this is closer to what you really want, you may get the best results unescaping the HTML string. If so try this:
CGI::unescapeHTML(str)
Details of the unescapeHTML method are here.
If you are getting data from a 'feed', aka RSS XML, then you should be using an XML parser like Nokogiri to process the XML. This will automatically unescape HTML entities and allow you to get the proper string representation directly.
For removing try to use gsub method, something like this:
text = "foo&bar"
text.gsub /\b&\b/, "" #=> foobar

Extracting email addresses in an html block in ruby/rails

I am creating a parser that wards off against spamming and harvesting of emails from a block of text that comes from tinyMCE (so it may or may not have html tags in it)
I've tried regexes and so far this has been successful:
/\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/i
problem is, i need to ignore all email addresses with mailto hrefs. for example:
test#mail.com
should only return the second email add.
To get a background of what im doing, im reversing the email addresses in a block so the above example would look like this:
moc.liam#tset
problem with my current regex is that it also replaces the one in href. Is there a way for me to do this with a single regex? Or do i have to check for one then the other? Is there a way for me to do this just by using gsub or do I have to use some nokogiri/hpricot magicks and whatnot to parse the mailtos? Thanks in advance!
Here were my references btw:
so.com/questions/504860/extract-email-addresses-from-a-block-of-text
so.com/questions/1376149/regexp-for-extracting-a-mailto-address
im also testing using this:
http://rubular.com/
edit
here's my current helper code:
def email_obfuscator(text)
text.gsub(/\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/i) { |m|
m = "<span class='anti-spam'>#{m.reverse}</span>"
}
end
which results in this:
<a target="_self" href="mailto:<span class='anti-spam'>moc.liamg#tset</span>"><span class="anti-spam">moc.liamg#tset</span></a>
Another option if lookbehind doesn't work:
/\b(mailto:)?([A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4})\b/i
This would match all emails, then you can manually check if first captured group is "mailto:" then skip this match.
Would this work?
/\b(?<!mailto:)[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/i
The (?<!mailto:) is a negative lookbehind, which will ignore any matches starting with mailto:
I don't have Ruby set up at work, unfortunately, but it worked with PHP when I tested it...
Why not just store all the matched emails in an array and remove any duplicates? You can do this easily with the ruby standard library and (I imagine) it's probably quicker/more maintainable than adding more complexity to your regex.
emails = ["email_one#example.com", "email_one#example.com", "email_two#example.com"]
emails.uniq # => ["email_one#example.com", "email_two#example.com"]

How to use ruby to get string between HTML <cite> tags?

Greetings everyone:
I would love to get some information from a huge collection of Google Search Result pages.
The only thing I need is the URLs inside a bunch of <cite></cite> HTML tags.
I cannot get a solution in any other proper way to handle this problem so now I am moving to ruby.
This is so far what I have written:
require 'net/http'
require 'uri'
url=URI.parse('http://www.google.com.au')
res= Net::HTTP.start(url.host, url.port){|http|
http.get('/#hl=en&q=helloworld')}
puts res.body
Unfortunately I cannot use the recommended hpricot ruby gem (because it misses a make command or something?)
So I would like to stick with this approach.
Now that I can get the response body as a string, the only thing I need is to retrieve whatever is inside the ciite(remove an i to see the true name :)) HTML tags.
How should I do that? using regular expression? Can anyone give me an example?
Here's one way to do it using Nokogiri:
Nokogiri::HTML(res.body).css("cite").map {|cite| cite.content}
I think this will solve it:
res.scan(/<cite>([^<>]*)<\/cite>/imu).flatten
# This one to ignore empty tags:
res.scan(/<cite>([^<>]*)<\/cite>/imu).flatten.select{|x| !x.empty?}
If you're having problems with hpricot, you could also try nokogiri which is very similar, and allows you to do the same things.
Split the string on the tag you want. Assuming only one instance of tag (or specify only one split) you'll have two pieces I'll call head and tail. Take tail and split it on the closing tag (once), so you'll now have two pieces in your new array. The new head is what was between your tags, and the new tail is the remainder of the string, which you may process again if the tag could appear more than once.
An example that may not be exactly correct but you get the idea:
head1, tail1 = str.split('<tag>', 1) # finds the opening tag
head2, tail2 = tail1.split('</tag>', 1) # finds the closing tag

Resources