Extracting sublink in between two characters in Ruby - ruby-on-rails

How would I extract a sub-link between two characters in a string?
For example, I'd like to extract the Video ID in a youtube URL:
http://www.youtube.com/watch?v=UkzbRkPv4T4&feature=g-all-u
I'd like the text between the "=" and the first "&" sign, which would be "UkzbRkPv4T4".

If you don't want to deal with regular expressions, you could rely on functionality from Ruby's Standard Library for parsing URLs:
url = "http://www.youtube.com/watch?v=UkzbRkPv4T4&feature=g-all-u"
video_id = CGI.parse(URI.parse(url).query)['v'][0]

You just need a regular expression:
uri = 'http://www.youtube.com/watch?v=UkzbRkPv4T4&feature=g-all-u'
m = uri.match /v=(?<id>\w+)&/
if m
puts m[:id]
end

Just to expand upon apneadiving's comment.
>> url = "http://www.youtube.com/watch?v=UkzbRkPv4T4&feature=g-all-u"
=> "http://www.youtube.com/watch?v=UkzbRkPv4T4&feature=g-all-u"
>> md = url.match(/v=(.*)&/)
=> #<MatchData "v=UkzbRkPv4T4&" 1:"UkzbRkPv4T4">
>> md[1]
=> "UkzbRkPv4T4"

require 'uri'
uri = URI("http://www.youtube.com/watch?v=UkzbRkPv4T4&feature=g-all-u")
uri.query
# => "v=UkzbRkPv4T4&feature=g-all-u"
URI.decode_www_form(uri.query)
# => [["v", "UkzbRkPv4T4"], ["feature", "g-all-u"]]
URI.decode_www_form(uri.query).map(&:last)
# => ["UkzbRkPv4T4", "g-all-u"]
URI.decode_www_form(uri.query).assoc("v").last
# => "UkzbRkPv4T4"

Related

Regexp union in ruby escapes my original regex

I've got multiple regexes and I want to use Regexp.union to combine them in one big regex so I have this regex to show as an example:
^image\d*$
So I try this :
regex = %w(^image\d*$)
=> ["^image\\d*$"]
re = Regexp.union(regex)
=> /\^image\\d\*\$/
And it escapes my regex to /\^image\\d\*\$/ so when I try the basic case it doesn't match :
"image0".match(re)
=> nil
How can I get arround this?
Pass Regexp object. %w(...) is string literal. Use %r(...) or /.../ for regular expression literal.
regex = %r(^image\d*$)
# => /^image\d*$/
Regexp.union(regex)
# => /^image\d*$/
array_of_regexs = [/a/, /b/, /c/]
# => [/a/, /b/, /c/]
Regexp.union(array_of_regexs)
# => /(?-mix:a)|(?-mix:b)|(?-mix:c)/

Extracting URLs from a String that do not contain 'http'

I have the following 3 strings...
a = "The URL is www.google.com"
b = "The URL is google.com"
c = "The URL is http://www.google.com"
Ruby's URI extract method only returns the URL in the third string, because it contains the http part.
URI.extract(a)
=> []
URI.extract(b)
=> []
URI.extract(c)
=> ["http://www.google.com"]
How can I create a method to detect and return the URL in all 3 instances?
Use regular expressions :
Here is a basic one that should work for most cases :
/(https?:\/\/)?\w*\.\w+(\.\w+)*(\/\w+)*(\.\w*)?/.match( a ).to_s
This will only fetch the first url in the string and return a string.
There's no perfect solution to this problem: it's fraught with edge cases. However, you might be able to get tolerably good results using something like the regular expressions used by Twitter to extract URLs from tweets (stripping off the extra leading spaces is left as an exercise!):
require './regex.rb'
def extract_url(s)
s[Twitter::Regex[:valid_url]]
end
a = "The URL is www.google.com"
b = "The URL is google.com"
c = "The URL is http://www.google.com"
extract_url(a)
# => " www.google.com"
extract_url(b)
# => " google.com"
extract_url(c)
# => " http://www.google.com"
You seem to be satisfied with Sucrenoir's answer. The essence of Sucrenoir's answer is to identity a URL by assuming that it includes at least one period. if that is the case, Sucrenoir's regex can be simplified (not equivalently, but for the most part) to this:
string[/\S+\.\S+/]
This is something I used a while ago, hopefully it helps
validates :url, :format =>
{ :with => URI::regexp(%w(http https)), :message => "Not Valid URL" }
Pass it through that validation (I assume your using a database)
Try with this method. Hope it will work for you
def get_url(str)
arr = str.split(' ')
url = nil
arr.map {|arr_str| url = arr_str if arr_str.include?('.com')}
url
end
This is your example
get_url("The URL is www.google.com") #=> www.google.com
get_url("The URL is google.com") #=> google.com
get_url("The URL is http://www.google.com") #=> http://www.google.com

restclient with ruby

Here i am trying to pass one ID with the url, but that ID didn't append with URL...
def retrieve
url = "http://localhost:3000/branches/"
resource = RestClient::Resource.new url+$param["id"]
puts resource
end
giving ID via commend line that is
ruby newrest.rb id="22"
I have got the error like this
`+': can't convert nil into String (TypeError)
But all this working with mozilla rest client. How to rectify this problem?
Like this:
RestClient.get 'http://localhost:3000/branches', {:params => {:id => 50, 'name' => 'value'}}
You can find the command line parameters in the global ARGV array.
If ruby newrest.rb 22 will do then just
id = ARGV[0]
response = RestClient.get "http://localhost:3000/branches/#{id}"
puts response.body
Here are some examples from the documentation:
private_resource = RestClient::Resource.new 'https://example.com/private/resource', 'user', 'pass'
RestClient.post 'http://example.com/resource', :param1 => 'one', :nested => { :param2 => 'two' }
Just experiment with comma-separated parameters or with hashes so see what your URL gives you.
From my point of view line puts resource seems strange,
but when we leave it as it is
I'd suggest
def retrieve
url = "http://localhost:3000/branches/"
resource = RestClient::Resource.new url
res_with_param = resource[$param["id"]]
puts res_with_param
end
I haven't tried so there may be a syntax mistakes.
I'm really newcomer in ruby.
But idea is good I hope.
Greetings,
KAcper

Finding exact words in a string

I have a list of links to clothing websites that I am categorising by gender using keywords. Depending on what website they are for, they all have different URL structures, for example...
www.website1.com/shop/womens/tops/tshirt
www.website2.com/products/womens-tshirt
I cannot use the .include? method because regardless of whether it is .include?("mens") or .include?("womens"), it will return true. How can I have a method that will only return true for "womens" (and vice versa). I suspect it may have to be some sort of regex, but I am relatively inexperienced with these, and the different URL structures make it all the more tricky. Any help is much appreciated, thanks!
The canonical regex way of doing this is to search on word boundaries:
pry(main)> "foo/womens/bar".match(/\bwomens\b/)
=> #<MatchData "womens">
pry(main)> "foo/womens/bar".match(/\bmens\b/)
=> nil
pry(main)> "foo/mens/bar".match(/\bmens\b/)
=> #<MatchData "mens">
pry(main)> "foo/mens/bar".match(/\bwomens\b/)
=> nil
That said, either splitting, or searching with the leading "/", may be adequate.
If you first check for women it should work:
# assumes str is not nil
def gender(str)
if str.include?("women")
"F"
elsif str.include?("men")
"M"
else
nil
end
end
If this is not what you are looking for, please explain your problem in more detail.
You could split with / and check for string equality on the component(s) you want -- no need for a regex there
keyword = "women"
url = "www.website1.com/shop/womens/tops/tshirt"
/\/#{keyword}/ =~ url
=> 21
keyword = "men"
url = "www.website1.com/shop/womens/tops/tshirt"
/\/#{keyword}/ =~ url
=> nil
keyword = "women"
url = www.website2.com/products/womens-tshirt
/\/#{keyword}/ =~ url
=> 25
keyword = "men"
url = www.website2.com/products/womens-tshirt
/\/#{keyword}/ =~ url
=> nil
Then just do a !! on it:
=> !!nil => false
=> !!25 => true

Is there a Ruby library/gem that will generate a URL based on a set of parameters?

Rails' URL generation mechanism (most of which routes through polymorphic_url at some point) allows for the passing of a hash that gets serialized into a query string at least for GET requests. What's the best way to get that sort of functionality, but on top of any base path?
For instance, I'd like to have something like the following:
generate_url('http://www.google.com/', :q => 'hello world')
# => 'http://www.google.com/?q=hello+world'
I could certainly write my own that strictly suits my application's requirements, but if there existed some canonical library to take care of it, I'd rather use that :).
Yes, in Ruby's standard library you'll find a whole module of classes for working with URI's. There's one for HTTP. You can call #build with some arguments, much like you showed.
http://www.ruby-doc.org/stdlib/libdoc/uri/rdoc/classes/URI/HTTP.html#M009497
For the query string itself, just use Rails' Hash addition #to_query. i.e.
uri = URI::HTTP.build(:host => "www.google.com", :query => { :q => "test" }.to_query)
Late to the party, but let me highly recommend the Addressable gem. In addition to its other useful features, it supports writing and parsing uri's via RFC 6570 URI templates. To adapt the given example, try:
gsearch = Addressable::Template.new('http://google.com/{?query*}')
gsearch.expand(query: {:q => 'hello world'}).to_s
# => "http://www.google.com/?q=hello%20world"
or
gsearch = Addressable::Template.new('http://www.google.com/{?q}')
gsearch.expand(:q => 'hello world').to_s
# => "http://www.google.com/?q=hello%20world"
With vanilla Ruby, use URI.encode_www_form:
require 'uri'
query = URI.encode_www_form({ :q => "test" })
url = URI::HTTP.build(:host => "www.google.com", query: query).to_s
#=> "http://www.google.com?q=test"
I would suggest my iri gem, which makes it easy to build a URL through a fluent interface:
require 'iri'
url = Iri.new('http://google.com/')
.append('find').append('me') # -> http://google.com/find/me
.add(q: 'books about OOP', limit: 50) # -> ?q=books+about+OOP&limit=50
.del(:q) # remove this query parameter
.del('limit') # remove this one too
.over(q: 'books about tennis', limit: 10) # replace these params
.scheme('https') # replace 'http' with 'https'
.host('localhost') # replace the host name
.port('443') # replace the port
.path('/new/path') # replace the path of the URI, leaving the query untouched
.cut('/q') # replace everything after the host and port
.to_s # convert it to a string

Resources