Grab the values from url that is in between specific characters BigQuery - parsing

I need to parse urls in order to grab a value that comes after .com/ AND before the next / character. My data looks like this:
url
https://www.delish.com/food-news/news/jdhgkjdf/100-years-of-christmas
https://www.delish.com/food-news/news/100-years-of-christmas
The desired output is:
new_string
food-news
food-news
I have tried the following:
SPLIT(url, '/')[SAFE_OFFSET(ARRAY_LENGTH(SPLIT(url, '/')) - 4)] AS new string
But because the URLs are not consistent, sometimes it grabs food_news, sometimes it grabs www.delish.com, that's why offset is not working in this particular case.

Use below
regexp_extract(url, net.host(url) || r'/([^/]+)')

SPLIT(url, '/')[SAFE_OFFSET(ARRAY_LENGTH(SPLIT(b.page_link, '.com/')) +1)] AS new_string

Related

Filter data starts with character - Google Sheets

I have large set of data, and I want to filter the data which only start with certain character inside Query.
For example:-
AVTD1X4K1V0R01IA
AVTD1X4K1V0RXXF1
AVTD1X4K1V0RXXFA
AVTDMAIN1V0R03IA
AVTDMAIN1V0RXXFA
AWEWE23232323232
BLIVSE20122014X1
CA100U50VXSRCCCF
CA330U50VXSRCBCF
CA47UX63VXSRBBCX
In that data If I want to get starting with 'A' codes.
Thanks in advance
You can use match clause.
It will look like this based on your example
QUERY(A1:A11,"where A matches 'A.*'")
Change reference accordingly!
Try
=query(A:A,"select A where A like 'A%' ")

Rails, Postgres 12, Query where pattern matches regex, and contains substring

I have a field in the database which contains strings that look like: 58XBF2022L1001390 I need to be able to query results which match the last letter(in this case 'L'), and match or resemble the last four digits.
The regular expression I've been using to find records which match the structure is: \d{2}[A-Z]{3}\d{4}[A-Z]\d{7}, So far I've tried using a scope to refine the results, but I'm not getting any results. Here's my scope
def self.filter_by_shortcode(input)
q = input
starting = q.slice!(0)
ending = q
where("field ~* ?", "\d{2}[A-Z]{3}\d{4}/[#{starting}]/\d{3}[#{ending}]\g")
end
Here are some more example strings, and the substring that we would be looking for. Not every string stored in this database field matches this format, so we would need to be able to first match the string using the regex provided, then search by substring.
36GOD8837G6154231
G4231
13WLF8997V2119371
V9371
78FCY5027V4561374
V1374
06RNW7194P2075353
P5353
57RQN0368Y9090704
Y0704
edit: added some more examples as well as substrings that we would need to search by.
I do not know Rails, but the SQL for what you want is relative simple. Since your string if fixed format, once that format is validated, simple concatenation of sub-strings gives your desired result.
with base(target, goal) as
( values ('36GOD8837G6154231', 'G4231')
, ('13WLF8997V2119371', 'V9371')
, ('78FCY5027V4561374', 'V1374')
, ('06RNW7194P2075353', 'P5353')
, ('57RQN0368Y9090704', 'Y0704')
)
select substr(target,10,1) || substr(target,14,4) target, goal
from base
where target ~ '^\d{2}[A-Z]{3}\d{4}[A-Z]\d{7}$';

Problems with a include or if in rails

Im having issues with rails with the code
if #turno.chop == res[:department].to_s
where turno contains strings like ABC1 and department like ABC, im trying to filter if turno its equal of department but i need reduce the string of turno for that.
Every time what i try to do that the code dont finish and stuck in other part of code, when i delete the condition, the code works perfectly but dont do the filter.
i tryid to to do like
if #turno.include?(res[:department].to_s)
But appears the same error.
I believe something very similar to this was answered in the stackoverflow.com question. How to check whether a string contains a substring in Ruby?
The include? command sounds like what you should use.
my_string = "abcdefg"
if my_string.include? "cde"
puts "String includes 'cde'"
end
To be more accurate, #turno can contain a string like "ABC1" and res[:department] contains a string with "ABC" i need reduce the string in #turno to the first X characters and compare it with the content of res[:department]

URL query string has a comma in the string

I am having an issue with a URL query string and I believe the issue is that my parameter sometimes has a comma in it.
What happens is I have a query string that is generated from a list of group names so that my string looks something like:
Group=GroupName1,GroupName2,GroupName3
While doing some testing I noticed that some of my groups are not being displayed on the page even though they are in the query string. Then I noticed that the groups that are not showing are those that have a comma in the name. For example:
Group=People,%20Places%20and%20Stuff
Obviously the query string gets parsed looking for 'People' as a group and 'Places and Stuff' as a group. This is an issue because the group is 'People, Places and Stuff'. I don't have any control over the group names so they cannot be changed to not include commas. I tried to encode the comma in the string using %2C however that had no impact.
I did some searching but I couldn't find anything other than a suggestion about changing the server so that the delimiter isn't a comma but I don't have the ability to that. Any other solution or am I stuck?
After doing a bunch of hunting I finally found the answer.
I was on the right track encoding the comma as %2C however this has to be preceded by an escape character of %5C. Therefore the url query string would be the following:
Group=People%5C%2C%20Places%20and%20Stuff

Get line that matches regex in rails

I have a long list of information stored in a variable and I need to run some regex expressions against that variable and get various pieces of information from what is found.
How can you store the line that matches a regex expression in a variable?
How can you get the line number of the line that matches a regex expression?
Here is an example of what I'm talking about.
body = "service timestamps log datetime msec localtime show-timezone
service password-encryption
!
hostname switch01
!
boot-start-marker"
If I search for the line that contains "hostname" I need the line number, in this case it would be 4. I also need to store the line "hostname switch01" as another variable.
Any ideas?
Thanks!
First you'd want to convert the string to lines: body.split('\n'), then you want to add line numbers to the lines: .each_with_index. Then you want to select the lines .select {|line, line_nr| line =~ your_regex }. Putting it all together:
body.split('\n').each_with_index
.select {|line, line_nr| line =~ your_regex }
.map {|line, line_nr| line_nr }
This will give you all the lines matching 'your_regex'
Let's say you have an object file that provides a #lines method:
lines = file.lines.each_with_index.select {|line, i| line =~ /regex/ }
If you already have a list of lines you can leave out the call to #lines. If you have a string you can use string.split("\n").
This will result in the variable lines containing an array of 2-element arrays with the line that matched your RegEx and the index of the line in the original file.
Breakdown
file.lines gets the lines - of course the other methods I mentioned might also apply here for you. We then add the index to each element with #each_with_index, because you want to store these as well. This has the same effect as #map.with_index {|e, i| [e, i]}, i.e. map every element to [element, index]. We then use the #select method to get all lines that do match your RegEx (FYI, =~ is the matching operator in Ruby, Perl and other languages - in case you didn't already know). We're done after that, but you might need to further transform the data so you can process it.

Resources