Change thing in the middle of the string notepad++ - hyperlink

My file contains things like this:
http://example.com/main.do?y=yeay
http://example.com/main.do?y=hahahehe
http://example.com/main.do?d=wow
http://example.com/blah/blah/product.do?p=49302
etc...
I want to change them all like following.
http://example.com/main.do#y=yeay.html
http://example.com/main.do#y=hahahehe.html
http://example.com/main.do#d=wow.html
http://example.com/blah/blah/product.do#p=49302.html
These are the links in a html/do/asp files.
How can I change them? Thanks.
I can also use other programs not noteoad++ and i have both macOS and WIndows
THanks

Ctrl+H
Find what: https?\S+?\.do\K\?(\S+)
Replace with: #$1.html
UNCHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all
Explanation:
https? # http OR https
\S+? # 1 or more non spaces, not greedy
\. # a dot
do # literally "do"
\K # forget all we have seen until this position
\? # question mark
(\S+) # group 1, 1 or more non spaces
Replacement:
# # literally
$1 # content of group 1
.html # literally
Screenshot (before):
Screenshot (after):

Related

Regular expression to match a particular url and to extract a specific character from it

I have a url with below format:
https://ab.efghix.com/1234567890/f231a8c9ef2008b1d8772c27c359211fa-c0l
I need to split this url and have to extract the last character from f231a8c9ef2008b1d8772c27c359211fa which is a in this example. I want to do this using a regex, for which i tried following pattern:
ptn = /^.*((ab.efghix.com\/)|(^\d{10}$))\??([^-\?]*).*/
url = "https://ab.efghix.com/1663215071/f231a8c9ef2008b1d8772c27c359211fa-c0l"
ptn.match(url)
For which i am getting response as below:
<MatchData "https://ab.efghix.com/1663215071/f231a8c9ef2008b1d8772c27c359211fa-c0l"
1:"ab.efghix.com/"
2:"ab.efghix.com/"
3:nil
4:"1663215071/f231a8c9ef2008b1d8772c27c359211fa" >
I need help in fine-tuning this pattern to obtain the last character of fourth MatchData. Any help will be appreciated :)
Among other possibilities, you could use
.*(\w)-[^-]*$
and take the first group, see a demo on regex101.com.
This says:
.* # consume everything up to the end
(\w) # a word character: 0-9A-Za-z_
- # a dash
[^-]* # not a dash, 0+ times
$ # anchor it to the end

assign array value to ENV var on .env file

I need to set an array of strings on my .env file and cant find information about the right syntax. Test for this takes quite a while so I wanted to save some time. Some of this options should work:
MY_ARRAY=[first_string, second_string]
MY_ARRAY=[first_string second_string]
MY_ARRAY=['first_string', 'second_string']
Can someone tell me which?
As far as I know dotenv does not allow setting anything except strings (and multiline strings). The parser syntax is:
LINE = /
\A
(?:export\s+)? # optional export
([\w\.]+) # key
(?:\s*=\s*|:\s+?) # separator
( # optional value begin
'(?:\'|[^'])*' # single quoted value
| # or
"(?:\"|[^"])*" # double quoted value
| # or
[^#\n]+ # unquoted value
)? # value end
(?:\s*\#.*)? # optional comment
\z
/x
The reason behind this is shell and OS support for setting other types of env variables is spotty.
You could use a separator such as commas or pipes (|) and split the string with ENV['FOO'].split('|'). But maybe what you are trying to do should be solved with an initializer which combines ENV vars.

Remove contents within a specific tag

Using Rails 3.2. I want to remove all text in <b> and the tags, but I manage to find ways to strip the tags only.:
string = "
<p>
<b>Section 1</b>
Everything is good.<br>
<b>Section 2</b>
All is well.
</p>"
string.strip_tags
# => "Section 1 Everthing is good. Section 2 All is well."
I want to achieve this:
"Everthing is good. All is well."
Should I add regex matching too?
The "right" way would be to use an html parser like Nokogiri.
However for this simple task, you may use a regex. It's quite simple:
Search for : (?m)<b\s*>.*?<\/b\s*> and replace it with empty string. After that, use strip_tags.
Regex explanation:
(?m) # set the m modifier to match newlines with dots .
<b # match <b
\s* # match a whitespace zero or more times
> # match >
.*? # match anything ungreedy until </b found
<\/b # match </b
\s* # match a whitespace zero or more times
> # match >
Online demo
It would be much better to use an HTML/XML parser for this task. Ruby does not have a native one, but Nokogiri is good and wraps libxml/xslt
doc = Nokogiri::XML string
doc.xpath("//b").remove
result = doc.text # or .inner_html to include `<p>`
You can do string.gsub(/<b>.*<\/b>/, '')
http://rubular.com/r/hhmpY6Q6fX
if you want to remove tags you can try this :
ActionController::Base.helpers.sanitize("test<br>test<br>test<br> test")
if you want to remove all the tags you need to use this :
ActionView::Base.full_sanitizer.sanitize("test<br>test<br>test<br> test")
these two differ slightly.the first one is good for script tags to prevent Xss attacks but it doesn't remove tages. the second one removes any html tags in the text.

I need to keep only lines in txt file that contain "#"

I need to keep only this lines in my text file, which contain symbol #.
I have something like this for example:
something #
anything else fxgbdfg
car #
325235363456356 # dfsjdbfkjfbfds
958395959 #
sdfsnfjkndsnc3r /
And I need this:
something #
car #
958395959 #
Can somebody tell me how to do that in GREP?
grep \#
Or grep \# filename if you want to use a file rather than stdin

How to make a Ruby string safe for a filesystem?

I have user entries as filenames. Of course this is not a good idea, so I want to drop everything except [a-z], [A-Z], [0-9], _ and -.
For instance:
my§document$is°° very&interesting___thisIs%nice445.doc.pdf
should become
my_document_is_____very_interesting___thisIs_nice445_doc.pdf
and then ideally
my_document_is_very_interesting_thisIs_nice445_doc.pdf
Is there a nice and elegant way for doing this?
I'd like to suggest a solution that differs from the old one. Note that the old one uses the deprecated returning. By the way, it's anyway specific to Rails, and you didn't explicitly mention Rails in your question (only as a tag). Also, the existing solution fails to encode .doc.pdf into _doc.pdf, as you requested. And, of course, it doesn't collapse the underscores into one.
Here's my solution:
def sanitize_filename(filename)
# Split the name when finding a period which is preceded by some
# character, and is followed by some character other than a period,
# if there is no following period that is followed by something
# other than a period (yeah, confusing, I know)
fn = filename.split /(?<=.)\.(?=[^.])(?!.*\.[^.])/m
# We now have one or two parts (depending on whether we could find
# a suitable period). For each of these parts, replace any unwanted
# sequence of characters with an underscore
fn.map! { |s| s.gsub /[^a-z0-9\-]+/i, '_' }
# Finally, join the parts with a period and return the result
return fn.join '.'
end
You haven't specified all the details about the conversion. Thus, I'm making the following assumptions:
There should be at most one filename extension, which means that there should be at most one period in the filename
Trailing periods do not mark the start of an extension
Leading periods do not mark the start of an extension
Any sequence of characters beyond A–Z, a–z, 0–9 and - should be collapsed into a single _ (i.e. underscore is itself regarded as a disallowed character, and the string '$%__°#' would become '_' – rather than '___' from the parts '$%', '__' and '°#')
The complicated part of this is where I split the filename into the main part and extension. With the help of a regular expression, I'm searching for the last period, which is followed by something else than a period, so that there are no following periods matching the same criteria in the string. It must, however, be preceded by some character to make sure it's not the first character in the string.
My results from testing the function:
1.9.3p125 :006 > sanitize_filename 'my§document$is°° very&interesting___thisIs%nice445.doc.pdf'
=> "my_document_is_very_interesting_thisIs_nice445_doc.pdf"
which I think is what you requested. I hope this is nice and elegant enough.
From http://web.archive.org/web/20110529023841/http://devblog.muziboo.com/2008/06/17/attachment-fu-sanitize-filename-regex-and-unicode-gotcha/:
def sanitize_filename(filename)
returning filename.strip do |name|
# NOTE: File.basename doesn't work right with Windows paths on Unix
# get only the filename, not the whole path
name.gsub!(/^.*(\\|\/)/, '')
# Strip out the non-ascii character
name.gsub!(/[^0-9A-Za-z.\-]/, '_')
end
end
In Rails you might also be able to use ActiveStorage::Filename#sanitized:
ActiveStorage::Filename.new("foo:bar.jpg").sanitized # => "foo-bar.jpg"
ActiveStorage::Filename.new("foo/bar.jpg").sanitized # => "foo-bar.jpg"
If you use Rails you can also use String#parameterize. This is not particularly intended for that, but you will obtain a satisfying result.
"my§document$is°° very&interesting___thisIs%nice445.doc.pdf".parameterize
For Rails I found myself wanting to keep any file extensions but using parameterize for the remainder of the characters:
filename = "my§doc$is°° very&itng___thsIs%nie445.doc.pdf"
cleaned = filename.split(".").map(&:parameterize).join(".")
Implementation details and ideas see source: https://github.com/rails/rails/blob/master/activesupport/lib/active_support/inflector/transliterate.rb
def parameterize(string, separator: "-", preserve_case: false)
# Turn unwanted chars into the separator.
parameterized_string.gsub!(/[^a-z0-9\-_]+/i, separator)
#... some more stuff
end
If your goal is just to generate a filename that is "safe" to use on all operating systems (and not to remove any and all non-ASCII characters), then I would recommend the zaru gem. It doesn't do everything the original question specifies, but the filename produced should be safe to use (and still keep any filename-safe unicode characters untouched):
Zaru.sanitize! " what\ēver//wëird:user:înput:"
# => "whatēverwëirduserînput"
Zaru.sanitize! "my§docu*ment$is°° very&interes:ting___thisIs%nice445.doc.pdf"
# => "my§document$is°° very&interesting___thisIs%nice445.doc.pdf"
There is a library that may be helpful, especially if you're interested in replacing weird Unicode characters with ASCII: unidecode.
irb(main):001:0> require 'unidecoder'
=> true
irb(main):004:0> "Grzegżółka".to_ascii
=> "Grzegzolka"

Resources