Help with grep in BBEdit - grep

I'd like to grep the following in BBedit.
Find:
<dc:subject>Knowledge, Mashups, Politics, Reviews, Ratings, Ranking, Statistics</dc:subject>
Replace with:
<dc:subject>Knowledge</dc:subject>
<dc:subject>Mashups</dc:subject>
<dc:subject>Politics</dc:subject>
<dc:subject>Reviews</dc:subject>
<dc:subject>Ratings</dc:subject>
<dc:subject>Ranking</dc:subject>
<dc:subject>Statistics</dc:subject>
OR
Find:
<dc:subject>Social web, Email, Twitter</dc:subject>
Replace with:
<dc:subject>Social web</dc:subject>
<dc:subject>Email</dc:subject>
<dc:subject>Twitter</dc:subject>
Basically, when there's more than one category, I need to find the comma and space, add a linebreak and wrap the open/close around the category.
Any thoughts?

Wow. Lots of complex answers here. How about find:
,
(there's a space after the comma)
and replace with:
</dc:subject>\r<dc:subject>

Find:
(.+?),\s?
Replace:
\1\r
I'm not sure what you meant by “wrap the open/close around the category” but if you mean that you want to wrap it in some sort of tag or link just add it to the replace.
Replace:
\1\r
Would give you
Social web
Email
Twitter
Or get fancier with Replace:
\1\r
Would give you
Social web
Email
Twitter
In that last example you may have a problem with the “Social web” URL having a space in it. I wouldn't recommend that, but I wanted to show you that you could use the \1 backreference more than once.
The Grep reference in the BBEdit Manual is fantastic. Go to Help->User Manual and then Chapter 8. Learning how to use RegEx well will change your life.
UPDATE
Weird, when I first looked at this it didn't show me your full example. Based upon what I see now you should
Find:
(.+?),\s?
Replace:
<dc:subject>\1</dc:subject>\r

I don't use BBEdit, but in Vim you can do this:
%s/(_[^<]+)</dc:subject>/\=substitute(submatch(0), ",[ \t]*", "</dc:subject>\r", "g")/g
It will handle multiple lines and tags that span content with line breaks. It handles lines with multiple too, but won't always get the newline between the close and start tag.
If you post this to the google group vim_use and ask for a Vim solution and the corresponding perl version of it, you would probably get a bunch of suggestions and something that works in BBEdit and then also outside any editor in perl.
Don

You can use sed to do this either, in theory you just need to replace ", " with the closing and opening <dc:subject> and a newline character in between, and output to a new file. But sed doesn't seem to like the html angle brackets...I tried escaping them but still get error messages any time they're included. This is all I had time for so far, so if I get a chance to come back to it I will. Maybe someone else can solve the angle bracket issue:
sed s/, /</dc:subject>\n<dc:subject>/g file.txt > G:\newfile.txt
Ok I think I figured it out. Basically had to put the replacement text containing angle brackets in double quotes and change the separator character sed uses to something other than forward slash, as this is in the replacement text and sed didn't like it. I don't know much about grep but read that grep just matches things whereas sed will replace, so is better for this type of thing:
sed s%", "%"</dc:subject>\n<dc:subject>"%g file.txt > newfile.txt

You can't do this via normal grep. But you can add a "Unix Filter" to BBEdit doing this work for you:
#!/usr/bin/perl -w
while(<>) {
my $line = $_;
$line =~ /<dc:subject>(.+)<\/dc:subject>/;
my $content = $1;
my #arr;
if ($content =~ /,/) {
#arr = split(/,/,$content);
}
my $newline = '';
foreach my $part (#arr) {
$newline .= "\n" if ($newline ne '');
$part =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
$newline .= "<dc:subject>$part</dc:subject>";
}
print $newline;
}
How to add this UNIX-Filter to BBEdit you can read at the "Installation"-Part of this URL: http://blog.elitecoderz.net/windows-zeichen-fur-mac-konvertieren-und-umgekehrt-filter-fur-bbeditconverting-windows-characters-to-mac-and-vice-versa-filter-for-bbedit/2009/01/

Related

Is there a script that can extract particular link from txt and write it in another txt file?

I'm looking for a script (or if there isn't, I guess I'll have to write my own).
I wanted to ask if anyone here knows a script that can take a txt file with n links (lets say 200). I need to extract only links that have particular characters in them, let's say I only need links that contain "/r/learnprogramming". I need the script to get those links and write them to another txt files.
Edit: Here is what helped me: grep -i "/r/learnprogramming" 1.txt >2.txt
you can use ajax to read .txt file using jquery
<script src=https://cdnjs.cloudflare.com/ajax/libs/jquery/2.1.1/jquery.min.js></script>
<script>
jQuery(function($) {
console.log("start")
$.get("https://ayulayol.imfast.io/ajaxads/ajaxads.txt", function(wholeTextFile) {
var lines = wholeTextFile.split(/\n/),
randomIndex = Math.floor(Math.random() * lines.length),
randomLine = lines[randomIndex];
console.log(randomIndex, randomLine)
$("#ajax").html(randomLine.replace(/#/g,"<br>"))
})
})
</script>
<div id=ajax></div>
If you are using linux or macOS you could use cat and grep to output the links.
cat in.txt | grep /r/programming > out.txt
Solution provided by OP:
grep -i "/r/learnprogramming" 1.txt >2.txt
Since you did not provide the exact format of the document I assume those links are separated by newline characters. In this case, the code is pretty straightforward using Python/awk since you can iterate over file.readlines() and print only those that match your pattern (either by using a lines.contains(pattern) or using a regex if the pattern is more complex). To store the links in a new file simply redirect the stdout to a new file like this:
python script.py > links.txt
The solution above works even if links are separated by an arbitrary symbol s, first read the file into a single string and split it over s. I hope this helps.

regex to extract URLs from text - Ruby

I am trying to detect the urls from a text and replace them by wrapping in quotes like below:
original text: Hey, it is a url here www.example.com
required text: Hey, it is a url here "www.example.com"
original text show my input value and required text represents the required output. I searched a lot on web but could not find any possible solution. I already have tried URL.extract feature but that doesn't seem to detect URLs without http or https. Below are the examples of some of urls I want to deal with. Kindly let me know if you know the solution.
ANQUETIL-DUPERRON Abraham-Hyacinthe, KIEFFER Jean-Luc, www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,Les Belles lettres, 2001.
https://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/
www.jstor.org/stable/24084454
www.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/
insu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so
www.cerege.fr/spip.php?page=pageperso&id_user=94
Find words who look like urls:
str = "ANQUETIL-DUPERRON Abraham-Hyacinthe, KIEFFER Jean-Luc, www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,Les Belles lettres, 2001.\n\nhttps://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/\n\nwww.jstor.org/stable/24084454\n\nwww.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/\n\ninsu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so\n\nwww.cerege.fr/spip.php?page=pageperso&id_user=94"
str.split.select{|w| w[/(\b+\.\w+)/]}
This will give you an array of words which have no spaces and include a one or more . characters which MIGHT work for your use case.
puts str.split.select{|w| w[/(\b+\.\w+)/]}
www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,
https://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/
www.jstor.org/stable/24084454
www.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/
insu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so
www.cerege.fr/spip.php?page=pageperso&id_user=94
Updated
Complete solution to modify your string:
str_with_quote = str.clone # make a clone for the `gsub!`
str.split.select{|w| w[/(\b+\.\w+)/]}
.each{|url| str_with_quote.gsub!(url, '"' + url + '"')}
Now your cloned object wraps urls inside double quotes
puts str_with_quote
Will give you this output
ANQUETIL-DUPERRON Abraham-Hyacinthe, KIEFFER Jean-Luc, "www.hominides.net/html/actualites/outils-preuve-presence-hominides-asie-0422.php,Les" Belles lettres, 2001.
"https://www.ancient-code.com/indian-archeologists-stumbleacross-ruins-great-forgotten-civilization-mizoram/"
"www.jstor.org/stable/24084454"
"www.biorespire.com/2016/03/22/une-nouvelle-villeantique-d%C3%A9couverte-en-inde/"
"insu.cnrs.fr/terre-solide/terre-et-vie/de-nouvellesdatations-repoussent-l-age-de-l-apparition-d-outils-surle-so"
"www.cerege.fr/spip.php?page=pageperso&id_user=94"

How to cut a line after a phrase is found?

I have a large text file full of websites visited by hosts. This is the format:
Host : Url
A lot of the urls look like this:
http://google.com/?aslkdfjasldkfjaskldfjalskdjfalksdfjalksdjfa;sdlkfjas;dklfjasdklfjasdklfjasdklfjJUSTABUNCHOFRANDOMSTUFFaslkdjfaslkdfjaklsdfjaklsdjfasdkfjasdfklj
And it is hard to see what the original website is. How can I use grep to only show this:
Host : http://google.com
I've been looking everywhere to cut a line after the delimiter ".com" and can't find a solution. Thank you for you help!
Bonus: I forgot about .net, .org, and the other extensions. This might be a more difficult problem than I thought
Try this :
grep -oP 'Host : http://[^/]+'
^^^^
(All characters that's not slashes)
or if you want to specify .com :
grep -oP 'Host : http://.*?\.com'
Another solution :
cut -d'/' -f1-3

Find and replace a URL with grep/sed/awk?

Fairly regularly, I need to replace a local url with a live in large WordPress databases. I can do it in TextMate, but it often takes 10+ minutes to complete.
Basically, I have a 10MB+ .sql file and I want to:
Find: http://localhost:8888/mywebsite
and
Replace with: http://mywebsite.com
After that, I'll save the file and do a mysql import to the local/live servers. I do this at least 3-4 times a week and waiting for Textmate has been a pain. Is there an easier/faster way to do this with grep/sed/awk?
Thanks!
Terry
sed 's/http:\/\/localhost:8888\/mywebsite/http:\/\/mywebsite.com/g' FileToReadFrom > FileToWriteTo
This is running switch (s/) globally (/g) and replacing the first URL with the second. Forward slashes are escaped with a backslash.
kent$ echo "foobar||http://localhost:8888/mywebsite||fooooobaaaaaaar"|sed 's#http://localhost:8888/mywebsite#http://mywebsite.com#g'
foobar||http://mywebsite.com||fooooobaaaaaaar
if you want to do the replace in place (change in your original file)
sed -i 's#http://.....#http://mysite#g' input.sql
You don't need to replace the http://
sed "s/localhost:8888/www.my-awesome-page.com/g" input.sql > output.sql

how to disable bold font in vim?

i've removed all references to bold (gui=bold, cterm=bold, term=bold) in the color syntax file slate.vim but i still see some bolded text. for example in a python file, the keywords class, def, try, except, return, etc. are still in a bold blue font.
also how to disable bold in status messages, like "recording" or "Press ENTER or type command.."?
Instead of removing =bold references you should replace them by
gui=NONE
cterm=NONE
term=NONE
Put the following line in the .vimrc file.
set t_md=
Just in case someone is using iTerm on MacOS and also has this problem (since the same color scheme and vimrc settings under Ubuntu never gave me this problem), there is an option in iTerm under Preference->Profiles->text that stops iTerm from rendering any bold text. That's an easier and quicker fix.
try also to remove the occurrences of standout.
You can find highlighting groups by doing the following:
:sp $VIMRUNTIME/syntax/hitest.vim | source %
You can find where colors and font options were defined by doing:
:verbose highlight ModeMsg
(replace ModeMsg by your highlight group)
In vim, :scriptnames shows a list of all scripts loaded at vim startup.
In bash, grep -rl "=bold" $VIM shows a list of all files in your vim folder that contain that string. If $VIM is not set, or if you have a space in the filename (windows users), cd to your vim directory and run the command with . in place of $VIM
You can compare the two lists to find the files that need editing. Replace =bold with =NONE as stated in the previous answer by Tassos.
A side note: :hi Shows all current highlight formatting, with examples to demonstrate how the syntax is actually being rendered. In my case, standout had no effect on whether the font appeared bold.
Here's the easiest method:
In /colors directory enter sed -i 's/=bold/=NONE/g' *.vim
In /syntax directory enter sed -i 's/=bold/=NONE/g' *.vim
This will replace every instance in all those *.vim files.
For me, it was a tmux/screen issue. https://unix.stackexchange.com/questions/237530/tmux-causing-bold-fonts-in-vim led me to TERM=screen-256color which resolved my problem. It might also be worth exploring the difference when TERM is xterm vs. xterm-256color.
#devskii 's answer in the comment, above, works great for me. I'm going to include specifically the unbolding part here & wiki the answer. (If #devskii would like to make it an answer, I'll delete this... if I can delete wiki answers.)
Put this in your .gvimrc and smoke it:
" Steve Hall wrote this function for me on vim#vim.org
" See :help attr-list for possible attrs to pass
function! Highlight_remove_attr(attr)
" save selection registers
new
silent! put
" get current highlight configuration
redir #x
silent! highlight
redir END
" open temp buffer
new
" paste in
silent! put x
" convert to vim syntax (from Mkcolorscheme.vim,
" http://vim.sourceforge.net/scripts/script.php?script_id=85)
" delete empty,"links" and "cleared" lines
silent! g/^$\| links \| cleared/d
" join any lines wrapped by the highlight command output
silent! %s/\n \+/ /
" remove the xxx's
silent! %s/ xxx / /
" add highlight commands
silent! %s/^/highlight /
" protect spaces in some font names
silent! %s/font=\(.*\)/font='\1'/
" substitute bold with "NONE"
execute 'silent! %s/' . a:attr . '\([\w,]*\)/NONE\1/geI'
" yank entire buffer
normal ggVG
" copy
silent! normal "xy
" run
execute #x
" remove temp buffer
bwipeout!
" restore selection registers
silent! normal ggVGy
bwipeout!
endfunction
autocmd BufNewFile,BufRead * call Highlight_remove_attr("bold")

Resources