Is there a script that can extract particular link from txt and write it in another txt file? - hyperlink

I'm looking for a script (or if there isn't, I guess I'll have to write my own).
I wanted to ask if anyone here knows a script that can take a txt file with n links (lets say 200). I need to extract only links that have particular characters in them, let's say I only need links that contain "/r/learnprogramming". I need the script to get those links and write them to another txt files.
Edit: Here is what helped me: grep -i "/r/learnprogramming" 1.txt >2.txt

you can use ajax to read .txt file using jquery
<script src=https://cdnjs.cloudflare.com/ajax/libs/jquery/2.1.1/jquery.min.js></script>
<script>
jQuery(function($) {
console.log("start")
$.get("https://ayulayol.imfast.io/ajaxads/ajaxads.txt", function(wholeTextFile) {
var lines = wholeTextFile.split(/\n/),
randomIndex = Math.floor(Math.random() * lines.length),
randomLine = lines[randomIndex];
console.log(randomIndex, randomLine)
$("#ajax").html(randomLine.replace(/#/g,"<br>"))
})
})
</script>
<div id=ajax></div>

If you are using linux or macOS you could use cat and grep to output the links.
cat in.txt | grep /r/programming > out.txt
Solution provided by OP:
grep -i "/r/learnprogramming" 1.txt >2.txt

Since you did not provide the exact format of the document I assume those links are separated by newline characters. In this case, the code is pretty straightforward using Python/awk since you can iterate over file.readlines() and print only those that match your pattern (either by using a lines.contains(pattern) or using a regex if the pattern is more complex). To store the links in a new file simply redirect the stdout to a new file like this:
python script.py > links.txt
The solution above works even if links are separated by an arbitrary symbol s, first read the file into a single string and split it over s. I hope this helps.

Related

How to fix internal link issues when publishing a Docusaurus site on GitLab pages

In my Docusaurus project my internal links work on my local environment, but when I push to GitLab they no longer work. Instead of replacing the original doc title with the new one it adds it to the url at the end ('https://username.io/test-site/docs/overview/add-a-category.html'). I looked over my config file, but I do not understand why this is happening.
I tried updating the id in the front matter for the page, and making sure it matches the id in the sidebars.json file. I have also added customDocsPath and set it to 'docs/' in the config file, though that is supposed to be the default.
---
id: "process-designer-overview"
title: "Process Designer Overview"
sidebar_label: "Overview"
---
# Process Designer
The Process Designer is a collaborative business process modeling and
design workspace for the business processes, scenarios, roles and tasks
that make up governed data processes.
Use the Process Designer to:
- [Add a Category](add-a-category.html)
- [Add a Process or Scenario](Add%20a%20Process%20or%20Scenario.html)
- [Edit a Process or Scenario](Edit%20a%20Process%20or%20Scenario.html)
I updated the add a category link in parenthesis to an md extension, but that broke the link on my local and it still didn't work on GitLab. I would expect that when a user clicks on the link it would replace the doc title in the url with the new doc title ('https://username.gitlab.io/docs/add-a-category.html') but instead it just tacks it on to the end ('https://username.gitlab.io/docs/process-designer-overview/add-a-category.html') and so the link is broken as that is not where the doc is located.
There were several issues with my links. First, I converted these files from html to markdown using Pandoc and did not add front matter - relying instead on the file name to connect my files to the sidebars. This was fine, except almost all of the file names had spaces in them, which you can see in my code example above. This was causing real issues, so I found a Bash script to replace all of the spaces in my file names with underscores, but now all of my links were broken. I updated all of the links in my files with a search and replace in my code editor, replacing "%20" with "_". I also needed to replace the ".html" extension with ".md" or my project would no longer work locally. Again, I did this with a search and replace in my code editor.
Finally, I ended up adding the front matter because otherwise my sidebar titles were all covered in underscores. Since I was working with 90 files, I didn't want to do this manually. I looked for a while and found a great gist by thebearJew and adjusted it so that it would take the file name and add it as the id, and the first heading and add it as the title and sidebar_label, since as it happens that works for our project. Here is the Bash script I found online to convert the spaces in my file names to underscores if interested:
find $1 -name "* *.md" -type f -print0 | \
while read -d $'\0' f; do mv -v "$f" "${f// /_}"; done
Here is the script I ended up with if anyone else has a similar setup and doesn't want to update a huge amount of files with front matter:
# Given a file path as an argument
# 1. get the file name
# 2. prepend template string to the top of the source file
# 3. resave original source file
# command: find . -name "*.md" -print0 | xargs -0 -I file ./prepend.sh file
filepath="$1"
file_name=$("basename" -a "$filepath")
# Getting the file name (title)
md='.md'
title=${file_name%$md}
heading=$(grep -r "^# \b" ~/Documents/docs/$title.md)
heading1=${heading#*\#}
# Prepend front-matter to files
TEMPLATE="---
id: $title
title: $heading1
sidebar_label: $heading1
---
"
echo "$TEMPLATE" | cat - "$filepath" > temp && mv temp "$filepath"

Why doesn't grep work on some file, but works on another (same content)

I wrote this grep command:
grep -- "^[0-9a-zA-Z\.-]\+$" file.txt
To get all lines containing only numbers, letters and dashes (legal domains).
This is the result of diff on both files
1,3c1,3
< test.xcom
< hi-th6ere.co.k
< 54
---
> test.xcom
> hi-th6ere.co.k
> 54
I wrote a file with some domains to test and it works great!
But, when I download a file (with the same content!) from the web, and then run this command, grep doesn't return anything.
I've tried to set full permissions on this file, but it still doesn't work.
Any ideas?
Thanks,
What makes you think the file content as the same as the one you've tested?
You can run 'diff filename1 filename2' to see if there are any differences between the two files.
It could be the the file you're downloading is in unicode format, so in a web browser it looks to be have the same content as the file you've tested, but the binary content of the file itself is different.

Find and replace a URL with grep/sed/awk?

Fairly regularly, I need to replace a local url with a live in large WordPress databases. I can do it in TextMate, but it often takes 10+ minutes to complete.
Basically, I have a 10MB+ .sql file and I want to:
Find: http://localhost:8888/mywebsite
and
Replace with: http://mywebsite.com
After that, I'll save the file and do a mysql import to the local/live servers. I do this at least 3-4 times a week and waiting for Textmate has been a pain. Is there an easier/faster way to do this with grep/sed/awk?
Thanks!
Terry
sed 's/http:\/\/localhost:8888\/mywebsite/http:\/\/mywebsite.com/g' FileToReadFrom > FileToWriteTo
This is running switch (s/) globally (/g) and replacing the first URL with the second. Forward slashes are escaped with a backslash.
kent$ echo "foobar||http://localhost:8888/mywebsite||fooooobaaaaaaar"|sed 's#http://localhost:8888/mywebsite#http://mywebsite.com#g'
foobar||http://mywebsite.com||fooooobaaaaaaar
if you want to do the replace in place (change in your original file)
sed -i 's#http://.....#http://mysite#g' input.sql
You don't need to replace the http://
sed "s/localhost:8888/www.my-awesome-page.com/g" input.sql > output.sql

get and save path to php.ini file

Is it possible to get the path to the php.ini file with a php script and save this path to a variable? I know I can call phpinfo() to find out the path, but it prints a lot of info and I only need the path to php.ini file and no output at all. Thank you in advance.
Sure, there are two functions related to what you would like to do. The first one is exactly what you're looking for, the second one shows the bigger picture that there can be more than one ini file:
php_ini_loaded_fileDocs - Retrieve a path to the loaded php.ini file.
php_ini_scanned_filesDocs - Return a list of .ini files parsed from the additional ini dir.
Next to that, mind the gap with .user.ini files, they don't show up in php_ini_scanned_files nor phpinfo.
You could exec("php -i | grep php.ini"), and grab that output.
Or you could use outputbuffering (ob_start()), run phpinfo(), get the contents of the outputbuffer (ob_get_contents()) and search trough that (preg_match) to find the php.ini file...
This works on the command line, might also work for the CGI:
ob_start(); phpinfo();
if ( preg_match( '#>(.+)/php.ini#', ob_get_clean(), $matches ) ) {
echo 'php.ini location: ' . trim( $matches[1] ) . '/php.ini';
}

Help with grep in BBEdit

I'd like to grep the following in BBedit.
Find:
<dc:subject>Knowledge, Mashups, Politics, Reviews, Ratings, Ranking, Statistics</dc:subject>
Replace with:
<dc:subject>Knowledge</dc:subject>
<dc:subject>Mashups</dc:subject>
<dc:subject>Politics</dc:subject>
<dc:subject>Reviews</dc:subject>
<dc:subject>Ratings</dc:subject>
<dc:subject>Ranking</dc:subject>
<dc:subject>Statistics</dc:subject>
OR
Find:
<dc:subject>Social web, Email, Twitter</dc:subject>
Replace with:
<dc:subject>Social web</dc:subject>
<dc:subject>Email</dc:subject>
<dc:subject>Twitter</dc:subject>
Basically, when there's more than one category, I need to find the comma and space, add a linebreak and wrap the open/close around the category.
Any thoughts?
Wow. Lots of complex answers here. How about find:
,
(there's a space after the comma)
and replace with:
</dc:subject>\r<dc:subject>
Find:
(.+?),\s?
Replace:
\1\r
I'm not sure what you meant by “wrap the open/close around the category” but if you mean that you want to wrap it in some sort of tag or link just add it to the replace.
Replace:
\1\r
Would give you
Social web
Email
Twitter
Or get fancier with Replace:
\1\r
Would give you
Social web
Email
Twitter
In that last example you may have a problem with the “Social web” URL having a space in it. I wouldn't recommend that, but I wanted to show you that you could use the \1 backreference more than once.
The Grep reference in the BBEdit Manual is fantastic. Go to Help->User Manual and then Chapter 8. Learning how to use RegEx well will change your life.
UPDATE
Weird, when I first looked at this it didn't show me your full example. Based upon what I see now you should
Find:
(.+?),\s?
Replace:
<dc:subject>\1</dc:subject>\r
I don't use BBEdit, but in Vim you can do this:
%s/(_[^<]+)</dc:subject>/\=substitute(submatch(0), ",[ \t]*", "</dc:subject>\r", "g")/g
It will handle multiple lines and tags that span content with line breaks. It handles lines with multiple too, but won't always get the newline between the close and start tag.
If you post this to the google group vim_use and ask for a Vim solution and the corresponding perl version of it, you would probably get a bunch of suggestions and something that works in BBEdit and then also outside any editor in perl.
Don
You can use sed to do this either, in theory you just need to replace ", " with the closing and opening <dc:subject> and a newline character in between, and output to a new file. But sed doesn't seem to like the html angle brackets...I tried escaping them but still get error messages any time they're included. This is all I had time for so far, so if I get a chance to come back to it I will. Maybe someone else can solve the angle bracket issue:
sed s/, /</dc:subject>\n<dc:subject>/g file.txt > G:\newfile.txt
Ok I think I figured it out. Basically had to put the replacement text containing angle brackets in double quotes and change the separator character sed uses to something other than forward slash, as this is in the replacement text and sed didn't like it. I don't know much about grep but read that grep just matches things whereas sed will replace, so is better for this type of thing:
sed s%", "%"</dc:subject>\n<dc:subject>"%g file.txt > newfile.txt
You can't do this via normal grep. But you can add a "Unix Filter" to BBEdit doing this work for you:
#!/usr/bin/perl -w
while(<>) {
my $line = $_;
$line =~ /<dc:subject>(.+)<\/dc:subject>/;
my $content = $1;
my #arr;
if ($content =~ /,/) {
#arr = split(/,/,$content);
}
my $newline = '';
foreach my $part (#arr) {
$newline .= "\n" if ($newline ne '');
$part =~ s/^\s*(\S*(?:\s+\S+)*)\s*$/$1/;
$newline .= "<dc:subject>$part</dc:subject>";
}
print $newline;
}
How to add this UNIX-Filter to BBEdit you can read at the "Installation"-Part of this URL: http://blog.elitecoderz.net/windows-zeichen-fur-mac-konvertieren-und-umgekehrt-filter-fur-bbeditconverting-windows-characters-to-mac-and-vice-versa-filter-for-bbedit/2009/01/

Resources