Bookmarks parsing issue - parsing

I have a LARGE number of bookmarks and wanted to export them and share them with a group I work with. The issue is that when I export them, there are ADD_DATE and LAST_MODIFIED fields added by the browser (Firefox). I was hoping to just use cut or awk to pull the fields I want but the lack of a space before the >(website_name) is making that difficult. And my regex skills are weak.
How do I add a single space before the second to last > at the end of the line so that I can use cut or awk to pull out the fields I want into a new file?
Ex: 123456">SecurityTrails would become 123456 >SecurityTrails
Please see below for examples of what I'm working with. Any help is greatly appreciated!
<DT>SecurityTrails

i use firefox myself. it frequently also embeds favicon into the exported bookmarks.html file via base64 encoding. so to account for the different scenarios (than just the one mentioned by OP), maybe something like
{mawk/mawk2/gawk} 'BEGIN { FS = "\042" } $1 = $1'
then do whatever cutting that you want. That's just assuming OP wanted to keep every bit of it, and simply remove the quotations.
Now, if the objective is just to take out URL+Name of it,
{mawk/mawk2/gawk} 'BEGIN { DBLQT="\042"; FS = "(<A HREF=" DBLQT "|>)" } /<A HREF=/ {
url = substr($2, 1, index($2, DBLQT) - 1);
sitename = $(NF-1);
sub(/<\/A$/, "", sitename) ;
print url " > " sitename ; }' # or whatever way you want the output to be
I just typed it in extra verbosity to show what \042 meant - the ascii octal for double quote.

Related

ImageJ/Fiji - Save CSV using macro

I am not a coder but trying to turn ThunderSTORM's batch process into an automated one where I have a single input folder and a single output folder.
input_directory = newArray("C:\\Users\\me\\Desktop\\Images");
output_directory = ("C:\\Users\\me\\Desktop\\Results");
for(i = 0; i < input_directory.length; i++) {
open(input_directory[i]);
originalName = getTitle();
originalNameWithoutExt = replace( originalName , ".tif" , "" );
fileName = originalNameWithoutExt;
run("Run analysis", "filter=[Wavelet filter (B-Spline)] scale=2.0 order=3 detector "+
"detector=[Local maximum] connectivity=8-neighbourhood threshold=std(Wave.F1) "+
"estimator=[PSF: Integrated Gaussian] sigma=1.6 method=[Weighted Least squares] fitradius=3 mfaenabled=false "+
"renderer=[Averaged shifted histograms] magnification=5.0 colorizez=true shifts=2 "+
"repaint=50 threed=false");
saveAs(fileName+"_Results", output_directory);
}
This probably looks like a huge mess but the original batch file used arrays and I can't figure out what that is. Taking it out brakes it so I left it in. The main issues I have revolve around the saveAs part not working.
Using run("Export Results") works but I need to manually pick a location and file name. I tried to set this up to take the file name and rename it to the generic image name so it can save a CSV using that name.
Any help pointing out why I'm a moron? I would also love to only open one file at a time (this opens them all) and close it when the analysis is complete. But I will settle for that happening on a different day if I can just manage to save the damn CSV automatically.
For the most part, I broke the code a whole bunch of times but it's in a working condition like this.
I appreciate any and all help. Thank you!

Run Macro in PhpSpreadsheet

I'm doing a project that allow the customer to export the mysql data into .xls form. I'm using phpspreadsheet library.
That's done, but in my data contain lots of date, some of the date is 0000-00-00 means that it is not used.
I wanted to filter all of these '0000-00-00' into '-'.
I uses excel find and replace and save them as macro ( .bas )
What i have tried is
load the .bas file with IOFactory and reader in php, but it say the file format is not accepted
use substitute method in php loops that use to get the sql data value
$activeSheet->setCellValue('L'.$i, '=substitute('L'.$i ,"0000-00-00", "-')');
$i is 1 that will increase by 1 for each loop
This method failed when the i can't include the $i inside the substitute() because the of "" and
'' problem, I tried to change them around, but seem like the 0000-00-00 and - must use "", if
not the method is not recognise by the library that makes the $i can't be detect then...
Is there any way to solve any of these problems? or it can't be solve in the first place?
cause i can't found any explanation of macro in phpspreadsheet from community nor google.
When setting the value of the cell
if ($datefromselect == '0000-00-00') {
$activeSheet->setCellValueByColumnAndRow($colnum, $rownum, '-');
} else {
$activeSheet->setCellValueByColumnAndRow($colnum, $rownum, $datefromselect);
}
or get it done in the select as in
SELECT lastname,
if(date_closed = '0000-00-00', '-', date_closed)
FROM `lca_clients`

Nesting extra Span in Pandoc filter disappears the image

I am currently working on massaging the HTML output of a Pandoc filter due to some annoying restrictions in the CMS that is the eventual beneficiary of my hard work.
My working filter (now with the obvious declarations) is as follows:
local List = require 'pandoc.List'
local Emph = pandoc.Emph
local Quoted = pandoc.Quoted
local Span = pandoc.Span
local Str = pandoc.Str
local Strong = pandoc.Strong
local image_base = "http://my.website.example/images/"
local image_author = "Someone Not Stigma"
function process_images(el)
el.src = el.src:gsub("^file:images/", image_base)
el.caption = {
Strong( Quoted( "DoubleQuote", el.caption ) ),
Str(" by "),
Emph(image_author)
}
return el
end
return {{Image = process_images}}
In the eventual HTML, this gives me a nice figure with img and figcaption element inside of it. Wonderful. Unfortunately, my CMS destroys the figcaption (like it tends to destroy other stuff), and as such I figured I'd wrap everything in an extra span so I can style that one instead.
function process_images(el)
el.src = el.src:gsub("^file:images/", image_base)
el.caption = {
Span(
{
Strong( Quoted( "DoubleQuote", el.caption ) ),
Str(" by "),
Emph(image_author)
},
{ class="img-caption" }
)
}
return el
end
And yet somehow, this causes Pandoc to completely delete the image from the resulting HTML.
I have tried replacing the table syntaxes with List({}) syntaxes, but that just gives me upvalue complaints. I looked at the manual, but for as far I can tell I am doing everything right.
What am I missing here?
I call pandoc as follows:
pandoc --from=markdown-tex_math_dollars "Content.pure.txt" --lua-filter=".\pandoc-filter.lua" --to=html5 --template=".\pandoc-template.txt" -o "Content.txt"
Extensions are .txt (because these files are not browser ready). The template being used is rather lengthy (there's a fair bit of YAML variables and related markup), but be assured: $body$ can be found in there.
I am not a wise man. Always update to the latest version before posting questions, folks.
I was running an older version of Pandoc (v2.6), and upgrading to v2.9.1.1 suddenly made the output appear again. That's a lot of versions released in the span of about a year!
(In my defense, my Pandoc-filter-fu is not particularly strong, so it makes sense to assume user error rather than program bug. Why is it that every time you assume bug, it is user error, and every time you assume user error, it is an outright bug?)

parse multilines from a file and replace

I need to read a file where the content is like below :
Computer Location = afp.local/EANG
Description = RED_TXT
Device Name = EANG04W
Domain Name = afp.local
Full Name = Admintech
Hardware Monitoring Type = ASIC2
Last Blocked Application Scan Date = 1420558125
Last Custom Definition Scan Date = 1348087114
Last Hardware Scan Date = 1420533869
Last Policy Sync Date = 1420533623
Last Software Scan Date = 1420533924
Last Update Scan Date = 1420558125
Last Vulnerability Scan Date = 1420558125
LDAP Location = **CN=EANG04W**,OU=EANG,DC=afp,DC=local
Login Name = ADMINTECH
Main Board OEM Name = Dell Inc.
Number of Files = 384091
Primary Owner = **CN= LOUHICHI anoir**,OU=EANG,DC=afp,DC=localenter code here
I need to replace CN=$value by CN=Compagny where $value is what is retrived after CN= and before ,.
Ok, so you really should have updated your question an not posted the code in a comment, because it's really hard to read. Here's what I think you intended:
$file = 'D:\sources\scripts\2.txt'
$content = Get-Content $file | foreach ($line in $content) {
if ($line.Contains('CN=')) {
$variable = $line.Split(',').Split('=')[2]
$variable1 = $variable -replace $variable, "Compagny"
} Set-Content -path $file
}
That deffinately has some syntax errors. The first line is great, you define the path. Then things go wrong... Your call to Get-Content is fine, that will get the contents of the file, and send them down the pipe.
You pipe that directly into a ForEach loop, but it's the wrong kind. What you really want there is a ForEach-Object loop (which can be confusing, because it can be shortened to just ForEach when used in a pipeline like this). The ForEach-Object loop does not declare an internal variable (such as ($line in $content)) and instead the scriptblock uses the automatic variable $_. So your loop needs to become something like:
Get-Content $file | ForEach { <do stuff> } | Set-Content
Next let's look inside that loop. You use an If statement to see if the line contains "CN=", understandable, and functional. If it does you then split the line on commas, and then again on equals, selecting the second record. Hm, you create an array of strings anytime you split one, and you have split a string twice, but only specify which record of the array you want to work with for the second split. That could be a problem. Anyway, you assign that substring to $variable, and proceed to replace that whole thing with "company" and store that output to $variable1. So there's a couple issues here. Once you split the string on the commas you have the following array of strings:
"LDAP Location = **CN=EANG04W**"
"OU=EANG"
"DC=afp"
"DC=local"
That's an array with 4 string objects. So then you try to split at least one of those (because you don't specify which one) on the equals sign. You now have an array with 4 array objects, where each of those has 2 string objects:
("LDAP Location", "**CN", "EANG04W**")
("OU", "EANG")
("DC","afp")
("DC","local")
You do specify the third record at this point (arrays in PowerShell start at record 0, so [2] specifies the third record). But you didn't specify which record in the first array so it's just going to throw errors. Let's say that you actually selected what you really wanted though, and I'm guessing that would be "EANG04W". (by the way, that would be $_.Split(",")[0].Split("=")[1]). You then assign that to $Variable, and proceed to replace all of it with "Company", so after PowerShell expands the variable it would look like this:
$variable1 = "EANG04W" -replace "EANG04W", "company"
Ok, you just successfully assigned "company" to a variable. And your If statement ends there. You never output anything from inside your If statement, so Set-Content has nothing to set. Also, it would set that nothing for each and every line that is piped to the ForEach statement, re-writing the file each time, but fortunately for you the script didn't work so it didn't erase your file. Plus, since you were trying to pipe to Set-Content, there was no output at the end of the pipeline, you have assigned absolutely nothing to $content.
So let's try and fix it, shall we? First line? Works great! No change. Now, we aren't saving anything in a variable, we just want to update a file's content, so there's no need to have $Content = there. We'll just move on then, shall we? We pipe the Get-Content into a ForEach loop, just like you tried to do. Once inside the ForEach loop, we're going to do things a bit differently though. The -replace method performs a RegEx match. We can use that to our advantage here. We will replace the text you are interested in for each line, and if it's not found, no replacement will be made, and pass each line on down the pipeline. That will look something like this for the inside of the ForEach:
$_ -replace "(<=CN\=).*?(?=,)", "Company"
The breakdown of that RegEx match can be seen here: https://regex101.com/r/gH6hP2/1
But, let's just say that it looks for text that has 'CN=' immediately before it, and goes up to the first comma following it. In your example, that includes the two trailing asterisks, but it doesn't touch the leading ones. Is that what you intended? That would make the last line of your example file:
Primary Owner = **CN=Company,OU=EANG,DC=afp,DC=localenter code here
Well, if that is as intended, then we have a winner. Now we close out the ForEach loop, and pipe the output to Set-Content and we're all set! Personally, I would highly suggest outputting to a new file, in case you need to reference the original file for some reason later, so that's what I'm going to do.
$file = 'D:\sources\scripts\2.txt'
$newfile = Join-Path (split-path $file) -ChildPath ('Updated-'+(split-path $file -Leaf))
Get-Content $file | ForEach{$_ -replace "(?<=CN\=).*?(?=,)", "Company"} | Set-Content $newfile
Ok, that's it. That code will produce D:\sources\scripts\Updated-2.txt with the following content:
Computer Location = afp.local/EANG
Description = RED_TXT
Device Name = EANG04W
Domain Name = afp.local
Full Name = Admintech
Hardware Monitoring Type = ASIC2
Last Blocked Application Scan Date = 1420558125
Last Custom Definition Scan Date = 1348087114
Last Hardware Scan Date = 1420533869
Last Policy Sync Date = 1420533623
Last Software Scan Date = 1420533924
Last Update Scan Date = 1420558125
Last Vulnerability Scan Date = 1420558125
LDAP Location = **CN=Company,OU=EANG,DC=afp,DC=local
Login Name = ADMINTECH
Main Board OEM Name = Dell Inc.
Number of Files = 384091
Primary Owner = **CN=Company,OU=EANG,DC=afp,DC=localenter code here

How to parse a remote website and create a link on every single word for a dictionary tooltip?

I want to parse a random website, modify the content so that every word is a link (for a dictionary tooltip) and then display the website in an iframe.
I'm not looking for a complete solution, but for a hint or a possible strategy. The linking is my problem, parsing the website and displaying it in an iframe is quite simple. So basically I have a String with all the html content. I'm not even sure if it's better to do it serverside or after the page is loaded with JS.
I'm working with Ruby on Rails, jQuery, jRails.
Note: The content of the href tag depends on the word.
Clarification:
I tried a regexp and it already kind of works:
#site.gsub!(/[A-Za-z]+(?:['-][A-Za-z]+)?|\\d+(?:[,.]\\d+)?/) {|word| '' + word + ''}
But the problem is to only replace words in the text and leave the HTML as it is. So I guess it is a regex problem...
Thanks for any ideas.
I don't think a regexp is going to work for this - or, at least, it will always be brittle. A better way is to parse the page using Hpricot or Nokogiri, then go through it and modify the nodes that are plain text.
It sounds like you have it mostly planned out already.
Split the content into words and then for each word, create a link, such as whatever
EDIT (based on your comment):
Ahh ... I recommend you search around for screen scraping techniques. Most of them should start with removing anything between < and > characters, and replacing <br> and <p> with newlines.
I would use Nokogiri to remove the HTML structure before you use the regex.
no_html = Nokogiri::HTML(html_as_string).text
Simple. Hash the HTML, run your regex, then unhash the HTML.
<?php
class ht
{
static $hashes = array();
# hashes everything that matches $pattern and saves matches for later unhashing
function hash($text, $pattern) {
return preg_replace_callback($pattern, array(self,'push'), $text);
}
# hashes all html tags and saves them
function hash_html($html) {
return self::hash($html, '`<[^>]+>`');
}
# hashes and saves $value, returns key
function push($value) {
if(is_array($value)) $value = $value[0];
static $i = 0;
$key = "\x05".++$i."\x06";
self::$hashes[$key] = $value;
return $key;
}
# unhashes all saved values found in $text
function unhash($text) {
return str_replace(array_keys(self::$hashes), self::$hashes, $text);
}
function get($key) {
return self::$hashes[$key];
}
function clear() {
self::$hashes = array();
}
}
?>
Example usage:
ht::hash_html($your_html);
// your word->href converter here
ht::unhash($your_formatted_html);
Oh... right, I wrote this in PHP. Guess you'll have to convert it to ruby or js, but the idea is the same.

Resources