How can I remove $ from imported html - google-sheets

I'm using the following code.
=(index(IMPORTHTML(concat("https://coinmarketcap.com/currencies/","bitcoin"),"table",1),1,2))
I would like to multiply with the number, however, I can't do that since it is treated as text.
Does any have any clue how I can remove the $?

use:
=INDEX(REGEXREPLACE(IMPORTHTML(
"https://coinmarketcap.com/currencies/"&"bitcoin",
"table", 1), "[\$,]", )*1, 1, 2)

Related

Extract string values that are enclosed in slashes

An example url that I'm trying to collect the values from has this pattern:
https://int.soccerway.com/matches/2021/08/18/canada/canadian-championship/hfx-wanderers/blainville/3576866/
The searched value always starts at the seventh / and ends at the ninth /:
/canada/canadian-championship/
The method I know how to do is using LEFT + FIND and RIGHT + FIND, but it is very archaic, I believe there is a better method for this need.
Another alternative:
="/"&textjoin("/", 1, query(split(A1, "/"), "Select Col7, Col8"))&"/"
The searched value always starts at the seventh / and ends at the ninth /:
Here's another way you can do it:
="/"&regexextract(A1,"(?:.*?/){7}(.*?/.*?/)")
You can use =REGEXTRACT() to match part of the string with a regular expression:
For example, If A1 = https://int.soccerway.com/matches/2021/08/18/canada/canadian-championship/hfx-wanderers/blainville/3576866/ ,
then
=REGEXEXTRACT(A1, "\/[^\/]*\/[^\/]*\/[^\/]*\/[^\/]*\/[^\/]*\/[^\/]*(\/[^\/]*\/[^\/]*\/)")
returns
/canada/canadian-championship/
Explanation: \/ is '/' escaped. [^\/]* matches any non '/' character 0 or more times. \/[^\/]* is repeated 6 times. () captures a specific part of the string as a group to be returned. Finally (\/[^\/]*\/[^\/]*\/) matches the essential part we want.
Little bit different approach.
=REGEXEXTRACT(SUBSTITUTE(SUBSTITUTE(A1,"/","|",9),"/","|",7),"\|(.*?)\|")

Does anyone know how to fix this grep command

For 1, I can get 101 to 191 to print. How do I include 203 and up as well so that it includes everything from 10 up? For 2, I can get the first set of names starting with an L to print but not the ones in the and 230. Please don't suggest I use something else like awk or sed, I want to know how to do it the way I am currently trying to do it. How can I expand the ranges I am searching in order to include more. Thanks.
For 1) since it has to be 10 or more, it needs 2 or more digits, so just use this:
grep 'per[0-9]\{2,\}'
For 2), just do
grep 'per[0-9]*:L'
and, of course, you can combine them with
grep 'per[0-9]\{2,\}:L'
Try using the * to grep for repeated Numbers like: grep "per[0-9]*:L" idfile.txt
This is a more detailed answer :)
Regex - Matching arbitrary amount of numbers

How to join mutilines in the Notepad++?

In the Notepad++, I have thousands lines of data need to modify, some of them in one appropriate line and end by "$", some data should have in one line but now arrange in several lines, so how to join them together and all end by "$"?
Here is the data sample:
1.we love it $
2.its beautiful $
3.how
can
it? $
4. yes I love it $
5. sorry
its
ugly
too $
for that sample, line 1,2,4 in the right line, but for line 3 and 5, it separates into multi-lines, so how to join them together?PS: except all the ends it has $, in other contents it has no one more "$"
Use regex replace:
find: (?<!\$)[\n\r]+(( ) *)?
replace: $2
The $2 preserves one of the leading spaces (if any) from the joined line.
Given your input, the above produces:
1.we love it $
2.its beautiful $
3.how can it? $
4. yes I love it $
5. sorry its ugly too $
Note that your sample input is "corrupt" in that it has trailing spaces after $ (eg the first line), so you'll have to clean that up first.

How can I create a bag of words for latex strings?

I have a set of input paragraphs in latex formats. I want to create a bag of words from them.
Taking a set of guys that look like these:
"Some guy did something with \emph{ yikes } $ \epsilon $"
I want to out put a dictionary:
{
"Some": 40,
...
"yikes": 10
"epsilon (or unicode for it)": 3
}
That is I need a dictionary where the set of keys are the set of words/symbols/equations (I'll call all of these words for brevity) across all paragraphs and a count of their occurrences across all paragraphs as well.
From there given k-ordered-tuple of words, I need a k-array for each paragraph where the ith element in the array represents the count of the word in the ith tuple in that paragraph.
so say (Some, dunk, yikes, epsilon) will give me
[1, 0, 1, 1] for the stated example.
I've tried this by using a lexer to get the tokens out and processing the tokens directly. This is difficult and error prone not to mention slow. Is there a better strategy or tool that can do this?
There are some corner cases to consider with special characters:
G\""odel => Gödel
for example. I'd like to preserve these.
Also, I'd like to drop equations all together or keep them as one word. Equations occur in between $ ... $ signs.
If I understand correctly, you are trying to do the following:
Split the sentence into words:
s = "Some guy did something with \emph{ yikes } \epsilon"
words = s.split()
print words
Output:
['Some', 'guy', 'did', 'something', 'with', '\\emph{', 'yikes', '}', '\\epsilon']
Count the number of occurrences:
from collections import Counter
dictionary = Counter(words)
print dictionary
Output:
Counter({'did': 1, '}': 1, '\\epsilon': 1, 'Some': 1, 'yikes': 1, 'something': 1, 'guy': 1, 'with': 1, '\\emph{': 1})
Access words and their corresponding numbers as separate lists:
print dictionary.keys()
print dictionary.values()
Output:
['did', '}', '\\epsilon', 'Some', 'yikes', 'something', 'guy', 'with', '\\emph{']
[1, 1, 1, 1, 1, 1, 1, 1, 1]
Note that I didn't process any word, yet. You might want to strip brackets or backslashes. But this can be easily done by traversing the dictionary (or lists) with a for-loop and handling each entry individually.
To convert LaTeX umlauts to unicode characters is somehow a whole new problem. There are several stackoverflow questions and answers on this topic. Maybe you just need to find/replace them in the initial string:
s = s.replace('\\"o', unichr(252))
(Note that depending on your command line encoding you might not see umlauts with print s. But they are not lost, as can be shown using print repr(s).)
To preserve equations you can split the string using a regular expression rather than split:
import re
print re.findall('\$.+\$|[\w]+', s)
Output:
['Some', 'guy', 'did', 'something', 'with', 'emph', 'yikes', '$ \\epsilon $']
Please see my answer to another question for a similar example and a more detailed explanation.

GNU-M4: Strip empty lines

How can I strip empty lines (surplus empy lines) from an input file using M4?
I know I can append dnl to the end of each line of my script to suppress the newline output, but the blank lines I mean are not in my script, but in a data file that is included (where I am not supposed to put dnl's).
I tried something like that:
define(`
',`')
(replace a new-line by nothing)
But it didn't work.
Thanks.
I use divert() around my definitions :
divert(-1) will suppress the output
divert(0) will restore the output
Eg:
divert(-1)dnl output supressed starting here
define(..)
define(..)
divert(0)dnl normal output starting here
use_my_definitions()...
I understand your problem to be a data file with extra line breaks, meaning that where you want to have the pattern data<NL>moredata you have things like data<NL><NL>moredata.
Here's a sample to cut/paste onto your command line that uses here documents to generate a data set and runs an m4 script to remove the breaks in the data set. You can see the patsubst command replaces every instance of one or more newlines in sequence (<NL><NL>*) with exactly one newline.
cat > data << -----
1, 2
3, 4
5, 6
7, 8
9, 10
11, 12
e
-----
m4 << "-----"
define(`rmbreaks', `patsubst(`$*', `
*', `
')')dnl
rmbreaks(include(data))dnl
-----

Resources