I'm trying to get Tweets using twarc2 on terminal like this.
twarc2 search --archive --start-time "2017-10-16” ‘(“#metoo”) -is:retweet lang:en' --limit 1000000 tweets.json
However, after I put the request above, I get
dquote>
It seems that this appears when you use a single quotation and a double quotation. How can I avoid this?
i think the double quote is not necessary. otherwise you can use the escape \ "
You don’t have a single quote ending the command (or, you’ve used a smart quote not a straight quote).
Related
so I'm struggling with regex. I'll start with what I want to achieve and then proceed to what I have "so far".
So for example I have commit name lines
merge(#2137): done something
Merge pull request #420 from Example/branch
feat(): done something [#2137JDN]
merge(#690): feat(): done something [#2137JDN]
And I want to grep only by PR ID, or if it's not there then it'd search by that second hash
#2137
#420
#2137JDN
#690
For now I have this regex, but it's not perfect
/(\(|\s|\[)(#\d+|#.+)(\)|\s|\])/g
because it's capturing this
(#2137)
\s#420\s
[#2137JDN]
(#690)[#2137JDN]
How I can improve it to get what I want exactly?
You can use the #[\dA-Z]+ pattern to grep only hashes.
command | grep -Po "#[\dA-Z]+"
Which returns the all matched strings (in our case - hashes)
#2137
#420
#2137JDN
#690
#2137JDN
Unfortunately, grep does not support non-greedy feature. See this answer.
I'm trying to create cypher-statements for import using neo4j-shell. I'm using version 2, M3. And I'm somewhat in the dark as to what characters I should escape in the properties. Heres an example:
MATCH (artist:Artist) WHERE artist.kunstnernavn = 'Ditlev Blunck'
CREATE (artwork:Artwork {titel:'Christian IV's vision på slottet Rothenburg',inventarnummer:'KMS64',datering:'-4622274825',teknik:'Olie på lærred',optagelse:'\\foto-02\globus\globus\GLOBUS 2011\kms64.jpg '})
CREATE (artist)-[:CREATED_ARTWORK]->(artwork);
I have tried to escape "\" by %5C but then I get an error on globus%5C .. apparantly s% is a special character in that context. Same goes for titels with " -h" .. apparantly interpreted as an option.
Where can I find docs specifying this?
thanx,
Thorbjørn
Try using the back-tick character (`) to quote your strings.
The neo4j syntax docs gloss over this feature, but usage examples can be found across the site.
text.scan(/\"[\d\w\s\+\-\*\/]*\"/)
I'm simply looking to find any thing within quotations that can contain letters, numbers, spaces, plus, minus, star, or forward slash. Everything works great in console. Each of the following works in a browser:
"abc"
"123"
"x-1" or "x - 1"
"x/1" or "x / 1"
But the plus sign and star fail in a browser (despite working fine in console with the same regex). Does anyone have any ideas?
Edit #1: I'm performing a quick gsub to add some formatting to the results of the scan. If the quotations have a plus or star in them, they don't even get picked up by the scan. The same code and text pasted in console works just fine.
Edit #2: I figured out a better way to frame this question without extraneous details and got the answer. "Why can't I perform a gsub on each of the results from a scan if the result contains regex special characters?"
Turned out that this problem was related to regexp string insertion (/#{whatever}/) not escaping special characters - manually escaping clears it up (/#{Regexp.escape(whatever)}/). See this question for a full example/explanation.
I don't know what do you mean "work in browser" but I'm making an assumption that you're trying to parse an URL. In URL the + & * signs can be converted to %2B & %2A respectively.
Try this regexp:
/"[(\d\w\s\+\-\*\/|%2B|%2A)]+"/
...or decode URL before parsing.
I have a file where I want to grep for lines that start with either -rwx or drwx AND end in any number.
I've got this, but it isnt quite right. Any ideas?
grep [^.rwx]*[0-9] usrLog.txt
The tricky part is a regex that includes a dash as one of the valid characters in a character class. The dash has to come immediately after the start for a (normal) character class and immediately after the caret for a negated character class. If you need a close square bracket too, then you need the close square bracket followed by the dash. Mercifully, you only need dash, hence the notation chosen.
grep '^[-d]rwx.*[0-9]$' "$#"
See: Regular Expressions and grep for POSIX-standard details.
It looks like you were on the right track... The ^ character matches beginning-of-line, and $ matches end-of-line. Jonathan's pattern will work for you... just wanted to give you the explanation behind it
It should be noted that not only will the caret (^) behave differently within the brackets, it will have the opposite result of placing it outside of the brackets. Placing the caret where you have it will search for all strings NOT beginning with the content you placed within the brackets. You also would want to place a period before the asterisk in between your brackets as with grep, it also acts as a "wildcard".
grep ^[.rwx].*[0-9]$
This should work for you, I noticed that some posters used a character class in their expressions which is an effective method as well, but you were not using any in your original expression so I am trying to get one as close to yours as possible explaining every minor change along the way so that it is better understood. How can we learn otherwise?
You probably want egrep. Try:
egrep '^[d-]rwx.*[0-9]$' usrLog.txt
are you parsing output of ls -l?
If you are, and you just want to get the file name
find . -iname "*[0-9]"
If you have no choice because usrLog.txt is created by something/someone else and you absolutely must use this file, other options include
awk '/^[-d].*[0-9]$/' file
Ruby(1.9+)
ruby -ne 'print if /^[-d].*[0-9]$/' file
Bash
while read -r line ; do case $line in [-d]*[0-9] ) echo $line; esac; done < file
Many answers provided for this question. Just wanted to add one more which uses bashism-
#! /bin/bash
while read -r || [[ -n "$REPLY" ]]; do
[[ "$REPLY" =~ ^(-rwx|drwx).*[[:digit:]]+$ ]] && echo "Got one -> $REPLY"
done <"$1"
#kurumi answer for bash, which uses case is also correct but it will not read last line of file if there is no newline sequence at the end(Just save the file without pressing 'Enter/Return' at the last line).
Could anybody help me make a proper regular expression from a bunch of text in Ruby. I tried a lot but I don't know how to handle variable length titles.
The string will be of format <sometext>title:"<actual_title>"<sometext>. I want to extract actual_title from this string.
I tried /title:"."/ but it doesnt find any matches as it expects a closing quotation after one variable from opening quotation. I couldn't figure how to make it check for variable length of string. Any help is appreciated. Thanks.
. matches any single character. Putting + after a character will match one or more of those characters. So .+ will match one or more characters of any sort. Also, you should put a question mark after it so that it matches the first closing-quotation mark it comes across. So:
/title:"(.+?)"/
The parentheses are necessary if you want to extract the title text that it matched out of there.
/title:"([^"]*)"/
The parentheses create a capturing group. Inside is first a character class. The ^ means it's negated, so it matches any character that's not a ". The * means 0 or more. You can change it to one or more by using + instead of *.
I like /title:"(.+?)"/ because of it's use of lazy matching to stop the .+ consuming all text until the last " on the line is found.
It won't work if the string wraps lines or includes escaped quotes.
In programming languages where you want to be able to include the string deliminator inside a string you usually provide an 'escape' character or sequence.
If your escape character was \ then you could write something like this...
/title:"((?:\\"|[^"])+)"/
This is a railroad diagram. Railroad diagrams show you what order things are parsed... imagine you are a train starting at the left. You consume title:" then \" if you can.. if you can't then you consume not a ". The > means this path is preferred... so you try to loop... if you can't you have to consume a '"' to finish.
I made this with https://regexper.com/#%2Ftitle%3A%22((%3F%3A%5C%5C%22%7C%5B%5E%22%5D)%2B)%22%2F
but there is now a plugin for Atom text editor too that does this.