sed: search for a pattern after a particular line containing a keyword - parsing

input file:
new file
myString
abc
myString
xyz
pattern
file txt
myString
almost the end of file
myString
end of file
The input file contains multiple occurrences of myString throughout the entire file but I need to replace only the first occurrence of myString that occurs after another pattern pattern.
Also, there are no specific number of lines between pattern and first occurrence of myString.
desired output:
new file
myString
abc
myString
xyz
pattern
file txt
replacement_pattern
almost the end of file
myString
end of file
I want to do this task using sed, if possible

This might work for you (GNU sed):
sed '/pattern/{:a;N;s/myString/myReplacement/;Ta}' file
Gather up lines after finding one that contains pattern and replace the first match with a replacement string.
Alternative:
sed '/pattern/{:a;n;/myString/!ba;s//myReplacement/}' file
Same idea but using traditional sed commands rather that GNU specific.
N.B. This solution uses n instead of N which may affect the speed of the result (as it uses the implicit print for each line that is not matched by myString) in contrast to using less memory and printing all lines once a match is found.

The simple would be just:
/pattern/,/myString/s/myString/replacement_pattern/
For the first and only first myString in file I think there should be something simple, but I ended up with:
/pattern/,${ x; /myString/!{ x; //{ h; s//replacement_pattern/}; x}; x}'
A bit shorter:
/pattern/,${ /myString/{ x; //!{ g; s//replacement_pattern/; x}; x} }

Related

What Lua pattern behaves like a regex negative lookahead?

my problem is I need to write a Lua code to interpret a text file and match lines with a pattern like
if line_str:match(myPattern) then do myAction(arg) end
Let's say I want a pattern to match lines containing "hello" in any context except one containing "hello world". I found that in regex, what I want is called negative lookahead, and you would write it like
.*hello (?!world).*
but I'm struggling to find the Lua version of this.
Let's say I want a pattern to match lines containing "hello" in any context except one containing "hello world".
As Wiktor has correctly pointed out, the simplest way to write this would be line:find"hello" and not line:find"hello world" (you can use both find and match here, but find is probably more performant; you can also turn off pattern matching for find).
I found that in regex, what I want is called negative lookahead, and
you would write it like .*hello (?!world).*
That's incorrect. If you checked against the existence of such a match, all it would tell you would be that there exists a "hello" which is not followed by a "world". The string hello hello world would match this, despite containing "hello world".
Negative lookahead is a questionable feature anyways as it isn't trivially provided by actually regular expressions and thus may not be implemented in linear time.
If you really need it, look into LPeg; negative lookahead is implemented as pattern1 - pattern2 there.
Finally, the RegEx may be translated to "just Lua" simply by searching for (1) the pattern without the negative part (2) the pattern with the negative part and checking whether there is a match in (1) that is not in (2) simply by counting:
local hello_count = 0; for _ in line:gmatch"hello" do hello_count = hello_count + 1 end
local helloworld_count = 0; for _ in line:gmatch"helloworld" do helloworld_count = helloworld_count + 1 end
if hello_count > helloworld_count then
-- there is a "hello" not followed by a "world"
end

How to replace some characters of input file, before it getting lexed in flex?

How to replace all occurrences of some character or char-sequence with some other character or char-sequence, before flex lexes it. For example I want B\65R to match identifier rule as it is equivalent to BAR in my grammar. So, essentially I want to turn a sequence of \dd into its equivalent ascii character and then lex it. (\65 -> A, \66 -> B, …).
I know, I can first search the entire file for a sequence of \dd and replace it with equivalent character and then feed it to flex. But I wonder if there exists a better way. Something like writing a rule that matches \dd and then replacing it with corresponding alternative in the input stream, so that, I don't have to parse entire file twice.
Several options...
Next, flex is going to read from a filter that
substitutes "\dd" by "chr(dd)" (untested).
You could run something along the lines of
YYIN = popen("perl -pe 's/\\(\d\d)/chr($1)/e' ", "r");
yylex()....

BBEdit: how to write a replacement pattern when a back reference is immediately followed by a number

I'm new to GREP in BBEdit. I need to find a string inside an XML file. Such string is enclosed in quotes. I need to replace only what's inside the quotes.
The problem is that the replacement string starts with a number thus confuses BBEdit when I put together the replacement pattern. Example:
Original string in XML looks like this:
What I need to replace it with:
01 new file name.png
My grep search and replace patterns:
Using the replacement pattern above, BBEdit wrongly thinks that the first backreference is "\101" when what I really need it understand is that I mean "\01".
TIA for any help.
Your example is highly artificial because in fact there is no need for your \1 or \3 as you know their value: it is " and you can just type that directly to get the desired result.
"01 new file name.png"
However, just for the sake of completeness, the answer to your actual question (how to write a replacement group number followed by a number) is that you write this:
\0101 new file name.png\3
The reason that works is that there can only be 99 capture groups, so \0101 is parsed as \01 (the first capture group) followed by literal 01.

How can I combine words with numbers when pattern matching in LUA?

I'm trying to match any strings that come in that follow the format Word 100.00% ~(45.56, 34.76) in LUA. As such, I'm looking to do a regex close (in theory) to this:
%D%s[%d%.%d]%%(%d.%d, %d.%d)
But I'm having no luck so far. LUA's patterns are weird.
What am I missing?
Your pattern is close you neglected to allow for multiple instances of a digit you can do this by using a + at like %d+.
You also did not use [,( and . correctly in the pattern.
[s in a pattern will create a set of chars that you are trying to match such as [abc] means you are looking to match any as bs or c at that position.
( are used to define a capture so the specific values you want returned rather then the whole string in the event of a match, in order to use it as a char you for the match you need to escape it with a %.
. will match any character rather then specifically a . you will need to add a % to escape if you want to match a . specifically.
local str = "Word 100.00% ~(45.56, 34.76)"
local pattern = "%w+%s%d+%.%d+%%%s~%(%d+%.%d+, %d+%.%d+%)"
print(string.match(str, pattern))
Here you will see the input string print if it matches the pattern otherwise you will see nil.
Suggested resource: Understanding Lua Patterns

End of line lex

I am writing an interpreter for assembly using lex and yacc. The problem is that I need to parse a word that will strictly be at the end of the file. I've read that there is an anchor $, which can help. However it doesn't work as I expected. I've wrote this in my lex file:
ABC$ {printf("QWERTY\n");}
The input file is:
ABC
without spaces or any other invisible symbols. So I expect the outputput to be QWERTY, however what I get is:
ABC
which I guess means that the program couldn't parse it. Then I thought, that $ might be a regular symbol in lex, so I changed the input file into this:
ABC$
So, if $ isn't a special symbol, then it will be parsed as a normal symbol, and the output will be QWERTY. This doesn't happen, the output is:
ABC$
The question is whether $ in lex is a normal symbol or special one.
In (f)lex, $ matches zero characters followed by a newline character.
That's different from many regex libraries where $ will match at the end of input. So if your file does not have a newline at the end, as your question indicates (assuming you consider newline to be an invisible character), it won't be matched.
As #sepp2k suggests in a comment, the pattern also won't be matched if the input file happens to use Windows line endings (which consist of the sequence \r\n), unless the generated flex file was compiled for Windows. So if you created the file on Windows and run the flex-generated scanner in a Unix environment, the \r will also cause the pattern to fail to match. In that case, you can use (f)lex's trailing context operator:
ABC/\r?\n { puts("Matched ABC at the end of a line"); }
See the flex documentation for patterns for a full description of the trailing context operator. (Search for "trailing context" on that page; it's roughly halfway down.) $ is exactly equivalent to /\n.
That still won't match ABC at the very end of the file. Matching strings at the very end of the file is a bit tricky, but it can be done with two patterns if it's ok to recognise the string other than at the end of the file, triggering a different action:
ABC/. { /* Do nothing. This ABC is not at the end of a line or the file */ }
ABC { puts("ABC recognised at the end of a line"); }
That works because the first pattern will match as long as there is some non-newline character following ABC. (. matches any character other than a newline. See the above link for details.) If you also need to work with Windows line endings, you'll need to modify the trailing context in the first pattern.

Resources