Ant delete all lines from text file after a certain keyword - ant

Is there a possibility to delete all text lines using Ant in a text file that are after a specific keyword? - after the first occurrence of the keyword.
Example
Line1
Line2
Line3
Line4
Line5
.....
Line1000
I want to delete everything that is in that file that is after "Line3" keyword excluding that line.

Ant's replaceregexp task can handle this pretty easily:
<replaceregexp
file="input.txt"
match="(.*Line3).*"
replace="\1"
flags="s"
/>
Brief explanation: The regex pattern captures everything up to and including "Line3" in a group, then continues to match the rest of the input. The replacement consists of only the captured group, effectively deleting the part you don't want. The s flag is switched on so that newlines are matched with the . wildcard.

Related

Grep lines for multiple words and the line ending, and then replace line ending if matched

I need to grep a long text file for lines that contains multiple possible words and also end in "=1", and then replace the line with the same text except change the "=1" to "=0".
I'm using BBEdit.
So far I have this to find lines that contains the desired match that also ends with 1:
^(.*test|.*disabled|.*inactive|.*server).*(=1)
I'm unable to do the replacement successfully though.
Here are some example lines of text from the file:
OU>2020,OU>Disabled Accounts,DC>net,DC>example,DC>com=1
OU>Distribution Groups,DC>net,DC>example,DC>com=1
OU>Exchange Servers,DC>net,DC>example,DC>com=1
CN>Users,DC>net,DC>example,DC>com=1
OU>Test Servers,OU>Servers,OU>ABC,DC>net,DC>example,DC>com=1
As an example, the first line above would have its =1 changed to =0 like:
OU>2020,OU>Disabled Accounts,DC>net,DC>example,DC>com=0
Other matches would follow that pattern.
After playing around with it more, this seems to work:
Find:
(^.*(test|disable|inactive|server).*)(=1)$
Replace:
\1=0

how to tokenize/parse/search&replace document by font AND font style in LibreOffice Writer?

I need to update a bilingual dictionary written in Writer by first parsing all entries into their parts e.g.
main word (font 1, bold)
foreign equivalent transliterated (font 1, italic)
foreign equivalent (font 2, bold)
part of speech (font 1, italic)
Each line of the document is the main word followed by the parts listed above, each separated by a space or punctuation.
I need to automate the process of walking through the whole file, line by line, and place a delimiter between each part, ignoring spaces and punctuation, so I can mass import it into a Calc file. In other words, "each part" is a sequence of character (ignoring spaces and punctuation) that have the same font AND font-style.
I have tried the standard Search&Replace feature, and AltSearch extension, but neither are able to complete the task. The main problem is I am not able to write a search query that says:
Find: consecutive characters with the same font AND font_style, ignore spaces and punctuation
Replace: term found above + "delimiter"
Any suggestions how I can write a script for this, or if an existing tool can solve the problem?
Thanks!
Pseudo code for desired effect:
var delimiter = "|"
Go to beginning of document
While not end of document do:
var $currLine = get line from doc
var $currChar = get next character which is not space or punctuation;
var $font = currChar.font
var $font_style - currChar.font_style (e.g. bold, italic, normal)
While not end of line do:
$currChar = next character which is not space or punctuation;
if (currChar.font != $font || currChar.font_style != $font_style) { // font or style has changed
print $delimiter
$font = currChar.font
$font_style - currChar.font_style (e.g. bold, italic, normal)
}
end While
end While
Here are tips for each of the things your pseudocode does.
First, the easiest way to move line by line is with the TextViewCursor, although it is slow. Notice the XLineCursor section. For the while loop, oVC.goDown() will return false when the end of the document is reached. (oVC is our variable for the TextViewCursor).
Get each character by calling oVC.goRight(0, False) to deselect followed by oVC.goRight(1, True) to select. Then the selected value is obtained by oVC.getString(). To ignore space and punctuation, perhaps use python's isalnum() or the re module.
To determine the font of the character, call oVC.getPropertyValue(attr). Values for attr could simply be CharAutoStyleName and CharStyleName to check for any changes in formatting.
Or grab a list of specific properties such as 'CharFontFamily', 'CharFontFamilyAsian', 'CharFontFamilyComplex', 'CharFontPitch', 'CharFontPitchAsian' etc. Character properties are described at https://wiki.openoffice.org/wiki/Documentation/DevGuide/Text/Formatting.
To insert the delimiter into the text: oVC.getText().insertString(oVC, "|", 0).
This python code from github shows how to do most of these things, although you'll need to read through it to find the relevant parts.
Alternatively, instead of using the LibreOffice API, unzip the .odt file and parse content.xml with a script.

Ant Rename files maintaining directory structure

I want to rename files using Ant maintaining their directory structure.
e.g. Assume following directory structure:
- copy
- new
- testthis.a
Using code below, I could rename files containing "this" word to "that.a" using copy task, but they all are getting pasted into "paste" directory loosing their directory structure.
<copy todir="paste" overwrite="true">
<fileset dir="copy"/>
<regexpmapper from="^(.*)this(.*)\.a$$" to="that.a"/>
</copy>
Output:
- paste
- that.a
If I change regexmapper to (notice \1 before that.a):
<regexpmapper from="^(.*)this(.*)\.a$$" to="\1that.a"/>
It's generating correct directory structure but always prepends word before "this" to "that.a"
Output:
- paste
- new
- testthat.a
Is there any way to rename files maintaining their directory structure without pre-pending or appending any word?
Is there any other mapper which can be used for the same?
Any help would be appreciated.
<copy todir="paste" verbose="true">
<fileset dir="copy" includes="**/*this*.a"/>
<regexpmapper from="((?:[^/]+/)*)[^/]+$$" to="\1that.a" handledirsep="true"/>
</copy>
First, setting handledirsep="true" allows us use forward slashes to match backslashes. This makes the regular expression a bit cleaner.
Next, I'll explain the gnarly regex by breaking it into parts.
I explode ((?:[^/]+/)*) into...
(
(?:
[^/]+
/
)
*
)
What the parts mean:
( -- capture group 1 starts
(?: -- non-capturing group starts
[^/]+ -- greedily match as many non-directory separators as possible
/ -- match a single directory-separator character
) -- non-capturing group ends
* -- repeat the non-capturing group zero-or-more times
) -- capture group 1 ends
The above parts repeatedly match as many subdirectories as possible. The ( and ) put all of the matches into capture group 1. Later, capture group 1 can be used in the to attribute of <regexpmapper> with a \1 backreference.
If there are no / directory separators in a path, then the above parts won't match anything and capture group 1 will be an empty string.
Moving to the end of the regex, the $$ anchors the regex to the end of each path selected by the <fileset>.
In the double dollar-sign expression, $$, the first $ escapes the second $. This is necessary because Ant would treat a single $ as the start of a property reference.
The [^/]+ matches just the filename because it matches all characters at the end of the path that aren't directory separators (/).
Example
Given the following directory structure...
- copy (dir)
- new (dir)
- notthis.b
- testthis.a
- anythis.a
...Ant outputs...
[copy] Copying 2 files to C:\ant\paste
[copy] Copying C:\ant\copy\anythis.a to C:\ant\paste\that.a
[copy] Copying C:\ant\copy\new\testthis.a to C:\ant\paste\new\that.a
Try this regexpmapper:
<regexpmapper from="^(.*)/([^/]*)this(.*)\.a$$" to="\1/that\3"/>
This cuts the path (\1) and filename prefix (\2), so you can preserve the directory structure.
Also, you can preserve the file extension if you use \3in the replacement string.

Need to selectively remove newline characters from a file using unix (solaris)

I am trying to find a way to selectively remove newline characters from a file. I have no issues removing all of them..but I need some to remain.
Here is the example of the bad input file. Note that rows with Permit ID COO789 & COO012 have newlines embedded in the description field that I need to remove.
"Permit Id","Permit Name","Description","Start Date","End Date"
"COO123","Music Festival",,"02/12/2013","02/12/2013"
"COO456","Race Weekend",,"02/23/2013","02/23/2013"
"COO789","Basketball Final 8 Championships - Media vs. Politicians
Skills Competition",,"02/22/2013","02/22/2013"
"COO012","Dragonboat race
weekend",,"05/11/2013","05/11/2013"
Here is an example of how I need the file to look like:
"Permit Number/Id","Permit Name","Description","Start Date","End Date"
"COO123","Music Festival",,"02/12/2013","02/12/2013"
"COO456","Race Weekend",,"02/23/2013","02/23/2013"
"COO789","Basketball Final 8 Championships - Media vs. Politicians Skills Competition",,"02/22/2013","02/22/2013"
"COO012","Dragonboat race weekend",,"05/11/2013","05/11/2013"
NOTE: I did simplify the file by removing a few extra columns. The logic should be able to accommodation any number of columns though. The actual full header line is with all columns is. Technically, I expect the "extra" newlines to be found in Description and Location columns.
"Permit Number/Id","Permit Name","Description","Start Date","End Date","Custom Status","Owner Name","Total Expected Attendance","Location"
I have tried sed, cut, tr, nawk, etc. Open to any solution that can do this..that can be called from within a unix script.
Thanks!!!
If you must remove newline characters from only within the 'Description' and 'Location' fields, you will need a proper csv parser (think Text::CSV). You could also do this fairly easily using GNU awk, but you won't have access to gawk on Solaris unfortunately. Therefore, the next best solution would be to join lines that don't start with a double-quote to the previous line. You can do this using sed. I've written this with compatibility in mind:
sed -e :a -e '$!N; s/ *\n\([^"]\)/ \1/; ta' -e 'P;D' file
Results:
"Permit Id","Permit Name","Description","Start Date","End Date"
"COO123","Music Festival",,"02/12/2013","02/12/2013"
"COO456","Race Weekend",,"02/23/2013","02/23/2013"
"COO789","Basketball Final 8 Championships - Media vs. Politicians Skills Competition",,"02/22/2013","02/22/2013"
"COO012","Dragonboat race weekend",,"05/11/2013","05/11/2013"
sed ':a;N;$!ba;s/ \n/ /g'
Reads the whole file into the pattern space, then removes all newlines which occur directly after a space - assuming that all the errant newlines fit this pattern. If not, when else should newlines be removed?

ANT: Reg regexp for extracting contents between slashes in property regex

I have the following strings as input for scheduler file
Z:\cnt_development\cnt\test\Test-cases-blr\v80-WM\scheduler\FRQ\AUTO\sml-hr454\SRISM.xml
Z:\cnt_development\cnt\test\Test-cases-blr\v80-WM\scheduler\FRQ\AUTO\sml-lr454\Swap_MUL.xml
Z:\cnt_development\cnt\test\Test-cases-blr\v80-WM\scheduler\FRQ\AUTO\sml-lr456\Swap_MU.xml
I need to extract the complete part from v80-WM
i.e The regex must be able to select the following string
v80-WM\scheduler\FRQ\AUTO\sml-hr454\SRISM.xml
v80-WM\scheduler\FRQ\AUTO\sml-lr454\Swap_MUL.xml
v80-WM\scheduler\FRQ\AUTO\sml-lr456\Swap_MU.xml
Currently I am using the following regex where the regex finds the last occurence of "Q" in the above string and trimming for there and using workardoung to construct the above mentioned results.
<echo message="runpART ... Scheduler File ${schedulerFile}"/>
<propertyregex property="cfg.arg" input="${schedulerFile}" regexp="([^Q]*).xml" select="\1" casesensitive="false"/>
Need help in extracting string from "v80-WM....xml".
Some inputs will be helpful
That's good. The v80-WM gives you a fixed "starting point"
Using this as your regular expression should do it.
^.(v80-WM.)
What it means:
^.* match anything until you get to *the caret isn't really necessary, but I like making the reg exp more strict)
v80-WM
= .* then match the rest
The parens include the v80-WM name and everything that comes after so you don't have to reconstruct it.

Resources