Removing mutltiple specific lines from a txt with Batch - parsing

I am trying to remove multiple lines from a text file that have been parsed out of a PDF.
What the file looks like:
word1
word2
word3
b
word4
word5
b
word6
B
b
word7
word8
word9
b
Now the results I am looking for:
word1
word2
word3
word4
word5
word6
B (is an initial of a user and should remain)
word7
word8
word9
Issues:
I can't get the batch to be case sensitive and if I do somewhat get it working it will remove all the b's from the words.
I keep walking into issues trying to achieve this in batch. I have no example script because I did not make any progress. Does someone have a way to do this properly?
If possible, I would like to have it working 100% in batch with no dependencies, please.

Using findstr's regular expression will help you here. To exclude all lowercase standalone b's you can do:
(findstr /V /RC:"\<b\>" filename.txt)>output.txt
Or to find only the uppercase standalone B's and no other text:
(findstr /RC:"\<B\>" filename.txt)>output.txt

Related

How do I get a list of strings that are between 2 strings in Lua?

I'm trying to write a HTML parser in Lua. A major roadblock I hit almost immediately was I have no idea how to get a list of strings between 2 strings. This is important for parsing HTML, where a tag is defined by being within 2 characters (or strings of length 1), namely '<' and '>'. I am aware of this answer, but it only gets the first occurence, not all instances of a string between the 2 given strings.
What I mean by "list of strings between 2 strings" is something like this:
someFunc("<a b c> <c b a> a </c b a> </a b c>", "<", ">")
Returns:
{"a b c", "c b a", "/c b a", "/a b c"}
This does not have to parse newlines nor text in between tags, as both of those can be handled using extra logic. However, I would prefer it if it did parse newlines, so I can run the code once for the whole string returned by the first GET request.
Note: This is a experiment project to see if this is possible in the very limited Lua environment provided by the CC: Tweaked mod for Minecraft. Documentation here: https://tweaked.cc/
You can do simply:
local list = {}
local html = [[bla]]
for innerTag in html:gmatch("<(.-)>") do
list[#list] = innerTag
end
But be aware that it is weak as doesn't validade wrong things, as someone can put an < or > inside a string etc.

Counting word in cell/column base on number at that word

Please, allow me ask a question about formula of counting word base on last number follow that word.
example:
| A | B
--------------------
1 | thumbnail20 | 20
2 | gallery13 | 13
3 | girl45 | 45
I'm so appreciate for all answer, sorry for duplicate question
thanks for #ziganotschka and #BHAWANI SINGH, it's all work, case close :)
There are several options depending on your data structure, e.g.
=VALUE(REGEXREPLACE(A1,"[^[:digit:]]", ""))
will extract all digits from the A column to the B column
Should you have several numbers within your string,
=SPLIT(lower(A4),"qwertyuiopasdfghjklzxcvbnm`-=[]\;',./!##$%^&*()")
will extract the first number into column B, the second into column C etc.
If you want to extract only the digits to the right, then
=arrayformula(RIGHT(A1,LEN(A1)+1-min(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789"))))

Merging >2 files with AWK or JOIN?

Merging 2 files using AWK is a well covered topic on StackOverflow. However, the technique of reading 3 files into an array gets more complicated. As I'm formatting the output to go into an R script, I'm going to need to add lots of syntax so I don't think I can use JOIN. Here is a simplistic version I have working so far:
awk 'FNR==1{f++}
f==1{a[FNR]=$1;next}
f==2{b[FNR]=$1;next}
{print a[FNR], "<- c(", b[FNR], ",", $1, ")"}' words.txt x.txt y.txt
Where:
$ cat words.txt
word1
word2
word3
$ cat x.txt
1
2
3
$ cat y.txt
11
22
33
The output is then
word1 <- c(1, 11)
word2 <- c(2, 22)
word3 <- c(3, 22)
The best way I can summarize this technique is
Create a variable f to keep track of which file you're processing
For file 1 read the values into array a
For file 2 read the values into array b
Fall through to file three, where you concatenate your final output
As a beginner to AWK, this works, but I find it a bit awkward and I worry coming back to the code in 6 months, I'll no longer understand it. Is this the best way to merge these 3 files in AWK? Could JOIN actually handle this level of formatting the final output?
a variation of #RavinderSingh13's solution
$ paste {words,x,y}.txt | awk '{print $1, "<- c(" $2 ", " $3 ")"}'
EDIT: Could you please try following.
paste words.txt x.txt y.txt | awk '{$2="<- c("$2", "$3")";$3="";sub(/ +$/,"")} 1'
Output will be as follows.
word1 <- c(1, 11)
word2 <- c(2, 22)
word3 <- c(3, 33)
In case you simply want to add 3 file's contents in column vice then try following.
paste words.txt x.txt y.txt
word1 1 11
word2 2 22
word3 3 33
If it's for readability, you can change the file checking method, as well as the variable names.
Try these please:
awk 'ARGIND==1{words[FNR]=$1;}
ARGIND==2{xcol[FNR]=$1;}
ARGIND==3{print words[FNR], "<- c(", xcol[FNR], ",", $1, ")"}' words.txt x.txt y.txt
Above file checking method is for GNU awk.
Change to another, as well as change the file reading order, would be:
awk 'FILENAME=="words.txt"{print $1, "<- c(", xcol[FNR], ",", ycol[FNR], ")";}
FILENAME=="x.txt"{xcol[FNR]=$1;}
FILENAME=="y.txt"{ycol[FNR]=$1;}' x.txt y.txt words.txt
As you can also see here, file reading order and block order can be different.
Since words.txt has first column, or main column, so to speak, so it's sensible to read it last.
You can also use FILENAME==ARGV[1] FILENAME==ARGV[2] etc to check files, and put comments inside (use awk script file and load with awk -f scriptfile is better with comments):
awk 'FILENAME==ARGV[1]{xcol[FNR]=$1;} #Read column B, x column
FILENAME==ARGV[2]{ycol[FNR]=$1;} # Read column C, y cloumn
FILENAME==ARGV[3]{print $1, "<- c(", xcol[FNR], ",", ycol[FNR], ")";}' x.txt y.txt words.txt

Can I duplicate rows with kiba using a transform?

I'm currently using your gem to transform a csv that was webscraped from a personel-database that has no api.
From the scraping I ended up with a csv. I can process it pretty fine using your gem, there's only one bit I am wondering
Consider the following data:
====================================
| name | article_1 | article_2 |
------------------------------------
| Andy | foo | bar |
====================================
I can turn this into this:
======================
| name | article |
----------------------
| Andy | foo |
----------------------
| Andy | bar |
======================
(I used this tutorial to do this: http://thibautbarrere.com/2015/06/25/how-to-explode-multivalued-attributes-with-kiba/)
I'm using the normalizelogic on my loader for this. The code looks like:
source RowNormalizer, NormalizeArticles, CsvSource, 'RP00119.csv'
transform AddColumnEntiteit, :entiteit, "ocmw"
What I am wondering, can I achieve the same using a transform? So that the code would look like this:
source CsvSource, 'RP00119.csv'
transform NormalizeArticles
transform AddColumnEntiteit, :entiteit, "ocmw"
So question is: can I achieve to duplicate a row with a transform class?
EDIT: Kiba 2 supports exactly what you need. Check out the release notes.
In Kiba as currently released, a transform cannot yet more than one row - it's either one or zero.
The Kiba Pro offering I'm building includes a multithreaded runner which happens (by a side-effect rather than as actual goal) to allow transforms to yield an arbitrary number of rows, which is what you are looking after.
But that said, without Kiba Pro, here are a number of techniques which could help.
The first possibility is to split your ETL script into 2. Essentially you would cut it at the step where you want to normalize the articles, and put a destination here instead. Then in your second ETL script, you would use a source able to explode the row into many. This is I think what I'd recommend in your case.
If you do that, you can use either a simple Rake task to invoke the ETL scripts as a sequence, or you can alternatively use post_process to invoke the next one if you prefer (I prefer the first approach because it makes it easier to run either one or another).
Another approach (but too complicated for your current scenario) would be to declare the same source N times, but only yield a given subset of data, e.g.:
pre_process do
field_count = number_of_exploded_columns # extract from CSV?
end
(0..field_count).each do |shard|
source MySource, shard: shard, shard_count: field_count
end
then inside MySource you would only conditionnally yield like this:
yield row if row_index % field_count == shard
That's the 2 patterns I would think of!
I would definitely recommend the first one to get started though, more easy.

Can you iterate in LaTeX?

I'm new to LaTeX and I must say that I am really struggling with it. I discovered the \newcommand command that is kind of like a function/method in regular programming languages. You can give it arguments and everything.
I was wondering though, can I somehow iterate in LaTeX? Basically, what I would like to do is create a table with N+1 columns where the first row just contains a blank cell and then the numbers 1, 2, ..., N in the other columns. I only want to give N as an argument to this 'function' (newcommand).
Here is an example of something that might look like what I'm looking for (although obviously this won't work):
\newcommand{\mytable}[2]{
\begin{tabular}{l|*{#1}{c|}} % table with first argument+1 columns
for(int i = 1; i <= #1; i++) "& i" % 'output' numbers in different columns
\\\hline
letters & #2 % second argument should contain actual content for row
\\\hline
\end{tabular}
}
Call it with:
\mytable{3}{a & b & c}
Output should be:
| 1 | 2 | 3 |
--------+---+---+---+
letters | a | b | c |
--------+---+---+---+
Does anyone know if something like this is possible?
Thanks!
Just make the following into a new command and be sure to use package ifthen.
\begin{tabular}{l|*{10}{c|}}
\newcounter{count}
\whiledo{\value{count}<10}{
\ifthenelse{\value{count}=0}{}{\the\value{count}}
\ifthenelse{\value{count}<9}{&}{\\}
\stepcounter{count}
}
letters&a&b&c&d&e&f&g&h&i\\
\end{tabular}
Auntie Google says yes.
You can use the \loop or \repeat tokens. Or the multido package.
Sure it's possible. You can also recur. eplain has iteration macros in it, see, eg, here.
Another possibility (if you're lazy like me) is perltex

Resources