Parsing a .txt file using Lua. The most basic of parsing - parsing

I am currently trying to parse a file. it looks like this:
A|00CA|GOLDSTONE GTS|35350525|-116888367|3038
R|04|37|6000|0|0|0|35349333|-116893334|3038|300|50
R|22|217|6000|0|0|0|35360333|-116877500|3038|300|50
A|00WI|NORTHERN LITE|44304283|-89050111|860
R|09|90|1000|0|0|0|44304217|-89052022|860|300|50
R|27|270|1000|0|0|0|44304350|-89048208|860|300|50
A|01ID|LAVA HOT SPRINGS|42608250|-112032461|5268
R|14|143|2894|0|0|0|42611000|-112034867|5268|300|50
R|32|323|2894|0|0|0|42603733|-112030533|5268|300|50
A|01LS|COUNTRY BREEZE|30722639|-91077361|125
R|09|91|1800|0|0|0|30722747|-91080222|125|300|50
R|27|271|1800|0|0|0|30722531|-91074500|125|300|50
A|01MT|CRYSTAL LAKES RESORT|48789131|-114880436|3141
R|13|131|5000|0|0|0|48794975|-114885842|3141|300|50
R|31|311|5000|0|0|0|48783292|-114875003|3141|300|50
but longer, however you get the picture.
Say I want to get a whole line out of this using only the a four digit code.
So when the user types in 00CA it will pull the following whole line and break it up into the numbers or letter in between the "|":
A|00CA|GOLDSTONE GTS|35350525|-116888367|3038
I have been given code that looks like this:
file = assert(io.open("Airports.txt", "r"))
for line in file:lines() do
fields = { line:match "(%w+)|(%w+)|([%w ]+)|([%d-]+)|([%d-]+)|([%d-]+)" }
print(fields[4], fields[5]) -- the 2 numeric fields you're interested in
end
file:close()
Of this whole line:
A|00CA|GOLDSTONE GTS|35350525|-116888367|3038
I would only be interested in getting these peices of data : 35350525 : -116888367
however when I try and put this or anything like this. It just puts out a nil value.
-- ICAO == "00CA"
fields = { line:match "(%w+)|" .. ICAO .. "|([%w ]+)|([%d-]+)|([%d-]+)|([%d-]+)" }
And obviously you I need to put some custom data (The ICAO code) in there as many of the lines follow that pattern.
What am I doing wrong?

Add parentheses around the call:
line:match("(%w+)|" .. ICAO .. "|([%w ]+)|([%d-]+)|([%d-]+)|([%d-]+)")
The original code is parsed as
(line:match "(%w+)|") .. ICAO .. "|([%w ]+)|([%d-]+)|([%d-]+)|([%d-]+)"
Here is the complete code that I tested:
line="A|00CA|GOLDSTONE GTS|35350525|-116888367|3038"
ICAO = "00CA"
print(line:match("(%w+)|" .. ICAO .. "|([%w ]+)|([%d-]+)|([%d-]+)|([%d-]+)"))
The output is
A GOLDSTONE GTS 35350525 -116888367 3038
For this task, I'd use a simpler pattern: "(.-)|" .. ICAO .. "|(.-)|(.-)|(.-)|(.-)$"

Related

How to detect if a field contains a character in Lua

I'm trying to modify an existing lua script that cleans up subtitle data in Aegisub.
I want to add the ability to delete lines that contain the symbol "♪"
Here is the code I want to modify:
-- delete commented or empty lines
function noemptycom(subs,sel)
progress("Deleting commented/empty lines")
noecom_sel={}
for s=#sel,1,-1 do
line=subs[sel[s]]
if line.comment or line.text=="" then
for z,i in ipairs(noecom_sel) do noecom_sel[z]=i-1 end
subs.delete(sel[s])
else
table.insert(noecom_sel,sel[s])
end
end
return noecom_sel
end
I really have no idea what I'm doing here, but I know a little SQL and LUA apparently uses the IN keyword as well, so I tried modifying the IF line to this
if line.text in (♪) then
Needless to say, it didn't work. Is there a simple way to do this in LUA? I've seen some threads about the string.match() & string.find() functions, but I wouldn't know where to start trying to put that code together. What's the easiest way for someone with zero knowledge of Lua?
in is only used in the generic for loop. Your if line.text in (♪) then is no valid Lua syntax.
Something like
if line.comment or line.text == "" or line.text:find("\u{266A}") then
Should work.
In Lua every string have the string functions as methods attached.
So use gsub() on your string variable in loop like...
('Text with ♪ sign in text'):gsub('(♪)','note')
...thats replace the sign and output is...
Text with note sign in text
...instead of replacing it with 'note' an empty '' deletes it.
gsub() is returning 2 values.
First: The string with or without changes
Second: A number that tells how often the pattern matches
So second return value can be used for conditions or success.
( 0 stands for "pattern not found" )
So lets check above with...
local str,rc=('Text with strange ♪ sign in text'):gsub('(♪)','notation')
if rc~=0 then
print('Replaced ',rc,'times, changed to: ',str)
end
-- output
-- Replaced 1 times, changed to: Text with strange notation sign in text
And finally only detect, no change made...
local str,rc=('Text with strange ♪ sign in text'):gsub('(♪)','%1')
if rc~=0 then
print('Found ',rc,'times, Text is: ',str)
end
-- output is...
-- Found 1 times, Text is: Text with strange ♪ sign in text
The %1 holds what '(♪)' found.
So ♪ is replaced with ♪.
And only rc is used as a condition for further handling.

How do I add a newline to hs.eventtap.keyStrokes in Hammerspoon?

I just started using Hammerspoon. I'm trying to output multiple lines of text by pressing Cmd+Shift+l .
Here is what I have tried so far :
hs.hotkey.bind({"cmd", "shift"}, "l", function()
hs.eventtap.keyStrokes('from sklearn import metrics')
hs.eventtap.keyStroke("return")
hs.eventtap.keyStrokes('from sklearn.cross_validation import train_test_split')
end)
I also tried with inline "\n" and "%\n"
How can I bind a key combination to output multiple lines of text? Or, How can I send a newline character?
I ran into the same problem. I tried what you tried above and although it worked in many applications, it still didn't work in Chrome. I used the pasteboard (clipboard) as a workaround.
jira_text = [[a
long
multi-line
string]]
-- Hotkey JIRA text
hs.hotkey.bind({"cmd", "alt", "ctrl"}, "J", function ()
hs.alert.show("Remove this message after debugging!")
--hs.eventtap.keyStrokes(jira_text)#don't do this!
hs.pasteboard.writeObjects(jira_text)
hs.eventtap.keyStroke("cmd", "v")
end)
--
You could improve it further by using a custom named pasteboard so it doesn't overwrite your clipboard contents (if you want that).
I also ran into this problem and improved Josh Fox's answer by saving the contents of the system pasteboard into a temporary pasteboard before loading and pasting the multi-line string.
MULTILINE_STRING = [[multi
line
string]]
-- Paste Multi-line String
hs.hotkey.bind({'ctrl', 'cmd'}, 'F1', function()
-- save clipboard data to temp
tempClipboard = hs.pasteboard.uniquePasteboard()
hs.pasteboard.writeAllData(tempClipboard, hs.pasteboard.readAllData(nil))
-- load string into clipboard and paste
hs.pasteboard.writeObjects(MULTILINE_STRING)
hs.eventtap.keyStroke({'cmd'}, 'v')
-- recall clipboard data
hs.pasteboard.writeAllData(nil, hs.pasteboard.readAllData(tempClipboard))
hs.pasteboard.deletePasteboard(tempClipboard)
end)
I wasn't loving all this clipboard manipulation (too many side effects, probably unnecessarily heavy performance-wise), so I just solved this with the use of a helper function and some string splitting. Keep in mind that lua doesn't have a native string splitting function, I'm using the one from stringy here, but any custom or library-supplied string splitting function will work.
--- prevents hs.eventtap.keyStrokes from chewing up `\n`
--- #param str string
--- #return nil
function pasteMultilineString(str)
local lines = stringy.split(str, "\n")
local is_first_line = true
for _, line in ipairs(lines) do
if is_first_line then
is_first_line = false
else
hs.eventtap.keyStroke({}, "return")
end
hs.eventtap.keyStrokes(line)
end
end

Groovy- searching and excretion xml code from log file

I have so many texts in log file but sometimes i got responses as a xml code and I have to cut this xml code and move to other files.
For example:
sThread1....dsadasdsadsadasdasdasdas.......dasdasdasdadasdasdasdadadsada
important xml code to cut and move to other file: <response><important> 1 </import...></response>
important xml code to other file: <response><important> 2 </important...></response>
sThread2....dsadasdsadsadasdasdasdas.......dasdasdasdadasdasdasdadadsada
Hindrance: xml code starting from difference numbers of sign (not always start in the same number of sign)
Please help me with finding method how to find xml code in text
Right now i tested substring() method but xml code not always start from this same sign :(
EDIT:
I found what I wanted, function which I searched was indexOf().
I needed a number of letter where String "Response is : " ending: so I used:
int positionOfXmlInLine = lineTxt.indexOf("<response")
And after this I can cut string to the end of the line :
def cuttedText = lineTxt.substring(positionOfXmlInLine);
So I have right now only a XML text/code from log file.
Next is a parsing XML value like BDKosher wrote under it.
Hoply that will help someone You guys
You might be able to leverage XmlSlurper for this, assuming your XML is valid enough. The code below will take each line of the log, wrap it in a root element, and parse it. Once parsed, it extracts and prints out the value of the <important> element's value attribute, but instead you could do whatever you need to do with the data:
def input = '''
sThread1..sdadassda..sdadasdsada....sdadasdas...
important code to cut and move to other file: **<response><important value="1"></important></response>**
important code to other file: ****<response><important value="3"></important></response>****
sThread2..dsadasd.s.da.das.d.as.das.d.as.da.sd.a.
'''
def parser = new XmlSlurper()
input.eachLine { line, lineNo ->
def output = parser.parseText("<wrapper>$line</wrapper>")
if (!output.response.isEmpty()) {
println "Line $lineNo is of importance ${output.response.important.#value.text()}"
}
}
This prints out:
Line 2 is of importance 1
Line 3 is of importance 3

Preprocessing Scala parser Reader input

I have a file containing a text representation of an object. I have written a combinator parser grammar that parses the text and returns the object. In the text, "#" is a comment delimiter: everything from that character to the end of the line is ignored. Blank lines are also ignored. I want to process text one line at a time, so that I can handle very large files.
I don't want to clutter up my parser grammar with generic comment and blank line logic. I'd like to remove these as a preprocessing step. Converting the file to an iterator over line I can do something like this:
Source.fromFile("file.txt").getLines.map(_.replaceAll("#.*", "").trim).filter(!_.isEmpty)
How can I pass the output of an expression like that into a combinator parser? I can't figure out how to create a Reader object out of a filtered expression like this. The Java FileReader interface doesn't work that way.
Is there a way to do this, or should I put my comment and blank line logic in the parser grammar? If the latter, is there some util.parsing package that already does this for me?
The simplest way to do this is to use the fromLines method on PagedSeq:
import scala.collection.immutable.PagedSeq
import scala.io.Source
import scala.util.parsing.input.PagedSeqReader
val lines = Source.fromFile("file.txt").getLines.map(
_.replaceAll("#.*", "").trim
).filterNot(_.isEmpty)
val reader = new PagedSeqReader(PagedSeq.fromLines(lines))
And now you've got a scala.util.parsing.input.Reader that you can plug into your parser. This is essentially what happens when you parse a java.io.Reader, anyway—it immediately gets wrapped in a PagedSeqReader.
Not the prettiest code you'll ever write, but you could go through a new Source as follows:
val SEP = System.getProperty("line.separator")
def lineMap(fileName : String, trans : String=>String) : Source = {
Source.fromIterable(
Source.fromFile(fileName).getLines.flatMap(
line => trans(line) + SEP
).toIterable
)
}
Explanation: flatMap will produce an iterator on characters, which you can turn into an Iterable, which you can use to build a new Source. You need the extra SEP because getLines removes it by default (using \n may not work as Source will not properly separate the lines).
If you want to apply filtering too, i.e. remove some of the lines, you could for instance try:
// whenever `trans` returns `None`, the line is dropped.
def lineMapFilter(fileName : String, trans : String=>Option[String]) : Source = {
Source.fromIterable(
Source.fromFile(fileName).getLines.flatMap(
line => trans(line).map(_ + SEP).getOrElse("")
).toIterable
)
}
As an example:
lineMapFilter("in.txt", line => if(line.isEmpty) None else Some(line.reverse))
...will remove empty lines and reverse non-empty ones.

How to create a parser which tokenizes a list of words taken from a file?

I am trying to do a syntax text corrector for my compilers' class. The idea is: I have some rules, which are inherent to the language (in my case, Portuguese), like "A valid phrase is SUBJECT VERB ADJECTIVE", as in "Ruby is great".
Ok, so first I have to tokenize the input "Ruby is great". So I have a text file "verbs", with a lot of verbs, one by line. Then I have one text "adjectives", one "pronouns", etc.
I am trying to use Ragel to create a parser, but I don't know how I could do something like:
%%{
machine test;
subject = <open-the-subjects-file-and-accept-each-one-of-them>;
verb = <open-the-verbs-file-and-accept-each-one-of-them>;
adjective = <open-the-adjective-file-and-accept-each-one-of-them>;
main = subject verb adjective # { print "Valid phrase!" } ;
}%%
I looked at ANTLR, Lex/Yacc, Ragel, etc. But couldn't find one that seemed to solve this problem. The only way to do this that I could think of was to preprocess Ragel's input file, so that my program reads the file and writes its contents at the right place. But I don't like this solution either.
Does anyone knows how I could do this? There's no problem if it isn't with Ragel, I just want to solve this problem. I would like to use Ruby or Python, but that's not really necessary either.
Thanks.
If you want to read the files at compile time .. make them be of the format:
subject = \
ruby|\
python|\
c++
then use ragel's 'include' or 'import' statement (I forget which .. must check the manual) to import it.
If you want to check the list of subjects at run time, maybe just make ragel read 3 words, then have an action associated with each word. The action can read the file and lookup if the word is good or not at runtime.
The action reads the text file and compares the word's contents.
%%{
machine test
action startWord {
lastWordStart = p;
}
action checkSubject {
word = input[lastWordStart:p+1]
for possible in open('subjects.txt'):
if possible == word:
fgoto verb
# If we get here do whatever ragel does to go to an error or just raise a python exception
raise Exception("Invalid subject '%s'" % word)
}
action checkVerb { .. exercise for reader .. ;) }
action checkAdjective { .. put adjective checking code here .. }
subject = ws*.(alnum*)>startWord%checkSubject
verb := : ws*.(alnum*)>startWord%checkVerb
adjective := ws*.)alnum*)>startWord%checkAdjective
main := subject;
}%%
With bison I would write the lexer by hand, which lookup the words in the predefined dictionary.

Resources