LALR Parser Generator Implementation Problem - parsing

I'm currently trying to implement a LALR parser generator as described in "compilers principles techniques and tools" (also called "dragon book").
A lot already works. The parser generator is currently able to generate the full goto-graph.
Example Grammar:
S' --> S
S --> C C
C --> c C
C --> d
Nonterminals: S', S, C
Terminals: c, d
Start: S'
The goto-graph:
I[0]---------------+ I[1]-------------+
| S' --> . S , $ |--S-->| S' --> S . , $ |
| S --> . C C , $ | +----------------+
| C --> . c C , c |
| C --> . c C , d | I[2]--------------+
| C --> . d , c | | S --> C . C , $ | I[3]--------------+
| C --> . d , d |--C-->| C --> . c C , $ |--C-->| S --> C C . , $ |
+------------------+ | C --> . d , $ | +-----------------+
| | +-----------------+
| | +--c--+ | |
| | | | c |
| | | v v |
| | I[4]--------------+ |
| c | C --> c . C , c | |
| | | C --> c . C , d | |
| | | C --> c . C , $ | d
| | | C --> . c C , c | |
| +---->| C --> . c C , d | |
| | C --> . c C , $ | |
d | C --> . d , c |--+ |
| +-----| C --> . d , d | | |
| | | C --> . d , $ | | |
| | +-----------------+ | |
| C | |
| | I[6]--------------+ | |
| | | C --> c C . , c | d |
| +---->| C --> c C . , d | | |
| | C --> c C . , $ | | |
| +-----------------+ | |
| | |
| I[5]------------+ | |
| | C --> d . , c |<---+ |
+------->| C --> d . , d | |
| C --> d . , $ |<-----+
+---------------+
I have trubbles with implementing the algorithm to generate the actions-table!
My algorithm computes the following output:
state | action
| c | d | $
------------------------
0 | s4 | s5 |
------------------------
1 | | | acc
------------------------
2 | s4 | s5 |
------------------------
3 | | | r?
------------------------
4 | s4 | s5 |
------------------------
5 | r? | r? | r?
------------------------
6 | r? | r? | r?
sx... shift to state x
rx... reduce to state x
The r? means that I don't know how to get the state (the ?) to which the parser should reduce. Does anyone know an algorithm to get ? using the goto-graph above?
If anything is describe no clearly enough, please ask and I will try to explain it better!
Thanks for your help!

A shift entry is attributed by the next state, but a reduce entry indicates a production.
When you shift, you push a state reference onto your stack and proceed to the next state.
When you reduce, this is for a specific production. The production was responsible for shifting n states onto your stack, where n is the number of symbols in that production. E.g. one for S', two for S, and two or one for C (i.e. for the first or second alternative for C).
After n entries are popped off the stack, you return to the state where you started processing that production. For that state and the nonterminal resulting from the production, you lookup the goto table to find the next state, which is then pushed.
So the reduce entries identify a production. In fact it may be sufficient to know the resulting nonterminal, and the number of symbols to pop.
Your table thus should read
state | action | goto
| c | d | $ | C | S
------------------------------------
0 | s4 | s5 | | 2 | 1
------------------------------------
1 | | | acc | |
------------------------------------
2 | s4 | s5 | | 3 |
------------------------------------
3 | | | r0 | |
------------------------------------
4 | s4 | s5 | | | 6
------------------------------------
5 | r3 | r3 | r3 | |
------------------------------------
6 | r2 | r2 | r2 | |
where rx indicates reduce by production x.

You need to pop the stack and and find the next state from there.

The rx means: reduce using the production with the number x!
Then everything gets clear!
Simple pop the body of the production and shift the head back onto the top!

Related

How to avoid missing values when calculating returns with time-series data

I have a time series dataset for daily closing stock prices. The data is in the following format:
+---------------------+
| date close |
|---------------------|
1. | 01sep2008 9210.15 |
2. | 02sep2008 9229.51 |
3. | 03sep2008 9239.15 |
4. | 04sep2008 9239.26 |
5. | 05sep2008 9342.19 |
|---------------------|
6. | 08sep2008 9296.23 |
7. | 09sep2008 9279.62 |
8. | 10sep2008 9315.68 |
9. | 11sep2008 9263.39 |
10. | 12sep2008 9253.92 |
+---------------------+
As trading does not take place for the entire week because of the weekend and even within a week a stock may not be traded. Therefore, gaps in the time series are inevitable.
I need to use the following formula to generate returns:
gen returns = ln(close/l.close)
However, many missing values are generated because of the gaps in time series.
How can I address this problem?
l.close should be the previous value of closing price irrespective of its date.
The output below gives an idea about what I want (I generated a lag variable first):
+-------------------------------+
| date close lag |
|-------------------------------|
1. | 01sep2008 9210.15 . |
2. | 02sep2008 9229.51 9210.15 |
3. | 03sep2008 9239.15 9229.51 |
4. | 04sep2008 9239.26 9239.15 |
5. | 05sep2008 9342.19 9239.26 |
|-------------------------------|
6. | 08sep2008 9296.23 9342.19 |
7. | 09sep2008 9279.62 9296.23 |
8. | 10sep2008 9315.68 9279.62 |
9. | 11sep2008 9263.39 9315.68 |
10. | 12sep2008 9253.92 9263.39 |
+-------------------------------+
Instead, I get the following:
+-------------------------------+
| date close lag |
|-------------------------------|
1. | 01sep2008 9210.15 . |
2. | 02sep2008 9229.51 9210.15 |
3. | 03sep2008 9239.15 9229.51 |
4. | 04sep2008 9239.26 9239.15 |
5. | 05sep2008 9342.19 9239.26 |
|-------------------------------|
6. | 08sep2008 9296.23 . |
7. | 09sep2008 9279.62 9296.23 |
8. | 10sep2008 9315.68 9279.62 |
9. | 11sep2008 9263.39 9315.68 |
10. | 12sep2008 9253.92 9263.39 |
+-------------------------------+
The value at 08sep2008 is missing but here the value of 05sep2008 should be taken.
Example data:
* Example generated by -dataex-. To install: ssc install dataex
clear
input float(date close)
17776 9210.15
17777 9229.51
17778 9239.15
17779 9239.26
17780 9342.19
17783 9296.23
17784 9279.62
17785 9315.68
17786 9263.39
17787 9253.92
17790 9233.21
17791 9223.77
17792 9216.23
17793 9202.31
17794 9200.6
17797 9200.22
17798 9199.51
17799 9190.75
17800 9184.15
17804 9182.8
17805 9179.68
17811 9178.97
17812 9181.48
17813 9178.73
17814 9181.35
17815 9181.35
17818 9184.24
17819 9184.24
17820 9184.24
17821 9184.24
17822 9184.24
17825 9184.75
17826 9186.9
17827 9183.74
17828 9182.88
17829 9182.88
17832 9182.88
17833 9182.88
17834 9182.88
17835 9182.88
end
format %td date
The following works for me:
sort date
generate lag = close[_n-1]
generate returns = ln(close / close[_n-1])
list in 1/10
+-------------------------------------------+
| date close lag returns |
|-------------------------------------------|
1. | 01sep2008 9210.15 . . |
2. | 02sep2008 9229.51 9210.15 .0020998 |
3. | 03sep2008 9239.15 9229.51 .001044 |
4. | 04sep2008 9239.26 9239.15 .0000118 |
5. | 05sep2008 9342.19 9239.26 .011079 |
|-------------------------------------------|
6. | 08sep2008 9296.23 9342.19 -.0049318 |
7. | 09sep2008 9279.62 9296.23 -.0017884 |
8. | 10sep2008 9315.68 9279.62 .0038784 |
9. | 11sep2008 9263.39 9315.68 -.0056289 |
10. | 12sep2008 9253.92 9263.39 -.0010228 |
+-------------------------------------------+

How to get a parse NLP Tree object from bracketed parse string with nltk or spacy?

I have a sentence "You could say that they regularly catch a shower , which adds to their exhilaration and joie de vivre." and I can't achieve to get the NLP parse tree like the following example:
(ROOT (S (NP (PRP You)) (VP (MD could) (VP (VB say) (SBAR (IN that) (S (NP (PRP they)) (ADVP (RB regularly)) (VP (VB catch) (NP (NP (DT a) (NN shower)) (, ,) (SBAR (WHNP (WDT which)) (S (VP (VBZ adds) (PP (TO to) (NP (NP (PRP$ their) (NN exhilaration)) (CC and) (NP (FW joie) (FW de) (FW vivre))))))))))))) (. .)))
I want to replicate the solution to this question https://stackoverflow.com/a/39320379 but I have a string sentence instead of the NLP tree.
BTW, I am using python 3
Use the Tree.fromstring() method:
>>> from nltk import Tree
>>> parse = Tree.fromstring('(ROOT (S (NP (PRP You)) (VP (MD could) (VP (VB say) (SBAR (IN that) (S (NP (PRP they)) (ADVP (RB regularly)) (VP (VB catch) (NP (NP (DT a) (NN shower)) (, ,) (SBAR (WHNP (WDT which)) (S (VP (VBZ adds) (PP (TO to) (NP (NP (PRP$ their) (NN exhilaration)) (CC and) (NP (FW joie) (FW de) (FW vivre))))))))))))) (. .)))')
>>> parse
Tree('ROOT', [Tree('S', [Tree('NP', [Tree('PRP', ['You'])]), Tree('VP', [Tree('MD', ['could']), Tree('VP', [Tree('VB', ['say']), Tree('SBAR', [Tree('IN', ['that']), Tree('S', [Tree('NP', [Tree('PRP', ['they'])]), Tree('ADVP', [Tree('RB', ['regularly'])]), Tree('VP', [Tree('VB', ['catch']), Tree('NP', [Tree('NP', [Tree('DT', ['a']), Tree('NN', ['shower'])]), Tree(',', [',']), Tree('SBAR', [Tree('WHNP', [Tree('WDT', ['which'])]), Tree('S', [Tree('VP', [Tree('VBZ', ['adds']), Tree('PP', [Tree('TO', ['to']), Tree('NP', [Tree('NP', [Tree('PRP$', ['their']), Tree('NN', ['exhilaration'])]), Tree('CC', ['and']), Tree('NP', [Tree('FW', ['joie']), Tree('FW', ['de']), Tree('FW', ['vivre'])])])])])])])])])])])])]), Tree('.', ['.'])])])
>>> parse.pretty_print()
ROOT
|
S
______________________________________________________|_____________________________________________________________
| VP |
| ____|___ |
| | VP |
| | ___|____ |
| | | SBAR |
| | | ____|_______ |
| | | | S |
| | | | _______|____________ |
| | | | | | VP |
| | | | | | ____|______________ |
| | | | | | | NP |
| | | | | | | __________|__________ |
| | | | | | | | | SBAR |
| | | | | | | | | ____|____ |
| | | | | | | | | | S |
| | | | | | | | | | | |
| | | | | | | | | | VP |
| | | | | | | | | | ____|____ |
| | | | | | | | | | | PP |
| | | | | | | | | | | ____|_____________________ |
| | | | | | | | | | | | NP |
| | | | | | | | | | | | ________________|________ |
NP | | | NP ADVP | NP | WHNP | | NP | NP |
| | | | | | | ___|____ | | | | ____|_______ | ____|____ |
PRP MD VB IN PRP RB VB DT NN , WDT VBZ TO PRP$ NN CC FW FW FW .
| | | | | | | | | | | | | | | | | | | |
You could say that they regularly catch a shower , which adds to their exhilaration and joie de vivre .
I am going to assume there is a good reason as to why you need the dependency parse tree in that format. Spacy does a great job by using a CNN (Convolutional Neural Network) to produce CFGs (Context-Free Grammars), is production ready, and is super-fast. You can do something like the below to see for yourself (and then read the docs in the prior link):
import spacy
nlp = spacy.load('en')
text = 'You could say that they regularly catch a shower , which adds to their exhilaration and joie de vivre.'
for token in nlp(text):
print(token.dep_, end='\t')
print(token.idx, end='\t')
print(token.text, end='\t')
print(token.tag_, end='\t')
print(token.head.text, end='\t')
print(token.head.tag_, end='\t')
print(token.head.idx, end='\t')
print(' '.join([w.text for w in token.subtree]), end='\t')
print(' '.join([w.text for w in token.children]))
Now, you could make an algorithm to navigate this tree, and print accordingly (I couldn't find a quick example, sorry, but you can see the indexes and how to traverse the parse). Another thing you could do is to extract the CFG somehow, and then use NLTK to do the parsing and subsequent displaying in the format you desire. This is from the NLTK playbook (modified to work with Python 3):
import nltk
from nltk import CFG
grammar = CFG.fromstring("""
S -> NP VP
VP -> V NP | V NP PP
V -> "saw" | "ate"
NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
Det -> "a" | "an" | "the" | "my"
N -> "dog" | "cat" | "cookie" | "park"
PP -> P NP
P -> "in" | "on" | "by" | "with"
""")
text = 'Mary saw Bob'
sent = text.split()
rd_parser = nltk.RecursiveDescentParser(grammar)
for p in rd_parser.parse(sent):
print(p)
# (S (NP Mary) (VP (V saw) (NP Bob)))
However, you can see that you need to define the CFG (so if you tried your original text in place of the example's, you saw that it didn't understand the tokens not defined in the CFG).
It seems the easiest way to obtain your desired format is using Stanford's NLP parser. Taken from this SO question (and sorry, I haven't tested it):
parser = StanfordParser(model_path='edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz')
parsed = parser.raw_parse('Jack payed up to 5% more for each unit')
for line in parsed:
print(line, end=' ') # This will print all in one line, as desired
I didn't test this because I don't have the time to install the Stanford Parser, which can be a bit of a cumbersome process (relative to installing Python modules), that is, assuming you are looking for a Python solution.
I hope this helps, and I'm sorry that it's not a direct answer.

How to get MultiMarkDown to view tables in Sublime Text 2 OSX

I am trying to avoid using inline HTML to get tables working in my MD file. I have Markdown Preview and Table Editor installed via the package installer, and multimarkdown installed via homebrew, but I can't get the following text to display as a table:
| Left align adsf | Right align | Center align |
| :--------------- | ----------: | :----------: |
| This | This | This |
| column | column | column |
| will | will | will |
| be | be | be |
| left | right | center |
| aligned | aligned | aligned |
When I "Markdown Preview" it just displays like this:
| | | Left align adsf | Right align | Center align | | --- | --- | ---------------- | ----------- | ------------ | | | | ---------------s | ----------- | ------------ | | --- | --- | :--------------- | ----------: | :----------: | | | | This | This | This | | | | column | column | column | | | | will | will | will | | | | be | be | be | | | | left | right | center | | | | aligned | aligned | aligned |
I have switched the file type to MultiMarkdown (lower right portion of ST2 screen)
I have searched, and it appears some people have a build system, or other approaches I have been unable to get going. What am I missing? If a build system is needed, how do I set up one? I am mainly interested in viewing this in HTML, but wouldn't be opposed to other ways....
If you switch the parser to github, it'll work just fine.
Go to Prefrences > Package Settings > Markdown Preview > Settings - User and paste this code:
{
"parser": "github"
}
"If a build system is needed, how do I set up one?"
In OS X I would strong suggest getting the excellent Marked.app and then setting up a new build system in ST containing this trivial code
{
"osx": {"cmd": ["open", "-a", "Marked", "$file"]},
"selector": "text.html.markdown"
}
Then when you 'build' a markdown file (Cmd+B) you will get a preview generated in Marked.
Easy and elegant and well worth the cup-of-coffee price of Marked to avoid all the hassle of the plugin approach.

rails object to memcached and then out again

I want to store a simple active record object using memcached. I know I need to first convert the object to JSON before saving it to memcached my question is how I can pull it out again, deserialize it and use it as an activerecord relation. Do I have to make a custom parser the JSON or am I overlooking some drop dead easy solution?
The active record object looks like this:
+------+-----+-----------+---------------------------------+---------------------+-------+
| id | ppl | exclusive | name | price | spots |
+------+-----+-----------+---------------------------------+---------------------+-------+
| 8948 | 12 | false | 12 Bed Mixed Dorm | 9.0000000000000000 | 12 |
| 8947 | 10 | false | 10 Bed Mixed Dorm | 9.5000000000000000 | 10 |
| 8946 | 6 | false | 6 Bed Mixed Dorm | 10.0000000000000000 | 6 |
| 8945 | 4 | false | Basic 4 Bed Mixed Dorm | 10.0000000000000000 | 4 |
| 8944 | 2 | true | Twin Private Shared Bathroom | 12.0000000000000000 | 1 |
| 8943 | 1 | true | Standard Single Private Ensuite | 15.0000000000000000 | 1 |
+------+-----+-----------+---------------------------------+---------------------+-------+
You shouldn't don't need to worry about the serialization -- in almost all cases, this can be handled for you:
#Gemfile
gem install dalli
#config/environments/production.rb
config.cache_store = :dalli_store, '127.0.0.1' #use memcached
#Get id 1245 from model_names
Rails.cache.fetch("ModelName#1245") do
ModelName.find(1245)
end

Obtain Bacula status in parseable format

Is it possible to obtain the status of Bacula backup system Director in some parseable format?
It looks like the human-readable representation (one you can see when using bacula-console) is formed on the director side during the TCP control connection.
In what language? The easiest way would be to invoke bconsole and send command as stdin, then parse stdout and stderr.
Bacula has interactive mode in bconsole, but if you know commands in advance, this is not an issue.
You can also pull directly from the database, depending on your needs.
Example:
mysql> select JobId, Name, JobStatus from Job ORDER BY JobId DESC Limit 10;
+--------+-------------------------------------+-----------+
| JobId | Name | JobStatus |
+--------+-------------------------------------+-----------+
| 231215 | dbs16 Daily MysqlC XBM Snapshot | T |
| 231214 | dbs09 Daily MysqlS XBM Snapshot | T |
| 231213 | dbs10 Daily MysqlQ XBM Snapshot | T |
| 231212 | dbs11 Daily MysqlT XBM Snapshot | T |
| 231211 | dbs16 Daily MysqlI XBM Snapshot | T |
| 231210 | dbs19 Daily MysqlE XBM Snapshot | T |
| 231209 | dbs18 Daily MysqlB XBM Snapshot | R |
| 231208 | dbs17 Daily MysqlG XBM Snapshot | R |
| 231207 | Daily Catalog Backup | C |
| 231206 | adm6 svnops SVN Backup | R |
+--------+-------------------------------------+-----------+

Resources