getting started with Stanford Parser in jruby - ruby-on-rails

I'm looking to add some text parsing in my rails app, and have been going in circles for the past few days looking for any tutorials or hints as to how to get this working.
I am completely new to Java, but nothing like jumping in with both feet.
i suspect the following code doesn't belong in my controller, and should likely be in a model, but I'm just seeing if I've got all the pieces in the right place at this point.
I borrowed this code from this SO question, implementing custom java class in jruby, because I was having trouble finding any sort of example code.
#my requires/imports/includes, included multiple versions to be safe
require 'java'
#include Java
require '/media/sf_Ruby192/java_progs/parser/stanford-parser.jar'
#require '/media/sf_Ruby192/java_progs/parser/'
require 'rubygems'
include_class 'edu.stanford.nlp.parser.lexparser.LexicalizedParser'
class ParseController < ApplicationController
def index
lp = LexicalizedParser.new
#check if regular Java is working
list = java.util.ArrayList.new
a = "1"
b = "2"
list.add(a)
list.add(b)
d = list[0]
return render :text => list
end
end
unfortunately for me, I get the error
java.lang.NullPointerException: null
when I include the
lp = LexicalizedParser.new
am i doing EVERYTHING wrong? when I comment out the lp = ..., I get the list output, so jruby is working, and I can write java in my rails app and get the output.
can somebody point me in the right direction, maybe tell me what is wrong with this bit of code, but hopefully actually set me straight on how I'm supposed to be working with jruby and rails. Hopefully some input on Stanford Parser too (I know, it's a lot to ask). There seems to be very little by the way of documentation or example code that i've found.

I don't think so. But I do think that you need to read up on how this parser works.
According to http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/parser/lexparser/LexicalizedParser.html, the default constructor works as follows:
Construct a new LexicalizedParser object from a previously serialized
grammar read from a property
edu.stanford.nlp.SerializedLexicalizedParser, or a default file
location.
In other words, you are getting the NPE because the default constructor can't find enough information to create the parser.
If you grab the binary distribution from Stanford, appropriate grammars will be found in grammar directory. For example:
$ jruby -S irb
irb(main):001:0> require 'java'
=> true
irb(main):002:0> require 'stanford-parser.jar'
=> true
irb(main):003:0> java_import Java::edu.stanford.nlp.parser.lexparser.LexicalizedParser
=> Java::EduStanfordNlpParserLexparser::LexicalizedParser
irb(main):004:0> lp = LexicalizedParser.new("grammar/englishPCFG.ser.gz")
Loading parser from serialized file grammar/englishPCFG.ser.gz ... done [2.5 sec].
=> #<Java::EduStanfordNlpParserLexparser::LexicalizedParser:0x7d627b8b>

Related

Source code for rspec "describe" method (and others)?

I'm sauntering through Michael Hartl's Rails Tutorial right now, and am finding that I'm constantly encouraged to use wonderful methods that inexplicably do amazing things. He does a generally competent job of explaining what they do, but there is no real nitty gritty of why and how they work.
Specifically, I have just been plundering the rspec gem on github searching for the source code to the "describe" method. I cannot find it. Having now read a large amount of the source code (at an apprehension rate of about 25%) searching for it, I know that once found, I will need to look at its parent classes and modules to understand a certain amount of inheritance before I can really grasp (and then never let go of) the flesh and bones of "describe".
I don't mind struggling to grasp the concept, I'm a fan of attempting to read code in new languages before I fully understand it so that I can read it again later and use the comparison of my comprehension as a gauge of my fluency. I'd just like a kicker. Either a description or a file location with maybe a little helper hint to get me started.
For example...
I found this:
# RSpec.describe "something" do # << This describe method is defined in
# # << RSpec::Core::DSL, included in the
# # << global namespace (optional)
and rpsec/core/dsl states:
# DSL defines methods to group examples, most notably `describe`,
# and exposes them as class methods of {RSpec}. They can also be
# exposed globally (on `main` and instances of `Module`) through
# the {Configuration} option `expose_dsl_globally`.
but then there is no "class Describe" or def "describe" or such in that file.
SO: can anyone tell me where the "describe" method is, how it works, exactly, or (if not) why I am naively searching for the wrong thing in the wrong locations?
As you may know, there is no difference between describe and context methods and you can use them interchangably. Rspec developers could not let themselves to repeat the same code for different method names, so they moved the declaration to
module RSpec
module Core
class ExampleGroup
def self.define_example_group_method(name, metadata={})
# here they really define a method with given name using ruby metaprogramming
define_singleton_method(name) do |*args, &example_group_block|
And call that method a bit later for all the same-functionality DSL methods:
define_example_group_method :example_group
define_example_group_method :describe
define_example_group_method :context
So in case you are looking for describe method source, dive into define_example_group_method with assumption that name argument equals to describe and example_group_block is your block body.
The RSpec code base is not a trivial thing to get your head round. However, these links should get you started ...
This line defines the describe keyword:
https://github.com/rspec/rspec-core/blob/master/lib/rspec/core/example_group.rb#L246
The method above that line does the heavy lifting for you. Take your time reading it.
This part then exposes the generated method:
https://github.com/rspec/rspec-core/blob/master/lib/rspec/core/dsl.rb#L54
Good luck!

Finding out where methods are defined in Ruby/Rails (as opposed to Java)

I am just getting started with Ruby on Rails. Coming from the Java world, one thing that I am wondering is how do Ruby/Rails developers find out where methods are actually defined.
I am used to just clicking on the method in Eclipse to find where is is defined even in third party libraries (supposing I have the source code).
A concrete example: I am trying to find out how the Authlogic gem apparently changes the constructor of my User class to require an additional parameter (called :password_confirmation) even though the User class doesn't even inherit from anything related to Authlogic.
Probably I am just overlooking something really obvious here (or maybe I still can't wrap my head around the whole "convention over configuration" thing ;-))
It's slightly difficult to quickly find the method location for dynamic languages like Ruby.
You can use object.methods or object.instance_methods to quickly find out the methods.
If you are using Ruby 1.9, you can do something like this:
object.method(:method_name).source_location
For more information on source_location - click here
The Pry gem is designed precisely for this kind of explorative use-case.
Pry is an interactive shell that lets you navigate your way around a program's source-code using shell-like commands such as cd and ls.
You can pull the documentation for any method you encounter and even view the source code, including the native C code in some cases (with the pry-doc plugin). You can even jump directly to the file/line where a particular method is defined with the edit-method command. The show-method and show-doc commands also display the precise location of the method they're acting on.
Watch the railscast screencast for more information.
Here are some examples below:
pry(main)> show-doc OpenStruct#initialize
From: /Users/john/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/ostruct.rb # line 46:
Number of lines: 11
visibility: private
signature: initialize(hash=?)
Create a new OpenStruct object. The optional hash, if given, will
generate attributes and values. For example.
require 'ostruct'
hash = { "country" => "Australia", :population => 20_000_000 }
data = OpenStruct.new(hash)
p data # -> <OpenStruct country="Australia" population=20000000>
By default, the resulting OpenStruct object will have no attributes.
pry(main)>
You can also look up sourcecode with the show-method command:
pry(main)> show-method OpenStruct#initialize
From: /Users/john/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/ostruct.rb # line 46:
Number of lines: 9
def initialize(hash=nil)
#table = {}
if hash
for k,v in hash
#table[k.to_sym] = v
new_ostruct_member(k)
end
end
end
pry(main)>
See http://pry.github.com for more information :)
None of people advising Pry gem mentionned the method called find-method, which is probably what author was looking for.
Here's the example:
pry(main)> find-method current_user
Devise::Controllers::Helpers
Devise::Controllers::Helpers#current_user
WebsocketRails::ConnectionAdapters::Base
WebsocketRails::ConnectionAdapters::Base#current_user_responds_to?
Then, you can browse the method code by following #banister's tips.
You could use something like pry. See its railscast also.
There are several ways to change an existing class. E.g. if you want to modify the String class write:
class String
def my_custom_method
puts "hello!"
end
end
But there are other options like mixing in modules or adding/modifying methods by using meta-programming.
Anyhow, having some object you can always:
puts obj.methods.inspect
Either do it in your code or use the debugger.
The other option is to read the code. In particular you should read the gem's unit tests (./spec, ...). There are quite a lot of authors stating that unit tests make documentation obsolete.
In Ruby you can also add both class and instance methods to a given class by using mixins.
Essentially if you have a module you can add its methods to a given class using both include and extend class methods. A brief example on how those works is the following
Module A
def foo
"foo"
end
end
Module B
def bar
"bar"
end
end
Class YourClass
include A
extend B
end
p YourClass.new.foo # gives "foo" because the foo method is added as instance method
p YourClass.bar # gives "baz" because the bar method is added as class method
Because Ruby is a dynamic language, these statements can be used everywhere. So to come to your question there is no need to extend an authlogic class to get its methods. Many plugins uses this instruction when loaded
ActiveRecord::Base.send :include, ModuleName
In this way they tell to every AR object to include some plugin defined module and you get all the methods in AR objects.
Another technique used by many acts_as plugins is to include their modules only when the acts_as call is used in the base class.
Other useful references
What is the difference between include and extend in Ruby?
A quick tutorial about mixins

Using Ruby alias to extend a Gem

This is more a theoretical question, but I am curious anyway. I am a ruby / ruby on rails newbie (but with a lot of ancient experience in other languages / frameworks) so this is mainly a curious / learning question. Thanks in advance for any help!
I thought I could do a quick extension to a ruby gem using alias as follows:
module InstallMyExtension
def self.included(base)
base.class_eval {
alias :some_method_in_gem_without_my_extension :some_method_in_gem
alias :some_method_in_gem :some_method_in_gem_with_my_extension
}
end
def some_method_in_gem_with_my_extension
debugger
# ... do fun stuff here
some_method_in_gem_without_my_extension
end
end
Then in some initialization file I do:
Foo::SomeControllerInFoo.send :include, InstallMyExtension
I learned this technique in the Radiant CMS where its used all over the place to extend base behavior. I understand this technique is now disapproved of, but it seemed like a quick way to just try some ideas out, before forking a branch on the gem, etc, etc
First off is there a better way in Rails 3 to do a quick hack extension like this (which might be useful just to test a theory, before forking the gems etc???)
Second off, its not working, and there are multiple things I don't understand
Then let me explain the weirdness I am seeing:
Even if I do do the the "include" as shown above, when I go into the console I see some really weird behavior, that I don't understand:
1) I type Foo::SomeControllerInFoo i get back Foo::SomeControllerInFoo as I would expect. HOWEVER if run the same exact expression a second time, Foo::SomeControllerInFoo comes back undefined!
2) Just to play around I did foo = Foo::SomeControllerInFoo, and then I can do foo.send, foo.methods, whatever I like, but only if I save the copy of the class in foo! What's with that?
3) If I now do foo.send :include, MyExtension the behavior within the debug console is as expected (i.e. the original class contained in the gem now has my behavior added to it.) HOWEVER running the same code during initialization has no effect. Nothing breaks, but the controller is not extended.
Weird that it doesn't work, I just tried again to be sure and that does the trick (put this code in a file within config/initializers).
I always use a shortcut:
alias_method_chain :some_method_in_gem, :my_extension
instead of the two aliases lines, but it's exactly the same.
You could overwrite some methods much more easily using class_eval directly. Again in an initializer:
Foo::SomeControllerInFoo.class_eval do
def some_method_in_gem
#your redefinition
end
end
Sorry but no added value for your other questions: seems just really weird and buggy.
Just to be sure, when you want to run the method defined in your controller, do:
c = Foo::SomeControllerInFoo.new
c.method_name

Ruby on Rails: Learning ActionController class - Question on $:.unshift activesupport_path and autoload method

Inside ActionController class (rails/actionpack/lib/action_controller.lib) I found several weird code. I don't really have a mentor to learn Ruby on Rails from, so this forum is my only hope:
Question #1: Could anyone help me explain these lines of codes?
begin
require 'active_support'
rescue LoadError
activesupport_path = "#{File.dirname(__FILE__)}/../../activesupport/lib"
if File.directory?(activesupport_path)
$:.unshift activesupport_path
require 'active_support'
end
end
Especially the line with $:.unshift activesupport_path
In my thought, it tries to require active_support class, and if that doesn't work, it looks if activesupport_path is a directory, if it is, then . . . I totally lost it.
Question #2: What autoload method is for?
module ActionController
# TODO: Review explicit to see if they will automatically be handled by
# the initilizer if they are really needed.
def self.load_all!
[Base, CGIHandler, CgiRequest, Request, Response, Http::Headers, UrlRewriter, UrlWriter]
end
autoload :Base, 'action_controller/base'
autoload :Benchmarking, 'action_controller/benchmarking'
autoload :Caching, 'action_controller/caching'
autoload :Cookies, 'action_controller/cookies'
.
.
.
Question #3: If I later find a method I don't understand what for, how is the best way to find out? As for that autoload method case, I tried to find it across my project (I have my Rails code frozen there) but couldn't find any clue. I searched for "def autoload". Am I doing things wrong? Is my IDE, TextMate just doesn't cut it?
Thank you!
In order for a file to be required you have to ensure that the path to it is in the Ruby $LOAD_PATH variable. This is has a short-hand version $: for legacy reasons, inheriting this from Perl.
When you call require, the interpreter looks for a .rb file in each of the paths given there until it finds a match. If it finds one, it is loaded. If not you get an exception.
Often you will see lines like this in files:
# script/something
# This appends "script/../lib" to the $LOAD_PATH, but this expands to
# something like "/home/user/project/lib" depending on the details of
# your installation.
$: << File.expand_path(File.join('..', 'lib'), File.dirname(__FILE__))
You can use standard Array modifiers on $LOAD_PATH like unshift, push, and <<.
The first block of code is attempting to load active_support and only if that fails does it go about modifying the $LOAD_PATH to include the likely location of this file based on the path to the file making the require call. They do this because typically all gems from the Rails bundle are installed in the same base directory.
The reason for using unshift is to put that path at the highest priority, inserted at the front of the list. The << or push method adds to the end, lowest priority.
When you require a file it is loaded in, parsed, and evaluated, an operation which can take a small but measurable amount of time and will consume more memory to hold any class or method definitions inside the file, as well as any data such as string constants that may be declared. Loading in every single element of a library like ActiveRecord using require will require a considerable amount of memory, and this will import every database driver available, not just the ones that are actually used.
Ruby allows you to declare a class and a path to the file where it is defined, but with the advantage of not actually loading it in at that moment. This means that references to that class don't cause script errors in other parts of your application that make use of them.
You will often see declarations like this:
class Foo
# Declare the class Foo::Bar to be defined in foo/bar.rb
autoload(:Bar, 'foo/bar')
end
When using autoload you need to keep in mind that the class name is always defined within the scope of the module or class declaring it. In this example Bar is within Foo, or Foo::Bar using Ruby naming conventions.
When you make use of the Bar class, the foo/bar.rb file will be required. Think of it as creating a stub Bar class that transforms into the real class once it's actually exercised.
This is a great way of keeping a lot of options open, with many different modules ready to use, but without having to load everything into memory up front.
As for the third question, searchable documentation like APIDock will help you try and find more information on methods. The distinction between Ruby and Rails is often blurred, so you may have to check through both to be sure. Rails adds a lot of methods to core Ruby classes, so don't take the listing of methods available to be complete on either side. They work in conjunction.
Sometimes it pays to search for def methodname when trying to find out about where methodname originates, although this covers only conventional declarations. That method may be an alias from a mechanism like method_alias or may have been dynamically created using define_method, you can never really be sure until you dig around. At least 90% of the methods in Rails are declared the conventional way, though, so most of the time a simple search will yield what you want.

How to find where a method is defined at runtime?

We recently had a problem where, after a series of commits had occurred, a backend process failed to run. Now, we were good little boys and girls and ran rake test after every check-in but, due to some oddities in Rails' library loading, it only occurred when we ran it directly from Mongrel in production mode.
I tracked the bug down and it was due to a new Rails gem overwriting a method in the String class in a way that broke one narrow use in the runtime Rails code.
Anyway, long story short, is there a way, at runtime, to ask Ruby where a method has been defined? Something like whereami( :foo ) that returns /path/to/some/file.rb line #45? In this case, telling me that it was defined in class String would be unhelpful, because it was overloaded by some library.
I cannot guarantee the source lives in my project, so grepping for 'def foo' won't necessarily give me what I need, not to mention if I have many def foo's, sometimes I don't know until runtime which one I may be using.
This is really late, but here's how you can find where a method is defined:
http://gist.github.com/76951
# How to find out where a method comes from.
# Learned this from Dave Thomas while teaching Advanced Ruby Studio
# Makes the case for separating method definitions into
# modules, especially when enhancing built-in classes.
module Perpetrator
def crime
end
end
class Fixnum
include Perpetrator
end
p 2.method(:crime) # The "2" here is an instance of Fixnum.
#<Method: Fixnum(Perpetrator)#crime>
If you're on Ruby 1.9+, you can use source_location
require 'csv'
p CSV.new('string').method(:flock)
# => #<Method: CSV#flock>
CSV.new('string').method(:flock).source_location
# => ["/path/to/ruby/1.9.2-p290/lib/ruby/1.9.1/forwardable.rb", 180]
Note that this won't work on everything, like native compiled code. The Method class has some neat functions, too, like Method#owner which returns the file where the method is defined.
EDIT: Also see the __file__ and __line__ and notes for REE in the other answer, they're handy too. -- wg
You can actually go a bit further than the solution above. For Ruby 1.8 Enterprise Edition, there is the __file__ and __line__ methods on Method instances:
require 'rubygems'
require 'activesupport'
m = 2.days.method(:ago)
# => #<Method: Fixnum(ActiveSupport::CoreExtensions::Numeric::Time)#ago>
m.__file__
# => "/Users/james/.rvm/gems/ree-1.8.7-2010.01/gems/activesupport-2.3.8/lib/active_support/core_ext/numeric/time.rb"
m.__line__
# => 64
For Ruby 1.9 and beyond, there is source_location (thanks Jonathan!):
require 'active_support/all'
m = 2.days.method(:ago)
# => #<Method: Fixnum(Numeric)#ago> # comes from the Numeric module
m.source_location # show file and line
# => ["/var/lib/gems/1.9.1/gems/activesupport-3.0.6/.../numeric/time.rb", 63]
I'm coming late to this thread, and am surprised that nobody mentioned Method#owner.
class A; def hello; puts "hello"; end end
class B < A; end
b = B.new
b.method(:hello).owner
=> A
Copying my answer from a newer similar question that adds new information to this problem.
Ruby 1.9 has method called source_location:
Returns the Ruby source filename and line number containing this method or nil if this method was not defined in Ruby (i.e. native)
This has been backported to 1.8.7 by this gem:
ruby18_source_location
So you can request for the method:
m = Foo::Bar.method(:create)
And then ask for the source_location of that method:
m.source_location
This will return an array with filename and line number.
E.g for ActiveRecord::Base#validates this returns:
ActiveRecord::Base.method(:validates).source_location
# => ["/Users/laas/.rvm/gems/ruby-1.9.2-p0#arveaurik/gems/activemodel-3.2.2/lib/active_model/validations/validates.rb", 81]
For classes and modules, Ruby does not offer built in support, but there is an excellent Gist out there that builds upon source_location to return file for a given method or first file for a class if no method was specified:
ruby where_is module
In action:
where_is(ActiveRecord::Base, :validates)
# => ["/Users/laas/.rvm/gems/ruby-1.9.2-p0#arveaurik/gems/activemodel-3.2.2/lib/active_model/validations/validates.rb", 81]
On Macs with TextMate installed, this also pops up the editor at the specified location.
Maybe the #source_location can help to find where is the method come from.
ex:
ModelName.method(:has_one).source_location
Return
[project_path/vendor/ruby/version_number/gems/activerecord-number/lib/active_record/associations.rb", line_number_of_where_method_is]
OR
ModelName.new.method(:valid?).source_location
Return
[project_path/vendor/ruby/version_number/gems/activerecord-number/lib/active_record/validations.rb", line_number_of_where_method_is]
This may help but you would have to code it yourself. Pasted from the blog:
Ruby provides a method_added()
callback that is invoked every time a
method is added or redefined within a
class. It’s part of the Module class,
and every Class is a Module. There are
also two related callbacks called
method_removed() and
method_undefined().
http://scie.nti.st/2008/9/17/making-methods-immutable-in-ruby
If you can crash the method, you'll get a backtrace which will tell you exactly where it is.
Unfortunately, if you can't crash it then you can't find out where it has been defined. If you attempt to monkey with the method by overwriting it or overriding it, then any crash will come from your overwritten or overridden method, and it won't be any use.
Useful ways of crashing methods:
Pass nil where it forbids it - a lot of the time the method will raise an ArgumentError or the ever-present NoMethodError on a nil class.
If you have inside knowledge of the method, and you know that the method in turn calls some other method, then you can overrwrite the other method, and raise inside that.
Very late answer :) But earlier answers did not help me
set_trace_func proc{ |event, file, line, id, binding, classname|
printf "%8s %s:%-2d %10s %8s\n", event, file, line, id, classname
}
# call your method
set_trace_func nil
You might be able to do something like this:
foo_finder.rb:
class String
def String.method_added(name)
if (name==:foo)
puts "defining #{name} in:\n\t"
puts caller.join("\n\t")
end
end
end
Then ensure foo_finder is loaded first with something like
ruby -r foo_finder.rb railsapp
(I've only messed with rails, so I don't know exactly, but I imagine there's a way to start it sort of like this.)
This will show you all the re-definitions of String#foo. With a little meta-programming, you could generalize it for whatever function you want. But it does need to be loaded BEFORE the file that actually does the re-definition.
You can always get a backtrace of where you are by using caller().

Resources