Trackback implementation: rel="trackback" vs RDF - ruby-on-rails

I want my Rails App to parse external websites for a trackback URL but I'm not really sure if I should just look for a
Text
or follow the RDF specifications described by sixapart. Or both. Wordpress and Techcrunch both only offer a rel="trackback" link and they should know. On the other hand maybe some blog only provides RDF and I'm missing the link.
What do you think?
And is there any ready gem/plugin out there (it's really hard to google for trackback...)
Thanks.
UPDATE
I'm now first trying to find the RDF information. If I do not find anything, I look for the link tag. I was refering to the sixapart specifications. Thanks for your help.

I'm checking for both now (first RDF, then link if not successfull). I was refering to the sixapart specifications. Thanks for your help!

Related

Is there any more complete object reference for Rails than the API?

Sometime it is a big problem to get to know, how to use all methods of an object. Is there a complete list of all possible parameters of all methods of all rails-objects?
For instance, when I looked for "belong_to", I found more complete information in tutorial than in the API API.
Where do the authors of Tutorial take their knowledge? Don’t tell they decompile hundreds of lines of source codes of each method.
I am not looking for a book. A complete reference is enough for me (similarly complete as a Ruby Language reference - e.g.
Have you checked API Dock? http://apidock.com/rails/ActiveRecord/Associations/ClassMethods/belongs_to
I find their documentation to be very easy to read and it has a lot of examples.
Of course, here it is - https://github.com/rails/rails/. Just read the code.
Did you try to go through the links? belongs_to is alias for references, there is a link to add_reference
https://api.rubyonrails.org/ contains most complete docs for ROR

Pdf Parsing Challenge

I have the following problem: I have a lot of papers in pdf format and I have to extract information from the first page of each one and then save it into a database
I just need to extract, the title, the abstract, keywords, authors list, universities list, emails. I want to do a script to get a string for each one of that fields, for each paper.
How can I do that? Does anyone already did that? What languages and tools do you recommend me?
and Does exist a paper repository that already do that database feeding?
Considering the pdfs could be with different encodings, I have to deal with this problem too. Any help with this would be great.
An example of a paper its here
Greetings!
http://pdfbox.apache.org/
You have to check about the security of the pdf, that it's really text and not an image. Check the command line application of pdfbox if it works extracting the text, then you can use the jar and use http://pdfbox.apache.org/apidocs/org/apache/pdfbox/examples/util/ExtractTextByArea.html
Hope it helps....
By the way it's java...
edit.
I have not used this as a jar library http://www.qoppa.com/pdftext/, but I used the example application and it works, but I decided to go with pdfbox...
You need a API to read your pdf.
Seems fine (I never try it though)
You can probably find others with this link :-)

Integrate Google Search Appliance with Rails Application

I wanted to integrate GSA with Rails. But the web proved to be un-helpful. Anybody knows any step-by-step tutorial.
First, you have to call the GSA with the search query. The reference informations for generating the GET-request is here: http://code.google.com/apis/searchappliance/documentation/610/xml_reference.html#SubmittingaSearchRequest
Then you will receive a xml response from the GSA with all needed information inside. The reference for the xml nodes are here (Look better at a xml response from your GSA and get additional informations in my link): http://code.google.com/apis/searchappliance/documentation/610/xml_reference.html#results_xml_tags
Last but not least, you have to parse the informations within the xml and generate your custom frontend.
I hope that will help you. It isn't really complicated.
Define "integrate" closer. The most things are language independent and have nothing to do with Rails/Java/.Net or whatever
Try using the gem rails-gsa this will help you a lot in integrating the Google Search Appliance with your rails Application.
Check out the documentation on github https://github.com/rohit9889/rails-gsa

Incorporating MathML in Google Blogger posts

I'd like to format maths equations using MathML, with LaTeX-like syntax, in my blog posts hosted by Google Blogger; but references, on Google's site and elsewhere, to articles on how to conveniently do this seem non-existent.
The few articles I've found, on MathML generally, presupposes one can control the contents of an entire page, for example putting tokens in the "<html>" tag, which I don't think applies to Google Blogger.
The best site I've found is Ionel Alexandru's code at http://www.fmath.info/ But even there the documentation is very sparse and it isn't obvious how one would use his
scripts/packages for this.
Maybe I'm just being thick. But surely people must be using MathML in Google Blogger, and if so I'd be very interested in references to how it can best be done (preferably via an XML solution rather than dozens of little inline images in the text !)
Failing that, are there standard "register and start blogging" facilities/sites other than Google Blogger that make it easy to use MathML or where it is available as standard?
Cheers
John R Ramsden
I've written a javascript module, jqmath, to do what you want. See http://mathscribe.com/author/jqmath.html.
Instructions on how to use it in Google Blogger are at http://mathscribetheblog.blogspot.com/2011/03/jqmath-in-blogger.html.
By the way, those instructions do have you edit your <html> tag, which Google Blogger happily let me do. (I just did it to add a MathML namespace prefix for the MathPlayer plugin for Internet Explorer, so actually things would work ok, but less well, without it.)
I hope you like this. If you have any problems, post the link to your blog, and I'll take a look at it.
I use Peter Jipsen;s script, modified a bit, to get mathml into blogger
http://dpcarlisle.blogspot.com/2007/04/testing-interface.html
It seems the most wellknown is Math jax? MathJax

How to get rid of stupid "pad" labels produced by RTML functions?

I am unlucky to be in charge of maintaining some old Yahoo! Store built using their RTML-based platform.
Recently I've noticed that HTML code generated by some RTML functions is sprinkled all over with "padding images" (or whatever is the conventional name for those 1x1 pixel images used to enforce layout). I have nothing against using such images, but... all those images are supplied with an ALT attribute like this:
<img href="http://.../image1x1.gif" alt="pad">
With all due respect to the original authors of RTML, but they must have been smoking something when they came up with this "accessibility enhancement"... :-(
Anyway, here are my questions:
Does anybody know a list of all RTML functions that generate HTML with all these "pad" images?
Is there any way to get rid of all those alt="pad" attributes without rewriting a lot of RTML code?
NB: This may sound a little cynical, but improved accessibility is not the main goal here. The main goal is to stop exposing those moronic alt="pad" attributes to Google and other smart search engines. So client-side scripting is not going to help, as far as I know.
Thank you!
P.S. Probably, most of you are really lucky and never heard of RTML. Because if somebody would establish a prize for software products based on
commercial success
------------------
usability
ratio, this RTML-based "platform" would probably win the first place.
P.P.S. Apparently someone from Yahoo! finally listened, because I can no longer find those silly "pad" tags in the RTML generated for our store. Nevertheless, one of the ideas offered in response to my original question does provide a very practical solution - not just to the original problem but to any similar problem with RTML platform. See the winning answer - it's really good.
The only way I see is to have your own website front-end that will filter whatever you want from the RTML site....
for example, your rtml site is at http://rtmlusglysite.yahoo.com/store/XYZ01134 , you could host a simple PHP front-end at http:://www.example.com that would be acting like a "filtering" HTTP web proxy, so http://rtmlusglysite.yahoo.com/store/XYZ01134/item1234.rtml would be accessed by http://www.example.com/item1234.html
It's not an ideal solution, but it should work, and you could do some more fancy stuff.
Nice try from the other posters, but there is a very simple RTML command that will do it. . .
TEXT PAT-SUBST s GRAB
MULTI
HEAD
BODY
TEXT #var-with-alt-tag-equals-pad-in-it
frompat "alt=\"pad\""
topat ""
The above RTML will find all instances of alt="pad" and replace it with nothing.
Well you're right on RTML being relatively untraveled :)
Do you have a way to add your own attributes to these images tags? If so, would it be possible to override the alt attribute? If you specify alt="", I would think that would override Yahoo's... Otherwise consider putting a useful alt tag in there for the blind and dialup types.
It's the first time I'm hearing about this platform, but here is an idea: if you can add javascript to the pages, you could write a function that will run after the page has loaded and remove all the alt="pad" attributes from the page.
Unfortunately this solutions works only with browsers that know about scripting, so lynx or some other text based browsers might not support it.
I have shared a link official RTML guide from yahoo. Hope it will help. Thanks!
List of available RTML books and resources

Resources