I need my ROR app to search files like pdf, xml, doc, xls etc...
I'm using apache solr to do this.I'm sending query string to the solr url and able to retrieve the details in json format and display the file names in my app views.
I need to make the file names clickable so as to open those files.I don't know how to make this.Any help will be great.Is there any option in apache solr to open those files or any gem supports this.
In apache solr app screens also I,m not able to open files.I don't know whether there is any option in apache solr.This may sound quite silly, but I'm new to Apache solr and don't have much idea about the working of solr.
I don't know much about document searching. If you use sunspot solr gem there is an add-on which is used for document searching. Have a look at this: sunspot_cell
Related
I am trying to create an application that crawls a website providing free financial data in .xlsx format. They upload files once a month and not always on the same day.
Is it possible to download any new files from a specific URL and dump it into my S3 bucket, before reading it into a database? I have read up about creating a worker using Sidekiq. I expect that this will play a crucial part in the process.
Can anybody perhaps give some advice or point me to a tutorial that can help?
Yes, you can, and you don't even need Sidekiq.
Take a look at AWS SDK for Ruby, and do the following things:
Just write a ruby script that downloads the xlsx files then upload to S3. Be sure the script starts with #!/usr/bin/env ruby, and give it execute permission.
Add this script to your crontab jobs, and make it run everyday.
Im planning to migrate all my Wordpress posts to a Rails app that Im going to create. So I want to know how to read the Wordpress export file in Rails and get those data.
Not exactly what you are looking for, but have a look at the Jekyll importer. Jekyll is a blogging platform coded in Ruby, and it has an importers that are able to decipher the wordpress format.
You could either try it, and see what you get as an output, or look at the code to modify it the way you need. You can look at the code here.
I'm testing it and Nokogiri does not seem to respect Robots.txt file. Is there someway to make it respect? It seems like common question, but I could not find any answer online.
Nokogiri parses the HTML or webpage that you give it. It does not know anything about the robots.txt file for the domain where the page you happen to have requested resides.
I presume that you want to ignore in-site links that are in robots.txt?
Since you've tagged this Rails, I'll assume you use Ruby. In that case you can use the Mechanize library which has the facility to use the robots.txt file.
There is also the original Perl version and other language ports if you prefer those.
I was wondering what the best option for generating doc for rails, its plugin and the app in one single file that I can navigate.
I've been using rdoc but that creates multiple files, yard is too slow and hanna gets stuck at random places.
Any help?
If you truly want just the one file, maybe rocco would work well for you:
I was just wondering if anyone knew of any good libraries for parsing .doc files (and similar formats, like .odt) to extract text, yet also keep formatting information where possible for display on a website.
Capability of doing similarly for PDFs would be a bonus, but I'm not looking as much for that.
This is for a Rails project, if that helps at all.
Thanks in advance!
Apache's POI is a very popular way to access Word and Excel documents. There's a Ruby POI binding that might be worth investigating, but it looks like you'll have to build it yourself. And the API doesn't seem very Ruby-like since it's virtually a direct port from the Java code. And it seems to only have been tested against Ruby 1.8.2.