Rails scraping - submitting a form - ruby-on-rails

I am filling in a form on a page and submitting it.
This should trigger the download of a file.
However, when I try to save the output of the download, I get the source code of the page rather than the file.
My code is:
mechanize = Mechanize.new
mechanize.pluggable_parser.default = Mechanize::Download
page = mechanize.get('http://page.com/')
form = page.forms.first
form.radiobuttons_with(name: 'presence')[0].check
form.source = "btce"
form.label = "BTC/USD"
mechanize.get_file(form.submit).save!('page.csv')
How can I save a file which is downloaded when I submit a form?

Does the file automatically begin downloading once you submit the form?
Submitting a form may return a new page, also new scripts/stylesheets can be loaded. Which possibly explains why your file contains the source code since that's what you're downloading. (Mechanize doesn't throw an error if you download a web page)
For example, I use Mechanize to fill out Google's search form and submit it and save the results to google_search.csv. The new file contains a mixture of the page's source code along with javascript, mySQL, and its stylesheets.
You can dig through the page's source code using Firebug and pinpoint what exactly happens when you submit the form, which can likely be a link that's invoked but you were unaware of.

Related

How to get form values in browser to watir webdriver

I have installed my application and it is running on following URL
http://localhost:3000
Above URL will the load form with some fields, Then I will the fill the data in the required field and then submit form. My div element will displayed at the bottom of the page. Picture will be displayed inside the iframe with in the div element.
User will the above URL and then submit form. After submitting the form, Picture should be downloaded into their local machine.
Right I am calling the following line after form submission, how can I get the existing page into the browser object and download screenshot?
browser = Watir::Browser.new
b.div(:id => "phone_shell").screenshot("/home/user/Documents/preview.png")
I found few problems in your code
screenshot method is not available for element object, it's available for browser object and also you need to call the method save to save the file in the destination folder. So write the following code, it would work.
Code to get the html of the page
b.html
Code to take the screenshot
b.screenshot.save("/home/user/Documents/preview.png")
Now this will save the image in the destination folder.

Select & onChange | Ruby on Rails

In my view, I have this form with the select function:
<form id="form_id">
<%= form.select('id','name', #document.informations.find(:all).collect {|u| [u.name] },
options={},{:onChange => 'submit()'}) -%>
</form>
How can I use the selected name in the rest of my view ?
I saw this in other topics:
$('#id_name').val();
But it didn't work for me, it says:
`$(' is not allowed as a global variable name
You're getting that particular error because the $('#id_name').val() is JavaScript, not Ruby, but you've put it within erb tags.
When you visit a page /documents/3 in your browser, Rails will run the code in your controller, and send back some HTML to your browser. That HTML can load CSS and JavaScript, but that's run after your Ruby program has finished - and may not be run at all, depending on the browser. How you use the name of the selected item in your view depends on what you're doing with it.
If you're storing it in the database somewhere, then you should start by getting this working just in Ruby. For instance, if your #document has a selected_informations attribute, you could use that in the rest of your page, and pass it to your form.select to pre-select it in the page. The Rails Guides documentation has more info on this.
If you're not storing it in the database, then you can get the value of your box out with JavaScript whenever it changes. Here's some sample code that prints out the name of the selected item to the JavaScript console whenever a <select> box gets changed. I've included it in <script> tags so you can drop it straight into your view for testing, but you should put it into a dedicated JavaScript file if you adapt it for your project.
<script>
$('select').on('change', function(ev) {
var selected_item = $(ev.currentTarget).val();
console.log("Your select value is " + selected_item);
});
</script>
One final thing to note is that your existing form.select tag is set up to call a method submit() whenever its value gets changed. submit is just a name, so that function could do anything... but I'd guess it's submitting the form whenever an item is selected. This sends a request to your Rails server and refreshes the page, so beware - if you're using an event listener on change to update the current page, you won't see those changes on the new, refreshed page (and should use a server-side solution instead).

Form does not get Submitted via Mechanize

URL = 'http://public.dep.state.ma.us/SearchableSites2/Search_UST.aspx'
agent = Mechanize.new()
agent.get(URL)
form = agent.page.form_with(:action=>/Search_UST.aspx/)
form.submit(form.button_with(:value=>'Search'))
puts agent.page.body
The above snippet is suppose to submit the form and receives search results page. However, the form does not get submitted. Instead of getting results page, I get the form page as if I did not submit the form.
That's the source page I'm trying submit http://public.dep.state.ma.us/SearchableSites2/Search_UST.aspx
Any suggestion on how to overcome this problem?
Thank you
I can see that Search button has a "doPostBack" action in the onclick. So you will need to parse that and do something with it.

How can we circumvent these remote forms drawback?

In an effort to have everything translateable in our website ( including the error messages for the validations ), we switched almost all of our forms to remote forms. While this helps with the ability to translate error messages, we have encountered other problems, like:
if the user clicks on the submit button multiple times, the action gets called multiple times. If we have a remote form for creating a new record in the database, and assuming that the user's data is valid, each click will add a new object ( with the exact same contents ). Is there any way of making sure that such things cannot happen?
Is there somewhere I could read about remote forms best practices? How could I handle the multiple clicks problem? Is switching all the forms to remote forms a very big mistake?
There is a rails 3 option called :disable_with. Put this on input elements to disable and re-label them while a remote form is being submitted. It adds a data-disable-with tag to those inputs and rails.js can select and bind this functionality.
submit_tag "Complete sale", :disable_with => "Please wait..."
More info can be found here
Easy, and you can achieve that in many ways depending your preferences:
Post the form manually simply using an ajax request and while you wait for the response disable/hide (or whatever you need) the form to ensure the user can't keep doing posts as crazy. Once you get the response from the server, again you can allow the user to post again (cleaning the form first), or show something else or redirect it to another page or again whatever you need.
Use link_to :remote=>true to submit the form and add a callback function to handle the response and also to disable/hide (or whatever you need) the form when it's submitted
Add a js listener to the form to detect when it's submitted and then disable/hide/whatever the form
As you see, there are lots of different ways to achieve what you need.
EDIT: If you need info about binding or handling a form submit from js here you'll find very easy and interesting examples that may help you to do what I suggested you! jQuery Submit
I have remote forms extensively myself, and in most cases I would avoid them. But sometimes your layout or UX demands for on-the-fly drop-down forms, without reloading or refreshing the complete page.
So, let me tackle this in steps.
1. Preventing Normal form double-post
Even with a normal form, a user could double-click your button, or click multiple times, if the user does not get a clear indication that the click has been registered and the action has started.
There are a lot of ways (e.g. javascript) to make this visible, but the easiest in rails is this:
= f.button :submit, :disable_with => "Please wait..."
This will disable the button after the first click, clearly indicating the click has been registered and the action has started.
2. Handling the remote form
For a remote form it is not that much different, but the difference most likely is: what happens afterward ?
With a remote form you have a few options:
In case of error: you update the form with the errors.
you leave the form open, allowing users to keep on entering the data (I think this is your case?)
you redirect the users to some place.
Let me handle those cases. Please understand that those three cases are completely standard when doing a normal form. But not when doing a remote call.
2.1 In case of error
For a remote form to update correctly, you have to do a bit more magic. Not a lot, but a bit.
When using haml, you would have a view called edit.js.haml which would look something like
:plain
$('#your-form-id').replaceWith('#{j render(:partial => '_form') }');
What this does: replace the complete haml with only the form. You will have to structure your views accordingly, to make this work. That is not hard, but just necessary.
2.2 Clearing the form
You have two options:
* re-render the form completely, as with the errors. Only make sure you render the form from a new element, not the just posted one!!
* just send the following javascript instead:
$('#your-form-id').reset();
This will blank the form, and normally, that would effectively render any following clicking useless (some client validation could block posting until some fields are filled in).
2.3 Redirecting
Since you are using a remote form, you can't just redirect. This has to happen client-side, so that is a tad more complicated.
Using haml again this would be something like
:plain
document.location.href = '#{#redirect_uri}';
Conclusion
To prevent double (triple, quadruple, more) posts using remote forms you will have to
disable the button after first click (use :disable_with)
clear the form after succesful submission (reset the form or render with a new element)
Hope this helps.
The simplest solution would be to generate a token for each form. Then your create action could make sure it hasn't been used yet and determine whether the record should be created.
Here's how I would go about writing this feature. Note that I haven't actually tested this, but the concept should work.
1.
Inside the new action create a hash to identify the form request.
def new
#product = Product.new
#form_token = session["form_token"] = SecureRandom.hex(15)
end
2.
Add a hidden field to the form that stores the form token. This will be captured in the create action to make sure the form hasn't been submitted before.
<%= hidden_field_tag :form_token, #form_token %>
3.
In the create action you can make sure the form token matches between the session and params variables. This will give you a chance to see if this is the first or second submission.
def create
# delete the form token if it matches
if session[:form_token] == params[:form_token]
session[:form_token] = nil
else
# if it doesn't match then check if a record was created recently
product = Product.where('created_at > ?', 3.minutes.ago).where(title: params[:product][:title]).last
# if the product exists then show it
# or just return because it is a remote form
redirect_to product and return if product.present?
end
# normal create action here ...
end
Update: What I have described above has a name, it is called a Synchronizer (or Déjà vu) Token. As described in this article, is a proper method to prevent a double submit.
This strategy addresses the problem of duplicate form submissions. A synchronizer token is set in a user's session and included with each form returned to the client. When that form is submitted, the synchronizer token in the form is compared to the synchronizer token in the session. The tokens should match the first time the form is submitted. If the tokens do not match, then the form submission may be disallowed and an error returned to the user. Token mismatch may occur when the user submits a form, then clicks the Back button in the browser and attempts to resubmit the same form.
On the other hand, if the two token values match, then we are confident that the flow of control is exactly as expected. At this point, the token value in the session is modified to a new value and the form submission is accepted.
I hate to say it, but it sounds like you've come up with a cure that's worse than the disease.
Why not use i18n for translations? That certainly would be the 'Rails way'...
If you must continue down this route, you are going to have to start using Javascript. Remote forms are usually for small 'AJAXy things' like votes or comments. Creating whole objects without leaving the page is useful for when people might want to create lots of them in a row (the exact problem you're trying to solve).
As soon as you start using AJAX, you have to deal with the fact that you'll have to get into doing some JS. It's client-side stuff and therefore not Rail's speciality.
If you feel that you've gone so far down this road that you can't turn back, I would suggest that the AJAX response should at least reset the form. This would then stop people creating the same thing more than once by mistake.
From a UI/UX point of view, it should also bring up a flash message letting users know that they successfully created the object.
So in summary - if you can afford the time, git reset and start using i18n, if you can't, make the ajax callback reset the form and set a flash message.
Edit: it just occurred to me that you could even get the AJAX to redirect the page for you (but you'd have to handle the flash messages yourself). However, using a remote form that then redirects via javascript is FUGLY...
I've had similar issues with using a popup on mouseover, and not wanting to queue several requests. To get more control, you might find it easier to use javascript/coffeescript directly instead of UJS (as I did).
The way I resolved it was assigning the Ajax call to a variable and checking if the variable was assigned. In my situation, I'd abort the ajax call, but you would probably want to return from the function and set the variable to null once the ajax call is completed successfully.
This coffeescript example is from my popup which uses a "GET", but in theory it should be the same for a "POST" or "PUT".
e.g.
jQuery ->
ajaxCall = null
$("#popupContent").html " "
$("#popup").live "mouseover", ->
if ajaxCall
return
ajaxCall = $.ajax(
type: "GET"
url: "/whatever_url"
beforeSend: ->
$("#popupContent").prepend "<p class=\"loading-text\">Loading..please wait...</p>"
success: (data) ->
$("#popupContent").empty().append(data)
complete: ->
$"(.loading-text").remove()
ajaxCall = null
)
I've left out my mouseout, and timer handling for brevity.
You can try something like that for ajax requests.
Set block variable true for ajax requests
before_filter :xhr_blocker
def xhr_blocker
if request.xhr?
if session[:xhr_blocker]
respond_to do |format|
format.json, status: :unprocessable_entity
end
else
session[:xhr_blocker] = true
end
end
end
Clear xhr_blocker variable with an after filter method
after_filter :clear_xhr_blocker
def clear_xhr_blocker
session[:xhr_blocker] = nil
end
I would bind to ajax:complete, (or ajax:success and ajax:error) to redirect or update the DOM to remove/change the form as necessary when the request is complete.

Generate doodle report on post

Lately I've been using doodle reporting to generate excel and pdf reports. I was doing that with a button link because the report is not dynamic.
But now I have a requirement where the parameters are dynamic, where I have a form of parameters and the users will fill these form and hit submit. The form will submit to another action and that action should generate a pdf report.
When I tried this code:
return new ReportResult(report, new PdfReportWriter());
It just generate the report in the page and I'm unable to download. Any idea how?
I've already included all the required dlls and I'm able to generate when I'm using an actionLink.
To solve the problem specify the content type and filename:
return new ReportResult(report, new ExcelReportWriter(), "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet") { FileName = "Report.xls" };

Resources