Why I am not getting the products on [this][1] page when searching a " " space in the search form I only get the menus but not the search result products
Ruby code:
require 'nokogiri'
require 'mysql2'
require 'logger'
require 'mechanize'
agent = Mechanize.new{|a| a.log = Logger.new(STDERR) }
agent.user_agent_alias = 'Windows Mozilla'
agent.read_timeout = 60
def add_cookie(agent, uri, cookie)
uri = URI.parse(uri)
Mechanize::Cookie.parse(uri, cookie) do |cookie|
agent.cookie_jar.add(uri, cookie)
end
end
login_page = agent.get "http://www.example.com.mx/login.php?location=%2F"
login_form = login_page.form_with(:method => 'POST')
email_field = login_form.field_with(name: "correo_ingresar")
password_field = login_form.field_with(name: "password")
email_field.value = 'user#example.com'
password_field.value = 'password'
home_page = login_form.submit
myarray = home_page.body.scan(/SetCookie\(\"(.+)\", \"(.+)\"\)/)
myarray.each{|line| add_cookie agent, 'http://www.example.com.mx', "#{line[0]}=#{line[1]}"}
add_cookie(agent, 'http://www.example.com.mx', "forzar_existencias=1; path=/; domain=www.example.com.mx")
add_cookie(agent, 'http://www.example.com.mx', "articulos_mostrar=50; path=/; domain=www.example.com.mx")
add_cookie(agent, 'http://www.example.com.mx', "forz_existencias=1=; path=/; domain=www.example.com.mx")
add_cookie(agent, 'http://www.example.com.mx', "no_actualiza=1; path=/; domain=www.example.com.mx")
add_cookie(agent, 'http://www.example.com.mx', "orden_mostrar=8; path=/; domain=www.example.com.mx")
add_cookie(agent, 'http://www.example.com.mx', "page=1; path=/; domain=www.example.com.mx")
add_cookie(agent, 'http://www.example.com.mx', "precio_inicio=0; path=/; domain=www.example.com.mx")
add_cookie(agent, 'http://www.example.com.mx', "location=%2Farticulos.php%3Fbuscar%3D%2B; path=/; domain=www.example.com.mx")
search_form = home_page.forms.first
search_field = search_form.field_with(name: "buscar")
search_field.value = ' '
search_results = search_form.submit
resultados = 'http://example.com.mx/articulos.php?buscar=+'
I downloaded the Live HTTP Headers addon for firefox with firebug. when I fill with a space and click on the search button on the [webpage][1] I get the following results on live HTTP headers.
http://example.com.mx/articulos.php?buscar=+
GET /articulos.php?buscar=+ HTTP/1.1
Host: example.com.mx
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:39.0) Gecko/20100101 Firefox/39.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://example.com.mx/articulos.php?buscar=+
Cookie: _ga=GA1.3.162897808.1438611502; _gat=1
Connection: keep-alive
HTTP/1.1 200 OK
Date: Sat, 08 Aug 2015 04:29:40 GMT
Server: Apache
x-powered-by: PHP/5.4.30
Cache-Control: no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: 0
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html
----------------------------------------------------------
http://www.google-analytics.com/collect?v=1&_v=j37&a=1988602157&t=pageview&_s=1&dl=http%3A%2F%2Fexample.com.mx%2Farticulos.php%3Fbuscar%3D%2B&ul=en-us&de=UTF-8&dt=Sistemas%20Aplicados&sd=24-bit&sr=1920x1080&vp=1903x969&je=0&_u=AACAAEABI~&jid=&cid=162897808.1438611502&tid=UA-58813310-1&z=90642832
GET /collect?v=1&_v=j37&a=1988602157&t=pageview&_s=1&dl=http%3A%2F%2Fexample.com.mx%2Farticulos.php%3Fbuscar%3D%2B&ul=en-us&de=UTF-8&dt=Sistemas%20Aplicados&sd=24-bit&sr=1920x1080&vp=1903x969&je=0&_u=AACAAEABI~&jid=&cid=162897808.1438611502&tid=UA-58813310-1&z=90642832 HTTP/1.1
Host: www.google-analytics.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:39.0) Gecko/20100101 Firefox/39.0
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://example.com.mx/articulos.php?buscar=+
Connection: keep-alive
HTTP/1.1 200 OK
Pragma: no-cache
Expires: Mon, 07 Aug 1995 23:30:00 GMT
Access-Control-Allow-Origin: *
Last-Modified: Sun, 17 May 1998 03:00:00 GMT
x-content-type-options: nosniff
Content-Type: image/gif
Date: Wed, 29 Jul 2015 12:33:33 GMT
Server: Golfe2
Content-Length: 35
Age: 834969
Alternate-Protocol: 80:quic,p=0
Cache-Control: private, no-cache, no-cache=Set-Cookie, proxy-revalidate
----------------------------------------------------------
http://example.com.mx/resultados.php
POST /resultados.php HTTP/1.1
Host: example.com.mx
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:39.0) Gecko/20100101 Firefox/39.0
Accept: */*
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With: XMLHttpRequest
Referer: http://example.com.mx/articulos.php?buscar=+
Content-Length: 204
Cookie: _ga=GA1.3.162897808.1438611502; _gat=1
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
opcion=&buscar=+&page=1&articulos_mostrar=10&orden_mostrar=1&seccion=&linea=&sublinea=&forz_existencias=1&precio_inicio=0&precio_final=20000&location=%252Farticulos.php%253Fbuscar%253D%252B&no_actualiza=1
HTTP/1.1 200 OK
Date: Sat, 08 Aug 2015 04:29:42 GMT
Server: Apache
x-powered-by: PHP/5.4.30
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html
----------------------------------------------------------
Question is: how do I get the full products to show on the webpage so I can start scraping if it has a referer link and it does not automatically gets the products. [This][2] is the resulting HTML:
I give 2 solutions, but only one uses a POST as you requested in your question:
require 'mechanize'
agent = Mechanize.new
agent.get("http://www.sistemasaplicados.com.mx/")
agent.page.forms.first.field_with(name: "buscar").value = ' '
result_page = agent.page.forms.first.submit
The other option is to encode your search term and use it in a simple the GET request (encoded in the URL) directly with nokogiri. In your particular case a search for "160GB" leads to following URL http://www.sistemasaplicados.com.mx/articulos.php?buscar=160GB, which you can just GET.
Btw, you do not necessarily need mechanize for all that, except you want to place orders into your account automatically or something like that. I assume you do that in the interest of sistemasaplicados otherwise I would consider this rude and it would bring you bad karma.
Update
When manually checking what is happening you should look what happens if JavaScript is disabled (in this case, there are no results). Then, with the "inspector", "console" or "developer tools" of your Browser (often opened by pressing F12) find out what happens. In your case a POST request to resultados.php is done. I found out with firefox, dev tools, "Network" tab. There also you find the relevant parameters to fire up in your POST request.
Related
I have a fairly simple Rails 6.0 API-only application serving JSON; I am a little disturbed to find that the response headers to a POST request include "transfer-encoding: chunked", the response body is a few 10s of bytes so this seems inappropriate.
cURL output:
> POST /profiles/779397007/activities HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.58.0
> Accept: */*
> Content-Length: 58
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 58 out of 58 bytes
< HTTP/1.1 201 Created
< X-Frame-Options: SAMEORIGIN
< X-XSS-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< X-Download-Options: noopen
< X-Permitted-Cross-Domain-Policies: none
< Referrer-Policy: strict-origin-when-cross-origin
< Content-Type: application/json; charset=utf-8
< ETag: W/"caf45e454eded06497865cb8e2938360"
< Cache-Control: max-age=0, private, must-revalidate
< X-Request-Id: 4a0953ca-0782-4f72-8240-6edd1fd4cdc3
< X-Runtime: 0.287641
< Transfer-Encoding: chunked
Would anyone have any idea why this is happening? (Or just how I can disable it?) I've tried adding stream: false to the call to #render in the controller, but that has no effect (the documentation states ":stream only works with templates. Rendering :json or :xml with :stream won't work.".
A similar sounding problem was found with Rails' request handling, but this is in a Rails response.
I am experiencing a very strange behaviour with traefik (1.3.5) being used in kubernetes (used as ingress (deploy with the stable chart)).
I have a php endpoint behind a varnish server that returns a 404 when I curl it directly without any special trick.
$ curl -v ingress.../sport/?page=404
> GET /sport/?page=404 HTTP/1.1
> Host: varnish.ingress.xxx
> User-Agent: curl/7.43.0
> Accept: */*
>
< HTTP/1.1 404 Not Found
< Age: 0
< Cache-Control: max-age=10, public
< Content-Type: text/html; charset=UTF-8
< Date: Wed, 06 Sep 2017 21:19:48 GMT
< Server: nginx
< Vary: Accept-Encoding
< Vary: Accept-Encoding
< Via: 1.1 varnish-v4
< X-Cache: MISS
< X-Powered-By: PHP/7.1.6
< X-Varnish: 65773
< Transfer-Encoding: chunked
<
This is the expected behavior, but when I curl it through traefik with the gzip header (or with --compressed), I have an http 200... :up_side_down:
$ curl -v ...ingress.../sport/?page=404
> GET /sport/?page=404 HTTP/1.1
> Host: varnish.ingress.xxx
> User-Agent: curl/7.43.0
> Accept: */*
> Accept-Encoding: gzip
>
< HTTP/1.1 200 OK
< Age: 0
< Cache-Control: max-age=10, public
< Content-Encoding: gzip
< Content-Type: text/html; charset=UTF-8
< Date: Wed, 06 Sep 2017 21:18:38 GMT
< Server: nginx
< Vary: Accept-Encoding
< Vary: Accept-Encoding
< Via: 1.1 varnish-v4
< X-Cache: MISS
< X-Powered-By: PHP/7.1.6
< X-Varnish: 197657
< Transfer-Encoding: chunked
<
If I do the same test by directly on varnish or through an amazon elb I don't have the issue and always get a 404...
I noticed that traefik is re-adding the Vary: Accept-Encoding header.
I also noticed dozen of server.go:2753: http: multiple response.WriteHeader calls log messages.
Has anyone of you already got that strange behavior ?
Any clue how to investigate ?
Thanks in advance
Well, this is a known issue https://github.com/containous/traefik/pull/1948 that will be shipped in 1.4.0
M.
I'm trying to test my Rails 5 API endpoints using HTTPie. The command I run in Terminal is:
http POST :3000/users email=test#example.com password=anewpassword password_confirmation=anewpassword
The response I get is:
HTTP/1.1 400 Bad Request
Cache-Control: no-cache
Content-Type: application/json; charset=utf-8
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Request-Id: c3ca4d50-9501-46e9-a7e1-a4576d6cdd7e
X-Runtime: 0.010404
X-XSS-Protection: 1; mode=block
{
"errors": [
"Password can't be blank"
]
}
Which means it's bumping up against my validations. However, from what I can tell it shouldn't be. This is what I get from the Rails server terminal:
Started POST "/users" for 127.0.0.1 at 2017-02-03 12:09:54 -0800
Processing by UsersController#create as JSON
Parameters: {"email"=>"test#example.com", "password"=>"[FILTERED]",
"password_confirmation"=>"[FILTERED]", "user"=>{"email"=>"test#example.com"}}
(0.1ms) BEGIN User Exists (0.4ms) SELECT 1 AS one FROM "users"
WHERE LOWER("users"."email") = LOWER($1) LIMIT $2 [["email","test#example.com"], ["LIMIT", 1]]
(0.1ms) ROLLBACK
Completed 400 Bad Request in 3ms (Views: 0.1ms | ActiveRecord: 0.6ms)
Any ideas as to what I'm doing wrong? I have a feeling I'm not using HTTPie correctly, but I can't find much with my Google-fu.
EDIT
I can confirm the route is working. Using Postman and passing the following JSON in the request body gets the expected response:
{"user": {"email": "test#example.com","password": "anewpassword","password_confirmation": "anewpassword"}}
Response:
{"status": "User created successfully"}
With the help of this post(Sending nested JSON object using HTTPie) I modified my HTTPie request to look like this:
http POST :3000/users user:='{"email": "test#example.com", "password": "anewpassword", "password_confirmation": "anewpassword"}'
Which gave me the expected response of:
HTTP/1.1 201 Created
Cache-Control: max-age=0, private, must-revalidate
Content-Type: application/json; charset=utf-8
ETag: W/"8a33506505f34984dfa0c1fe7e8a9ef3"
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Request-Id: 6e34a81d-e5cb-4109-98ca-b406c962189b
X-Runtime: 0.196269
X-XSS-Protection: 1; mode=block
{
"status": "User created successfully"
}
I have following in one of my controller:
render :json => #obj.to_json, status: 201
This is giving empty response sometimes, with proper 201 status code.
However if I change the status code to 200, the response is rendered properly everytime.
Any idea why the response body behaviour is behaving like this with the change in status code.
(Note, I am deploying this on heroku - and problem occurs only on server - can't reproduce on local development)
Edit:
cURL Details:
curl url syntax is normal with basic headers and some of my own for authentication.
Response:
* Hostname was NOT found in DNS cache
* Trying 23.21.41.210...
* Connected to utilities-myappp.herokuapp.com (23.21.41.210) port 80 (#0)
> POST /api/v2/bookings HTTP/1.1
> User-Agent: curl/7.37.1
> Host: utilities-myappp.herokuapp.com
> Content-Type: application/json
> Accept: */*
> Cache-Control: no-cache
> Cookie: request_method=POST
> Connection: keep-alive
> access_token: my-access-token
> Content-Length: 413
>
* upload completely sent off: 413 out of 413 bytes
< HTTP/1.1 201 Created
* Server Cowboy is not blacklisted
< Server: Cowboy
< Connection: close
< Date: Wed, 25 Mar 2015 13:13:27 GMT
< Status: 201 Created
< X-Frame-Options: SAMEORIGIN
< X-Xss-Protection: 1; mode=block
< X-Content-Type-Options: nosniff
< Content-Type: application/json; charset=utf-8
< Etag: "b7f09a43851971654d269bee8beed023"
< Cache-Control: max-age=0, private, must-revalidate
< X-Request-Id: 3ed3d9ee-d534-416a-b689-045e1d60e374
< X-Runtime: 0.351844
< Vary: Accept-Encoding
< Via: 1.1 vegur
<
This is the same signature I received when 1) I receive the response body(in that case this is followed by response text, 2) when there is no response body.
The only changes are in Date, Etag, X-Request-Id, X-runtime response headers in both cases.
I have a simple out of the box rails server running and I am trying to figure out how to make it work with the If-Modified-Since header. I am using the following curl request.
curl -I localhost:3000/shows/1 --header 'If-Modified-Since: Thu, 21 Jun 2012 19:16:20 GMT'
I always get 200 OK but I want 304 Not Modified.
When I do this through the browser I get a 304 Not Modified but not with curl.
Am I doing it wrong?
UPDATE
If I run
curl -v -I localhost:3000/shows/1 --header 'If-Modified-Since: Thu, 21 Jun 2012 19:16:20 GMT'
I get:
* About to connect() to localhost port 3000 (#0)
* Trying 127.0.0.1... connected
* Connected to localhost (127.0.0.1) port 3000 (#0)
> HEAD /shows/1 HTTP/1.1
> User-Agent: curl/7.21.4 (universal-apple-darwin11.0) libcurl/7.21.4 OpenSSL/0.9.8r zlib/1.2.5
> Host: localhost:3000
> Accept: */*
> If-Modified-Since: Thu, 21 Jun 2012 19:16:20 GMT
>
< HTTP/1.1 200 OK
< Content-Type: text/html; charset=utf-8
< X-Ua-Compatible: IE=Edge
< Etag: "35ec119e3b6c7ccde9eeee82afcd4ee9"
< Cache-Control: max-age=0, private, must-revalidate
< X-Request-Id: dd365eb0b80f5b9ff62bc16f3dbbe494
< X-Runtime: 0.007800
< Content-Length: 0
< Server: WEBrick/1.3.1 (Ruby/1.9.3/2012-02-16)
< Date: Thu, 21 Jun 2012 19:37:55 GMT
< Connection: Keep-Alive
< Set-Cookie: _spacebarfm- rails_session=BAh7B0kiD3Nlc3Npb25faWQGOgZFRkkiJWY0NzNjYmFlYzJhNzgzZWRjMjc2MmU4YWFkZDc1OTIwBjsAVEkiEF9jc3JmX3Rva2VuBjsARkkiMWhjU3NvQ2VuRkR1NmJRREV1SmxOMUlaK3VhQU0ycjY2am94cFFmOTVnTTg9BjsARg%3D%3D--f9c756d08065c423cdecba2d710b1b74bf798c12; path=/; HttpOnly
Set-Cookie: _spacebarfm-rails_session=BAh7B0kiD3Nlc3Npb25faWQGOgZFRkkiJWY0NzNjYmFlYzJhNzgzZWRjMjc2MmU4YWFkZDc1OTIwBjsAVEkiEF9jc3JmX3Rva2VuBjsARkkiMWhjU3NvQ2VuRkR1NmJRREV1SmxOMUlaK3VhQU0ycjY2am94cFFmOTVnTTg9BjsARg%3D%3D--f9c756d08065c423cdecba2d710b1b74bf798c12; path=/; HttpOnly
* Connection #0 to host localhost left intact
* Closing connection #0
The rails code for the controller is:
def index
#shows = Show.all
respond_to do |format|
format.html # index.html.erb
format.json { render json: #shows }
end
end
The Chrome output is as follows:
Request URL:http://localhost:3000/shows/1.json
Request Method:GET
Status Code:304 Not Modified
Request Headersview source
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Charset:ISO-8859-1,utf-8;q=0.7,*;q=0.3
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Connection:keep-alive
Cookie:_btest_session=BAh7B0kiD3Nlc3Npb25faWQGOgZFRkkiJTNkNzI3ODc1NjU0M2EyZTViOWY0ZTgzMjU0M2IzYmY4BjsAVEkiEF9jc3JmX3Rva2VuBjsARkkiMUFVZVN0ZHp6L0JRNGVMZDZST2JSNG54Zlg4T3VmSnk4RFNhWXRHbmljK3c9BjsARg%3D%3D--ceb783671372cbcc3ecf971340f2a2e2424c9620; _spacebarfm-rails_session=BAh7B0kiD3Nlc3Npb25faWQGOgZFRkkiJTNhMTcxNjMyNzJkMWRjYzAxY2NiN2EwN2JjMzE5YWFjBjsAVEkiEF9jc3JmX3Rva2VuBjsARkkiMTZZREp5ZC92VTlEQk9oKytCK0hKYW5na1M4bzRPRUM3U3lmWURzU09neWs9BjsARg%3D%3D--793b15b38511bd0805df3a210d7e72152905b8b9
Host:localhost:3000
If-None-Match:"bb012b3a16e7e80ec271b0d234a9b8ce"
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_4) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.56 Safari/536.5
Response Headersview source
Cache-Control:max-age=0, private, must-revalidate
Connection:close
Date:Thu, 21 Jun 2012 21:09:28 GMT
Etag:"bb012b3a16e7e80ec271b0d234a9b8ce"
Server:WEBrick/1.3.1 (Ruby/1.9.3/2012-02-16)
Set-Cookie:_spacebarfm-rails_session=BAh7B0kiD3Nlc3Npb25faWQGOgZFRkkiJTNhMTcxNjMyNzJkMWRjYzAxY2NiN2EwN2JjMzE5YWFjBjsAVEkiEF9jc3JmX3Rva2VuBjsARkkiMTZZREp5ZC92VTlEQk9oKytCK0hKYW5na1M4bzRPRUM3U3lmWURzU09neWs9BjsARg%3D%3D--793b15b38511bd0805df3a210d7e72152905b8b9; path=/; HttpOnly
X-Request-Id:2435168fa20699508e9282c37566c015
X-Runtime:0.005378
X-Ua-Compatible:IE=Edge
You need to use the stale? method to tell Rails that you want to use if-modified-then
You're actually calling the show method. Change it to be:
def show
#show = Show.find(params[:id])
if stale?(:etag => #show, :last_modified => #show.updated_at.utc)
respond_to do |format|
format.html # index.html.erb
format.json { render json: #show }
end
end
end