Rails: save scraped data to seeds.rb [closed]

Rails: save scraped data to seeds.rb [closed] - ruby-on-rails

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
As I'm trying to fetch data and have the scraped data saved to the database under seeds.rb, I realized the same data would't overwrite itself. And as a result I got multiple repetitive data in the database.
The goal is to update the existing data with new info instead of creating new ones.
Here's how I fetch the data:
seed.rb
require 'uri'
require 'net/http'
require 'openssl'
require 'json'
url = URI("https://google-flights-search.p.rapidapi.com/search?departure_airport_code=HND&arrival_airport_code=TPE&departure_date=2022-02-17&flight_class=Economy")
http = Net::HTTP.new(url.host, url.port)
http.read_timeout = 300
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Get.new(url)
request["x-rapidapi-host"] = ENV["x-rapidapi-host"]
request["x-rapidapi-key"] = ENV["x-rapidapi-key"]
response = http.request(request)
dep_hash = JSON.parse(response.read_body)
I planned to only show the cheapest flight and save it to the database, and expected every time I run rails db:seed the data would be updated and overwritten. (as I set the ticket_id to be the same)
dep_flight_data = dep_hash["flights"]
tempPrice = 10000
depHash= {}
for flight in dep_flight_data
if flight["price"]< tempPrice
tempPrice = flight["price"]
depHash = flight
end
end
dep_ticket_id = (depHash["departure_airport_code"]+depHash["arrival_airport_code"]+ depHash["departure_date"]).split('/').join
Ticket.create(
ticket_id: dep_ticket_id,
departure: depHash["departure_airport_code"],
arrival: depHash["arrival_airport_code"],
departure_date: depHash["departure_date"],
ticket_amount: (depHash["price"]*28)
)
current seed data as below:
Is there any way I can update the seeds file correctly?
Any guides are much appreciated!

Instead of just adding all found tickets into the database you need to check if a similar ticket already exists before creating a new one.
This can be done by changing
Ticket.create(
ticket_id: dep_ticket_id,
departure: depHash["departure_airport_code"],
arrival: depHash["arrival_airport_code"],
departure_date: depHash["departure_date"],
ticket_amount: (depHash["price"]*28)
)
to
Ticket
.create_with(
departure: depHash["departure_airport_code"],
arrival: depHash["arrival_airport_code"],
departure_date: depHash["departure_date"],
ticket_amount: (depHash["price"]*28)
)
.find_or_create_by(ticket_id: dep_ticket_id)
See docs for find_or_create_by

Related

How do I get the JSON response from Dialogflow with Rails?

I understand the whole process of dialogflow and I have a working deployed bot with 2 different intents. How do I actually get the response from the bot when a user answers questions? (I set the bot on fulfillment to go to my domain). Using rails 5 app and it's deployed with Heroku.
Thanks!

If you have already set the GOOGLE_APPLICATION_CREDENTIALS path to the jso file, now you can test using a ruby script.
Create a ruby file -> ex: chatbot.rb
Write the code bellow in the file.
project_id = "Your Google Cloud project ID"
session_id = "mysession"
texts = ["hello"]
language_code = "en-US"
require "google/cloud/dialogflow"
session_client = Google::Cloud::Dialogflow::Sessions.new
session = session_client.class.session_path project_id, session_id
puts "Session path: #{session}"
texts.each do |text|
query_input = { text: { text: text, language_code: language_code } }
response = session_client.detect_intent session, query_input
query_result = response.query_result
puts "Query text: #{query_result.query_text}"
puts "Intent detected: #{query_result.intent.display_name}"
puts "Intent confidence: #{query_result.intent_detection_confidence}"
puts "Fulfillment text: #{query_result.fulfillment_text}\n"
end
Insert your project_id. You can find this information on your agent on Dialogflow. Click on the gear on the right side of the Agent's name in the left menu.
Run the ruby file in the terminal or in whatever you using to run ruby files. Then you see the bot replying to the "hello" message you have sent.
Obs: Do not forget to install the google-cloud gem:

Not Entirely familiar with Dilogflow, but if you want to receive a response when an action occurs on another app this usually mean you need to receive web-hooks from them
A WebHook is an HTTP callback: an HTTP POST that occurs when something happens; a simple event-notification via HTTP POST. A web application implementing WebHooks will POST a message to a URL when certain things happen.
I would recommend checking their fulfillment documentation for an example. Hope this helps you out.

Optimal way to structure polling external service (RoR)

I have a Rails application that has a Document with the flag available. The document is uploaded to an external server where it is not immediately available (takes time to propogate). What I'd like to do is poll the availability and update the model when available.
I'm looking for the most performant solution for this process (service does not offer callbacks):
Document is uploaded to app
app uploads to external server
app polls url (http://external.server.com/document.pdf) until available
app updates model Document.available = true
I'm stuck on 3. I'm already using sidekiq in my project. Is that an option, or should I use a completely different approach (cron job).
Documents will be uploaded all the time and so it seems relevant to first poll the database/redis to check for Documents which are not available.

See this answer: Making HTTP HEAD request with timeout in Ruby
Basically you set up a HEAD request for the known url and then asynchronously loop until you get a 200 back (with a 5 second delay between iterations, or whatever).
Do this from your controller after the document is uploaded:
Document.delay.poll_for_finished(#document.id)
And then in your document model:
def self.poll_for_finished(document_id)
document = Document.find(document_id)
# make sure the document exists and should be polled for
return unless document.continue_polling?
if document.remote_document_exists?
document.available = true
else
document.poll_attempts += 1 # assumes you care how many times you've checked, could be ignored.
Document.delay_for(5.seconds).poll_for_finished(document.id)
end
document.save
end
def continue_polling?
# this can be more or less sophisticated
return !document.available || document.poll_attempts < 5
end
def remote_document_exists?
Net::HTTP.start('http://external.server.com') do |http|
http.open_timeout = 2
http.read_timeout = 2
return "200" == http.head(document.path).code
end
end
This is still a blocking operation. Opening the Net::HTTP connection will block if the server you're trying to contact is slow or unresponsive. If you're worried about it use Typhoeus. See this answer for details: What is the preferred way of performing non blocking I/O in Ruby?

Forcing Rails's app.get to redo the request

I'm using the following code to perform a request on the server from within a rake task:
app = ActionDispatch::Integration::Session.new(Rails.application)
app.host!('localhost:3000')
app.get(path)
This works well.
However, if I call app.get(path) again with the same path, the request is not repeated and the previous result is returned.
Is there a way I can force app.get to repeat the call?

Try to reset the session:
app.reset!
Here is how it works when reset,
def reset!
#https = false
#controller = #request = #response = nil
#_mock_session = nil
#request_count = 0
#url_options = nil
self.host = DEFAULT_HOST
self.remote_addr = "127.0.0.1"
self.accept = "text/xml,application/xml,application/xhtml+xml," +
"text/html;q=0.9,text/plain;q=0.8,image/png," +
"*/*;q=0.5"
unless defined? #named_routes_configured
# the helpers are made protected by default--we make them public for
# easier access during testing and troubleshooting.
#named_routes_configured = true
end
end
Otherwise it will just re-use the last response:
# this is a private method in Session, it will called every time you call `get/post, etc`
def process
......
#request_count += 1
#request = ActionDispatch::Request.new(session.last_request.env)
response = _mock_session.last_response
#response = ActionDispatch::TestResponse.new(response.status, response.headers, response.body)
#html_document = nil
....
end
Good luck!

I've worked out what's going on.
Basically the observation "the request is not repeated" is Rails' own caching in action. Which makes sense, app.get is treated as any other request, if caching is enabled, the cache is returned, and if it's not, it will repeat (as #henrikhodne claimed). This explains why a puts in the cached controller will not output the second time.
To verify, add a puts in 2 controller methods, but only set the expires_in in the second. The first one will repeat the output, the second will not.
The way to force the request to repeat is to bust the cache by modifying the URL e.g.
app.get("/") becomes app.get("/?r=123456") as you would if using HTTP. It all seems obvious in hindsight, basically app.get is treated exactly as a client request, and all the same rules apply.

Cucumber-like testing for my API [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I have been writing my Rails application with Cucumber in TDD mode: Tests first, then the code. Now my application needs an API. What I like about cucumber is, that I can specify my tests in plain English, so even managers understand what's going on.
Is there any way I can do this for my JSON-API?

YES! This is totally possible. Have you checked out the Cucumber Book by the Pragmatic Programmer series?
Here's a quick example:
Feature: Addresses
In order to complete the information on the place
I need an address
Scenario: Addresses
Given the system knows about the following addresses:
[INSERT TABLE HERE or GRAB FROM DATABASE]
When client requests GET /addresses
Then the response should be JSON:
"""
[
{"venue": "foo", "address": "bar"},
{ more stuff }
]
"""
STEP DEFINITION:
Given(/^the system knows about the following addresses:$/) do |addresses|
# table is a Cucumber::Ast::Table
File.open('addresses.json', 'w') do |io|
io.write(addresses.hashes.to_json)
end
end
When(/^client requests GET (.*)$/) do |path|
#last_response = HTTParty.get('local host url goes here' + path)
end
Then /^the response should be JSON:$/ do |json|
JSON.parse(#last_response.body).should == JSON.parse(json)
end
ENV File:
require File.join(File.dirname(__FILE__), '..', '..', 'address_app')
require 'rack/test'
require 'json'
require 'sinatra'
require 'cucumber'
require 'httparty'
require 'childprocess'
require 'timeout'
server = ChildProcess.build("rackup", "--port", "9000")
server.start
Timeout.timeout(3) do
loop do
begin
HTTParty.get('local host here')
break
rescue Errno::ECONNREFUSED => try_again
sleep 0.1
end
end
end
at_exit do
server.stop
end

You can definitely achieve this. You can write step definitions to assert/verify your json responses. Something like this
Given a username and password
When I try to login via the API
Then I should get logged in
While this works, this just tests the API ( controllers/actions ) work or not, ie more like "functional" testing, not Acceptance testing. As such you are not going to test the API consumer itself.

[HTTP]how to change "www.example.com" to "localhost:3000"

hi I'm create test use cucumber in my rails apps. in my step scenario I used http basic authenticate, so far it pass the basic authenticate, but when I wanna to call method on controller and post some params, I had problem :
first I use this code in step but failed not cross to method on controller :
post some_admin_url, #params
second I used this code, and failed also, the error is when running the URI.parse redirect to "www.example.com" I want to go "localhost:3000/admin", so I can match the data :
Net::HTTP.post_form(URI.parse(some_admin_url), {'from' => '2005-01-01','to' => '2005-03-31'}) { |i| }
FUTURE :
#selenium
Scenario: Admin want to activate user
Given one user logged in as admin
And admin page
STEPS :
Given /^one user logged in as admin$/ do
create_user_admin
visit '/user_sessions/new'
fill_in 'user_session[login]', :with=>'siadmin'
fill_in 'user_session[password]', :with=>'12345'
click_button 'Anmelden'
end
Given /^admin page$/ do
require "net/http"
require "uri"
uri = URI.parse(user_action_admin_users_url)<br/>
http = Net::HTTP.new(uri.host, uri.port)<br/>
request = Net::HTTP::Get.new(uri.request_uri)<br/>
request.basic_auth("username", "password")<br/>
response = http.request(request)<br/>
end
enter code here
HELP !!!
THANKS

fl00r's answer probably works, but wouldn't catch redirects.
If you want to catch redirects you have to change this in a higher level. For some test frameworks there is a way to set the default host.
Using Capybara we did that:
Capybara.default_host = "subdomain.yourapp.local"
Now we actually use a Rack middleware (in test env only :), that changes changes the env[HTTP_HOST] transparently for the application, so we don't have to care about which testframework / browser driver we use.
What framework do you use?

uname="myapp"
Capybara.default_host = "http://#{uname}.mywebsite.com:3000"
Capybara.server_port = 3000 # could be any of your choice
Capybara.app_host = "http://#{uname}.mywebsite.com:#{Capybara.server_port}"
OR
request.host="www.myapp.com"
,when you are extending ActionController::TestCase

try to change url to path
post some_admin_path, #params
or
post "http://localhost:3000/#{some_admin_url}", #params

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Rails: save scraped data to seeds.rb [closed] - ruby-on-rails

Related

How do I get the JSON response from Dialogflow with Rails?

Optimal way to structure polling external service (RoR)

Forcing Rails's app.get to redo the request

Cucumber-like testing for my API [closed]

[HTTP]how to change "www.example.com" to "localhost:3000"

Categories

Resources