rails too many open files using Feedzirra - ruby-on-rails

I am getting a LoadError - "Too many open files" when using Feedzirra. I am running it on my development server using the default WEBrick server.
I am parsing only 2 feeds. What is the problem?

I had the same issue with Feedzirra. You can notice that it leaves TCP connections in CLOSE_WAIT state forever, hence causing the problem.
It appears to be curb gem specific that is used to fetch feeds. Another project depending on libcurl had the same issue. They have fixed it by setting 'CURLOPT_FORBID_REUSE' option.
I've tried to do the same for Feedzirra but didn't succeed. Even with this option I had a growing number of CLOSE_WAIT sessions and Too many open files error eventually.
So I did the most straightforward thing, I download feeds using Net::HTTP:
def get_contents(furl)
url = URI.parse(furl)
req = Net::HTTP::Get.new(url.to_s)
res = Net::HTTP.start(url.host, url.port) { |http|
http.request(req)
}
unless res.kind_of? Net::HTTPSuccess
puts "can't get feed #{url.to_s}: #{res.code}"
return nil
end
res.body
end
Then I parse the XML with Feedzirra:
xml = get_contents(furl)
feedin = Feedzirra::Feed.parse xml
No more stuck connections and no more errors. You may also want to add better error handling to this sample code.

Related

How to read POST requests in Lua?

I have this Telegram bot written in Lua that I am doing as a hobby for a language network. And I have been reading new messages via the getUpdates API call all the time. Now I want to rewrite it to use webhooks, but I have no experience with that whatsoever. I have googled but didn't find anything certain. I kinda feel that WSAPI is the library to use, but I am not sure. Moreover, I am not really sure I need any special library just for reading POST requests (which is all that the Telegram bot API uses). I tried using sockets:
socket = require 'socket'
server = assert(socket.bind("*", 9000))
function read(client, pattern, prefix)
local data, emsg, partial = client:receive(pattern, prefix)
if data then
return data
end
if partial and #partial > 0 then
return partial
end
return nil, emsg
end
while true do
local client = server:accept()
client:settimeout(3)
local msg, err = read(client, '*a')
if not err then
print(msg)
client:close()
end
end
The print(msg) here gives me the full POST request including headers, which I am probably able to parse (the body is supposed to always be a JSON). I am not really that familiar with HTTP requests though and I'm not sure I can just throw away everything that goes before the first {.
My setup is Lua 5.2, Ubuntu x64 16.04 and Nginx. What I need to do is to receive and read POST requests, nothing more.
TL;DR: is it okay to parse the POST request I receive from the code above or am I missing something, like a library that'd make my life easier?
Thanks!

How to parse a very huge XML file from a remote server rails

I have a very large XML from a remote server which I have to parse and get the data.
I have tried to open the file using the open() function but it is taking more than 15 minutes and still no response.
Then I tried Nokogiri::XML(open(URL)) where URL is the link which contains the data to parse.
Also, I have tried using Net::HTTP::Get but again with no fruitful results.
Can anyone suggest which gem and function can be used to parse the data?
As mentioned before, Nokogiri::XML::Reader is your friend here. The example in the documentation works fine if you have the file locally.
It is also possible to parse the data as soon as it comes in, fully streaming. This involves getting the data in chunks (e.g. using Net::HTTP) and connecting it to the Nokogiri::XML::Reader by means of an IO.pipe.
Example (adapted from this gist):
require 'nokogiri'
require 'net/http'
# setup request
uri = URI("http://example.com/articles.xml")
req = Net::HTTP::Get.new(uri.request_uri)
# read response in a separate thread using a pipe to communicate
rd, wr = IO.pipe
reader_thread = Thread.new do
Net::HTTP.start(uri.host, uri.port, use_ssl: uri.scheme == 'https') do |http|
http.request(req) do |response|
response.read_body {|chunk| wr.write(chunk) }
end
wr.close
end
end
# parse the incoming data chunk by chunk
reader = Nokogiri::XML::Reader(rd)
reader.each do |node|
next if node.node_type != Nokogiri::XML::Reader::TYPE_ELEMENT
next if node.name != "article"
# now that we have the desired fragment, put it to use
doc = Nokogiri::XML(node.outer_xml)
puts("Got #{doc.text}")
end
rd.close
# let the reader thread finish cleanly
reader_thread.join
If you are working with large XML files then you can use Nokogiri::XML::Reader class. I have successfully opened 1 GB files without any problems. For optimal performance you could download the file first and then parse it using XML::Reader class localy on your server
The usage is something like this (replace XML_FILE with your path):
Nokogiri::XML::Reader(File.open(XML_FILE)).each do |node|
if node.name == 'Node' && node.node_type == Nokogiri::XML::Reader::TYPE_ELEMENT
puts node.outer_xml # you can do something like this also Nokogiri::XML(node.outer_xml).at('./Node')
end
end
Heere is the documentation: http://www.rubydoc.info/github/sparklemotion/nokogiri/master/Nokogiri/XML/Reader
Hope it helps

Rails net:http PUT request - curl - Tika server

I have a remote TIKA server set up and I'm trying to use it from within a RoR application. I need to pull a file from a remote location and send it on to the Tika server. The wiki for TikaJAXRS gives an example using curl, but I have not been able to get that to work. What does work is this:
curl https://mydomain.s3.amazonaws.com/uploads/testdocument.docx | curl -v -i -X PUT -T - ec2...154.uswest2.compute.amazonaws.com:9998/tika
How do I render this in my Rails app using net::http? I've successfully written a GET request with net::http to the Tika server from the Rails app and gotten back the expected result, but the documentation on PUT is a bit sparse. (The server does require a PUT rather than POST.)
BTW, if anyone knows how to make that last example in that wiki work and render it in net::http, that would be even better!
Addendum:
Here's what I have in the RoR app that doesn't work:
ENDPOINT = "http://ec2...154.us-west-2.compute.amazonaws.com:9998"
file = "https://mydomain.s3.amazonaws.com/uploads/testdocument.docx"
uri = URI.parse(endpoint)
#http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Put.new("/tika")
request.body = URI.parse(file).read
#response = #http.request(request)
and I get back a code 415
I need to know how to change this code to do what the curl commands (curl remote_file piped to curl PUT) are doing successfully.
Update
After a couple of days of fruitless attempts on this, I have a workaround:
gem 'curb'
#response = Curl.put("http://ec2...154.us-west-2.compute.amazonaws.com:9998/tika",
Curl.get("https://mydomain.s3.amazonaws.com/uploads/testdocument.docx").body_str)
While this does provide a solution to my immediate problem, I still want to know how to implement this same functionality more directly by using Net::HTTP.

Trying to connect to a "digest authentication" webservice using HTTParty or Net:HTTP (or etc)

I have been trying to connect to a web service that is using digest authentication.
I am able to connect in Safari using user:password#service.site.com/endpoint
I have tried in Ruby and Rails to connect using HTTParty and Net:HTTP using the "basic"auth" options, but have not had any luck.
Wondering if the HTTParty/Net:HTTP "basic_auth" option is not going to be compatible with a "digest auth" service?
If not, is there another way that I might connect?
HTTParty basic auth is apparently not compatible with digest_auth. I found this Net:HTTP extension: https://codesnippets.joyent.com/posts/show/1075 and am writing a method to handle this, with the help of the Crack gem http://github.com/jnunemaker/crack:
def self.decode vin
url = URI.parse(APP_CONFIG[:vinlink_url])
Net::HTTP.start(url.host) do |http|
res = http.head(url.request_uri)
req = Net::HTTP::Get.new("/report?type=basic&vin=#{vin}")
req.digest_auth(APP_CONFIG[:vinlink_login], APP_CONFIG[:vinlink_password], res)
#response = http.request(req)
end
if #response.code == "200"
hash = Crack::XML.parse(#response.body).recursive_downcase_keys!.recursive_symbolize_keys!
end
end
Wasn't able to get to the codesnippets link given above today, but code is also available here https://gist.github.com/73102. I've used this successfully for digest authentication, but ran into problems with multiple request, getting 'Stale client nonce' errors - resolved by generating a new nonce within the digest_auth function each time it was called. Didn't find much on that when I looked, so hope this helps someone.

What's the best way to use SOAP with Ruby?

A client of mine has asked me to integrate a 3rd party API into their Rails app. The only problem is that the API uses SOAP. Ruby has basically dropped SOAP in favor of REST. They provide a Java adapter that apparently works with the Java-Ruby bridge, but we'd like to keep it all in Ruby, if possible. I looked into soap4r, but it seems to have a slightly bad reputation.
So what's the best way to integrate SOAP calls into a Rails app?
I built Savon to make interacting with SOAP webservices via Ruby as easy as possible.
I'd recommend you check it out.
We used the built in soap/wsdlDriver class, which is actually SOAP4R.
It's dog slow, but really simple. The SOAP4R that you get from gems/etc is just an updated version of the same thing.
Example code:
require 'soap/wsdlDriver'
client = SOAP::WSDLDriverFactory.new( 'http://example.com/service.wsdl' ).create_rpc_driver
result = client.doStuff();
That's about it
We switched from Handsoap to Savon.
Here is a series of blog posts comparing the two client libraries.
I also recommend Savon. I spent too many hours trying to deal with Soap4R, without results. Big lack of functionality, no doc.
Savon is the answer for me.
Try SOAP4R
SOAP4R
Getting Started with SOAP4R
And I just heard about this on the Rails Envy Podcast (ep 31):
WS-Deathstar SOAP walkthrough
Just got my stuff working within 3 hours using Savon.
The Getting Started documentation on Savon's homepage was really easy to follow - and actually matched what I was seeing (not always the case)
Kent Sibilev from Datanoise had also ported the Rails ActionWebService library to Rails 2.1 (and above).
This allows you to expose your own Ruby-based SOAP services.
He even has a scaffold/test mode which allows you to test your services using a browser.
I have used HTTP call like below to call a SOAP method,
require 'net/http'
class MyHelper
def initialize(server, port, username, password)
#server = server
#port = port
#username = username
#password = password
puts "Initialised My Helper using #{#server}:#{#port} username=#{#username}"
end
def post_job(job_name)
puts "Posting job #{job_name} to update order service"
job_xml ="<soapenv:Envelope xmlns:soapenv=\"http://schemas.xmlsoap.org/soap/envelope/\" xmlns:ns=\"http://test.com/Test/CreateUpdateOrders/1.0\">
<soapenv:Header/>
<soapenv:Body>
<ns:CreateTestUpdateOrdersReq>
<ContractGroup>ITE2</ContractGroup>
<ProductID>topo</ProductID>
<PublicationReference>#{job_name}</PublicationReference>
</ns:CreateTestUpdateOrdersReq>
</soapenv:Body>
</soapenv:Envelope>"
#http = Net::HTTP.new(#server, #port)
puts "server: " + #server + "port : " + #port
request = Net::HTTP::Post.new(('/XISOAPAdapter/MessageServlet?/Test/CreateUpdateOrders/1.0'), initheader = {'Content-Type' => 'text/xml'})
request.basic_auth(#username, #password)
request.body = job_xml
response = #http.request(request)
puts "request was made to server " + #server
validate_response(response, "post_job_to_pega_updateorder job", '200')
end
private
def validate_response(response, operation, required_code)
if response.code != required_code
raise "#{operation} operation failed. Response was [#{response.inspect} #{response.to_hash.inspect} #{response.body}]"
end
end
end
/*
test = MyHelper.new("mysvr.test.test.com","8102","myusername","mypassword")
test.post_job("test_201601281419")
*/
Hope it helps. Cheers.
I have used SOAP in Ruby when i've had to make a fake SOAP server for my acceptance tests. I don't know if this was the best way to approach the problem, but it worked for me.
I have used Sinatra gem (I wrote about creating mocking endpoints with Sinatra here) for server and also Nokogiri for XML stuff (SOAP is working with XML).
So, for the beginning I have create two files (e.g. config.rb and responses.rb) in which I have put the predefined answers that SOAP server will return.
In config.rb I have put the WSDL file, but as a string.
##wsdl = '<wsdl:definitions name="StockQuote"
targetNamespace="http://example.com/stockquote.wsdl"
xmlns:tns="http://example.com/stockquote.wsdl"
xmlns:xsd1="http://example.com/stockquote.xsd"
xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"
xmlns="http://schemas.xmlsoap.org/wsdl/">
.......
</wsdl:definitions>'
In responses.rb I have put samples for responses that SOAP server will return for different scenarios.
##login_failure = "<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/">
<s:Body>
<LoginResponse xmlns="http://tempuri.org/">
<LoginResult xmlns:a="http://schemas.datacontract.org/2004/07/WEBMethodsObjects" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<a:Error>Invalid username and password</a:Error>
<a:ObjectInformation i:nil="true"/>
<a:Response>false</a:Response>
</LoginResult>
</LoginResponse>
</s:Body>
</s:Envelope>"
So now let me show you how I have actually created the server.
require 'sinatra'
require 'json'
require 'nokogiri'
require_relative 'config/config.rb'
require_relative 'config/responses.rb'
after do
# cors
headers({
"Access-Control-Allow-Origin" => "*",
"Access-Control-Allow-Methods" => "POST",
"Access-Control-Allow-Headers" => "content-type",
})
# json
content_type :json
end
#when accessing the /HaWebMethods route the server will return either the WSDL file, either and XSD (I don't know exactly how to explain this but it is a WSDL dependency)
get "/HAWebMethods/" do
case request.query_string
when 'xsd=xsd0'
status 200
body = ##xsd0
when 'wsdl'
status 200
body = ##wsdl
end
end
post '/HAWebMethods/soap' do
request_payload = request.body.read
request_payload = Nokogiri::XML request_payload
request_payload.remove_namespaces!
if request_payload.css('Body').text != ''
if request_payload.css('Login').text != ''
if request_payload.css('email').text == some username && request_payload.css('password').text == some password
status 200
body = ##login_success
else
status 200
body = ##login_failure
end
end
end
end
I hope you'll find this helpful!
I was having the same issue, switched to Savon and then just tested it on an open WSDL (I used http://www.webservicex.net/geoipservice.asmx?WSDL) and so far so good!
https://github.com/savonrb/savon

Resources