Signal to let connection open - ruby-on-rails

I'm currently wondering how tell my Rails app to not close a connection according to some data.
Let imagine I play a music, a very long one like 50 minutes. When I start playing this music, I also start to stream (preload) the second one (without playing it).
When my first music is at end, the second will fail at the end of what it was able to pre download because there were not any new bytes downloaded and the server will consider this request as fail (timeout).
Of course I don't want to increase the timeout. Everybody knows that to increase timeout may have more bad things than good.
I was wondering how send something like a ping to not consider this stream request as failed.
Here is my code Rails code:
send_data file.read,
:status => status_code,
:stream => 'true',
:disposition => 'inline'

You reinventing the wheel. You need to include ActionController::Live to enable streaming in rails. This will solve your problem with timeout but you must to close all streams manually, remember that.
Here is example how to use that module:
class StreamingController < ApplicationController
include ActionController::Live
def send_something
response.headers['Content-Type'] = 'text/event-stream'
10.times {
response.stream.write "This message will be repeated 10 times with delay in 1 second.\n"
sleep 1
}
response.stream.close
end
end
ActionController::Live documentation page
Also SSE might be useful for you. Check it out too.

Related

Ruby memory (allocations) spikes when handling base64 strings

I have a rails instance which on average uses about 250MB of memory. Lately I'm having issues with some really heavy spikes of memory usage which results in a response time of about ~25s. I have an endpoint which takes some relative simple params and base64 strings which are being send over to AWS.
See the image below for the correlation between memory/response time.
Now, when I look at some extra logs what's specifically happening during that time, I found something interesting.
First of all, I find the net_http memory allocations extremely high. Secondly, the update operation took about 25 sec in total. When I closely look at the timeline, I noticed some "blank gaps", between ~5 and ~15 seconds. The specific operations that are being done during those HTTP calls is from my perspective nothing special. But I'm a bit confused why those gaps occur, maybe someone could tell me a bit about that?
The code that's handling the requests:
def store_documents
identity_documents.each do |side, content|
is_file = content.is_a?(ActionDispatch::Http::UploadedFile)
file_extension = is_file ? content : content[:name]
file_name = "#{SecureRandom.uuid}_#{side}#{File.extname(file_extension)}"
if is_file
write_to_storage_service(file_name, content.tempfile.path)
else
write_file(file_name, content[:uri])
write_to_storage_service(file_name, file_name)
delete_file(file_name)
end
store_object_key_on_profile(side, file_name)
end
end
# rubocop:enable Metrics/MethodLength
def write_file(file_name, base_64_string)
File.open(file_name, 'wb') do |f|
f.write(
Base64.decode64(
get_content(base_64_string)
)
)
end
end
def delete_file(file_name)
File.delete(file_name)
end
def write_to_storage_service(file_name, path)
S3_IDENTITY_BUCKET
.object(file_name)
.upload_file(path)
rescue Aws::Xml::Parser::ParsingError => e
log_error(e)
add_errors(base: e)
end
def get_content(base_64_string)
base_64_string.sub %r{data:((image|application)/.{3,}),}, ''
end
def store_object_key_on_profile(side, file_name)
profile.update("#{side}_identity_document_object_key": file_name)
end
def identity_documents
{
front: front_identity_document,
back: back_identity_document
}
end
def front_identity_document
#front_identity_document ||= identity_check_params[:front_identity_document]
end
def back_identity_document
#back_identity_document ||= identity_check_params[:back_identity_document]
end
I tend towards some issues with Ruby GC, or perhaps Ruby doesn't have enough pages available to directly store the base64 string in memory? I know that Ruby 2.6 and Ruby 2.7 had some large improvements regarding memory fragmentation, but that didn't change much either (currently running Ruby 2.7.1)
I have my Heroku resources configured to use Standard-2x dynos (1GB ram) x3. WEB_CONCURRENCY(workers) is set to 2, and amount of threads is set to 5.
I understand that my questions are rather broad, I'm more interested in some tooling, or ideas that could help to narrow my scope. Thanks!

How can I get around Heroku's HTTP 30 second limit?

I inherited a rails app that is deployed using Heroku (I think). I edit it on AWS's Cloud9 IDE and, for now, just do everything in development mode. The app's purpose is to process large amounts of survey data and spit it out onto a PDF report. This works for small reports with like 10 rows of data, but when I load a report that is querying a data upload of 5000+ rows to create an HTML page which gets converted to a PDF, it takes around 105 seconds, much longer than Heroku's 30 seconds allotted for HTTP requests.
Heroku says this on their website, which gave me some hope:
"Heroku supports HTTP 1.1 features such as long-polling and streaming responses. An application has an initial 30 second window to respond with a single byte back to the client. However, each byte transmitted thereafter (either received from the client or sent by your application) resets a rolling 55 second window. If no data is sent during the 55 second window, the connection will be terminated." (Source: https://devcenter.heroku.com/articles/request-timeout#long-polling-and-streaming-responses)
This sounds excellent to me - I can just send a request to the client every second or so in a loop until we're done creating the large PDF report. However, I don't know how to send or receive a byte or so to "reset the rolling 55 second window" they're talking about.
Here's the part of my controller that is sending the request.
return render pdf: pdf_name + " " + pdf_year.to_s,
disposition: 'attachment',
page_height: 1300,
encoding: 'utf8',
page_size: 'A4',
footer: {html: {template: 'recent_grad/footer.html.erb'}, spacing: 0 },
margin: { top: 10, # default 10 (mm)
bottom: 20,
left: 10,
right: 10 },
template: "recent_grad/report.html.erb",
locals: {start: #start, survey: #survey, years: #years, college: #college, department: #department, program: #program, emphasis: #emphasis, questions: #questions}
I'm making other requests to get to this point, but I believe the part that is causing the issue is here where the template is being rendered. My template queries the database in a finite loop that stops when it runs out of survey questions to query from.
My question is this: how can I "send or receive a byte to the client" to tell Heroku "I'm still trying to create this massive PDF so please reset the timer and give me my 55 seconds!" Is it in the form of a query? Because, if so, I am querying the MySql database over and over again in my report.html.erb file.
Also, it used to work without issues and does work on small reports, but now I get the error "504 Gateway Timeout" before the request is complete on the actual page, but my puma console continues to query the database like a mad man. I assume it's a Heroku problem because the 504 error happens exactly every 35 seconds (5 seconds to process the other parts and 30 seconds to try to finish the loop in the template so it can render correctly).
If you need more information or code, please ask! Thanks in advance
EDIT:
Both of the comments below suggest possible duplicates, but neither of them have a real answer with real code, they simply refer to the docs that I am quoting here. I'm looking for a code example (or at least a way to get my foot in the door), not just a link to the docs. Thanks!
EDIT 2:
I tried what #Sergio said and installed SideKiq. I think I'm really close, but still having some issues with the worker. The worker doesn't have access to ActionView::Base which is required for the render method in rails, so it's not working. I can access the worker method which means my sidekiq and redis servers are running correctly, but it gets caught on the ActionView line with this error:
WARN: NameError: uninitialized constant HardWorker::ActionView
Here's the worker code:
require 'sidekiq'
Sidekiq.configure_client do |config|
# config.redis = { db: 1 }
config.redis = { url: 'redis://172.31.6.51:6379/0' }
end
Sidekiq.configure_server do |config|
# config.redis = { db: 1 }
config.redis = { url: 'redis://172.31.6.51:6379/0' }
end
class HardWorker
include Sidekiq::Worker
def perform(pdf_name, pdf_year)
av = ActionView::Base.new()
av.view_paths = ActionController::Base.view_paths
av.class_eval do
include Rails.application.routes.url_helpers
include ApplicationHelper
end
puts "inside hardworker"
puts pdf_name, pdf_year
av.render pdf: pdf_name + " " + pdf_year.to_s,
disposition: 'attachment',
page_height: 1300,
encoding: 'utf8',
page_size: 'A4',
footer: {html: {template: 'recent_grad/footer.html.erb'}, spacing: 0 },
margin: { top: 10, # default 10 (mm)
bottom: 20,
left: 10,
right: 10 },
template: "recent_grad/report.html.erb",
locals: {start: #start, survey: #survey, years: #years, college: #college, department: #department, program: #program, emphasis: #emphasis, questions: #questions}
end
end
Any suggestions?
EDIT 3:
I did what #Sergio said and attempted to make a PDF from an html.erb file directly and save it to a file. Here's my code:
# /app/controllers/recentgrad_controller.rb
pdf = WickedPdf.new.pdf_from_html_file('home/ec2-user/environment/gradSurvey/gradSurvey/app/views/recent_grad/report.html.erb')
save_path = Rails.root.join('pdfs', pdf_name + pdf_year.to_s + '.pdf')
File.open(save_path, 'wb') do |file|
file << pdf
end
And the error output:
RuntimeError (Failed to execute:
["/usr/local/rvm/gems/ruby-2.4.1#gradSurvey/bin/wkhtmltopdf", "file:///home/ec2-user/environment/gradSurvey/gradSurvey/app/views/recent_grad/report.html.erb", "/tmp/wicked_pdf_generated_file20190523-15416-hvb3zg.pdf"]
Error: PDF could not be generated!
Command Error: Loading pages (1/6)
Error: Failed loading page file:///home/ec2-user/environment/gradSurvey/gradSurvey/app/views/recent_grad/report.html.erb (sometimes it will work just to ignore this error with --load-error-handling ignore)
Exit with code 1 due to network error: ContentNotFoundError
):
I have no idea what it means when it says "sometimes it will work just to ignore this error with --load-error-handling ignore". The file definitely exists and I've tried maybe 5 variations of the file path.
I've had to do something like this several times. In all cases, I ended up writing a background job that does all the heavy lifting generation. And because it's not a web request, it's not affected by the 30 seconds timeout. It goes something like this:
client (your javascript code) requests a new report.
server generates job description and enqueues it for your worker to pick up.
worker picks the job from the queue and starts working (querying database, etc.)
in the meanwhile, client periodically asks the server "is my report done yet?". Server responds with "not yet, try again later"
worker is finished generating the report. It uploads the file to some storage (S3, for example), sets job status to "completed" and job result to the download link for the uploaded report file.
server, seeing that job is completed, can now respond to client status update requests "yes, it's done now. Here's the url. Have a good day."
Everybody's happy. And nobody had to do any streaming or playing with heroku's rolling response timeouts.
The scenario above uses short-polling. I find it the easiest to implement. But it is, of course, a bit wasteful with regard to resources. You can use long-polling or websockets or other fancy things.
Check my response here just in case it works for you. I didnĀ“t wanted to change the user workflow adding a bg job and then a place/notification to get the result.
I use Rails controller streaming support with Live module and set the right reponse headers. I fetch the data from some Enumerable object.

How to send binary file over Web Sockets with Rails

I have a Rails application where users upload Audio files. I want to send them to a third party server, and I need to connect to the external server using Web sockets, so, I need my Rails application to be a websocket client.
I'm trying to figure out how to properly set that up. I'm not committed to any gem just yet, but the 'faye-websocket' gem looks promising. I even found a similar answer in "Sending large file in websocket before timeout", however, using that code doesn't work for me.
Here is an example of my code:
#message = Array.new
EM.run {
ws = Faye::WebSocket::Client.new("wss://example_url.com")
ws.on :open do |event|
File.open('path/to/audio_file.wav','rb') do |f|
ws.send(f.gets)
end
end
ws.on :message do |event|
#message << [event.data]
end
ws.on :close do |event|
ws = nil
EM.stop
end
}
When I use that, I get an error from the recipient server:
No JSON object could be decoded
This makes sense, because the I don't believe it's properly formatted for faye-websocket. Their documentation says:
send(message) accepts either a String or an Array of byte-sized
integers and sends a text or binary message over the connection to the
other peer; binary data must be encoded as an Array.
I'm not sure how to accomplish that. How do I load binary into an array of integers with Ruby?
I tried modifying the send command to use the bytes method:
File.open('path/to/audio_file.wav','rb') do |f|
ws.send(f.gets.bytes)
end
But now I receive this error:
Stream was 19 bytes but needs to be at least 100 bytes
I know my file is 286KB, so something is wrong here. I get confused as to when to use File.read vs File.open vs. File.new.
Also, maybe this gem isn't the best for sending binary data. Does anyone have success sending binary files in Rails with websockets?
Update: I did find a way to get this working, but it is terrible for memory. For other people that want to load small files, you can simply File.binread and the unpack method:
ws.on :open do |event|
f = File.binread 'path/to/audio_file.wav'
ws.send(f.unpack('C*'))
end
However, if I use that same code on a mere 100MB file, the server runs out of memory. It depletes the entire available 1.5GB on my test server! Does anyone know how to do this is a memory safe manner?
Here's my take on it:
# do only once when initializing Rails:
require 'iodine/client'
Iodine.force_start!
# this sets the callbacks.
# on_message is always required by Iodine.
options = {}
options[:on_message] = Proc.new do |data|
# this will never get called
puts "incoming data ignored? for:\n#{data}"
end
options[:on_open] = Proc.new do
# believe it or not - this variable belongs to the websocket connection.
#started_upload = true
# set a task to send the file,
# so the on_open initialization doesn't block incoming messages.
Iodine.run do
# read the file and write to the websocket.
File.open('filename','r') do |f|
buffer = String.new # recycle the String's allocated memory
write f.read(65_536, buffer) until f.eof?
#started_upload = :done
end
# close the connection
close
end
end
options[:on_close] = Proc.new do |data|
# can we notify the user that the file was uploaded?
if #started_upload == :done
# we did it :-)
else
# what happened?
end
end
# will not wait for a connection:
Iodine::Http.ws_connect "wss://example_url.com", options
# OR
# will wait for a connection, raising errors if failed.
Iodine::Http::WebsocketClient.connect "wss://example_url.com", options
It's only fair to mention that I'm Iodine's author, which I wrote for use in Plezi (a RESTful Websocket real time application framework you can use stand alone or within Rails)... I'm super biased ;-)
I would avoid the gets because it's size could include the whole file or a single byte, depending on the location of the next End Of Line (EOL) marker... read gives you better control over each chunk's size.

Ruby timeout does not work in Rails?

I'm having an issue trying to get a timeout when connecting via TCPSocket to a remote resource that isn't available. It just hangs indefinitely without timing out. Ideally I'd want it to try reconnect every 2 minutes or so, but the TCPSocket.new call seems to block. I've tried using timeout() but that doesn't do anything either. Trying the same call in an IRB instance works perfectly fine, but when it's in Rails, it fails. Anyone have a work around for this?
My code looks something as follows:
def self.connect!
##connection = TCPSocket.new IP, 4449
end
def self.send(cmd)
puts "send "
unless ##connection
self.connect!
end
loop do
begin
##connection.puts(cmd)
return
rescue IOError
sleep(self.get_reconnect_delay)
self.connect!
end
end
end
Unfortunately, there is currently no way to set timeouts on TCPSocket directly.
See http://bugs.ruby-lang.org/issues/5101 for the feature request. You will have use the basic Socket class and set socket options.

Reading from TCPSocket is slow in Ruby / Rails

I have this simple piece of code that writes to a socket, then reads the response from the server. The server is very fast (responds within 5ms every time). However, while writing to the socket is quick -- reading the response from the socket is always MUCH slower. Any clues?
module UriTester
module UriInfo
class << self
def send_receive(socket, xml)
# socket = TCPSocket.open("service.server.com","2316")
begin
start = Time.now
socket.print(xml) # Send request
puts "just printed the xml into socket #{Time.now - start}"
rescue Errno::ECONNRESET
puts "looks like there is an issue!!!"
socket = TCPSocket.open("service.server.com","2316")
socket.print(xml) # Send request
end
response=""
while (line =socket.recv(1024))
response += line
break unless line.grep(/<\/bcap>/).empty?
end
puts "SEND_RECEIVE COMPLETED. IN #{Time.now - start}"
# socket.close
response
end
end
end
end
Thanks!
Writing to the socket will always be way faster than reading in this case because the write is local to the machine and the read has to wait for a response to come over the network.
In more detail, when the call to write / send returns all the system is telling you is that N bytes have been successfully copied to the sockets kernel space buffer. It does not mean that the data has actually been sent across the network yet. In fact, the data can sit in the socket buffer for quite a long time ( assuming you're using TCP ). This is because of something called the Nagle algorithm which is intended to make efficient use of network bandwidth. This unseen Nagle delay adds to the round trip time and how long till you get a response. The server might also delay it's response for the same reason adding even more to the response time.
So when you time the write and it returns quickly, that doesn't actually mean anything useful.
As mentioned earlier the read from the socket will be way longer since when you time the read you are actually timing the round trip travel time plus server response time, which is always going to be way slower than the time it takes to copy data from a user space program to a kernel buffer.
What is it that you are actually trying to measure and why?

Resources