I am quite new at Ruby/Rails. I am building a service that make an API available to users and ends up with some files created in the local filesystem, without any need to connect to any database. Then, once every few hours, I want to run a piece of ruby code that takes these local files, uploads them to Amazon S3 and registers their location into a Postgres database.
Right now both codes live together in the same project. I am observing that every time a user does something the system connects to the database. I have seen this answer which recommends to eliminate all traces of ActiveRecord in my code, but given that I want to have my background bookkeeping process connect to the database I am stuck on what to do.
Is it possible to define two different profiles (one with database and one without) and specify which profile a certain function call should run on? would this work?
I'm a bit confused by this, the db does not magically connect to the database for kicks on every request, it does so because of a specific request requires it. Generally through ActiveRecord but not exclusively
If your system is connecting every time you make a request, then that implies you have some sort of user metric or authorisation based code in there. Just killing off the database will cause this to fail, and likely you'll have to find it anyways, to then get your system to work. I'd advise locating it.
Things to look for are before_filters in controllers, or database session management, for example, or look for what is in the logs - the query should appear - and that will tell you what is being loaded, modified or whatnot.
It might even work to stop your database, just before doing a user activity, and see where the error leads you. Rinse and repeat until the user activity works, without the database.
Related
Is it possible to access to the information being saved into a rails log file without reading the log file. To be clear I do not want to send the log file as a batch process but rather every event that is written into the log file I want to also send as a background job to a separate database.
I have multiple apps running in docker containers and wish to save the log entries of each into a shared telemetry database running on the server. Currently the logs are formatted with lograge but I have not figured out how to access this information directly and send it to a background job to be processed.(as stated before I would like direct access to the data being written to the log and send that via a background job)
I am aware of the command Rails.logger.instance_variable_get(:#logger) however what I am looking for is the actual data being saved to the logs so I can ship it to a database.
The reasoning behind this is that there are multiple rails api's running in docker containers. I have an after action set up to run a background job that I hoped would send just the individual log entry but this is where I am stuck. Sizing isn't an issue as the data stored in this database to be purged every 2 weeks. This is moreso a tool for the in-house devs to track telemetry through a dashboard. I appreciate you taking the time to respond
You would probably have to go through your app code and manually save the output from the logger into a table/field in your database inline. Theoretically, any data that ends up in your log should be accessible from within your app.
Depending on what how much data you're planning on saving this may not be the best idea as it has the potential to grow your database extremely quickly (it's not uncommon for apps to create GBs worth of logs in a single day).
You could write a background job that opens the log files, searches for data, and saves it to your database, but the configuration required for that will depend largely on your hosting setup.
So I got a solution working and in fairness it wasn't as difficult as I had thought. As I was using the lograge gem for formatting the logs I created a custom formatter through the guide in this Link.
As I wanted the Son format I just copied this format but was able to put in the call for a background job at this point and also cleanse some data I did not want.
module Lograge
module Formatters
class SomeService < Lograge::Formatters::Json
def call(data)
data = data.delete_if do |k|
[:format, :view, :db].include? k
end
::JSON.dump(data)
# faktory job to ship data
LogSenderJob.perform_async(data)
super
end
end
end
end
This was just one solution to the problem that was made easier as I was able to get the data formatted via lograge but another solution was to create a custom logger and in there I could tell it to write to a database if necessary.
Im developing a azure website where users can upload blob and metadata. I want uploaded stuff too be deleted after some time.
The only way i can think off is going for a cloudapp instead of a website with a worker role that checks like every hour if the uploaded file has expired and continue and delete it. However im going for a simple website here without workerroles.
I have a function that checks if the uploaded item should be deleted and if the user do something on the page i can easily call this function, BUT.. If the user isnt doing anything and the time runs out it wont delete it because the user never calls the function.. The storage will never be deleted. How would you solve this?
Thanks
Too broad to give one right answer, as you can solve this in many ways. But... from an objective perspective because you're using Web Sites I do suggest you look at Web Jobs and see if this might be the right tool for you (as this gives you the ability to run periodic jobs without the bulk of extra VMs in web/worker configuration). You'll still need a way to manage your metadata to know what to delete.
Regarding other Azure-specific built-in mechanisms, you can also consider queuing delete messages, with an invisibility time equal to the time the content is to be available. After that time expires, the queue message becomes visible, and any queue consumer would then see the message and be able to act on it. This can be your Web Job (which has SDK support for queues) or really any other mechanism you build.
Again, a very broad question with no single right answer, so I'm just pointing out the Azure-specific mechanisms that could help solve this particular problem.
Like David said in his answer, there can be many solutions to your problem. One solution could be to rely on blob itself. In this approach you can periodically fetch the list of blobs in the blob container and decide if the blob should be removed or not. The periodic fetching could be done through a Azure WebJob (if application is deployed as a website) or through a Azure Worker Role. Worker role approach is independent of how your main application is deployed. It could be deployed as a cloud service or as a website.
With that, there are two possible approaches you can take:
Rely on Blob's Last Modified Date: Whenever a blob is updated, its Last Modified property gets updated. You can use that to identify if the blob should be deleted or not. This approach would work best if the uploaded blob is never modified.
Rely on Blob's custom metadata: Whenever a blob is uploaded, you could set the upload date/time in blob's metadata. When you fetch the list of blobs, you could compare the upload date/time metadata value with the current date/time and decide if the blob should be deleted or not.
Another approach might be to use the container name to be the "expiry date"
This might make deletion easier, as you then could just remove expired containers
I know that there are a few questions like this, but the question is more in respect to this specifict situation.
Im developing a platform for taking tests online. A test is a set of images and belonging questions. Its being hosted on Azure and using MVC 4.
How I would love that if the user have taken half the test and the brower crashes or something makes it deviate from the test and comes back, it will give the option to resume.
I have one idea my self, but would like to know if theres other options. I was considering to use the localstorage. When a user starts a test, the information for the test is saved in localstorage and every time he moved on to a new image, it updates the local state. Then if the test-player is loaded it will check if any ongoing tests are avalible.
How could i do it? any one witch similar problem/solution.
Local Storage is not a good choice, because it is specific to each instance. That means if you have two instances of a Web Role (the recommended minimum), then each instance would have it's own local storage. They are not shared, and there is no way to access local storage on a specific machine.
You really have two options. You could use a database like SQL Azure, or use Azure caching. Azure caching is probably easier, since it's super easy to serialize/deserialize complex objects, but the downside is that caching is only valid for 72 hours. If a cached object isn't accessed/updated in 72 hours, it gets purged.
I would not recommend you storing this information on the client browser. The user has access to local storage, cookies, etc ... and could modify it. You could store the test start time in your database on the server. Then every time the user sends a request to the server in order to answer a question you would verify if the test is still active or the maximum allowed time has elapsed.
I've got a legacy database that has a temperamental (at best) connection. The data on this database gets updated maybe once a week. The users of my app just need read access.
The problem I encounter is, occasionally, queries to the DB return nil, so all sorts of bad things happen in my app.
Is there a way I can query the database until I get a valid response, then store that response somewhere in my rails app? That way, the stored version will be returned to my users. Then, maybe once a week, I can re-query the database until it returns a valid object?
To add complications to this, the legacy db is sql server, so I've had to install and use rails-sqlserver, which works pretty well, but might be adding to the problem somehow.
The problem you'd encounter doing this in the request cycle is that the request that actually fetched the data would probably run glacially slowly (since it would need to request until the result isn't nil, which could potentially take awhile), so your users would hammer the refresh button and just queue more requests until your SQL server or application is inundated.
If I were to do this, I would probably have a Resque task set up to fetch all the data you require (possibly a full dump of the database) every week or every day. Dump the resulting data to a datastore: either your local database, or something like redis or memcached if you don't particularly care about persistence. Because it's asynchronous, you can take as many times as you need to get the data fetch right. On your app side, don't even try to connect to the temperamental database; consider the "middle" database authoritative for all requests. So if the data isn't present there, assume it didn't exist on the SQL server either.
The downside to this method, of course, is that if the SQL server has a very large database, you can't just copy all of it to a more stable middle location. You'd have to choose either a subset of the data or rely on a per-request caching method, as you suggested yourself... but I don't think that's the best way to do this if you can avoid it.
I'm going to attempt to create an open project which compares the most common MP3 download providers.
This will require a user to enter a track/album/artist name i.e. Deadmau5 this will then pull the relevant prices from the API's.
I have a few questions that some of you may have encountered before:
Should I have one server side page that requests all the data and it is all loaded simultaneously. If so, how would you deal with timeouts or any other problems that may arise. Or should the page load, then each price get pulled in one by one (ajax). What are your experiences when running a comparison check?
The main feature will to compare prices, but how can I be sure that the products are the same. I was thinking running time, track numbers but I would still have to set one source as my primary.
I'm making this a wiki, please add and edit any issues that you can think of.
Thanks for your help. Look out for a future blog!
I would check amazon first. they will give you a SKU (the barcode on the back of the album, I think amazon calls it an EAN) If the other providers use this, you can make sure they are looking at the right item.
I would cache all results into a database, and expire them after a reasonable time. This way when you get 100 requests for Britney Spears, you don't have to hammer the other sites and slow down your application.
You should also make sure you are multithreading whatever requests you are doing server side. Curl for instance allows you to pull multiple urls, and assigns a user defined callback. I'd have the callback send a some data so you can update your page with as the results come back. GETTUNES => curl callback returns some data for each url while connection is open that you parse it on the client side.