Fetch file from database server - ruby-on-rails

I'm building a project where the front end is react and the backend is ruby on rails and uses a postgres DB. A required functionality is the ability for users to export a large datasets.
I have the following code snippet that creates a CSV and stores it on the database server.
query = <<-SQL
COPY (SELECT * FROM ORDERS WHERE ORDERS.STORE_ID = ? OFFSET ? LIMIT ?) to '/temp/out.txt' WITH CSV HEADER
SQL
query_result = Order.find_by_sql([query, store_id.to_i, offset.to_i, 1000000])
How would I be able to retrieve that file to send to the front end. I've seen examples that use copy_data and get_copy_data but I couldn't get it to work with parameterized query. Any help would be great. Thanks!

There are two problems with your approach:
COPY doesn't support parameters, so you will have to construct the complete query string on the client side (beware of SQL injection).
COPY ... TO 'file' requires superuser rights or membership in the pg_write_server_files role.
Don't even think of running an application as a superuser.
Even without that, allowing client code to create files on the database server opens you the risk of denial-of-service through a full file system.
I think that the whole idea is ill-conceived. If you have a large query result, the database server will automatically use temporary files if an intermediate result won't fit into memory. Keep it simple.

Related

Redis delete by pattern is too slow

for i, name in ipairs(redis.call('KEYS''cache:user_transaction_logs:*:8866666')) do redis.call('DEL', name); end"
How can I Optimise this redis query?
We are using Redis as cache store in Rails.Whenever auser makes a successfull transaction The receivers and initiators transaction history is expired from redis
The query can not be optimized - it should be replaced in its entirety because the use of KEYS is discouraged for anything other than debugging purposes on non-production environments.
A preferable approach, instead of trying to fetch the relevant key names ad-hoc, is to manage them in a data structure (e.g. Set or List) and read from it when you perform the deletions.
You need to change the approach for how you are storing cache entries for your users.
Your keys should look something like cache:user_transaction_logs:{user_id}.
Then you will be able to just delete the entry by its key (user_id).
In case if you need several cache entries per user_id - use Redis hashes (https://redis.io/commands#hash), and then again you will be able to delete all entries per user_id with one command DELETE or needed entry with HDEL.
Also a good idea to use Redis database numbers (default 0, 1-15 available) and put separate functionalities on separate database numbers. Then in case if you need to wipe cache of whole functionality that can be done with one command FLUSHDB

How to retrieve data from a DSN using Rails

I have a DSN (data source name) of the following format:
<driver>://<username>:<password>#<host>:<port>/<database>
and I am asked to retrieve rows from the corresponding database, which has a single table in this specific example and it is on AWS. I would like to do it using an endpoint in a Rails app.
I did some research online to look for an example about DSN, but couldn't find any help.
I am looking for some high level explanation of how to work with DSN, and ideally how to use Rails to communicate with the database
I am not sure if it is going to be useful for anybody, but this is as much info I could gather.
The DSN format is something like:
<driver>://<username>:<password>#<host>:<port>/<database>
Within rails it should be used like, assuming you want an array of users:
require 'pg'
conn = PG.connect('<driver>://<username>:<password>#<host>:<port>/<database>')
puts conn.exec("SELECT count(*) FROM users").to_a

Bulk upsert with Ruby on Rails

I have a Rails 3 application where I need to ingest an XML file provided by an external system into a Postgres database. I would like to use something like ActiveRecord-Import but this does not appear to handle upsert capabilities for Postgres, and some of the records I will be ingesting will already exist, but will need to be updated.
Most of what I'm reading recommends writing SQL on the fly, but this seems like a problem that may have been solved already. I just can't find it.
Thanks.
You can do upserting on MySQL and PostgreSQL with upsert.
If you're looking for raw speed, you could use nokogiri and upsert.
It might be easier to import the data using data_miner, which uses nokogiri and upsert internally.
If you are on PostgreSQL 9.1 you should use writeable common table expressions. Something like:
WITH updates (id) AS (
UPDATE mytable SET .....
WHERE ....
RETURNING id
)
INSERT INTO mytable (....)
SELECT ...
FROM mytemptable
WHERE id NOT IN (select id from updates);
In this case you bulk process thins in a temp table first, then it will try to update the records from the temptable according to your logic, and insert the rest.
Its a two step thing. First you need to fetch the XML File. If its provided by a user via a form that luck for you otherwise you need to fetch it using the standard HTTP lib of ruby or otherwise some gem like mechanize (which is actually really great)
The second thing is really easy. You read all the XML into a string and then you can convert it into a hash with this pice of code:
Hash.from_xml(xml_string)
Then you can parse and work with the data...

Identifying a connection ID in Postgres

I have a Postgres database (9) that I am writing a trigger for. I want the trigger to set the modification time, and user id for a record. In Firebird you have a CONNECTIONID that you can use in a trigger, so you could add a value to a table when you connect to the database (this is a desktop application, so connections are persistent for the lifetime of the app), something like this:
UserId | ConnectionId
---------------------
544 | 3775
and then look up in the trigger that connectionid 3775 belongs to userid 544 and use 544 as the user that modified the record.
Is there anything similar I can use in Postgres?
you could use the process id. It can be retrieved with:
pg_backend_pid()
With this pid you can also use the table pg_stat_activity to get more information about the current backend, althouht you already should know everything, since you are using this backend.
Or better. Just create a serial, and retrieve one value from it for each connection:
CREATE SEQUENCE 'connectionids';
And then:
SELECT next_val('connectionids');
in each connection, to retrieve a connection unique id.
One way is to use the custom_variable_classes configuration option. It appears to be designed to allow the configuration of add-on modules, but can also be used to store arbitrary values in the current database session.
Something along the lines of the following needs to be added to postgresql.conf:
custom_variable_classes = 'local'
When you first connect to the database you can store whatever information you require in the custom class, like so:
SET local.userid = 'foobar';
And later in on you can retrieve this value with the current_setting() function:
SELECT current_setting('local.userid');
Adding an entry to a log table might look something like this:
INSERT INTO audit_log VALUES (now(), current_setting('local.userid'), ...)
While it may work for your desktop use case, note that process ID numbers do rollover (32768 is a common upper limit), so using them as a unique key to identify a user can run into problems. If you ever end up with leftover data from a previous session in the table that's tracking user->process mapping, that can collide with newer connections assigned the same process id once it's rolled over. It may be sufficient for your app to just make sure you aggressively clean out old mapping entries, perhaps at startup time given how you've described its operation.
To avoid this problem in general, you need to make a connection key that includes an additional bit of information, such as when the session started:
SELECT procpid,backend_start FROM pg_stat_activity WHERE procpid=pg_backend_pid();
That has to iterate over all of the connections active at the time to compute, so it does add a bit of overhead. It's possible to execute that a bit more efficiently starting in PostgreSQL 8.4:
SELECT procpid,backend_start FROM pg_stat_get_activity(pg_backend_pid());
But that only really matters if you have a large number of connections active at once.
Use current_user if you need the database user (I'm not sure that's what you want by reading your question).

Rails: Accessing a database not meant for Rails?

I have a standard rails application, that uses a mysql database through Active Record, with data loaded through a separate parsing process from a rather large XML file.
This was all well and good, but now I need to load data from an Oracle database, rather than the XML file.
I have no control how the database looks, and only really need a fraction of the data it contains (maybe one or two columns out of a few tables). As such, what I really want to do is make a call to the database, get data back, and put the data in the appropriate locations in my existing, Rails friendly mysql database.
How would I go about doing this? I've heard* you can (on a model by model basis) specifiy different databases for Rails Models to use, but that sounds like they use them in their entirety, (that is, the database is Rails friendly). Can I make direct Oracle calls? Is there a process that makes this easier? Can Active Record itself handle this?
A toy example:
If I need to know color, price, and location for an Object, then normally I would parse a huge XML file to get this information. Now, with oracle, color, price, and location are all in different tables, indexed by some ID (there isn't actually an "Object" table). I want to pull all this information together into my Rails model.
Edit: Sounds like what I'd heard about was ActiveRecord's "establish_connection" method...and it does indeed seem to assume one model is mapped to one table in the target database, which isn't true in my case.
Edit Edit: Ah, looks like I might be wrong there. "establish_connection" might handle my situation just fine (just gotta get ORACLE working in the first place, and I'll know for sure... If anyone can help, the question is here)
You can create a connection to Oracle directly and then have ActiveRecord execute a raw SQL statement to query your tables (plural). Off the top of my head, something like this:
class OracleModel < ActiveRecord::Base
establish_connection(:oracle_development)
def self.get_objects
self.find_by_sql("SELECT...")
end
end
With this model you can do OracleModel.get_objects which will return a set of records whereby the columns specified in the SELECT SQL statement are attributes of each OracleModel. Obviously you can probably come up with a more meaningful model name than I have!
Create an entry named :oracle_development in your config/database.yml file with your Oracle database connection details.
This may not be exactly what you are looking for, but it seems to cover you situation pretty well: http://pullmonkey.com/2008/4/21/ruby-on-rails-multiple-database-connections/
It looks like you can make an arbitrarily-named database configuration in the the database.yml file, and then have certain models connect to it like so:
class SomeModel < ActiveRecord::Base
establish_connection :arbitrary_database
#other stuff for your model
end
So, the solution would be to make ActiveRecord models for just the tables you want data out of from this other database. Then, if you really want to get into some sql, use ActiveRecord::Base.connection.execute(sql). If you need it as a the actual active_record object, do SomeModel.find_by_sql(sql).
Hope this helps!
I don't have points enough to edit your question, but it sounds like what you really need is to have another "connection pool" available to the second DB -- I don't think Oracle itself will be a problem.
Then, you need to use these alternate connections to "simply" execute a custom query within the appropriate controller method.
If you only need to pull data from your Oracle database, and if you have any ability to add objects to a schema that can see the data you require . . . .
I would simplify things by creating a view on the Oracle table that projects the data you require in a nice friendly shape for ActiveRecord.
This would mean maintaining code to two layers of the application, but I think the gain in clarity on the client-side would outweigh the cost.
You could also use the CREATE OR REPLACE VIEW Object AS SELECT tab1., tab2. FROM tab1,tab2 syntax so the view returned every column in each table.
If you need to Insert or Update changes to your Rails model, then you need to read up on the restrictions for doing Updates through a view.
(Also, you may need to search on getting Oracle to work with Rails as you will potentially need to install the Oracle client software and additional Ruby modules).
Are you talking about an one-time data conversion or some permanent data exchange between your application and the Oracle database? I think you shouldn't involve Rails in. You could just make a SQL query to the Oracle database, extract the data, and then just insert it into the MySQL database.

Resources