Ruby on Rails 4 CSV File recursive loading - ruby-on-rails

I am having a very strange problem with loading a CSV file that is driving my absolutely crazy and doesn't make any sense to me! I am loading a CSV file into my database with the following code:
CSV.foreach(Rails.root.join('public','uploads', '0', csv_file.file_name), :headers => true, :header_converters => lambda { |h| h.try(:downcase)}) do |row|
Exclusion.create!(row.to_hash)
end
I have a file with 14,808 entries. If I try to load the file at once, for some reason it adds all 14,808 entries into the database as expected, but then starts over again from entry 1. It continues to do this in a recursive manner until I stop the server or it crashes. If I break the file down into two files, the individual files get added to the database as expected. I thought it may be a problem with the number of records, but I was able to load a csv file with about 100,000 records without this problem. I am very baffled as to why this is occurring. Oddly, if I comment out the create statement and just put a counter there, it stops at 14,808. Also, if I create a view that prints each row.to_hash, it stops at 14,808. I can't figure out what about saving it into the database would cause it to continue to repeat itself? I am using SQLite3, but again, I dont have a problem with a CSV file with 100,000 records.
Update:
Going through the log, it looks like the CSV file is loaded properly and all records are added to the database. Ruby even redirects to the proper url afterwards, but then seems to receive the "load file" request again. Since my screen has already timed out waiting for the server to process the CSV file, could that be causing an error and leading to either a duplicate request to load the file or the server thinking it hasn't processed the request and starting over?

Browsers sometimes repeat requests. If you have a catch-all route that goes to one action (the one that loads CSV file) then it could also be triggered by the browsers requesting favicon.ico.
However, I'd ask, why are you using a web request for this? Does csv_file come from the user? If not (i.e. if you already have the CSV file you want to load) I'd recommend putting this in a Rake task and just running it manually.

Related

ActiveRecord or Postgresql Setting Wrong ID When Creating New Record

I am Rails 5.1 with Postgresql 9.6.9 on Heroku free tier.
I recently was using a rake task with csv files to try to create a couple of batches or records. I then wanted to test the front end use and just verify my forms were working like they do in development.
However, when I went to add a new Record with the form I kept getting a 500 error. When I check my Heroku logs, I got this error.
ActiveRecord::RecordNotUnique (PG::UniqueViolation: ERROR: duplicate key value violates unique constraint "games_pkey"
It would list the id it was trying to use, which would be a lower id that was used when I imported records via the rake task and csv file.
I confirmed this truly was the issue by continuously attempting to use my form to create the record, which finally went through after about 20 attempts and it got to the correct next id.
I am not sure if this is an error caused by Postgres or Rails. It clearly was started when I started using the rake task and csv file. The csv file does have an attribute for ID which was being used for an easy check to either update or create a new file, but perhaps this was my cause.
Any advice on how to prevent this error would be great, as I have also uploaded other csv files for other models in even bigger batches. I would like to continue to do so, but only if I can avoid this error, so that I can add on one or two by the simple form I have when possible.
Thank you for your help.
Here is the rask task code that maybe is causing an issue:
CSV.foreach(filename, {col_sep: ";", headers: true}) do |row|
game_hash = row.to_h
game = Game.find_by(id: game_hash["id"])
if(!game)
game = Game.create(game_hash)
else
game.update(game_hash)
end
end
If the issue is because of id in the game_hash then I would attempt to remove that as:
CSV.foreach(filename, {col_sep: ";", headers: true}) do |row|
game_hash = row.to_h
game = Game.find_by(id: game_hash["id"])
#remove the id from the hash
game_hash.delete :id
if(!game)
game = Game.create(game_hash)
else
game.update(game_hash)
end
end
This should remove the id before the new Game is created.

Way to examine contents of Rails cache?

I'm trying to debug stale entries in a cached view in a Rails (5.0.0.beta2) running on Heroku. I'd like to look at the entries in the cache to confirm that they are named the way that I expect and are getting expired when they should.
Is there any way to do this? I found this question, HOW do i see content of rails cache, which suggests Rails.cache.read("your_key"). So, using bin/rails c (on Heroku) I tried:
Rails.cache.read(User.find(19).cache_key) => nil
Where 19 is the :id of one of the users for whom I'm seeing stale data. This has me kind of stumped...
If I try:
User.find(19).cache_key => "users/19-20160316151228266421"
But when a cache entry is supposedly expired the log line looks like:
Expire fragment views/users/19-20160316151228266421 (0.2ms)
So I tried doing a Rails.cache.read on that path, this also returned nil – I also tried doing the same with a user that had not be expired, and got nil again.
I'm wondering if that difference in path signals a problem, or if there is a way to see the path of the key that is created (I've been assuming that it matches at least the part after the slash).
Cache has the following instance variables:
[:#options, :#data, :#key_access, :#max_size, :#max_prune_time, :#cache_size, :#monitor, :#pruning]
You can examine the data with:
Rails.cache.instance_variable_get(:#data)

Ruby on Rails - Automatically insert data into DB without seeds.rb

I've some .json files with data that are automatically updated from time to time. I also have a Ruby on Rails app where I want to insert the information that's in those files.
By now, I'm parsing the JSON and inserting the data in the database in the seeds.rb file, but I'll want to add more data without having to restart the app, I mean, on the go.
From time to time, I'll check those files and if they have modifications, I want to insert those new items into my database.
What's the best way to do it?
Looks like a job for a cron.
Create the code you need in a rake task (in /lib/tasks):
task :import, => :environment do
Importer.read_json_if_modified # your importer class
end
Then run this with the period you want using your system's cron.

Where should I put a CSV file that I'm using to perform a data migration?

This may well be a duplicate, but I couldn't find anyone asking quite this question.
My understanding* is that if I want to migrate data from an outside source to my Rails app's database, I should do so using a migration. It seems from my preliminary research that what I could do is use a tool like FasterCSV to parse a CSV file (for example) right in the migration (.rb) file itself.
Is this the correct approach? And if so, where should I actually put that CSV file -- it seems that if migrations are, after all, meant to be reversible/repeatable, that CSV data ought to be kept in a stable location.
*Let me know if I am completely mistaken about how to even go about this as I am still new to RoR.
You can write this to a rake job without FasterCSV, though I use both.
Write rows to 'csvout' file.
outfile = File.open('csvout', 'wb')
CSV::Writer.generate(outfile) do |csv|
csv << ['c1', nil, '', '"', "\r\n", 'c2']
...
end
outfile.close
This file will output where the rake file is written. In your case, you can put it in a seperate folder for CSV's. I would personally keep it out of the rest of the app structure.
You may want to look into seed_fu to manage it. It has the benefit of being able to easily update the data already in database. You can convert the CSV into a seed file, which is just a Ruby code (example code is provided there).

Rails fixtures seem to be adding extra unexpected data

all. I've got a dynamic fixture CSV file that's generating predictable data for a table in order for my unit tests to do their thing. It's working as expected and filling the table with the data, but when I check the table after the tests run, I'm seeing a number of additional rows of "blank" data (all zeros, etc). Those aren't being created by the fixture, and the unit tests are read-only, just doing selects, so I can't blame the code. There doesn't seem to be any logging done during the fixtures setup, so I can't see when the "blank" data is being inserted. Anyone ever run across this before, or have any ideas of how to log or otherwise see what the fixture setup is doing in order to trace down the source of the blank data?
You could turn on ActiveRecord logging (put ActiveRecord::Base.logger = Logger.new(STDOUT) in your test/test_helper.rb file).
Or, instead of using fixtures (which have gone the way of the dodo for most Rails developers) you could use something more reliable like Factories (thoughtbot's factory_girl) or seed_fu (if you have specific data that must be loaded).
I discovered what the problem was, although not the precise way to prevent it. I was placing ERB into the fixture CSV file, which was working fine, but due to the way it was being parsed and processed, it was causing blank lines to be placed into the resulting CSV output. Fixtures doesn't seem to handle that very well, and as a result it was inserting blank rows into the table. I couldn't prevent the blank lines from being placed into the output CSV because for whatever reason, the <% rubycode -%> doesn't work -- having the closing dash caused ERB parsing errors. Not sure why.
In any case, the eventual workaround was to switch to YML instead of CSV. It tolerates the white space just fine, and no blank rows are being inserted into the table anymore.
As an aside, factory___girl seems potentially interesting, but the YML is now doing just fine so it may be overkill. There's not a huge benefit to using seed_fu I think. In this current case I'm testing reporting code so the data is very specific and needs to be structured in a certain way in order to verify output data for the reports.

Resources