After fetching CSV file, my terminal output displays only last ~200 items - ruby-on-rails

I am using https://github.com/tilo/smarter_csv gem to read my csv files. On the last line of my code I wrote puts records to see output from the function.
filename = 'db/csv/airports_codes.csv'
options = {
:col_sep => 'tt',
:headers_in_file => false,
:user_provided_headers => [
"id",
"code",
"city",
"country",
"country_code",
"continent",
"coordinate_x",
"coordinate_y"
]
}
records = SmarterCSV.process(filename, options)
puts records
However there are so many outputs that my terminal displays only last ~200 items. How do I see others?
These are first 2 items displayed on top of the terminal.
{:id=>4564, :code=>"YEB", :city=>"Bar River", :country=>"Canada", :country_code=>"CA", :continent=>"North America", :coordinate_x=>"\\N", :coordinate_y=>"\\N"}
{:id=>4565, :code=>"YED", :city=>"Edmonton", :country=>"Canada", :country_code=>"CA", :continent=>"North America", :coordinate_x=>"\\N", :coordinate_y=>"\\N"}
I also want to note that it doesn't let me scroll above that. It acts like this is the first line in terminal and that there was nothing above it. I am using Ubuntu linux.

Since records is really just an array of objects, you can treat it as you would any array and use slice to break it up into viewable pieces.
records.slice(20, 10)
Will output 10 records, beginning at the 21st item
Obviously there are other alternatives, such as increasing the number of lines your terminal displays - but you should ask a specific question to http://unix.stackexchange.com - let them know which terminal you're using and which shell environment, etc and someone will help you.

Related

Rails - Understanding the Querying

I am new to Rails, and I am given a code, but I do not understand what it means (I am actually just trying to understand the Rails code because I am tasked to use the same logic inside another program)
here is the code
ids = [1, 2, 3]
users = User.where(account_id: ids)
output = Worksheet.where(created_by: users).as_json(only: [:created_at, :id]).group_by_week(week_start: :monday)
{|w| w["created_at"]}
i am not sure if I am following along, but from what i understand, it seems like i am querying the users with id 1, 2, 3, and finding the worksheets that are created by said users, and grouping them by week. However, I do not really understand what the 'only: [:created_at, :id]' does, but I checked through the columns, and there were columns 'created_at' and 'id' inside the worksheet table. Also, I am totally lost about what the below code is about
{|w| w["created_at"]}
and finally, is it possible to let me know what the output of the program would be like? thanks all!
The as_json(only: [:created_at, :id]) part says "convert this result to json but I only want those two columns." Documentation.
The group_by_week(week_start: :monday) takes a block, which is what the { |w| w["created_at"] } part is. It will go through each result from all the previous operations, assign each in turn to w, and then use w["created_at"] for the group_by_week function (for comparison purposes, most likely).

How can I write a method that turns a sheet of data into records?

I want to gather data and then write a method to generate records based on said data. After running the method, I want to have a series of Movies and MovieRelations (which associates similar movies with each other). Each Movie will have a title, release_date, and several similar Movies through a MovieRelation. Each MovieRelation will have a movie_a_id and a movie_b_id.
The simplest way I've come up with would be to write a text document with the movies and their individual data separated by two different special symbols, to mark where the text should be broken up into separate movies, and where the movies should be broken up into their individual pieces of data, like this:
Title#Release Date#Similar Movie A#Similar Movie B%Title2#Release Date2#Similar Movie 2A#Similar Movie 2B#Similar Movie 2C
Then I could copy and paste the raw text into a method similar to this:
"X Men#11-02-2010#Hulk#Logan%Sing#12-04-2017#Zootopia#Pitch Perfect#Monster U"
.split('%').map.each do |movie_data|
#movie = Movie.create()
movie_data.split('#').map.each_with_index do |individual_data, index|
if index == 1
#movie.name = individual_data
elsif index == 2
#movie.release_date = individual_data
elsif index > 2
MovieRelation.create(movie_a_id: #movie.id, movie_b_id: Movie.find_by(name: individual_data))
end
end
#movie.save
end
So in the end, I should have 2 Movies and 5 MovieRelations.
I think this would work, but it seems pretty hacky. Is there a better way to accomplish this?
Before you start trying to create your own format, I'd suggest looking at YAML or JSON, which are well established, well supported, are internet standards with established syntax, and have parsers/serializers for the major languages so your data won't be locked to just your application.
Here's a starting point:
require 'yaml'
data = {
'title' => 'Raiders of the Lost Ark',
'release_date' => '12 June 1981',
'similar_movies' => [
{
'title' => 'Indiana Jones and the Last Crusade',
'release_date' => '24 May 1989',
'similar_movies' => nil
},
{
'title' => 'Indiana Jones and the Temple of Doom',
'release_date' => '23 May 1984',
'similar_movies' => nil
}
]
}
puts data.to_yaml
That outputs:
---
title: Raiders of the Lost Ark
release_date: 12 June 1981
similar_movies:
- title: Indiana Jones and the Last Crusade
release_date: 24 May 1989
similar_movies:
- title: Indiana Jones and the Temple of Doom
release_date: 23 May 1984
similar_movies:
YAML is parsed using the Psych class so see the Psych documentation's load, load_file and maybe load_stream methods to learn how to read that data and convert it back to a Ruby object.
Similarly you could use JSON:
require 'json'
puts data.to_json
Which outputs:
{"title":"Raiders of the Lost Ark","release_date":"12 June 1981","similar_movies":[{"title":"Indiana Jones and the Last Crusade","release_date":"24 May 1989","similar_movies":null},{"title":"Indiana Jones and the Temple of Doom","release_date":"23 May 1984","similar_movies":null}]}
Or, if you need "pretty":
puts JSON.pretty_generate(data)
{
"title": "Raiders of the Lost Ark",
"release_date": "12 June 1981",
"similar_movies": [
{
"title": "Indiana Jones and the Last Crusade",
"release_date": "24 May 1989",
"similar_movies": null
},
{
"title": "Indiana Jones and the Temple of Doom",
"release_date": "23 May 1984",
"similar_movies": null
}
]
}
JSON lets us use JSON['some JSON as a string'] or JSON[a_ruby_hash_or_array] as a shortcut to parse or serialize respectively:
foo = JSON[{'a' => 1}]
foo # => "{\"a\":1}"
JSON[foo] # => {"a"=>1}
In either case, experiment with using Ruby to build your starting hash and let it emit the serialized version, then pipe that output to a file and begin filling it in.
If you want to use an ID for a related movie instead of the name you'll have to order your records in the file so the related movies occur first, remember what those IDs are after inserting them, then plug them into your data. That's really a pain. Instead, I'd walk through the object that results from parsing the data, extract all the related movies, insert them, then insert the main record. How to do that is left for you to figure out, but it's not too hard.
Parsing the string
For your code, you don't need an index, if or case but just split and splat :
input = 'X Men#11-02-2010#Hulk#Logan%Sing#12-04-2017#Zootopia#Pitch Perfect#Monster U'
input.split('%').each do |movie_data|
title, date, *related_movies = movie_data.split('#')
puts format('%-10s (%s) Related : %s', title, date, related_movies)
end
It outputs :
X Men (11-02-2010) Related : ["Hulk", "Logan"]
Sing (12-04-2017) Related : ["Zootopia", "Pitch Perfect", "Monster U"]
Saving data
You're trying to solve a problem that has already been solved. MovieRelations belong to, well, a relational database!
You could do all the imports, sorts and filters with a database (e.g. with Rails or Sequel). Once you're done and would like to export the information as plain text, you could dump your data into YAML/SQL/JSON.
With your current format, you'll only run into problems when you want to update relations, delete a movie or insert a movie with % or # in the title.

In Rails 3.2 & Rspec 2, how to manage a set of 50,000 pairs of input_string, expected_score pairs?

I am writing specs for a method 'scores' a string of text according to a fairly complex set of rules having to do with a large set of various combinations of keywords.
My test set of strings is 50,000 strings. my_method_being_tested("some test string") produce a score with 3 elements [boolean, boolean, integer].
I have a tally of 50,000 inputs & expected outputs, something like this
test_set = [ {"test string one" => [true, false, 0] } , { "test string 2" => [false, false, 10] } , ... ]
What is the best way to store/manage a 50,000 element test set when using Rspec, so I can loop thru the array something like:
test_set.each do | a_set |
my_method_being_tested(a_set.key).should == a_set.value
end
There is no underlying ActiveRecord Model for the method in my app so I cannot simply store a fixture and load it into an activerecord table (unless perhaps it makes sense somehow to create an activerecord-less model of some kind and load a fixture into that?
StackOverflow isn't really set up for answering "what is best" questions, but your basic approach is fine and you just need to make sure that your data structure and access mechanisms match up. Given the code snippets you showed, I would suggest reading up on Ruby hashes and structs.
At a meta level you'd have:
test_set = my_test_setup
test_set.each do |pair|
expect(my_method_being_tested(my_key_accessor(pair)).to match_array(my_value_accessor(pair))
end
If you want to keep test_set as as is, then you can change your test loop to be:
test_set.each do |pair|
expect(my_method_being_tested(pair.keys.first)).to match_array(pair.values.first)
end
If you want to keep your test loop as is, you can change our test_setup to be:
TestPair = Struct.new(:key, :value)
test_set = [
TestPair.new("test string one", [true, false, 0]),
TestPair.new("test string 2", [false, false, 10]),
... ]
Note that I'm using the new "expect" syntax rather than the deprecated "should" syntax, but that's a separate issue.
UPDATE: As for storing key/value pairs in a file, there are myriad options as well. YAML, as you note in your comment is fine, and you can combine it with DBM, letting you do something like:
require 'yaml/dbm'
YAML::DBM.load('your_yaml_file.yml').each do |key, value|
expect(my_method_being_tested(key)).to match_array(value)
end
That, of course, assumes that you've stored your key/value pairs in the YAML+DBM file in the first place, which gets you back to creating some Ruby to represent the key/value pairs.
A set of 50k items isn't that big to keep in memory, but if you're really concerned about doing the reading and testing of each pair incrementally, you can always read a line at a time from a file. But that still begs the question of what to store in that line (e.g. JSON) and formatting it in the first place.

Suppress delimiters in Ruby's String#split

I'm importing data from old spreadsheets into a database using rails.
I have one column that contains a list on each row, that are sometimes formatted as
first, second
and other times like this
third and fourth
So I wanted to split up this string into an array, delimiting either with a comma or with the word "and". I tried
my_string.split /\s?(\,|and)\s?/
Unfortunately, as the docs say:
If pattern contains groups, the respective matches will be returned in the array as well.
Which means that I get back an array that looks like
[
[0] "first"
[1] ", "
[2] "second"
]
Obviously only the zeroth and second elements are useful to me. What do you recommend as the neatest way of achieving what I'm trying to do?
You can instruct the regexp to not capture the group using ?:.
my_string.split(/\s?(?:\,|and)\s?/)
# => ["first", "second"]
As an aside note
into a database using rails.
Please note this has nothing to do with Rails, that's Ruby.

Merged Google Fusion Tables - Select Query with WHERE clause on Merge Key Error

I have merged two Fusion Tables together on the key "PID". Now I would like to do a SELECT query WHERE PID = "value'. The error comes back that no column with the name PID exists in the table. A query for another column gives this result:
"kind": "fusiontables#sqlresponse",
"columns": [
"\ufeffPID",
"Address",
"City",
"Zoning"
],
"rows": [
[
"001-374-079",
"# LOT 15 MYSTERY BEACH RD",
"No_City_Value",
"R-1"
],
It appears that the column name has been changed from "PID" to "\ufeffPID", which no matter how many attempts to get the syntax to read a GET Url, I keep getting an error.
Is there any limitation with querying on the key of a merged table? Since I cannot seem to get the name correct for the column a work around would be to use the Column ID but that does not seem to be an option either. Here is the URL:
https://www.googleapis.com/fusiontables/v1/query?sql=SELECT 'PID','Address','City','Zoning' FROM 1JanYNl3T45kFFxqAmGS0BRgkopj4AS207qnLVQI WHERE '\ufeffPID' = 001-493-078&key=myKey
Cheers
I have no explanation for \ufeff in there; that's the Unicode character 'ZERO WIDTH NO-BREAK SPACE', so it's conceivable that it's actually there in the column name because it would be invisible in the UI. So, first off I would recommend changing the name in the base tables and see if that works.
Column IDs for merge tables have a different form than for base tables. An easy way to get them is to add the filters of interest to one of your tabs (any type will do) and then do Tools > Publish. The top text ("Send a link in email or IM") has a query URL that has what you need. Run it through a URL decoder such as http://meyerweb.com/eric/tools/dencoder/ and you'll see the column ID for PID is col0>>0.
Rod

Resources