SFTP Download CSV File to memory - ruby-on-rails

I have the following code:
sftp = Net::SFTP.start('ftp.test.com','luigi', :password => 'pass_code')
data = sftp.download!("luigi/List.csv")
This converts data to a string like so:
2015,This is, one value, test, 9820\r\n4003, This is also one value, test, 0393
I want to separate each row in my CSV file into an array value. This is my attempt:
data.split(/\r\n/)
data.each do |record|
record.split(/,/)
end
This works for most cases, but it obviously gets screwed up for the first row in my csv file since the additional comma creates another value in my array:
2015,This is, one value, test, 9820
Because there is an additional comma that is not escaped, my array looks like this:
["2015","This is", "one value","test","9820"]
I want the above to look like this:
["2015","This is, one value","test","9820"]
or even
["2015","This is one value","test","9820"]
Is it possible, using sftp, to download my file into memory (not locally, I have nowhere to store it) and loop through each row in the file without the additional commas throwing things off? It seems that the root of the issue is I can't do something like:
CSV.foreach(...) do |csv_row|
But instead I have to use sftp.download! which only returns one string, not an array of "csv rows".

Related

Is there any way to parse JSON with trailing commas in Ruby?

I'm currently coding a transition from a system that used hand-crafted JSON files to one that can automatically generate the JSON files. The old system works; the new system works; what I need to do is transfer data from the old system to the new one.
The JSON files are used by an iOS app to provide functionality, and have never been read by our server software in Ruby On Rails before. To convert between the original system and the new system, I've started work on parsing the existing JSON files.
The problem is that one of my first two sample files has trailing commas in the JSON:
{ "sample data": [1, 2, 3,] }
This apparently went through just fine with the iOS app, because that file has been in use for a while. Now I need some way to parse the data provided in the file in my Ruby on Rails server, which (quite rightfully) throws an exception over the illegal trailing comma in the JSON file.
I can't just JSON.parse the code, because the parser, quite rightfully, rejects it as invalid JSON. Is there some way to parse it -- either an option I can pass to JSON.parse, or a gem that adds something, etc etc? Or do I need to report back that we're going to have to hand-fix the broken files before the automated process can process them?
Edit:
Based on comments and requests, it looks like some additional data is called for. The JSON files in question are stored in .zip files on S3, stored via ActiveStorage. The process I'm writing needs to download, unpack, and parse the zip files, using the 'manifest.json' file as a key to convert the archived file into a database structure with multiple, smaller files stored on S3 instead of a single zip that contains everything. A (very) long term goal is for clients to stop downloading a unitary zip file, and instead download the files individually. The first step towards that is to break the zip files up on the server, which means the server needs to read in the zip files. A more detailed sample of the data follows. (Note that the structure contains several design decisions I later came to regret; one of the original ideas was to be able to re-use files rather than pack multiple copies of the same identical file, but YAGNI bit me in the rear there)
The following includes comments that are not legal in JSON format:
{
"defined_key": [
{
"name": "Object_with_subkeys",
"key": "filename",
"subkeys": [
{
"id":"1"
},
{
"id":"2"
},
{
"id":"3" // references to identifier on another defined key
}, // Note trailing comma
]
}
],
"another_defined_key":[
{
"identifier": "should have made parent a hash with id as key instead of an array",
"data":"metadata",
"display_name":"Names: Can be very arbitrary",
"user text":"Wait for the right {moment}", // I actually don't expect { or } in the strings, but they're completely legal and may have been used
"thumbnail":"filename-2.png",
"video-1":"filename-3.mov"
}
]
}
The problem is that your are trying to parse something that looks a lot like JSON but is not actually JSON as defined by the spec.
Arrays- An array structure is a pair of square bracket tokens surrounding zero or more values. The values are separated by commas.
Since you have a trailing comma another value is also expected and most JSON parsers will raise an error due to this violation
All that being said json-next will parse this appropriately maybe give that a shot.
It can parse JSON like representations that completely violate the JSON spec depending on the flavor you use. (HanSON, SON, JSONX as defined in the gem)
Example:
json = "{ \"sample data\": [1, 2, 3,] }")
require 'json/next'
HANSON.parse(json)
#=> {"sample data"=>[1, 2, 3]}
but the following is equivalent and completely violates spec
JSONX.parse("{ \"sample data\": [1 2 3] }")
#=> {"sample data"=>[1, 2, 3]}
So if you choose this route do not expect to use this to validate the JSON data or structure in any fashion and you could end up with unintended results.

Accessesing saved JSON response in column

Okay, I am using HTTParty to save a JSON response in my show.subsources column. Subsources is a text type column.
For example, one of my subsources: column currently has the data saved as so:
[{"source"=>"hulu_free", "display_name"=>"Hulu", "id"=>6001348,"link"=>"http://www.hulu.com/watch/843378"}]
To me, this looks like a hash within an array. How do I access each hash within the array, so that I can correctly display the "source" name in my views?
What I have tried is:
<% showdata = show.subsources %>
<%= showdata[0]['sources'] %>
I am expecting that to display 'hulu_free', but instead it will not show anything in my views. Am I not using the right syntax to access that hash? Or am I not saving the JSON correctly in order for me to access the hash data? Do I need to parse the data first? I have spent way too long trying to figure this out.
In your case you can easily get JSON representation and parse it to Hash.
text = text = %q{[{"source"=>"hulu_free", "display_name"=>"Hulu", "id"=>6001348,"link"=>"http://www.hulu.com/watch/843378"}]}
text.gsub!('=>',':')
JSON.parse(text) # => returns hash
I'm not sure how internals work in your program, but probably you should consider to put JSON like text into this field.

How to format a text area sent as an array via JSON?

I'm pretty new to Ruby, and I'm using it work with an API. Text area's sent over the API are converted to the format below before being sent to me via a JSON POST request:
"Comment": [
"hdfdhgdfgdfg\r",
"This is just a test\r",
"Thanks!\r",
"- Kyle"
]
And I'm getting the value like this:
comments = params["Comment"]
So each line is broken down into what looks like an array. My issue is, it functions just like one big string instead of an array with 4 values. I tried using comments[0] and just printing comments but both return the same result, it just displays everything as a string, ie
["hdfdhgdfgdfg\r", "This is just a test\r", "Thanks!\r", "- Kyle"]
But I need to display it as it appears in the text area, ie
hdfdhgdfgdfg
This is just a test
Thanks!
- Kyle
I know I could just strip out all the extra characters, but I feel like there has to be a better way. Is there a good way to convert this back to the original format of a text area, or at least to convert it to an array so I can loop through each item and re-format it?
First, get rid of those ugly \rs:
comments.map!(&:chomp)
Then, join the lines together:
comment = comments.join("\n") # using a newline
# OR, for HTML output:
comment = comments.join('<br>')
You should be able to parse the JSON and populate a hash with all of the values:
require 'json'
hash = JSON.parse(params["Comment"])
puts hash
=> {"Comment"=>['all', 'of', 'your', 'values']}
This should work for all valid json. One of the rules of json syntax is that
Data is in name/value pairs
The json you provided doesn't supply names for the values, therefore this method might not work. If this is the case, parsing the raw string and extracting values would do the job as well (although more messy).
How you might go about doing that:
json = params["Comment"]
newArray = []
json.split(" ").each do |element|
if element.length > 1
newArray << element
end
end
This would at least give you an array with all of your values.

Parse a string like a CSV file with seek, rewind, position

My application accepts an uploaded file from the user and parses it, making use of seek and rewind methods quite heavily to parse blocks from the file (lines can begin with 'start' or 'end' to enclose a section of data, etc).
A new requirement allows the user to upload encrypted files. I've implemented decryption of the content of the file and return the content string to the existing method. I can parse the string as a CSV but lose the file controls.
Storing an unencrypted version of the file is not an option for business reasons.
I'm using FasterCSV but not averse to using something else if I can keep the seek/rewind behaviour.
Current code:
FasterCSV.open(path, 'rb') do |csv| # Can I open a string as if it were a file?
unless csv.eof? # Catch empty files
# Read, store position, seek, rewind all used during parsing
position = csv.pos
row = csv.readline
csv.seek(pos)
After some digging and experimentation I've found that it was possible to retain the IO methods by using the StringIO class like so:
csv = StringIO.new(decrypted_content)
unless csv.nil?
unless csv.eof? # Catch empty files
position = csv.pos
row = csv.readline.chomp.split(',')
csv.seek(pos)
Only change is needing to manually split the line to be able to use it like a csv row, not much extra work.
You don't need the CSV gem anymore but if you prefer the seek/rewind behaviour you can roll your own for strings. Something like this might work for your scenario:
array_of_lines=unecrypted_file_string.split('\n')
array_of_lines.each_with_index do |line,index|
position=index
row=line
seek=line[10]
end

Scanning text to find domain types

I'm trying to piece together a part of a create action in a controller that scans the the text entered and intelligently understands what type of domain name it is.
I have a text box called "domain_names". A user puts domains into the box separated by commas, e.g. "yahoo.com, google.com"
In the controller it hits it like this:
#extracted_domains = (params[:domain_names]).split(",")
#extracted_domains.each do |domain|
domain.strip
domain_scan = domain.scan(/(\w+)[.]/).flatten
com_scan = domain.scan(/[.](\w+)/).flatten
new_domain_type = DomainType.find_or_create_by_domain_type(:domain_type => com_scan)
new_domain = Domain.create(:domain => domain_scan, :domain_type_id => new_domain_type.id)
end
In the console it works great. But when I put it into practise I get really odd things stored in the database. For example if :domain was meant to have the value "google", it will instead have the value "---\n- google\n" , when its stored in the database.
No idea why
Thanks in advance.
UPDATE**
Problem: It was putting an array into a string.
Solution: Make it a string.
domain_scan = domain.scan(/(\w+)[.]/).flatten.first
com_scan = domain.scan(/[.](\w+)/).flatten.first
It appears to be fed YAML input. Three dashes at the beginning of the string followed by a newline are a strong indicator of YAML: http://en.wikipedia.org/wiki/YAML#Sample_document
As for your issue, can we see the exact params that are sent?
I would take a look at https://github.com/pauldix/domainatrix for domain extraction.

Resources