Extracting JSON objects from JSON string - ruby-on-rails

I want to break down a JSON string into smaller objects. I have two servers, one acting as the web-app interface to the whole application and the other is a repository/database.
I'm able to retrieve information from the repository to the web-app as JSON, but after that I don't know how to return it.
Here's a sample of the JSON being returned:
{"respPages":[{"page":{"page_url":"http://www.google.com/","created_at":"2011-08-10T11:00:19Z","website_id":1,"updated_at":"2011-08-10T11:00:19Z","id":1}},{"page":{"page_url":"http://www.blank.com/services/content_services/","created_at":"2011-08-10T11:02:46Z","website_id":1,"updated_at":"2011-08-10T11:02:46Z","id":2}}],"respSite":{"website":{"created_at":"2011-08-10T11:00:19Z","website_id":null,"updated_at":"2011-08-10T11:00:19Z","website_url":null,"id":1}},"respElementTypes":[{"element_type":{"created_at":"2011-08-10T11:00:19Z","updated_at":"2011-08-10T11:00:19Z","id":1,"tag_name":"head"}},
There are four tags in the JSON:
page
website
elementType
elementData
I would like to create four arrays and populate them with the object that matches these tags.
I would image the code is something like this:
#Get the json from repo using net/http
uri = URI.parse("http://127.0.0.1:3007/repository/infoid/1.json")
http = Net::HTTP.new(uri.host, uri.port)
response = http.request(Net::HTTP::Get.new(uri.request_uri))
#x = response.to_hash
#pages = Array.new
#websites= Array.new
#elementDatas = Array.new
#elementTypes = Array.new
#enter code here`#For every bit of the hash, find out what it is and allocate it accordingly
#x.each_with_index do |e,index|
if e.tagName == pages #Getting real javascripty here. There must be someway to check the tag or title of the element
#pages[index]=e
end
My goal for the returned value is to have four arrays, each containing different types of objects:
#pagesArray[1]
should contain the first occurrence of a page object in the JSON string. Then do the same for the other ones.
Of course I'd need to break down the object further but once I can break down the top level and categorize them, then I can go deeper.
In the JSON there are already tag titles respPages and respWebsites which group all the objects.
How do I turn JSON back into objects in Ruby and reference them using something like the tag name?

You should be able to decode anything in JSON format using the standard JSON library:
JSON.load(...)
It will throw exceptions on malformed JSON data, so be sure to test it thoroughly and make sure it can handle all the important cases.
If you're trying to navigate the structure of the JSON itself, you probably need to write a series of recursive methods that handle each case along the way. A good pattern to start with is this:
#data.each do |key, value|
case (key)
when 'someKey'
handle_some_key(value)
when 'otherKey'
handle_other_key(value)
end
end
You can either break out the behavior into methods as in this example, or inline it if the logic is fairly straightforward.
As a note, an alternative to Array.new is simply [ ] as it is in JavaScript. For example:
#pages = [ ]
You'll see this used frequently in most Ruby examples. The alternative to Hash.new is { }.

The following works:
json = {"respPages"=>[{"page"=>{"page_url"=>"http://www.google.com", "created_at"=>"2011-08-10T11:00:19Z", "website_id"=>1, "updated_at"=>"2011-08-10T11:00:19Z", "id"=>1}}, {"page"=>{"page_url"=>"http://www.blank.com/services/content_services/", "created_at"=>"2011-08-10T11:02:46Z", "website_id"=>1, "updated_at"=>"2011-08-10T11:02:46Z", "id"=>2}}],
"respSite"=>{"website"=>{"created_at"=>"2011-08-10T11:00:19Z", "website_id"=>nil, "updated_at"=>"2011-08-10T11:00:19Z", "website_url"=>nil, "id"=>1}},
"respElementTypes"=>[{"element_type"=>{"created_at"=>"2011-08-10T11:00:19Z", "updated_at"=>"2011-08-10T11:00:19Z", "id"=>1, "tag_name"=>"head"}}]}
#respPages, #respSite, #respElementTypes = [], [], []
json.each do |key_category, group_category|
group_category.each do |hash|
if group_category.is_a? Array
eval("##{key_category}") << hash.values.first
elsif group_category.is_a? Hash
eval("##{key_category}") << hash[1]
end
end
end
there weren't any respData in your sample but you've got the idea.

Related

Rails - Accessing JSON members

I am new to Rails, and working with some JSON, and not sure how to get to the data as the examples below:
1) If i were to use JSON.parse(response)['Response']['test']['data']['123456'], i will need to parse another response for 123457, is there a better way to loop through all the objects in data?
2) base on the membershipId, identify the top level object, ie data.
"test": {
"data": {
"123456": {
"membershipId": "321321312",
"membershipType": a,
},
"123457": {
"membershipId": "321321312",
"membershipType": a,
},
}
JSON.parse(response)['Response']['test']['data'].each do |key, object|
puts key
puts object['membershipID']
...
end
To select the data record associated with a particular membership
match_membership = '321321312'
member = JSON.parse(response)['Response']['test']['data'].select |_key, object|
object['membershipID'] == match_membership
end
puts member.key
=> 123456
For 1:
Assumption:
By you saying "need to parse another response", you were doing something like below:
# bad code: because you are parsing `response` multiple times
JSON.parse(response)['Response']['test']['data']['123456']
JSON.parse(response)['Response']['test']['data']['123457']
then simply:
Solution 1:
If you are gonna be accessing 2+ level deep hash values for just maybe 2 or 3 times,
response_hash = JSON.parse(response)
response_hash['Response']['test']['data']['123456']
response_hash['Response']['test']['data']['123457']
Solution 2:
If you are gonna be accessing 2+ level deep hash values for loooooots of times,
response_hash = JSON.parse(response)
response_hash_response_test_data = response_hash['Response']['test']['data']
response_hash_response_test_data['123456']
response_hash_response_test_data['123457']
response_hash_response_test_data['123458']
response_hash_response_test_data['123459']
response_hash_response_test_data['123460']
# ...
Solution 2 is better than Solution 1 because it saves repetitive method calls for Hash#[] which is the "getter" method each time you do like ...['test'] then ['data'] then ['123456'], and so is better-off doing Solution 2 which you store the nested-level of the hash into a variable (this does not duplicate the values in-memory!). Plus it's more readable this way.

Rails params to array

I am sending a list of checkbox selected from PHP file to our Rails API server. All checked items' ID's will be sent in json format (campaign_ids in json_encode from PHP):
I got a URL being passed to our API like this
Started PUT "/campaigns/function.json?campaign_ids=["6","7"]&user_id=0090000007"
I need to get the campaign_ids ["6","7"] and process it like any other array using array.each do || end
How can I convert this to an array so I can use array.each?
The following sample code can achieve it but I think there could be a better way?
campaign_ids = params[:campaign_ids].to_s # [\"6\",\"7\"]
campaign_ids = campaign_ids.gsub(/[^0-9,]/,'') # 6,7
if campaign_ids.size.to_i > 0 # 3 ??
campaign_ids.split(",").each do |campaign_id|
...
end
end
The correct format of the URL should've been campaign_ids[]=6&campaign_ids[]=7. That would automatically yield an array of [6, 7] when you do params[:campaign_ids].
But assuming you can't change the format of the incorrect parameters, you can still get it via JSON.parse(params[:campaign_ids])
Try this
campaign_ids = JSON.parse(params[:campaign_ids])
You get params[:campaign_ids] as a string.
So, you will have to parse that json string to get array elements.
params[:campaign_ids] is already in your desired array format, you need not convert that to string using to_s.
You can do something like this
campaign_ids = params[:campaign_ids]
campaign_ids.each do |campaign_id|
# do the computation here
end

How to parse JSON with the Oj SAX parser, Saj

I want to parse a 10-20MB JSON file, and figure it's probably a good idea to not parse the entire JSON file at once and cause major memory usage. After looking around it seems like Oj's Saj or ScHandler APIs might be a good fit.
The only problem is that I can't really wrap my head around how to use them, and the documentation doesn't make it much clearer. I've looked at the example in Saj source code, and defined a super simple subclass of Oj::Saj like below:
class MySaj < Oj::Saj
def hash_start(key)
p key
end
end
Used like this:
open(URL) do |contents|
Oj.saj_parse(handler, contents)
end
And this leads to a lot of keys from my JSON being printed out. But I still have no idea how to actually access the values belonging to the keys I'm printing.
Can I access the hash itself somehow, or how am I supposed to do this?
SAX-style parsing is complicated. You have to maintain the state of the parsing, and deal with each state change appropriately.
The hash_start and array_start callbacks, notify your SAX handler that Saj has found the beginning of a hash, and that the next callbacks that occur will be in the context of that hash. Note that hashes may be nested, contain (or be contained within) arrays, or simple values.
Here is a simple Saj handler that parses a very simple JSON object:
require 'oj'
class MySaj < ::Oj::Saj
def initialize()
#hash_cnt = 0
#array_cnt = 0
end
def hash_start(key)
#hash_cnt += 1
puts "Start-Hash[#hash_cnt]: '#{key}'"
end
def hash_end(key)
#hash_cnt -= 1
puts "End-Hash[#hash_cnt]: '#{key}'"
end
def array_start(key)
#array_cnt += 1
puts "Start-Array[#array_cnt]: '#{key}'"
end
def array_end(key)
#array_cnt -= 1
puts "End-Array[#array_cnt]: '#{key}'"
end
def add_value(value, key);
puts "Value: [#{key}] = '#{value}'"
end
def error(message, line, column)
puts "ERRRORRR: #{line}:#{column}: #{message}"
end
end
json = '[{ "key1": "abc", "key2": 123}, { "test1": "qwerty", "pi": 3.14159 }]'
cnt = MySaj.new()
Oj.saj_parse(cnt, json)
The results of this basic JSON parsing with Saj gives this result:
Start-Array[#array_cnt]: ''
Start-Hash[#hash_cnt]: ''
Value: [key1] = 'abc'
Value: [key2] = '123'
End-Hash[#hash_cnt]: ''
Start-Hash[#hash_cnt]: ''
Value: [test1] = 'qwerty'
Value: [pi] = '3.14159'
End-Hash[#hash_cnt]: ''
End-Array[#array_cnt]: ''
You may notice that this output is roughly equivalent to one callback per token (omitting ',' and ':'). You essentially have to build into your callbacks the knowledge of what to do with individual JSON elements. Along those lines, you also need to build the hierarchy described by the callbacks. For example, when hash_start is called, push an empty hash on the stack; when hash_end is called, pop the hash or move back one level in the hierarchy.
For example you could have a handler in hash_end that checks to see if this is ending a top-level hash, and when it is, then do something with that hash. Note that you can often not do this with arrays, as the top-level element in a very large number of JSON documents is an array, so you have to determine when the array is the top+1 level array.
If you like writing compiler backends, this is the JSON parsing solution for you. Personally, I've never enjoyed working in Sax, but for large documents, it can be very resource-friendly and highly performant, depending on how well you write the handler. Be prepared for oodles of debugging and slightly mismatched state management, as that's par for the course with Sax-style parsing.
However, you shouldn't be too concerned with 10-20MB JSON, as that's actually not very large. I've processed 80+MB JSON with "regular" Oj (load and dump) quite a lot, and not had a problem with it. Unless you're running on a severely resource-constrained machine, the standard Oj will work well for you.
Saj is a streaming parser. What that means, in practice, is that it doesn't know a file's contents in their entirety and parses them whole — it instead notifies you of parse events as it encounters them. Your thinking is solid: the larger the file, the more you benefit from parsing in that manner if you wish to pick and choose from it.
hash_start is one such event, fired when Oj sees the beginning of an Object (which will become a Hash in Ruby land).
Take this JSON for instance:
{
"student-1": {
"name": "John Doe",
"age": 42,
"knownAliases": ["Blabby Joe", "Stack Underflow"],
"trainingGrades": {
"Advanced Zumba Dancing": "A+",
"Introduction to Twitter Arguments": "C-"
}
},
"student-2": {
"name": "Rebecca Melecca",
"age": 26,
"knownAliases": ["Booger Becca", "Tanktop Terror"],
"trainingGrades": {
"Intermediate Groin Kickery": "A+",
"Advanced Quantum Mechanics": "A+"
}
}
And the following parser:
class StudentParser < Oj::Saj
def hash_start(key)
puts "hash_start(#{key.inspect})"
end
def hash_end(key)
puts "hash_end(#{key.inspect})"
end
def array_start(key)
puts "array_start(#{key.inspect})"
end
def array_end(key)
puts "array_end(#{key.inspect})"
end
def add_value(value, key)
puts "add_value(#{value.inspect}, #{key.inspect})"
end
end
And you'll get the following sequence of events:
hash_start(nil)
hash_start("student-1")
add_value("John Doe", "name")
add_value(42, "age")
array_start("knownAliases")
add_value("Blabby Joe", nil)
add_value("Stack Underflow", nil)
array_end("knownAliases")
hash_start("trainingGrades")
add_value("A+", "Advanced Zumba Dancing")
add_value("C-", "Introduction to Twitter Arguments")
hash_end("trainingGrades")
hash_end("student-1")
hash_start("student-2")
add_value("Rebecca Melecca", "name")
add_value(26, "age")
array_start("knownAliases")
add_value("Booger Becca", nil)
add_value("Tanktop Terror", nil)
array_end("knownAliases")
hash_start("trainingGrades")
add_value("A+", "Intermediate Groin Kickery")
add_value("A+", "Advanced Quantum Mechanics")
hash_end("trainingGrades")
hash_end("student-2")
hash_end(nil)
When you see hash_start(nil), it means the parser has found a top-level object (that very first opening brace). Conversely, hash_end(nil) means that top-level object has been closed, and its innards properly parsed (i.e. no parsing erros have been found).
Parsing in this manner means you have to keep track of nesting, if that's meaningful to you, of adding keys and values at the right value, et cetera. That makes it annoying and hard, but worthwhile if you wish to carve out bits of a large file without committing everything to memory.

Is there a safe way to Eval In ruby? Or a better way to do this?

When a user uses my application, at one point they will get an array of arrays, that looks like this:
results = [["value",25], ["value2",30]...]
The sub arrays could be larger, and will be in a similar format. I want to allow my users to write their own custom transform function that will take an array of arrays, and return either an array of arrays, a string, or a number. A function should look like this:
def user_transform_function(array_of_arrays)
# eval users code, only let them touch the array of arrays
end
Is there a safe way to sandbox this function and eval so a user could not try and execute malicious code? For example, no web callouts, not database callouts, and so on.
First, if you will use eval, it will never be safe. You can at least have a look in the direction of taint method.
What I would recommend is creating your own DSL for that. There is a great framework in Ruby: http://treetop.rubyforge.org/index.html. Of course, it will require some effort from your side, but from the user prospective I think it could be even better.
WARNING: I can not guarantee that this is truly safe!
You might be able to run it as a separate process and use ruby $SAFE, however this does not guarantee that what you get is safe, but it makes it harder to mess things up.
What you then would do is something like this:
script = "arr.map{|e| e+2}" #from the user.
require "json"
array = [1, 2, 3, 4]
begin
results = IO.popen("ruby -e 'require \"json\"; $SAFE=3; arr = JSON.parse(ARGV[0]); puts (#{script}).to_json' #{array.to_json}") do |io|
io.read
end
rescue Exception => e
puts "Ohh, good Sir/Mam, your script caused an error."
end
if results.include?("Insecure operation")
puts "Ohh, good Sir/Mam, you cannot do such a thing"
else
begin
a = JSON.parse(results)
results = a
rescue Exception => e
puts "Ohh, good Sir/Mam, something is wrong with the results."
puts results
end
end
conquer_the_world(results) if results.is_a?(Array)
do_not_conquer_the_world(results) unless results.is_a?(Array)
OR
You could do this, it appears:
def evaluate_user_script(script)
Thread.start {
$SAFE = 4
eval(script)
}
end
But again: I do not know how to get the data out of there.

Map object and nested object to model using Ruby on Rails

I have an object like
{"Result":[{
"Links":[{
"UrlTo":"http://www.example.com/",
"Visited":1364927598,
"FirstSeen":1352031217,
"PrevVisited":1362627231,
"Anchor":"example.com",
"Type":"Text",
"Flag":[],
"TextPre":"",
"TextPost":""
}],
"Index":0,
"Rating":0.001416,
"UrlFrom":"http://www.exampletwo.com",
"IpFrom":"112.213.89.105",
"Title":"Example title",
"LinksInternal":91,
"LinksExternal":51,
"Size":5735
}]}
And I have a model with all of the keys.
UrlTo, Visited, FirstSeen, PrevVisited, Anchor, Type, TextPre, TextPost, Index, Rating, UrlFrom, IpFrom, Title, LinksInternal, LinksExternal, Size
I understand how to save this to the database without this bit below...
"Links":[{
"UrlTo":"http://example.com/",
"Visited":1364927598,
"FirstSeen":1352031217,
"PrevVisited":1362627231,
"Anchor":"example.com",
"Type":"Text",
"Flag":[],
"TextPre":"",
"TextPost":""
}],
Not sure how to save it with a nested object as well.
I had a search on Google and SO and couldn't find anything, what is the correct way to do this? Should I move the nested object into the one above? I have no need for it to be nested...
Thanks in advance
it looks like you want to save links, so I would loop over the Result/Links in the json provided, and create a new hash based on the links.
I've pretended below that your json is in a file called input.json -- but you'd obviously just parse the text or use an existing JSON object
require 'json'
json = JSON.parse File.read("input.json")
links = json["Result"].map do |result|
result["Links"].map {|link| link }
end.flatten
hash = {"Links" => links}
puts hash
This creates the object:
{"Links"=>[{"UrlTo"=>"http://www.example.com/", "Visited"=>1364927598, "FirstSeen"=>1352031217, "PrevVisited"=>1362627231, "Anchor"=>"example.com", "Type"=>"Text", "Flag"=>[], "TextPre"=>"", "TextPost"=>""}]}

Resources