First of all, I am very new to Ruby so go easy on me! I have the parsed JSON in my sourceHash variable and I am trying to group data by the "displayName" property. The JSON format is something like this (I've simplified it without changing the structure):
{
"results": [
{
"id": "12345",
"title": "my blog post",
"history": {
"createdOn": "2017-09-18 15:38:26",
"createdBy": {
"userName": "myUserName",
"displayName": "Michael W."
}
}
},
{ ... same stuff for some other blog post ... },
{ ... same stuff for some other blog post ... },
{ ... same stuff for some other blog post ... }
]
}
Basically, there are two things I want to do.
Imagine this list as "list of blog posts including the author data" of them.
Find the person who posted the most amount of entries
Get the top 10 bloggers, ordered by their blog post count, descending
So the first would look something like this:
Michael W. (51 posts)
However, the second one would look like this:
Michael Wayne (51 posts)
Emilia Clarke (36 posts)
Charlize Theron (19 posts)
Scarlett Johansson (7 posts)
I've played around these queries, trying to merge my LINQ logic into this, but I failed... (I'm a Ruby noob so be easy!)
sourceHash = #mainData["results"]
hashSetPrimary = sourceHash.group_by{|h| h['history']['createdBy']['displayName']}
return hashSetPrimary
So long story short, I am trying to write to separate queries that would group the data by those criteria, any help is appreciated as I can't find any proper way to do it.
Firstly, you need to look at your hash syntax. When you define a hash using h = { "foo": "bar" }, the key is not actually a string, but rather a symbol. Therefore accessing h["foo"] is not going to work (it will return nil); you have to access it as h[:foo].
So addressing that, this does what you need:
sourceHash = #mainData[:results]
hashSetPrimary = sourceHash.group_by{ |h| h.dig(:history, :createdBy, :displayName) }
.map { |k, v| [k, v.count] }
.sort_by(&:last)
return hashSetPrimary
Hash#dig requires Ruby 2.3+. If you are running on a lower version, you can do something like this instead of dig:
h[:history] && h[:history][:createdBy] && h[:history][:createdBy][:displayName]
I am building a Rails 5 backend API which will receive requests from my Ember app. However I'm having some trouble getting Ember to format the request in a way my Rails server understands.
By default, Rails creates controllers to expect parameters in this format, assuming the model is a, say, Car:
"car": {
"id": "1",
"name": "Foo",
"bar": "Bar",
...
}
However it looks like Ember is sending requests in this format:
"data": [
{
id: "1",
type: "cars",
attributes: {
"name: "Foo",
"bar": "Bar",
...
}
]
What can I do to make Ember send request payloads in a way my Rails server will understand? Thank you.
Your Rails is accepting REST adapter format, for that to work properly, your adapter should extend DS.RESTAdapter and serializer should extend DS.RESTSerializer. By default it will comes with JSONAPIAdapter and JSONAPISerializer.
If you are having control over the back end code, then consider writing json-api format response for that ember will work out of the box.
Reference:
https://emberjs.com/api/ember-data/2.14/classes/DS.RESTAdapter
https://emberjs.com/api/ember-data/2.14/classes/DS.RESTSerializer
https://emberjs.com/api/ember-data/2.14/classes/DS.JSONAPIAdapter
https://emberjs.com/api/ember-data/2.14.9/classes/DS.JSONAPISerializer
First time trying to use the Ruby AWS ADK V2 and I am trying to format the data i am getting back and it seems quiet hard getting it into useable format.
All I want to do is get a list of Hosted Zones and display in a table.
I have a helper that has:
def hosted_zones
r53 = Aws::Route53::Client.new
#convert to hash first so we can parse and covert to json
h = (r53.list_hosted_zones).to_hash
j = JSON.parse((h.to_json))
end
which then returns me the following JSON:
{
"hosted_zones": [{
"id": "/hostedzone/Z1HSDGASSSME",
"name": "stagephil.com.",
"caller_reference": "2016-07-12T15:33:45.277646707+01:00",
"config": {
"comment": "Private DNS zone for stage",
"private_zone": true
},
"resource_record_set_count": 10
}, {
"id": "/hostedzone/ZJDGASSS0ZN3",
"name": "stagephil.com.",
"caller_reference": "2016-07-12T15:33:41.290143511+01:00",
"config": {
"comment": "Public DNS zone for stage",
"private_zone": false
},
"resource_record_set_count": 7
}],
"is_truncated": false,
"max_items": 100
}
To which I am running a really but while statement to interact through all the hosted_zone entries into a table.
Is this the best way to get the response or can you request the response to be json already?
Why are you converting a hash to JSON, only to convert it to a hash again? JSON.parse(some_hash.to_json) will just give you some_hash.
That being said, I don't think it is possible to get JSON directly from AWS, mainly due to the fact that their API responds with XML. I think that your solution is ideal if that's all you're doing, but if you want, you can make a request with an HTTP client and then take the XML that you receive and use something like ActiveSupport's Hash.from_xml to create a hash that you can then convert to JSON.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I am getting Log Data from various web applications in the following format:
Session Timestamp Event Parameters
1 1 Started Session
1 2 Logged In Username:"user1"
2 3 Started Session
1 3 Started Challenge title:"Challenge 1", level:"2"
2 4 Logged In Username:"user2"
Now, a person wants to carry out analytics on this log data (And would like to receive it as a JSON blob after appropriate transformations). For example, he may want to receive a JSON blob where the Log Data is grouped by Session and TimeFromSessionStart and CountOfEvents are added before the data is sent so that he can carry out meaningful analysis. Here I should return:
[
{
"session":1,"CountOfEvents":3,"Actions":[{"TimeFromSessionStart":0,"Event":"Session Started"}, {"TimeFromSessionStart":1, "Event":"Logged In", "Username":"user1"}, {"TimeFromSessionStart":2, "Event":"Startd Challenge", "title":"Challenge 1", "level":"2" }]
},
{
"session":2, "CountOfEvents":2,"Actions":[{"TimeFromSessionStart":0,"Event":"Session Started"}, {"TimeFromSessionStart":2, "Event":"Logged In", "Username":"user2"}]
}
]
Here, TimeFromSessionStart, CountOfEvents etc. [Let's call it synthetic additional data] will not be hard coded and I will make a web interface to allow the person to decide what kind of synthetic data he requires in the JSON blob. I would like to provide a good amount of flexibility to the person to decide what kind of synthetic data he wants in the JSON blob.
I am expecting the database to store around 1 Million rows and carry out transformations in a reasonable amount of time.
My question is regarding choice of Database. What will be the relative advantages and disadvantages of using SQL Database such as PostgreSQL v/s using NoSQL Database such as MongoDB. From whatever I have read till now, I think that NoSQL may not be able to provide enough flexibility of adding additional synthetic data. On the other hand, I may face issues of flexibility in data representation if I use SQL Database.
I think the storage requirement for both MongoDB and PostgreSQL will be comparable since I will have to build similar indices (probably!) in both situations to speed up querying.
If I use PostgreSQL, I can store the data in the following manner:
Session and Event can be string, Timestamp can be date and Parameters can be hstore(key value pairs available in PostgreSQL). After that, I can use SQL queries to compute the synthetic (or additional) data, store it temporarily in variables in a Rails Application (which will interact with PostgreSQL database and act as interface for the person who wants the JSON blob) and create JSON blob from it.
Another possible approach is to use MongoDB for storing the log data and use Mongoid as an interface with Rails application if I can get enough flexibility of adding additional synthetic data for analytics and some performance/storage improvements over PostgreSQL. But, in this case, I am not clear of what will be the best way to store log data in MongoDB. Also, I read that MongoDB will be somewhat slower than PostgreSQL and is mainly meant to run in background.
Edit:
From whatever I have read in the past few days, Apache Hadoop seems to be a good choice as well because of it's greater speed over MongoDB (being multi-threaded).
Edit:
I am not asking for opinions and would like to know the specific advantages or disadvantages of using a particular approach. Therefore, I don't think that the question is opinion based.
You should check out logstash / kibana from elasticsearch. The primary use case for that stack is collecting log data, storing it, analyzing it.
http://www.elasticsearch.org/overview/logstash/
http://www.elasticsearch.org/videos/kibana-logstash/
Mongo is a good choice too if you are looking at building it all yourself, but I think you could find that the products from elasticsearch very well could solve your needs and allow the integration you need.
MongoDB is well suited to your task and its document storage is more flexible than the rigid SQL table structure.
Below, please find a working test using Mongoid
that demonstrates comprehension of your log data input,
easy storage as documents in a MongoDB collection,
and analytics using MongoDB's aggregation framework.
I've chosen to put the parameters in a sub-document.
This matches your sample input table more closely and simplifies the pipeline.
The resulting JSON is slightly modified,
but all of the specified calculations, data, and grouping is present.
I've added a test for an index on parameter Username that demonstrates an index on a subdoc field.
This is adequate for specific fields that you want to index,
but a completely general index can't be done on keys, you would have to restructure to values.
I hope that this helps and that you like it.
test/unit/log_data_test.rb
require 'test_helper'
require 'json'
require 'pp'
class LogDataTest < ActiveSupport::TestCase
def setup
LogData.delete_all
#log_data_analysis_pipeline = [
{'$group' => {
'_id' => '$session',
'session' => {'$first' => '$session'},
'CountOfEvents' => {'$sum' => 1},
'timestamp0' => {'$first' => '$timestamp'},
'Actions' => {
'$push' => {
'timestamp' => '$timestamp',
'event' => '$event',
'parameters' => '$parameters'}}}},
{'$project' => {
'_id' => 0,
'session' => '$session',
'CountOfEvents' => '$CountOfEvents',
'Actions' => {
'$map' => { 'input' => "$Actions", 'as' => 'action',
'in' => {
'TimeFromSessionStart' => {
'$subtract' => ['$$action.timestamp', '$timestamp0']},
'event' => '$$action.event',
'parameters' => '$$action.parameters'
}}}}
}
]
#key_names = %w(session timestamp event parameters)
#log_data = <<-EOT.gsub(/^\s+/, '').split(/\n/)
1 1 Started Session
1 2 Logged In Username:"user1"
2 3 Started Session
1 3 Started Challenge title:"Challenge 1", level:"2"
2 4 Logged In Username:"user2"
EOT
docs = #log_data.collect{|line| line_to_doc(line)}
LogData.create(docs)
assert_equal(docs.size, LogData.count)
puts
end
def line_to_doc(line)
doc = Hash[*#key_names.zip(line.split(/ +/)).flatten]
doc['session'] = doc['session'].to_i
doc['timestamp'] = doc['timestamp'].to_i
doc['parameters'] = eval("{#{doc['parameters']}}") if doc['parameters']
doc
end
test "versions" do
puts "Mongoid version: #{Mongoid::VERSION}\nMoped version: #{Moped::VERSION}"
puts "MongoDB version: #{LogData.collection.database.command({:buildinfo => 1})['version']}"
end
test "log data analytics" do
pp LogData.all.to_a
result = LogData.collection.aggregate(#log_data_analysis_pipeline)
json = <<-EOT
[
{
"session":1,"CountOfEvents":3,"Actions":[{"TimeFromSessionStart":0,"Event":"Session Started"}, {"TimeFromSessionStart":1, "Event":"Logged In", "Username":"user1"}, {"TimeFromSessionStart":2, "Event":"Started Challenge", "title":"Challenge 1", "level":"2" }]
},
{
"session":2, "CountOfEvents":2,"Actions":[{"TimeFromSessionStart":0,"Event":"Session Started"}, {"TimeFromSessionStart":2, "Event":"Logged In", "Username":"user2"}]
}
]
EOT
puts JSON.pretty_generate(result)
end
test "explain" do
LogData.collection.indexes.create('parameters.Username' => 1)
pp LogData.collection.find({'parameters.Username' => 'user2'}).to_a
pp LogData.collection.find({'parameters.Username' => 'user2'}).explain['cursor']
end
end
app/models/log_data.rb
class LogData
include Mongoid::Document
field :session, type: Integer
field :timestamp, type: Integer
field :event, type: String
field :parameters, type: Hash
end
$ rake test
Run options:
# Running tests:
[1/3] LogDataTest#test_explain
[{"_id"=>"537258257f11ba8f03000005",
"session"=>2,
"timestamp"=>4,
"event"=>"Logged In",
"parameters"=>{"Username"=>"user2"}}]
"BtreeCursor parameters.Username_1"
[2/3] LogDataTest#test_log_data_analytics
[#<LogData _id: 537258257f11ba8f03000006, session: 1, timestamp: 1, event: "Started Session", parameters: nil>,
#<LogData _id: 537258257f11ba8f03000007, session: 1, timestamp: 2, event: "Logged In", parameters: {"Username"=>"user1"}>,
#<LogData _id: 537258257f11ba8f03000008, session: 2, timestamp: 3, event: "Started Session", parameters: nil>,
#<LogData _id: 537258257f11ba8f03000009, session: 1, timestamp: 3, event: "Started Challenge", parameters: {"title"=>"Challenge 1", "level"=>"2"}>,
#<LogData _id: 537258257f11ba8f0300000a, session: 2, timestamp: 4, event: "Logged In", parameters: {"Username"=>"user2"}>]
[
{
"session": 2,
"CountOfEvents": 2,
"Actions": [
{
"TimeFromSessionStart": 0,
"event": "Started Session",
"parameters": null
},
{
"TimeFromSessionStart": 1,
"event": "Logged In",
"parameters": {
"Username": "user2"
}
}
]
},
{
"session": 1,
"CountOfEvents": 3,
"Actions": [
{
"TimeFromSessionStart": 0,
"event": "Started Session",
"parameters": null
},
{
"TimeFromSessionStart": 1,
"event": "Logged In",
"parameters": {
"Username": "user1"
}
},
{
"TimeFromSessionStart": 2,
"event": "Started Challenge",
"parameters": {
"title": "Challenge 1",
"level": "2"
}
}
]
}
]
[3/3] LogDataTest#test_versions
Mongoid version: 3.1.6
Moped version: 1.5.2
MongoDB version: 2.6.1
Finished tests in 0.083465s, 35.9432 tests/s, 35.9432 assertions/s.
3 tests, 3 assertions, 0 failures, 0 errors, 0 skips
MongoDB is an ideal database for this.
Create a collection for your raw log data.
Use one of Mongo's powerful aggregation tools, and output the aggregated data to another collection (or multiple output collections, if you want different buckets or views of the raw data)
You can either do the agg offline, with a set of pre-determined possibilities that users can pull from, or do it on demand/ad hoc, if you can tolerate some latency in your response.
http://docs.mongodb.org/manual/aggregation/
I'm working to have Rails 3 respond with a JSON request which will then let the app output the search results with the jQuery template plugin...
For the plugin to work, it needs this type of structure:
[
{ title: "The Red Violin", url: "/adadad/123/ads", desc: "blah yada" },
{ title: "Eyes Wide Shut", url: "/adadad/123/ads", desc: "blah yada" },
{ title: "The Inheritance", url: "/adadad/123/ads", desc: "blah yada" }
]
In my Rails 3 controller, I'm getting the search results which come back as #searchresults, which contain either 0 , 1 , or more objects from the class searched.
My question is how to convert that to the above structure (JSON)...
Thank you!
Update
Forgot to mention. The front-end search page will need to work for multiple models which have different db columns. That's why I'd like to learn how to convert that to the above to normalize the results, and send back to the user.
I am not really sure what is the problem here, since you can always call ".to_json" on every instance or collection of instances or hash, etc.
You can use .select to limit the number of fields you need, ie:
Object.select(:title, :url, :desc).to_json
I am guessing that the #searchresults is ActiveRecord::Relation, so you probably can use:
#searchresults.select(:title, :url, :desc).to_json