How much is too much flattening with firebase data? - ios

If I'm looking to have posts that can can be replied to, which is a better structure?
1)
posts
908239409234
postText: "Whats up peeps?"
replies:
09283049830294: true
a9s0dif09iasd9: true
replies
09283049830294
text: "Nm breh"
imageURL: nil
a9s0dif09iasd9
text: "Nm breh"
imageURL: nil
or 2)
posts
908239409234
postText: "Whats up peeps?"
replies:
09283049830294
text: "Nm breh"
imageURL: nil
a9s0dif09iasd9
text: "Nm breh"
imageURL: nil
I see so many examples of databases that look like #1 where you store references to posts stored somewhere else to support flattening, but I don't see any advantages to not just going with option 2 if it can be done?
If the user is joining in on a post, they'll have the posts uid, and they can just add in under the "replies" with an autoID.
TL;DR, is it better to go with the flatter method or the method that is seemingly more efficient and requires searching through less information? Is there any reason to not go for option 2?

I would go with no. 2.
The reason is that with no. 1 you will have to do a query to get the reply data based on IDs.
With firebase it is always better to write more and read less.

It depends on whether you will reference replies at some points later. If your replies only relate to your post and no more, then option 2 is optimal because you will not need to fetch the replies. However, if you would like to have more complex feature that will query for replies that relate to some users for example, it will be better to store a reference to each reply as option 1. This is from experience with my app that kept growing and at some points, it is too slow to fetch a node because it has too many children, grandchildren, etc. and I had to flatten the database to make it fetch faster. I would suggest to think about the architecture of your app carefully.

Related

Is there a quicker way to extract data from Sawyer::Resource via Octokit.rb GitHub API call?

I'm using Octokit.rb to search GitHub users and the response returns a Sawyer::Resource object. I'm currently accessing the data this way:
[].tap do |users|
#results.items.each do |item|
user = item.rels[:self].get.data
user = {
location: user.location,
username: user.login,
name: user.name,
email: user.email
}
users << user
end
end
I'd like to iterate over the users array created and display the results, however, right now the method takes extremely long as a result of accessing the data thru #rels[:self].get.data and I'm not sure what to do. Any help would be greatly appreciated!
Hey so I started messing around with the Octokit.rb library yesterday after I saw your question and I actually ran into the same issue as Jason pointed out. You're on the right track about using concurrent requests. I'm not sure about the rate limit being an issue and if it is you could always contact Github and ask if they could up your limit. If you're still having issues I would recommend using the rest-more gem, which uses rest-core to make concurrent requests. Its really simple to setup just read the docs.

Notifications or user activity log implementation

The application has certain actions that are recorded for each user and recorded in db table notifications. The table has the following structure:
user_id, notification_type, credit, timestamp
notification_type does not store the entire text of the notification but just stores a short type description.
Later when the user wants to view his notifications I use a helper method from my view to fetch the actual text.
def notification_text(type)
case type_id
when 'flagPositive'
return 'A question you flagged has been marked as correct.'
when 'qAccepted'
return 'A question you added has been accepted.'
when 'qModerated'
return 'You moderated a question.'
when 'flagReport'
return 'You moderated a flag.'
end
end
1) Is this an optimum way to do this?
2) Should I replace the type_description with integer values (say 1 -> flagPositive, 2-> qAccepted) for performance benefits?
3) Are there any best practices around the same that I should be following?
1) This highly depends on your application and requirements. What I can say is that I used this approach sometimes and faced no problems so far.
2) If you see a performance problem with the string lookup, you could do so. A general recommendation is to optimize performance only when really needed.
3) Just google for "Ruby", "Rails", "ActiveRecord" and "Enum". You'll find lots of discussions about different solutions for this kind of problem. There are similar questions on this site, e.g., Enums in Ruby or In Rails, how should I implement a Status field for a Tasks app - integer or enum?

Magento SalesOrderList... is there a ligth weight version of this, or a way to trim down the returned value

I am attempting to get all the orders from a magento instance. Once a day we grab all the orders.. (sometimes a few thousand)
Extra stuff that's more why I ask:
I'm using ruby-on-rails to grab the orders. This involves sending the soap call to the magento instance. It's easy as.
Once I have the response, I convert it into a Hash (a tree) and then pick out the increment id's of the orders and proceed to call getOrder with the increment id.
I have two problems with what's going on now, one operational, and one religious.
Grabbing the XML response to the list request takes really really long and when you tack on the work involved in converting the XML to a hash, I'm seeing a really slow processes.
The religious bit is that I just want the increment_ids so why do I have to pay for the processing/bandwidth to support a hugely bloated response.
Ok so the question...
Is there a way to set the response returned from Magento, to include only specific fields? Only the updated_at and the increment_id for instance.
If not, is there another call I'm not aware of, that can get just the increment_ids and date?
Edit
Below is an example of what I'm looking for from magento but it's for ebay. I send this xml up to ebay, and get back a really really specific bit of info about the product. It works for orders and such too. I can say "only this" and get just that. I want the same from Magento
<GetItemRequest xmlns="urn:ebay:apis:eBLBaseComponents">
<SKU>b123-332</SKU><OutputSelector>ItemId</OutputSelector>
</GetItemRequest>
I've created a rubygem that gives you your salesOrderList response in the form of a hash, and you can do what you want with the orders after you've received them back (i.e. select the fields you want including increment_id). Just run
gem install magento_api_wrapper
To do what you want to do, you would do something like this:
api = MagentoApiWrapper::Sales.new(magento_url: "yourmagentostore.com/index.php", magento_username: "soap_api_username", magento_api_key: "userkey123")
orders = api.order_list(simple_filters: [{key: "status" value: "complete"}])
orders.map {|o| [o.increment_id, o.items.first.sku] }
Rough guess, but you get the idea. You would get the array of hashes back and you can do what you want with them after that. Good luck!

Rails - given an array of Users - how to get a output of just emails?

I have the following:
#users = User.all
User has several fields including email.
What I would like to be able to do is get a list of all the #users emails.
I tried:
#users.email.all but that errors w undefined
Ideas? Thanks
(by popular demand, posting as a real answer)
What I don't like about fl00r's solution is that it instantiates a new User object per record in the DB; which just doesn't scale. It's great for a table with just 10 emails in it, but once you start getting into the thousands you're going to run into problems, mostly with the memory consumption of Ruby.
One can get around this little problem by using connection.select_values on a model, and a little bit of ARel goodness:
User.connection.select_values(User.select("email").to_sql)
This will give you the straight strings of the email addresses from the database. No faffing about with user objects and will scale better than a straight User.select("email") query, but I wouldn't say it's the "best scale". There's probably better ways to do this that I am not aware of yet.
The point is: a String object will use way less memory than a User object and so you can have more of them. It's also a quicker query and doesn't go the long way about it (running the query, then mapping the values). Oh, and map would also take longer too.
If you're using Rails 2.3...
Then you'll have to construct the SQL manually, I'm sorry to say.
User.connection.select_values("SELECT email FROM users")
Just provides another example of the helpers that Rails 3 provides.
I still find the connection.select_values to be a valid way to go about this, but I recently found a default AR method that's built into Rails that will do this for you: pluck.
In your example, all that you would need to do is run:
User.pluck(:email)
The select_values approach can be faster on extremely large datasets, but that's because it doesn't typecast the returned values. E.g., boolean values will be returned how they are stored in the database (as 1's and 0's) and not as true | false.
The pluck method works with ARel, so you can daisy chain things:
User.order('created_at desc').limit(5).pluck(:email)
User.select(:email).map(&:email)
Just use:
User.select("email")
While I visit SO frequently, I only registered today. Unfortunately that means that I don't have enough of a reputation to leave comments on other people's answers.
Piggybacking on Ryan's answer above, you can extend ActiveRecord::Base to create a method that will allow you to use this throughout your code in a cleaner way.
Create a file in config/initializers (e.g., config/initializers/active_record.rb):
class ActiveRecord::Base
def self.selected_to_array
connection.select_values(self.scoped)
end
end
You can then chain this method at the end of your ARel declarations:
User.select('email').selected_to_array
User.select('email').where('id > ?', 5).limit(4).selected_to_array
Use this to get an array of all the e-mails:
#users.collect { |user| user.email }
# => ["test#example.com", "test2#example.com", ...]
Or a shorthand version:
#users.collect(&:email)
You should avoid using User.all.map(&:email) as it will create a lot of ActiveRecord objects which consume large amounts of memory, a good chunk of which will not be collected by Ruby's garbage collector. It's also CPU intensive.
If you simply want to collect only a few attributes from your database without sacrificing performance, high memory usage and cpu cycles, consider using Valium.
https://github.com/ernie/valium
Here's an example for getting all the emails from all the users in your database.
User.all[:email]
Or only for users that subscribed or whatever.
User.where(:subscribed => true)[:email].each do |email|
puts "Do something with #{email}"
end
Using User.all.map(&:email) is considered bad practice for the reasons mentioned above.

Removing duplicates from array before saving

I periodically fetch the latest tweets with a certain hashtag and save them locally. In order to prevent saving duplicates, I use the method below. Unfortunately, it does not seem to be working... so what's wrong with this code:
def remove_duplicates
before = #tweets.size
#tweets.delete_if {|tweet| !((Tweet.all :conditions => { :twitter_id => tweet.twitter_id}).empty?) }
duplicates = before - #tweets.size
puts "#{duplicates} duplicates found"
end
Where #tweets is an array of Tweet objects fetched from twitter. I'd appreciate any solution that works and especially one that might be more elegant...
you can validate_uniqueness_of :twitter_id in the Tweet model (where this code should be). This will cause duplicates to fail to save.
Since it sounds like you're using the Twitter search API, a better solution is to use the since_id parameter. Keep track of the last twitter status id you got from your previous query and use that as the since_id parameter on your next query.
More information is available at Twitter Search API Method: search
array.uniq!
Removes duplicate elements from self. Returns nil if no changes are made (that is, no duplicates are found).
Ok, turns out the problem was a bit of different nature: When looking closer into it, I found out that multipe Tweets were saved with the twitter_id 2147483647... This is the upper limit for integer fields :)
Changing the field to bigint solved the problem. It took me very long to figure out since MySQL did silently fail and just reverted to the maximum value as long as it could. (until I added the unique index). I quickly tried it out with postgres, which returned a nice "Integer out of range" error, which then pointed me to the real cause of the problem here.
Thanks Ben for the validation and indexing tips, as they lead to much cleaner code now!

Resources