High Scalability app- Photos based and videos - ruby-on-rails

Background - My app is related to Chef where people will post pics and videos. These numbers would be huge.
Tech stack - RoR, nginx, Postgresql, AWS.
My question :
Since there would be lot of files, what would be the suggestion. Should I use s3 to store the images. Have CDN implemented?
I would be tagging the files so that they would be searched on those tags. What is the best way to store the tags?
Load Balancing?
I was looking at how FB is storing the images and videos. Not sure if I have to do it because mine is a small app.
P.S - I have already done the MVP with basic s3 setup but there are performance issues.
I was wondering to use another DB since images and videos are unstructured. (Correct me if I am wrong)
I am posting this because I have an actual working problem.

This is a vast question, and answers may vary depending on different opinions and experiences. But On the lines that you are thinking, AWS S3 is a good option to manage files, especially media. You can also search through s3, and associate meta information with files which you might want to call tags. To Scale it more for reads, you can also setup elastic search or opensearch server to index the image distributions in s3. The best way is indeed indexing, whatever tools or platforms you use. But Using indexing cant be achieved always, due to project constraints including costs. Infact you might not even want CDN because S3 is itelf geographically distributed in wider regions and you can leverage that.
Edit---------
Yes, you index the stored images information. You can set up elastic search server whose index will always be updated. If you are not keen on investing in elasticsearch, you can setup a simple database of your choice without worrying future's future's issues. May be you can setup NoSQL (DynamoDB AWS). On every store in s3, create a trigger to update the DynamoDB with its information storing tags and other meta information. You can retrieve that information and display image from s3 URI stored in that DynamoDB map.

Related

Best practice for storing and updating images in Xcode project

I am building my flash card iOS app for reviewing my learning Japanese using SwiftUI language.
The problem is how to storing and updating my images(>500 images). Please help me, any suggestion is appreciated, thanks for reading my post.
I think you're asking about how to manage 500+ images in an Xcode project. You could just add all the images to your project and load them as you would any image. You could use asset catalogs, which have the advantage that they let you store different versions of a resource for use on different devices, and only the ones needed for the device the app runs on will actually be installed on the device. See How Many Images Can/Should you Store in xcassets in Xcode? and Asset Catalog Format Reference for more information about asset catalogs. But any way you slice it, managing 500+ images is going to be cumbersome. There's probably a better way...
Managing all those images in your app isn't just a problem for you as the developer; building them into the app will also create problems for the user. Even if each image is relatively small, having hundreds of them in the app will probably make the app huge. That means it'll take a long time to install, and the app will use a lot of storage on the device. Every time you release a new version of the app, with more words, or even just to fix a few small bugs, the user will have to download all that data all over again.
Instead, you should consider building an app that can fetch the data it needs from a server. Ideally, you could apply that approach to all your app's data, not just the images. Maybe you'll organize your flash cards into sets of a few dozen, so that you can fetch a set of cards and the associated images pretty quickly, and sets that the user hasn't used for a while can be removed to free up space on the device. You'll be able to update a set of flash cards without having to update the app, and when you do update the app your users won't have to download all the data all over again.
You've said that you're a beginner, so this approach might seems very difficult. That's OK, you can start with a simpler approach and then improve as you go along. For example, you might just put all the images on a server and fetch them one at a time as you need them. Your flash card data file could contain just a dictionary with words and the URLs associated with those words. There are lots of examples of loading an image from an URL here on SO and elsewhere, so I'm not going to provide code for that, but it won't be hard to find. The earlier you start thinking about how to design your app so that it can scale as you add more and more words, the easier it will be to maintain the app later.
500 images can have a huge size. Applications that published on Appstore have size limit and Apple does not recommend to make big apps.
Store them on server and load needed images on fly. Also you will get possibility to update your images, remove add new.
If you don't have a backend, you can use something easy and free (Firebase storage for example) or with minimal code writing on AWS.
If you need to keep them on device - store them as files in the Documents or another apps folder, do not use CoreData for it (you can keep only the list of names/urls in database).
After loading image to be displayed for user, you can prefetch next bunch of images.
Use Alamofire, or SDWebImage to load images from network (I prefer last). These frameworks can do many useful things with images.
To load images:
you can have a list of your images (just list of the names and urls)
or
you can know only path and names pattern and generate links dynamically (like https://myserver/imageXXX.png.

Firebase & Swift: How to use another database for storing larger files?

I am currently trying to build a chat application in swift whilst using Firebase for real-time messaging. My only issue is I want users to send images, I want them to have profiles with images but I know Firebase has limited storage (or at least storage per pay tier is low for the number of connections you get)
So I would like to know how to connect up another database and make calls when needed between the two. So when and image is sent in a message, rather than Firebase storing the image, it stores a URL to the image in the other database.
I am under the impression something like AWS S3 is my best bet. any help is appreciated!
This question has been asked before and there are a number of solutions. It's kind of an 'opinion' type question but here are a few options.
View and store images in Firebase
Firebase has a 10Mb capacity, which is adequate for many images. However, if you need larger, they can be easily encoded as base64 and split into chunks.
If you want to go external:
s3 or Filepicker (Filestack) as well as Google provide some options.
Not sure of the overall requirements but obviously you can dig into CloudKit / CoreData and even Dropbox offers an API.
I have zero experience with Box but it may be an option as well.
Each option has it's own API.
In general, you would store a link in a firebase node to the image/object in question. However, the mechanics of that vary wildly as interfacing with CloudKit/CoreData will be different than say Filepicker.
With CoreData you will have to roll your own reference scheme whereas Filepicker you can have an almost direct reference to the file.
Many of these services provide a free or low cost tryouts and you can whip some code up in a manner of a few minutes to test out the functionality to see if it may meet your requirement.
If you need help encoding/decoding, see the answer to this question
Swift2 retrieving images from Firebase
Once you get rolling, if you have issues post some code in another question.

Need assistance choosing a image management gem

I am interested in building a Rails based system for handling the display and organization of large amounts of photos. This is sort of like Flickr but smaller. Each photo will have metadata associated with it. Photos will be shown in a selectable list and grid view. It would be nice to be able to load images as they are needed as well (as this would probably speed things up).
At the moment I have a test version of my database working by images loading from the assets/images directory but it is beginning to run slow when displaying several images (200-600 images). This is due to the way I have my view setup. I am using a straight loop to display the images in both list and grid layouts.
I also manually resized the thumbnails and a medium sized image from a full sized source image. I am investigating other resizing methods. Any advice is appreciated here as well.
As I am new to handling the images this way, could someone point me in a direction based on experience designing and implementing something like Flickr?
I am investigating the following tools:
Paperclip
http://railscasts.com/episodes/134-paperclip
Requirements: ImageMajick
attachment_fu
http://clarkware.com/blog/2007/02/24/file-upload-fu#FileUploadFu
Requirement: One of the following: ImageScience, RMagick, miniMagick, ImageMajick?
CarrierWave
http://cloudinary.com/blog/ruby_on_rails_image_uploads_with_carrierwave_and_cloudinary
http://cloudinary.com/blog/advanced_image_transformations_in_the_cloud_with_carrierwave_cloudinary
I'd go with Carrierwave anyday. It is very flexible and has lot of useful strategies. It generates it's on Uploader class and has all nifty and self explanatory features such as automatic generation of thumbnails (as specified by you), blacklisting, formatting image, size constraints etc; which you can put to your use.
This Railscast by Ryan Bates - http://railscasts.com/episodes/253-carrierwave-file-uploads is very useful, if you haven't seen it already.
Paperclip and CarrierWave are totally appropriate tools for the job, and the one you choose is going to be a matter of personal preference. They both have tons of users and active, ongoing development. The difference is whether you'd prefer to define your file upload rules in a separate class (CarrierWave), or if you'd rather define them inline in your model (Paperclip).
I prefer CarrierWave, but based on usage it's clear plenty of people feel otherwise.
Note that neither gem is going to do anything for your slow view with 200-600 images. These gems are just for handling image uploads, and don't help you with anything beyond that.
Note also that Rails is really pretty bad at handling file uploads and file downloads, and you should avoid this where possible by letting other services (a cdn, your web server, s3, etc) handle these for you. The central gotcha is that if you handle a file transfer with rails, your entire web application process is busy for the duration of the transfer. (For related discussion on this topic, see: Best Ruby on Rails Architecture for Image Heavy App).

Rails 3 - downloading large amounts of picture from the web

I am solving the question about downloading a large amounts of picture from the web. I have XML file with links to images on web sites and I have to download this images. Every image have ca ±3MB and the count of images is tens of thousands, so is not possible to store these images on hosting (100.000 x 3MB)...
And these images I need to display on sites. I don't worked yet with so large amounts of data yet, so I would like to ask you, what could be the best idea for displaying these images on "my" page.
My first ideas:
- store only links into my database table and then for displaying images use just image_tag URL_OF_IMAGE
- some a way of caching images/links of images (I don't know specifically)
Can you help me, please, what you think will be the fastest way for displaying images from foreigners sources?
Thank you in advance,
M.
Amazon S3, but as ezkl says, make sure you have licensing rights to the images or you'll land yourself in trouble (and in any case it is wrong to use the content without permission).

How should I go about providing image previews of sites while using Google's Web Search API?

I'm using Google's Custom Search API to dynamically provide web search results. I very intensely searched the API's docs and could not find anything that states it grants you access to Google's site image previews, which happen to be stored as base64 encodes.
I want to be able to provide image previews for sites for each of the urls that the Google web search API returns. Keep in mind that I do not want these images to be thumbnails, but rather large images. My question is what is the best way to go about doing this, in terms of both efficiency and cost, in both the short and long term.
One option would be to crawl the web and generate and store the images myself. However this is way beyond my technical ability, and plus storing all of these images would be too expensive.
The other option would be to dynamically fetch the images right after Google's API returns the search results. However where/how I fetch the images is another question.
Would there be a low cost way of me generating the images myself? Or would the best solution be to use some sort of site thumbnailing service that does this for me? Would this be fast enough? Would it be too expensive? Would the service provide the image in the correct size for me? If not, how could I change the size of the image?
I'd really appreciate answers that are comprehensive and for any code examples to be in ruby using rails.
So as you pointed out in your question, there are two approaches that I can see to your issue:
Use an external service to render and host the images.
Render and host the images yourself.
I'm no expert in field, but my Googling has so far only returned services that allow you to generate thumbnails and not full-size screenshots (like the few mentioned here). If there are hosted services out there that will do this for you, I wasn't able to find them easily.
So, that leaves #2. For this, my first instinct was to look for a ruby library that could generate an image from a webpage, which quickly led me to IMGKit (there may be others, but this one looked clean and simple). With this library, you can easily pass in a URL and it will use the webkit engine to generate a screenshot of the page for you. From there, I would save it to wherever your assets are stored (like Amazon S3) using a file attachment gem like Paperclip or CarrierWave (railscast). Store your attachment with a field recording the original URL you passed to IMGKit from WSAPI (Web Search API) so that you can compare against it on subsequent searches and use the cached version instead of re-rendering the preview. You can also use the created_at field for your attachment model to throw in some "if older than x days, refresh the image" type logic. Lastly, I'd put this all in a background job using something like resque (railscast) so that the user isn't blocked when waiting for screenshots to render. Pass the array of returned URLs from WSAPI to background workers in resque that will generate the images via IMGKit--saving them to S3 via paperclip/carrierwave, basically. All of these projects are well-documented, and the Railscasts will walk you through the basics of the resque and carrierwave gems.
I haven't crunched the numbers, but you can against hosting the images yourself on S3 versus any other external provider of web thumbnail generation. Of course, doing it yourself gives you full control over how the image looks (quality, format, etc.), whereas most of the services I've come across only offer a small thumbnail, so there's something to be said for that. If you don't cache the images from previous searches, then your costs reduces even further, since you'll always be rendering the images on the fly. However I suspect that this won't scale very well, as you may end up paying a lot more for server power (for IMGKit and image processing) and bandwidth (for external requests to fetch the source HTML for IMGKit). I'd be sure to include some metrics in your project to attach some exact numbers to the kind of requests you're dealing with to help determine what the subsequent costs would be.
Anywho, that would be my high-level approach. I hope it helps some.
Screen shotting web pages reliably is extremely hard to pull off. The main problem is that all the current solutions (khtml2png, CutyCapt, Phantom.js etc) are all based around QT which provides access to an embedded Webkit library. However that webkit build is quite old and with HTML5 and CSS3, most of the effects either don't show, or render incorrectly.
One of my colleagues has used most, if not all, of the current technologies for generating screenshots of web pages for one of his personal projects. He has written an informative post about it here about how he now uses a SaaS solution instead of trying to maintain a solution himself.
The TLDR version; he now uses URL2PNG to do all his thumbnail and full size screenshots. It isn't free, but he says that it does the job for him. If you don't want to use them, they have a list of their competitors here.

Resources