How can I store images in my heroku postgres DB - ruby-on-rails

I've found some similar questions, and good link on the topic. Such as this one (Easiest way to store images in database on Heroku?), where the top answer is to re-think using their DB and instead go with S3.
What I need is store 120 100KB images. That's it. There is no dynamic aspect. It's not like I have 10,000 users, and each one needs to store their profile picture. None of that. I just need to store 120 100KB images. The amount of images will never never change, neither growing or shrinking. 3 years from now, the same 120 images will be all that there is.
For these reasons, I want to store these in my DB. S3 is overkill, could cost some, extra time and cost needed implementing that solution, etc. In my DB, it's a measly 12MB which is a fraction of a fraction of the DB's total size.
How can I store the images? What datatype should I use and how can I upload them into my DB?

Just store them as bytea fields in a table. Insert them using a script in whatever your preferred language is and its database adapter. e.g. using Python with psycopg2 you'd open('filename','rb') the file, then pass it as a query parameter to the execute method when doing your INSERT.
For images that small there's no point using pg_largeobject and the lo wrapper, which can be useful for really big files. The only real advantage of that is that you can use lo_import to read the files directly into the database.

Related

How to manage millions of tiny HTML files

I have taken over a project which is a Ruby on Rails app that basically crawls the internet (in cronjobs). It crawls selected sites to build historical statistics about their subject over time.
Currently, all the raw HTML it finds is kept for future use. The HTML is saved into a Postgres database table and is currently at about 1,2 million records taking up 16 gigabytes and growing.
The application currently reads the HTML once to extract the (currently) relevant data and then flags the row as processed. However, in the future, it may be relevant to reprocess ALL files in one job to produce new information.
Since the tasks in my pipeline includes setting up backups, I don't really feel like doing daily backups of this growing amount of data, knowing that 99,9% of the backup is just static records with a practically immutable column with raw HTML.
I want this HTML out of the PostgreSQL database and into some storage. The storage must be easily manageable (inserts, reads, backups) and relatively fast. I (currently) have no requirement to search/query the content, only to resolve it by a key.
The ideal solution should allow me to insert and get data very fast, easy to backup, perhaps incrementally, perhaps support batch reads and writes for performance.
Currently, I am testing a solution where I push all these HTML files (about 15-20 KB each) to Rackspace Cloudfiles, but it takes forever and the access time is somewhat slow and relatively inconsistent (between 100 ms and several seconds). I estimate that accessing all files sequentially from rackspace could take weeks. That's not an ideal foundation for devising new statistics.
I don't believe Rackspace's CDN is the optimal solution and would like to get some ideas on how to tackle this problem in a graceful way.
I know this question has no code and is more a question about architecture and database/storage solutions, so it may be treading on the edges of SO's scope. If there's a more fitting community for the question, I'll gladly move it.

How many records i can store in coredata

First time I make a network hit to (sql)sever to get the table data having 14 fields with image blob mostly. It has above 2 hundred thousand of records in table.
Can we store the 2 hundred thousand records in local database of device using core data.
Best way to place images in local file / DB. (or) we can using remote images loading.
Should work offline
Please suggest the best way of possible to fill this above requirement.
Storing 200k records in core data is not a problem in itself as long as you do the initial importing of those records correctly. Make sure you implement you update-or-insert properly otherwise your users will have to wait proportional to N^2. Apple suggests a nice implementation for this: https://developer.apple.com/library/mac/documentation/cocoa/conceptual/coredata/articles/cdimporting.html
Then once you have the local data, you probably need to fine tune the batch size of your fetch requests, but that's a good idea to do, even if you don't have 200k records.
As for the images, never ever store them in Core Data as binary blobs. Always store them as normal files on disk and store their path in Core Data to access them later on.

Loading images from AppBundle vs. CoreData

I'm making a catalog where the cells in my collection view will be either an image with a label or a pdf. There will be many collections and they themselves will be static. I want the user to be able to save the cells he likes and view them in his own custom view.
1) I could to store the image as data in Core Data.
2) I could just include the image in my App Bundle and load the image from there every time my app starts.
I've got it into to my head that reading data from a Core Data Store would give me more options when building my app as well as offer some boost in performance as opposed to reading it from the app bundle. Is that true? Keeping in mind of course that most of the data is static.
It seems inefficient to have images both serialized images in my app bundle and the pure data as well.
I think I'd rather have it all in the store but they have to be loaded from the bundle at some point in code right?
I'd love to know how other developers do it.
Now in Core Data there is an "allows external storage" option for binary data, which basically means if your file is bigger than 1 MB it will be stored automatically outside of your database, and you have to do nothing differently. In my opinion that's the way to get the best of both worlds, increased performance + automatization + fast queries (although they are slower than usual when you allow external storage, but still faster than doing it yourself)

serve my text from the filesystem instead of a database?

I am working on a content management application in which the data being stored on the database is extremely generic. In this particular instance a container has many resources and those resources map to some kind of digital asset, whether that be a picture, a movie, an uploaded file or even plain text.
I have been arguing with a colleague for a week now because in addition to storing the pictures, etc - they would like to store the text assets on the file system and have the application look up the file location(from the database) and read in the text file(from the file system) before serving to the client application.
Common sense seemed to scream at me that this was ridiculous and if we are bothering to look up something from the database, we might as well store the text in a database column and have it served along up with the row lookup. Database lookup + File IO seemed sounds uncontrollably slower then just Database Lookup. After going back and forth for some time, I decided to run some benchmarks and found the results a little surprising. There seems to be very little consistency when it comes to benchmark times. The only clear winner in the benchmarks was pulling a large dataset from the database and iterating over the results to display the text asset, however pulling objects one at a time from the database and displaying their text content seems to be neck and neck.
Now I know the limitations of running benchmarks, and I am not sure I am even running the correct idea of "tests" (for example, File system writes are ridiculously faster then database writes, didn't know that!). I guess my question is for confirmation. Is File I/O comparable to database text storage/lookup? Am I missing a part of the argument here? Thanks ahead of time for your opinions/advice!
A quick work about what I am using:
This is a Ruby on Rails application,
using Ruby 1.8.6 and Sqlite3. I plan
on moving the same codebase to MySQL
tomorrow and see if the benchmarks are
the same.
The major advantage you'll get from not using the filesystem is that the database will manage concurrent access properly.
Let's say 2 processes need to modify the same text as the same time, synchronisation with the filesystem may lead to race conditions, whereas you will have no problem at all with everyhing in database.
I think your benchmark results will depend on how you store the text data in your database.
If you store it as LOB then behind the scenes it is stored in an ordinary file.
With any kind of LOB you pay the Database lookup + File IO anyway.
VARCHAR is stored in the tablespace
Ordinary text data types (VARCHAR et al) are very limited in size in typical relational database systems. Something like 2000 or 4000 (Oracle) sometimes 8000 or even 65536 characters. Some databases support long text
but these have serious drawbacks and are not recommended.
LOBs are references to file system objects
If your text is larger you have to use a LOB data type (e.g. CLOB in Oracle).
LOBs usually work like this:
The database stores only a reference to a file system object.
The file system object contains the data (e.g. the text data).
This is very similar to what your colleague proposes except the DBMS lifts the heavy work of
managing references and files.
The bottom line is:
If you can store your text in a VARCHAR then go for it.
If you can't you have two options: Use a LOB or store the data in a file referenced from the database. Both are technically similar and slower than using VARCHAR.
I did this before. Its a mess, you need to keep the filesystem and the database synchronized all the time, so that makes the programming more complicated, as you would guess.
My advice is either go for an all filesystem solution, or all database solution, depending on the data. Notably, if you require lots of searches, conditional data retrieval, then go for database, otherwise fs.
Note that database may not be optimized for storage of large binary files. Still, remember, if you use both, youre gonna have to keep them synchronized, and it doesnt make for an elegant nor enjoyble (to program) solution.
Good luck!
At least, if your problems come from the "performance side", you could use a "no SQL" storage solution like Redis (via Ohm, for example), or CouchDB...

What is the best way to upload user portrait

Stored in the database or file system ?
And I need several different sizes. like 128*128, 96*96, 64*64 and so on.
What is the best way to upload user portrait?
Not knowing all of your constraints, I'd upload the 128x128 picture and then create all the other portraits on the fly.
I don't think you need to worry about storing the images in the DB, specially if you're running SQLServer 2008 (and you use the new FILESTREAM type).
Definitively it depends of the amount of images you need to store.
If you store them in the file system you just need to keep the URL or location of the image the database. To change the size you can do it in real time using components to achieve the change depending of the language you're using. For .NET ASPjpeg is a very good one, but you can manipulate the image with the System.Drawing.Imaging class. No need to manipulate database filestream or BLOB fields.
In the other hand, storing images in the database can make it too big in order to backup or download depending of the amount of records, even if you are working with SQL server, but you have everything in the same place. Maintenance is faster.
The problem with storing images in the file system is the cleaning procedure, if you delete a record, you need to delete the image in the file system too in order to prevent of keeping garbage.

Resources