Where to store created PDF files on ASP .NET MVC site [duplicate] - asp.net-mvc

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
storing uploaded photos and documents - filesystem vs database blob
I am starting to develop a web app, the primary purpose of which is to display photos. The users will be able to upload photos as well.
The first question that came up was where to store the photos: on the file system or the database.
I will be using a Windows box to host the site. The database is MySQL and the backend code is in C# utilizing ASP.NET MVC.

Filesystem, of course, unless you're aiming for a story on thedailywtf. The easiest way is to have the photos organized by a property you can derive from the file itself, such as its SHA-1 hash. Then just store the hash in the database, attached to the photo's primary key and other attributes (who uploaded it, upload date, etc).
It's also a good idea to divvy up the photos on the filesystem, so you don't end up with millions of files in a single directory. So you'll have something like this:
storage/00/e4/f56c0de1c61fdb926e79e8a0a65bd12930c9.jpg
storage/25/9a/ec1c55bfb660548a6770238668c4b117d92f.jpg
storage/5d/d5/4b01d98f17a9ad9dd1526b49ba39b5aa37a1.jpg
storage/63/49/6f740b6c284ce6685dc17d473a7360ace249.jpg
storage/b1/75/066d178188dde110149a8422ab651b0ee615.jpg
storage/b1/20/a2b7d02b7b0c43530677ab06235382a37e20.jpg
storage/da/39/a3ee5e6b4b0d3255bfef95601890afd80709.jpg
This is also easy to port if you ever move to sharded storage.

If you are using SQL Server 2008 there's a Filestream datatype that handles most of the problems mentioned about the DB getting larger. It handles all the annoying details of synchronizing between the filesystem and the table.
Look here for a blog post about the topic: Store any data in SQL Server 2008 (Katmai)

If you're building a website around photos then forget the database. If it will become popular your database is going to be hit hard and the majority of its time will be spent delivering photos. Also databases don't scale very well. There are so much more advantages in keeping them on the file system. And you can scale very well, having static content servers, using services for content delivery.
Also, Amazon S3 or other cloud providers do have their advantages. For instance S3 + Amazon CloudFront will provide good performance. CloudFront caches your files on servers around the world so they'll be very easily/fast accessible from anywhere. BUT if we're talking pictures and the site becomes popular your bills might be quite high.
For S3 Amazon charges per storage and per transfer in/out of the cloud.
For CloudFront per transfer.

Generally, people store binary data such as images on the filesystem, not the database. They reference the filesystem path from the database. Retrieving BLOBs (binary large objects) from the database is slower than allowing the web server to serve static files from the file system.

I would use something like Amazon S3.
But, if the choice is between filesystem and database I would choose filesystem because it's faster to server images from a filesystem than a database.

The only reason I would put photos as BLOBs in a database would be if I had a cluster of servers, and I was using database replication to automatically copy the photos to every machine in the cluster.
Life is much simpler if you just store the photos as files, and store the filenames of the photos in the database. If you need to create unique filenames for the photos, you can use a primary key integer from the database as part of the filename. But you could also just use a hash of the photo itself, as suggested by John Milliken. That's simple, and simple is better.

Some people point out that it's easier to manage if everything's in the database: including making backups, and preserving referential integrity.

If you store it in db, the db will grow quickly and will be much, much larger. It is just a touch more complicated to get an image out of db for display then to it is to get it from a file system. On the other hand, you better make sure that the file names and paths do not get out of sync with what is stored in db. In the past i have chosen to store on disk instead of db. It made it easier for me do move the database to different boxes. Worked out well.

We had a similar decision to make for a project I am on. The compelling thing about jamming stuff (images and other BLOBy things) into the DB is that it is is less likely that someone might delete/alter something (either intentionally or unintentionally). But, that isn't the choice we made. Instead we have the path info stored in the DB and use that to reference the data via UNC path. Data paths are stored in two parts - a part that references the location of the data relative to which machine it resides on and a part that points to which machine that group of data is on. When we need to move data around we can update the appropriate path info.
It is certainly quick to get the data without pulling out of the DB. Ultimately that was a major deciding factor.

It makes life so easy when you have a blob database. You should forget about the nightmare that is file system management.
EDIT
ID
VARBINARY
From experience this is an efficient way to manage binary files. You have one database that has only binary files. How can this be any harder to backup?

Related

Best technique for saving and syncing binary data offline in iOS?

I am working on an app that collects user data including photos. It's mandated that this app should work in offline mode - meaning that the user can complete surveys and take photos without an internet connection and that data should sync back to a remote database. How is this generally handled? Do I create a local database with Core Data and write an additional layer to manage saving/reading from a server? Are there any frameworks that help facilitate that syncing?
I have also been looking into backend services such as Firebase that include iOS SDKs that appear to handle a lot of the heavy lifting of offline support, but it does not appear to support offline syncing of image files through the Firebase Storage SDK.
Can anyone recommend the least painful way to handle this?
Couchbase Mobile / Couchbase Lite is probably the best solution I've come across so far.
It allows offline data storage including binary data, and online syncing with a CouchDB compatible server. It works best with their Couchbase Server / Sync Gateway combination, but if you don't need to use filtered replication or 'channels' (e.g. for syncing data specific to a single user with a shared database), you can use Cloudant which saves you having to set up your own server.
Its also available across most platforms.
Generally for images it is best to use NSFileManager and save your images in either the documents directory or the caches directory depending on the types of images you are storing. Core Data or Firebase are databases that are more qualified for data than images although they do support arbitrary data storage.
You can also try SDWebImage which has a lot of features around loading and storing images.

Getting a large amount of data from standalone Neo4j db into an embedded Neo4j db?

We have a significant, though not huge, amount of data loaded into a stand-alone Neo4j instance via the browser and we need to get that data into the embedded instance in our app. I tried dumping the data to cypher (a 600kb file) and uploading it to our app to execute, that gets stack overflow errors.
I'm hoping to find an efficient way of doing this with the Java API so that we can do it again on other developers' machines. This is test data for development but it was entered, with significant effort, and we'd rather not redo all that.
Here's a silly question. Can we just copy the data files from the stand-alone db to the embedded?
Just like #Dave_Bennett mentions in his comment you can copy over the graph.db folder. Embedded and standalone use exactly the same binary format.
If you want to copy over programmatically, I suggest going with the batch inserter API, see http://neo4j.com/docs/stable/batchinsert.html.
There a great tool for copying over datastores at https://github.com/jexp/store-utils which might give some hints as well.

Download sqllite database from django web app and use in iOS?

I'm developing a site-specific installation for an office lobby which will display content on 6 iPads. The installation has several megabytes of data which will be managed by a django webapp. I'm considering different strategies for fetching the content data from the web app. So far, I have simply been dumping the data in to xml format and fetching it via a single http request from the iPad to the content server. I then load all of the content in to memory on the iPad.
I'm beginning to have some concern that I may run in to memory issues as the amount of content grows, and that storing the entire database in-memory won't work. The natural next step is to think about a database on the iPads. I'm using sqllite for the content server. Seems to me that it may be feasible to simply download the entire database file itself and query it directly from the iPad.
Proposed Approach
Download the actual sqllite database file nightly from the django content server to each of six iPads used in an office lobby installation.
Things I like about this approach:
It could be really simple. It removes the whole web services layer from the system.
It protects against network problems nicely. If the network is unavailable, the worst problem is that the iPads display stale data, as apposed to there being no content if the system is network-dependent.
Things I don't like about this approach
I'm not sure how to safely download the file. How to I ensure that the file I'm downloading is in a valid state, and I'm not downloading while someone is updating it?
I've never heard of anybody doing this, or even considered doing it. It seems like it's far from tried and true.
My questions
Can anyone think of reasons why this is a bad idea?
How can I safely download a sqllite file with confidence that it's in a valid state?
Why don't you create a Syncing system - perhaps with JSON.
I've done something like this before - I had a central repository server on site that was running my Django web application. The different iPads would sync regularly with the web app's database making sure their local data matched the server data, if not it would update via json.
On the iPad itself, I was using phonegap's SQLITE syntax which worked perfectly for storing the clientside data. But the key was syncing this database via json to the central repositorie's database - rather than physically moving the SQLite db over to the ipad.

How Reduce Disk read write overhead?

I have one website mainly composed on javascript. I hosted it on IIS.
This website request for the images from the particular folder on hard disk and display them to end user.
The request of image are very frequent and fast.
Is there any way to reduce this overhead of disk read operation ?
I heard about memory mapping, where portion of hard disk can be mapped and it will be used as the primary memory.
Can somebody tell me if I am wrong or right, if I am right what are steps to do this.
If I am wrong , is there any other solution for this ?
While memory mapping is viable idea, I would suggest using memcached. It runs as a distinct process, permits horizontal scaling and is tried and tested and in active deployment at some of the most demanding website. A well implemented memcached server can reduce disk access significantly.
It also has bindings for many languages including those over the internet. I assume you want a solution for Java (Your most tags relate to that language). Read this article for installation and other admin tasks. It also has code samples (for Java) that you can start of with.
In pseudocode terms, what you need to do, when you receive a request for, lets say, Moon.jpeg is:
String key = md5_hash(Moon.jpeg); /* Or some other key generation mechanism */
IF key IN memcached
SUPPLY FROM memcached /* No disk access */
ELSE
READ Moon.jpeg FROM DISK
STORE IN memcached ASSOCIATED WITH key
SUPPLY
END
This is a very crude algorithm, you can read more about cache algorithms in this Wiki article.
The wrong direction. You want to reduce IO to slow disks (relative). You would want to have the files mapped in physical memory. In simple scenarios the OS will handle this automagically with file cache. You may look if Windows provides any tunable parameters or at least see what perf metric you can gather.
If I remember correctly (years ago) IIS handle static files very efficiently due a kernel routing driver linked to IIS, but only if it doesnt pass through further ISAPI filters etc.. You can probably find some info related to this on Channel9 etc..
Long term wise you should look to move static assets to a CDN such as CloudFront etc..
Like any problem though... are you sure you have a problem?

Default DB Size of Heroku App

I am pretty new to web dev and thought I needed a 20gb shared db in order to test out apps that have larger than 5mb stored.
My friend let me know this was not true because I am using single app. He told me shared dbs were used for sharing data between multiple applications.
If so, what is Heroku's default, unshared db size? I had difficulty in finding this information on Heroku's website and google searches.
Could anyone chime in?
A shared database in this case means the server itself is shared -- so the server's CPU will be used to serve other databases in addition to your own.
A dedicated database server's CPU's are yours and yours alone.
If you need to exceed the 5MB threshold, you need to add the 20GB add on. More information: http://www.heroku.com/pricing

Resources