I am pretty new to web dev and thought I needed a 20gb shared db in order to test out apps that have larger than 5mb stored.
My friend let me know this was not true because I am using single app. He told me shared dbs were used for sharing data between multiple applications.
If so, what is Heroku's default, unshared db size? I had difficulty in finding this information on Heroku's website and google searches.
Could anyone chime in?
A shared database in this case means the server itself is shared -- so the server's CPU will be used to serve other databases in addition to your own.
A dedicated database server's CPU's are yours and yours alone.
If you need to exceed the 5MB threshold, you need to add the 20GB add on. More information: http://www.heroku.com/pricing
Related
I am looking for a mechanism to accomplish a two-way storage mirroring.
I have two storages, both used for reads & writes at the same time.
any file wrote to one of these storages should be available for reading in the second one ASAP (the period of time should exceed no more than a few seconds).
in case one storage is down, the second one is already a full copy, and can serve any file requested.
new files should be synced to the breaking storage once it's up again.
for more case understanding here is my use case:
I am deploying an asp.net application into two sites (Site-A | Site-B), with a load balancer in between.
each site will have its own NAS storage (Storage-A | Storage-B).
Now when a user uploads a file to the application it will be saved to one storage which is linked to the site that handled the request, let's assume it was Storage-A.
Then, another user needs to download the file, but now his request handled by Site-B
means the file will be looked for inside Storage-B, and it should be available through the two-way mirroring.
Further information:
there is a 5-kilometer distance between the sites, and it's all private network and has no internet access.
network speed is 1Gb but can be increased if needed.
OS used is Windows server 2019.
I've searched a lot but all solution founds were including cloud services or clustering with one way mirroring.
happy to hear any suggestions, and pardon my deliver as it's the first question for me here.
This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
storing uploaded photos and documents - filesystem vs database blob
I am starting to develop a web app, the primary purpose of which is to display photos. The users will be able to upload photos as well.
The first question that came up was where to store the photos: on the file system or the database.
I will be using a Windows box to host the site. The database is MySQL and the backend code is in C# utilizing ASP.NET MVC.
Filesystem, of course, unless you're aiming for a story on thedailywtf. The easiest way is to have the photos organized by a property you can derive from the file itself, such as its SHA-1 hash. Then just store the hash in the database, attached to the photo's primary key and other attributes (who uploaded it, upload date, etc).
It's also a good idea to divvy up the photos on the filesystem, so you don't end up with millions of files in a single directory. So you'll have something like this:
storage/00/e4/f56c0de1c61fdb926e79e8a0a65bd12930c9.jpg
storage/25/9a/ec1c55bfb660548a6770238668c4b117d92f.jpg
storage/5d/d5/4b01d98f17a9ad9dd1526b49ba39b5aa37a1.jpg
storage/63/49/6f740b6c284ce6685dc17d473a7360ace249.jpg
storage/b1/75/066d178188dde110149a8422ab651b0ee615.jpg
storage/b1/20/a2b7d02b7b0c43530677ab06235382a37e20.jpg
storage/da/39/a3ee5e6b4b0d3255bfef95601890afd80709.jpg
This is also easy to port if you ever move to sharded storage.
If you are using SQL Server 2008 there's a Filestream datatype that handles most of the problems mentioned about the DB getting larger. It handles all the annoying details of synchronizing between the filesystem and the table.
Look here for a blog post about the topic: Store any data in SQL Server 2008 (Katmai)
If you're building a website around photos then forget the database. If it will become popular your database is going to be hit hard and the majority of its time will be spent delivering photos. Also databases don't scale very well. There are so much more advantages in keeping them on the file system. And you can scale very well, having static content servers, using services for content delivery.
Also, Amazon S3 or other cloud providers do have their advantages. For instance S3 + Amazon CloudFront will provide good performance. CloudFront caches your files on servers around the world so they'll be very easily/fast accessible from anywhere. BUT if we're talking pictures and the site becomes popular your bills might be quite high.
For S3 Amazon charges per storage and per transfer in/out of the cloud.
For CloudFront per transfer.
Generally, people store binary data such as images on the filesystem, not the database. They reference the filesystem path from the database. Retrieving BLOBs (binary large objects) from the database is slower than allowing the web server to serve static files from the file system.
I would use something like Amazon S3.
But, if the choice is between filesystem and database I would choose filesystem because it's faster to server images from a filesystem than a database.
The only reason I would put photos as BLOBs in a database would be if I had a cluster of servers, and I was using database replication to automatically copy the photos to every machine in the cluster.
Life is much simpler if you just store the photos as files, and store the filenames of the photos in the database. If you need to create unique filenames for the photos, you can use a primary key integer from the database as part of the filename. But you could also just use a hash of the photo itself, as suggested by John Milliken. That's simple, and simple is better.
Some people point out that it's easier to manage if everything's in the database: including making backups, and preserving referential integrity.
If you store it in db, the db will grow quickly and will be much, much larger. It is just a touch more complicated to get an image out of db for display then to it is to get it from a file system. On the other hand, you better make sure that the file names and paths do not get out of sync with what is stored in db. In the past i have chosen to store on disk instead of db. It made it easier for me do move the database to different boxes. Worked out well.
We had a similar decision to make for a project I am on. The compelling thing about jamming stuff (images and other BLOBy things) into the DB is that it is is less likely that someone might delete/alter something (either intentionally or unintentionally). But, that isn't the choice we made. Instead we have the path info stored in the DB and use that to reference the data via UNC path. Data paths are stored in two parts - a part that references the location of the data relative to which machine it resides on and a part that points to which machine that group of data is on. When we need to move data around we can update the appropriate path info.
It is certainly quick to get the data without pulling out of the DB. Ultimately that was a major deciding factor.
It makes life so easy when you have a blob database. You should forget about the nightmare that is file system management.
EDIT
ID
VARBINARY
From experience this is an efficient way to manage binary files. You have one database that has only binary files. How can this be any harder to backup?
We are a small bootstrapped ISP in a third world country where bandwidths are usually expensive and slow. We recently got a customer who need storage solution, of 10s of TB of mostly video files (its a tv station). The thing is I know my way around linux but I have never done anything like this before. We have a backblaze 3 storage pod casing which we are thinking of using as a storage server. The Server will be connected to customer directly so its not gonna go through the internet, because 100+mbps speed is unheard off in this part of the world.
I was thinking of using 4TB HDD all formatted with ext4 and using LVM to make them one large volume (50-70tb at least). So customer logs in to an FTP like client and dumps whatever files he/she wants. But the customer only sees a single volume, and we can add space as his requirements increases. Of course this is just on papers from preliminary research as i don't have prior experience with this kind of system. Also I have to take cost in to consideration so can't go for any proprietary solution.
My questions are:
Is this the best way to handle this probably, are there equally good or better solutions out there?
For large storage solutions (at least large for me) what are my cost effective options when it comes to dealing with data corruption and HD failure.
Would love to hear any other solutions and tips you guys might have. thanks!
ZFS might be a good option but there is no native bug-free solution for Linux, yet. I would recommend other operating systems in that case.
Today I would recommend Linux MD raid5 on enterprise disks or raid6 on consumer/desktop disks. I would not assign more than 6 disks to an array. LVM can then be used to tie the arrays to a logical volume suitable for ext4.
The ext4-filesystem is well tested and stable while XFS might be better for large file storage. The downside to XFS is that it is not possible to shrink an XFS filesystem. I would prefer ext4 because of it's more flexible nature.
Please also take into consideration that backups are still required even if you are storing your data on raid-arrays. The data can silently corrupt or be accidentally deleted.
In the end, everything depends on what the customer wants. Telling the customer the price of the service usually has an effect on the requirements.
I would like to add to the answer that mingalsuo gave. As he stated, it really comes down to the customer requirements. You don't say what, specifically, the customer will do with this data. Is it for archive only? Will they be actively streaming the data? What is your budget for this project? These types of answers will better determine the proposed solution. Here are some options based on a great many assumptions. Maybe one of them will be a good fit for your project.
CAPACITY:
In this case, you are not that concerned about performance but more interested in capacity. In this case, the number of spindles don't really matter much. As Mingalsuo stated, put together a set of RAID-6 SATA arrays and use LVM to produce a large volume.
SMALL BUSINESS PERFORMANCE:
In this case, you need performance. The customer is going to store files but also requires the ability for a small number of simultaneous data streams. Here you want as many spindles as possible. For streaming, it does little good to focus on the size of the controller cache. Just focus on the number of spindles. You want as many as possible. Keep in mind that the time to rebuild a failed drive increases with the size of the drive. And, during a rebuild, your performance will suffer. For these reasons I'd suggest smaller drives. Maybe 1TB drives at most. This will provide you with faster rebuild times and more spindles for streaming.
ENTERPRISE PERFORMANCE:
Here you need high performance - similar to that that an enterprise demands. You require many simultaneous data streams and performance is required. In this case, I would stay away from SATA drives and use 900G or 1.2TB SAS drives instead. I would also suggest that you consider abstracting the storage layer from the server layer. Create a Linux server and use iSCSI (or fibre) to connect to the storage device. This will allow you to load balance if possible, or at the very least make recovery from disaster easier.
NON TRADITIONAL SOLUTIONS:
You stated that the environment has few high-speed connections to the internet. Again, depending on the requirements, you still might consider cloud storage. Hear me out :) Let's assume that the files will be uploaded today, used for the next week or month, and then rarely read. In this case, these files are sitting on (potentially) expensive disks for no reason except archive. Wouldn't it be better to keep those active files on expensive (local) disk until they "retire" and then move them to less expensive disk? There are solutions that do just that. One, for example, is called StorSimple. This is an appliance that contains SAS (and even flash) drives and uses cloud storage to automatically migrate "retired" data from the local storage to cloud storage. Because this data is retired it wouldn't matter if it took longer than normal to move it to the cloud. And, this appliance automatically pulls it back from the cloud to local storage when it is accessed. This solution might be too expensive for your project but there are similar ones that you might find will work for you. The added benefit of this is that your data is automatically backed up by the cloud provider and you have an unlimited supply of storage at your disposal.
I want to provide in Azure MVC web site a Download link for files that are stored in Blob storage. I do not want the users see my blob storage Url and I want to provide my own dowload link to provide the name of the file by this as well.
I think this can be done with passing(forwarding) the stream. Found many similar questions here in SO, eg here: Download/Stream file from URL - asp.net.
The problem what I see is here: Imagine 1000 users start downloading one file simultaneously. This will totaly kill my server as there is limited number of threads in the pool right?
I should say, that the files I want to forward are about 100MB big so 1 request can take about 10 minutes.
I am right or can I do it with no risks? Would async method in MVC5 help? Thx!
Update: My azure example is here only to give some background. I am actualy interrested in the theoretical problem of the Long Streaming Methods in MVC.
in your situation Lukas, I'd actually recommend you look at using the local, temporary storage area for the blob and serve it up from there. This will result in a delay in delivering the file the first time, but all subsequent requests will be faster (in my experience) and result in fewer azure storage transaction calls. it also then eliminates the risk of running into throttling on the azure storage account or blob. Your throughput limits would be based on the outbound bandwidth of the vm instance and number of connections it can support. I have a sample for this type of approach at: http://brentdacodemonkey.wordpress.com/2012/08/02/local-file-cache-in-windows-azure/
do you know any if any well known clouds, e.g. Amazon, Azure, Google App Engine that has feature of shared memory? E.g. you can access data fast (from memory) and those are automatically synchronized with other nodes (machines...whatever).
Not quite shared memory, but Windows Azure has a Cache you can use. It's configurable from 128MB to 4GB, and exists outside of a specific deployment, letting you share cache content across instances, deployments, even on-premises applications.
More info on Cache is here.