How much storage and bandwidth do I need for hosting an online store? - storage

How much bandwidth and storage do I need to store a small online shop, with let's say 200-250 products and how should I know how many visitors my site has monthly? Can you give me some real example, in order to make myself an idea? If you already have an online store or already existing online stores (Stradivarius, Zara, something smaller?)
Thanks in advance:)

Simply put:
Domain name - pays yearly ~ $10-40 (reasonable).
Web hosting - pays monthly ~ $3-10.
I wouldn't worry about disk space and traffic. For example some hosters give 50GB space + Unlimited traffic for ~$6/mo.
Shop engine - mostly free, 50 - 250 MBytes space. Generally your products require MySQL database space, which is commonly sufficient for your store size in hosting plans priced in part 2.
WooCommerce, PrestaShop and OpenCart will require less space,
Monsters like magento will require more space.
Statistics - To monitor bandwidth you can use Google Analytics.
SEO - if your web store is not popular, there is no point in having it.
If you are a startup you can first leverage free online shopping platforms like ebay, and later trouble yourself with self hosted web shop.
P.S. Estimations on traffic are 1-2MB / per page. If we assume 25% cache hits for images (user should see new content every other link), and interest rate like 7 products per user.
We can very roughly estimate, that one user produce ~11MB of traffic. I.e. 1.1GB per 100 users.
P.P.S. If you optimize content it will be significantly less.

Related

Serving geoIP targeted ads

I am a publisher that is contracted to serve an ad to only US users. We are serving this ad using geoIP database from MaxMind to identify IP addresses.
Based on DoubleClick metrics, we are showing that of 3200 impressions served, 7% are still being served outside the US (Germany, France, India, Italy, UK with the most impressions).
Is there a better technology to geo-target ads to the US?
There are several commercial geolocation technology provider. For free one, you can consider the http://lite.ip2location.com
MaxMind precision service costs $0.0004 per lookup ... $20 gives you 50000 ip lookups. That is VERY cheap and much more accurate than goeiplite ..
GeoIP2 coun try costs $0.0001 so $20 gets you 200,000 lookups.
The only thing you need to block bots and spiders - or they will eat through your queries :D
Here: https://www.maxmind.com/en/geoip2-precision-services
I tried GeoIP2lite and it was horrible in tems of accurancy, but I needed CITY / ZIPCODE level - so I'm paying $10 / month for a much more accurate precision service
If you are serving ads, $20 is nothing for you

Erlang fault-tolerant application: PA or CA of CAP?

I have already asked a question regarding a simple fault-tolerant soft real-time web application for a pizza delivery shop.
I have gotten really nice comments and answers there, but I disagree in that it is a true web service. Rather than a web service, it is more of a real-time system to accept orders from customers, control the dispatching of these orders and control the vehicles that deliver those orders in real time.
Moreover, unlike a 'true' web service this system is not intended to have many users - it is just a few dispatchers (telephone operators) and a few delivery drivers that will use it (as for now I have no requirement to provide direct access to the service to the actual customers; only the dispatchers and delivery drivers will have the direct access).
Hence this question is a bit more general.
I have found that in order to make a right choice for a NoSQL data storage option for this application first thing that I have to do is to make a choice between CA, PA and CP according to the CAP theorem.
Now, the Building Web Applications with Erlang book says that "while it [Mnesia] is not a SQL database, it is a CA database like a SQL database. It will not handle network partition". The same book says that the CouchDB database is a PA database.
Having that in mind, I think that the very first thing that I need to do with my application is to decide what the 'fault-tolerance' term means regarding to CAP.
The simple requirement that I have is to have the application available 24/7(R1). The other one is that there is no need to scale, the application will have a very modest amount of users (it is probably not possible to have thousands of dispatchers) (R2).
Now, does R1 require the application to provide Consistency, Availability and Partition Tolerance and with what priorities?
What type of data storage option will better handle the following issues:
Providing 24/7 availability for a dispatcher (a person who accepts phone calls from customers and who uses a CRM) to look up customer records and put orders into the system;
Looking up current ongoing served orders and their status (placed, baking, dispatched, delivering, delivered) in real time;
Keep track of all working vehicles' locations and their payloads in real time;
Recover any part of the system after system crash or network crash to continue providing 1,2 and 3;
To sum it up: What kind of Data Storage (CA, PA or CP) will suite the system described above better? What kind of Data Storage will better satisfy the R1 requirement?
For your 24/ requirement you are searching a database with (High) Availability because you want your requests to succeed everytime (even if they are only error results).
A netsplit would bringt your whole system down, when you have no partition tolerance
Consistency is nice to have, but you can only have 2 of 3.
Your best bet will be a PA solution. I highly recomment a solution which has been inspired by Amazon Dynamo. The best known dynamo implementations are riak and couchdb. Riak even allows you to change PA to some other form by tuning the read and write replicas.
First, don't confuse CAP "Availability" with "High Availability". They have nothing to do with each other. The A in CAP simply means "All DB nodes can answer queries". To get High Availability, you must be in multiple data centers, you must have robust documented procedures for maintenance, expansion, etc. None of that depends on your CAP choice.
Second, be realistic about your requirements. A stock-trading application might have a requirement for 100% uptime, because every second of downtime could loose millions of dollars. On the other hand, I'm guessing your pizza joint might loose tens of dollars for every minute it's down. So it doesn't make sense to spend millions trying to keep it up. Try to compute your actual costs.
Third, always evaluate your choice vs mainstream. You could just go CA (MySQL) and quickly fail-over to the slaves when problems happen. Be realistic about the costs (and risks) of building on new technology. If you really expect your system to run for 5 years without downtime, ask for proof that someone else has run that database for 5 years without downtime.
If you go "AP" and have remote people (drivers, etc.) then you'll need to write an app that stores their data on their phone and sends it in the background (with retries). Of course, you could do this regardless of weather your database was CA or AP.
If you want high uptimes, you can either:
Increase MTBF (Mean Time Between Failures) - Buy redundant power supplies, buy dual ethernet cards, etc..
Decrease MTTR (Mean Time To Recovery) - Just make sure when failure happens you can recover quickly. (Fail over to slave)
I've seen people spend tens of thousands of dollars on MTBF, only to be down for 8 hours while they restore their backup. It makes more sense to ensure MTTR is low before attacking MTBF.

Is it safe to assume uniform geo-ip resolution for same first-16-bit IP addresses?

I have a geo-sensitive webapp for which I send a request's IP to a remote, commercial ip-to-location service, and get back the country, city, ISP, etc. for the IP.
I currently cache the IP lookups in my database in order to make subsequent lookups faster and free (the commercial service charges per lookup).
I wonder if I can further optimize my caching by assuming that the first 16 bits (i.e. the aaa.bbb in a aaa.bbb.ccc.ddd addresss) always have a uniform location. That way I can have at most 2^15 records to cache.
I don't mind so much about uniformity of ISP but that info would be helpful as well.
I'd recommend going down to at least /24 resolution. Oftentimes a /16 will tell you the ISP but not the city, or vice versa.
If you want a good idea of what the maps really look like, you can spend 49 USD on a developer license to Geobytes's GeoNetMap database. A developer license allows you to download the entire map from IP blocks to locations as a bunch of CSV files, but doesn't cover deploying it onto a production server. Geobytes has the added advantage of being entirely local, so lookups are liquid fast.
MaxMind also has a free downloadable map offering, although it is somewhat cut down from the full map, producing approximately double the error rate.
No, it's not safe. For example, if you do a GeoIP lookup on 216.34.181.45 (Slashdot) you get Mountain View, California. If you do a lookup on 216.34.1.1 you get Chesterfield, Missouri.
With respect to your caching, keep in mind that IPs can move around spatially. If an ISP goes bankrupt and its block gets bought by someone else, that block of IPs will move location.

Number of users to a webpage

I have a webpage that reads data from an Access file (Microsoft Access file) on my website. How many users can visit that page at the same time?
Would the page crash at some time if too many users tried to visit that page at the same time? Is it better to use a PHP file that reads data from a text file or its just the same?
There are many variables that influence how many people can simultaneousness use your website (loosely known as scalability), including your database, hardware, network, caching and more. And yes, at some point your performance will degrade if more and more users access the page.
It would be really hard to say from the information you provided how scalable your website is. PHP could be faster but not necessarily. Always be skeptical about technologies that promise superior performance.
For the moment your best option is to try and estimate how many concurrent users you are expecting and then use a load testing tool like JMeter, Apache Bench or others to assess if you're website will stand up to the load.
It turns out that my website was hosted on Domain.com. Domain.com say that I have unlimited bandwidth frequency. But in reality I don't.
My website was crashing, because it was hosted with thousands of websites on the same server. So the bandwidth is limited even though it says unlimited. My only solution was to host my website on a VPS. Basically hosting my website on a server by itself.

Rails gallery app hosting

I'm building a rails gallery app for a company that doesn't want to host it themselves. They estimate that it could potentially get 1000 or so unique visits a day. What I'm pondering is... should the images be on a static file server such as S3 or rackspace cloudfile, or should they just be left on the same server as the app? There is plenty of room on the app server for them. But will the cacheing play a huge negative?
What are your thoughts?
Also, I haven't decided on a host... though I was leaning toward webbynode... but should i be looking at something else instead?
(They want the hosting to be less than $35/month.)
Thanks.
The main variable you need to consider is latency. Since your traffic is relatively low, you can self-host for $0 extra, or host on S3 and pay for bandwidth. The benefit of S3 is better latency for users across disparate locations.
If it were me, I'd self-host to keep the complexity of the app low. Then only move to S3 if you really need the optimization of the CDN.

Resources