Magento Catalog URL Rewrites - Long time to index - url

We are using magento 1.4.1 for our store, with 30+ categories and 2000+ products, every time i try to reindex the indexes "Catalog URL Rewrites" takes longer time to complete, please suggest us on how we can improve its speed?

Unfortunately catalog_url_rewrites is the slowest index in Magento when you have a large number of SKUs and the time is multiplied if you have a large number of store views. If you still have the default french/german store views - be sure to delete them, this will speed things up by a factor of 3x.
There are no means to speed up the re-index other than beefing up hardware (or optimising server configuration).
Running re-index via command line will relieve the burden of HTTP, but if the php.ini is the same, then its going to take the same amount of time.
You compare by running
php -i | grep php.ini
And comparing it to the output of of a script accessed via HTTP
phpinfo();
Otherwise, server tuning is everything, improving PHP and MySQL performance (which is a bit beyond the scope of this reply).

I don't know the way to make this process faster. What I would suggest you to do is:
Setup a cronjob which will do like this:
php (mageroot)/shell/indexer.php reindexall
php (mageroot)/shell/indexer.php --reindex catalog_url
I am sure about first one, but not sure about second one.
Cron should run every night, for example.

Related

Very slow POST request with Ruby on Rails

We are 2 working on a website with Ruby on Rails that receives GPS coordinates sent by a tracking system we developped. This tracking system send 10 coordinates every 10 seconds.
We have 2 servers to test our website and we noticed that one server is processing the 10 coordinates very quickly (less than 0.5 s) whereas the other server is processing the 10 coordinates in 5 seconds minimum (up to 20 seconds). We are supposed to use the "slow" server to put our website in production mode this is why we would try to solve this issue.
Here is an image showing the time response of the slow server (on the bottom we can see 8593 ms).
Slow Server
The second image shows the time response of the "quick" server.
Fast Server
The version of the website is the same. We upload it via Github.
We can easily reproduce the problem by sending fake coordinates with POSTMan and difference of time between the two servers remain the same. This means the problem does not come from our tracking system in my opinion.
I come here to find out what can be the origins of such difference. I guess it can be a problem from the server itself, or from some settings that are not imported with Github.
We use Sqlite3 for our database.
However I do not even know where to look to find the possible differences...
If you need further information (such as lscpu => I am limited to a number of 2 links...) in order to help me, please do not hesitate. I will reply very quickly as I work on it all day long.
Thank you in advance.
EDIT : here are the returns of the lscpu commands on the server.
Fast Server :
Slow Server :
May be one big difference is the L2 cache...
My guess is that the answer is here but how can I know what is my value of pragma synchronous and how can I change it ?
The size of the .sqlite3 file I use is under 1 Mo for the tests. Both databases should be identical according to my schema.rb file.
The provider of the "slow" server solved the problem, however I do not know the details. Some things were consuming memory and slowing down everything.
By virtual server, it means finally that several servers are running on the same machine, each is attributed a part of the machine.
Thanks a lot for your help.

Scaling Puppet - when is too much for WEBrick?

I've found the following at Docs: Scaling Puppet:
Are you using the default webserver?
WEBrick, the default web server used to enable Puppet’s web services connectivity, is essentially a reference implementation, and becomes unreliable beyond about ten managed nodes. In any sort of production environment serving many nodes, you should switch to a more efficient web server implementation such as Passenger or Mongrel.
Where does the the number 10 come from in "ten managed nodes"?
I have a little over 20 nodes and I might soon have little over 30. Should I change to Passenger or not?
You should change to Passenger when you start having problems with WEBrick (or a little before). When that happens for you will depend on your workload.
The biggest problem with WEBrick is that it's single-threaded and blocking; once it's started working on a request, it cannot handle any other requests until it's done with the first one. Thus, what will make the difference to you is how much of the time Puppet spends processing requests.
Each time a client asks for its catalog, that's a request. Each separate file retrieved via puppet:/// URLs is also a request. If you're using Puppet lightly, each catalog won't take too long to generate, you won't be distributing many files on any given Puppet run, and each client won't be taking more than four to six seconds of server time every hour. If each client takes four seconds of server time per hour, 10 clients have a 5% chance of collisions0--of at least one client having to wait while another's request is processed. For 20 or 30 clients, those chances are 19% and 39%, respectively. As long as each request is short, you might be able to live with some contention, but the odds of collisions increase pretty quickly, so if you've got more than, say, 50 hosts (75% collision chance) you really ought to by using Passenger unless you're doing active performance measuring that shows that you're doing okay.
If, however, you're working your Puppet master harder--taking longer to generate catalogs, serving lots of files, serving large files, or whatever--you need to switch to Passenger sooner. I inherited a set of about thirty hosts with a WEBrick Puppet master where things were doing okay, but when I started deploying new systems, all of the Puppet traffic caused by a fresh deployment (including a couple of gigabyte files1) was preventing other hosts from getting their updates, so that's when I was forced to switch to Passenger.
In short, you'll probably be okay with 30 nodes if you're not doing anything too intense with Puppet, but at that point you need to be monitoring the performance of at least your Puppet master and preferably your clients' update status, too, so you'll know when you start running beyond the capabilities of WEBrick.
0 This is a standard birthday paradox calculation; if n is the number of clients and s is the average number of seconds of server time each client uses per hour, then the chance of having at least one collision during an hour is given by 1-(s/3600)!/((s/3600)^n*((s/3600)-n)!).
1 Puppet isn't really a good avenue for distributing files of this size in any case. I eventually switched to putting them on an NFS share that all of the hosts had access to.
For 20-30 nodes, there shouldn't be any problem. Note that passenger provides some additional features. It may be faster serving the nodes, but I am not sure how much improvement you will get if you have only 30 nodes.
You should change to passenger if you are using more than hundred nodes. I started seeing problems when the number of nodes requesting service from the puppet-master reached about 200. In my case, with the default web-server, about 5% of the nodes (random) couldn't receive the catalog during hourly run.

Which is more efficient - hitting my db or doing an extra web crawl and hitting an array?

I have a web crawler that looks for specific information I want and returns it. This is run daily.
The issue is that my crawler has to do two things.
Get the link it has to crawl.
Crawl said link and push stuff to the db.
The issue with #1 is, there are 700+ links in total. These links don't change VERY frequently - maybe once a month?
So one option is just to do a separate crawl for the 'list of links', once a month, and dump the links into the db.
Then, have the crawler do a db hit for each of those 700 links every day.
Or, I can just have a nested crawl within my crawler - where every single time the crawler is run (daily), it updates this list of 700 URLs and stores it in an array and pulls it from this array to do crawl each link.
Which is more efficient and be less taxing on Heroku - or whichever host?
It depends on how you measure "efficiency" and "taxing", but the local database hit is almost certain to be faster and "better" than an HTTP request + parsing an HTML(?) response for the links.
Further, not that it likely matters, but (assuming your database and adapter support it) you can begin to iterate through the DB request results and process them without waiting for or fetching the entire set into memory.
Network latency and resources are going to be much worse than poking at a DB that is already sitting there, running, and designed to be queried efficiently and quickly.
However: once per day? Is there a good reason to spend any energy optimizing this task?

calculate the number of simultaneous Passenger instances

I need to find out if a server I have is capable of handling a number of traffic. I'm running ruby on rails with passenger and apache.
So let say on average a page takes 2 seconds to render and their will be 200k visitors in a day. The busiest hour will see 300 page views in a minute. From this how can I work out how many simultaneous Passenger instances I'll need to handle the expected load and then from that how much RAM I'll need to handle the required number of Passenger processes.
Hopefully this will tell me what server(s) I'll need and maybe a load balancer(s)?
The only way to know for sure is to simulate the load with a benchmarking tool. Memory usage is highly application specific, and can even depend on the areas of the application you're exercising, so if you can generate reasonable diversity in your test data you'll have a much better idea of how it scales.
For a rough start try the ab tool that comes with Apache. For something more complete, there are a number of simulation systems that will perform a series of events like logging in, viewing pages, and so on, like Selenium.

How to slow down file downloads on local ruby webserver?

I'd like to mock large (>100MB) and slow file downloads locally by a ruby service - rails, sinatra, rack or something else.
After starting server and writing something like: http://localhost:3000/large_file.rar, I'd like to slooowly download a file (for testing purposes).
My question is, how to throttle local webserver to certain maximum speed? Because if file is stored locally, it will by default download very fast.
You should use curl for this, which allows you to specify a maximum transfer speed with the --limit-rate option. The following would download a file at about 10KB per second:
curl --limit-rate 10K http://localhost:3000/large_file.rar
From the documentation:
The given speed is measured in bytes/second, unless a suffix is
appended. Appending ‘k’ or ‘K’ will count the number as kilobytes, ‘m’
or M’ makes it megabytes, while ‘g’ or ‘G’ makes it gigabytes.
Examples: 200K, 3m and 1G.
The given rate is the average speed counted during the entire
transfer. It means that curl might use higher transfer speeds in short
bursts, but over time it uses no more than the given rate.
More examples here (search for "speed limit"): http://www.cs.sunysb.edu/documentation/curl/index.html

Resources