In August 2010 I was contracted to performance tune a LAMP server to handle approximately 70 full page loads per second which equated to 4,250 concurrent virtual users. We ended up doubling this expectation to 140 full page loads per second without striking issue. If this speed was maintained for 24 hours it would equate to over 12 million hits per day. This article will let you know how we achieved it.
The load tests were conducted using the HP performance center; a technology that HP obtained as part of its acquisition of Mercury for approximately USD$4.5 billion in 2006.
To find out more about the load testing software visit http://en.wikipedia.org/wiki/HP_LoadRunner
Goal:
Handle 4,250 concurrent users generating approximately 70 full page loads per second.
1 full page load consisted of:
– 1 dynamically generated PHP file using MySQL
– 4 JavaScript files
– 7 CSS files
– 8 image files
Original starting environment:
– ServerModel: Dell R300
– RAM: 2GB (2 x 1GB chips)
– Operating System: CentOS release 5.5 (Final)
– Apache: v2.2.3 (running in prefork mode)
– MySQL: v5.0.77
– PHP: v5.1.6 (as an apache module)
– eAccelerator: v0.9.5.3
– 120Mbits of bandwidth
Round 1: Initial Test
Round 1: Configuration
At the start of the process we were pretty much using the default configurations for the entire lamp stack. Linux was running iptables and ip6tables in its default configuration. eAccelerator was operating with 32MB of memory with optimization and caching enabled.
Apache (/etc/httpd/conf/httpd.conf):
For more info on variables for Apache 2.0.x go to:http://httpd.apache.org/docs/2.0/mod/mpm_common.html
<IfModule prefork.c>
|
MySQL (/etc/my.cnf):
For more info on variables for MySQL 5.0.x go to:http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html
[mysqld]
|
Round 1: Results
With these settings we got up to 30 page loads per second which was 42% of our target. Interestingly, we were only operating at about 8% CPU and about 50% of our memory capacity when we hit this limit.
Round 1: Review
Looking at the apache error logs we were getting a large number of MySQL errors:
mysql_connect() [<a href='function.mysql-connect'>function.mysql-connect</a>]: Too many connections in xxx.php on line 15 |
So the MySQL configuration seemed to be our bottleneck:
Round 2
Round 2: Configuration
We did our first major review of the Apache and MySQL performance settings and adjusted them accordingly. We doubled the Apache settings and used the ‘huge’ configuration as supplied with mysql (/usr/share/doc/mysql-server-5.0.77/my-huge.cnf).
Apache (/etc/httpd/conf/httpd.conf):
For more info on variables for Apache 2.0.x go to:http://httpd.apache.org/docs/2.0/mod/mpm_common.html
<IfModule prefork.c>
|
MySQL (/etc/my.cnf):
For more info on variables for MySQL 5.0.x go to:http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html
[mysqld]
|
As an extra precaution we locked the network card in the server to use 1Gbit:
#ethtool -s eth0 speed 1000 duplex full |
Edit the configuration for the network card:
#vim /etc/sysconfig/network-scripts/ifcfg-eth0 |
Add the following line:
ETHTOOL_OPTS='autoneg on speed 1000 duplex full' |
Restart the network:
#service network restart |
Round 2: Results
With these settings we got up to 58 full page loads per second which was 59% of our target. Interestingly, we were still only operating at about 10% CPU capacity when we hit this limit but we were using approximately 70-80% of our memory.
Our MySQL errors had disappeared and there were no more errors in the Apache logs.
Round 2: Review
We were concerned that the system was starting to use swap memory which was slowing the server to a halt.
Round 3
Round 3: Configuration
We added an additional 2GB of RAM to the server so it now contained 4 x 1GB chips.
Round 3: Results
With the new RAM we still only got up to 58 full page loads per second which was 59% of our target. We were still only operating at about 10% CPU capacity but now we were only using about 40% of our memory.
Round 3: Review
Still no errors in the Apache logs and the load test farm was not receiving Apache errors. In fact it was reporting that it could not even connect to the server. This led us to believe that it was either a lack of bandwidth or a NIC/network/firewall configuration issue. After checking with our datacenter, we found that there were no inhibiting factors that would cause the problem described.
We increased the Apache & MySQL Limits and ran a different style of test.
Round 4
Round 4: Configuration
In this test we only loaded the dynamic components of the page as generated by PHP and MySQL and served by Apache. This meant that we told the load test farm not to download static content such as images, CSS or JavaScript files.
Also we increased the MySQL and Apache limits as follows:
Apache (/etc/httpd/conf/httpd.conf):
For more info on variables for Apache 2.0.x go to:http://httpd.apache.org/docs/2.0/mod/mpm_common.html
<IfModule prefork.c>
|
MySQL (/etc/my.cnf):
For more info on variables for MySQL 5.0.x go to:http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html
[mysqld]
|
Round 4: Results
The results of this test were very interesting. We got up to 263 page loads without any issue. This consumed a lot more bandwidth than test 3 so we knew that bandwidth was not the issue. However the number of connections that both tests started to fail at were very similar.
Round 4: Review
So we knew we had a connection limit issue.
We also knew that the eAccelerator optcode cache was not dying at these high volumes, nor was MySQL, PHP or Apache.
We reviewing the kernel messages and found thousands of the following messages that were logged at the time of testing:
#cat /var/log/messages* | grep 'Aug 15'
|
Further investigation revealed that the iptables/ip6tables was activated and limiting the number of connections to the box because its table was full. Ordinarily when I set up a linux server I turn iptables off because I place hardware firewalls in front of the servers. However I didn’t have the opportunity to setup this box initially, so they were still activated. I however didn’t need them, so I deactivated them.
If you still need to keep iptables running you can simply adjust the following settings:
Check the current connections limit (only works if iptables is running):
#sysctl net.ipv4.netfilter.ip_conntrack_max
|
Change the connections limit:
#vim /etc/sysctl.conf |
Add the following lines:
# conntrack limits
|
Reload the config file:
#sysctl -p |
Check the new connections limit:
#sysctl net.ipv4.netfilter.ip_conntrack_max
|
Check the current buckets limit (only works if iptables is running):
#cat /proc/sys/net/ipv4/netfilter/ip_conntrack_buckets
|
To change the buckets limit:
#vim /etc/modprobe.conf |
Add the following lines:
options ip_conntrack hashsize=32768 |
Reboot the server:
#shutdown -r now |
Check the new buckets limit:
#cat /proc/sys/net/ipv4/netfilter/ip_conntrack_buckets
|
–
Alternatively if you don’t need iptables like me, you can just disable them:
#service iptables stop
|
Round 5
Round 5: Configuration
This test used exactly the same configuration with iptables disabled.
Round 5: Results
Success!!! We got to 4,250 concurrent users which is about 70 pages per second (loading all additional image, CSS and JavaScript files also) with zero errors and a 0.7 second average response time. This used about 120Mbits worth of bandwidth pipe. The datacenter ended up running out of pipe before the server had any issues.
At this rate we were running at about:
– 15% CPU utilisation
– 30% Memory usage (with 4GB RAM installed)
– 400 apache threads
– 100% Bandwidth
Round 5: Review
Key findings:
– Increase your Apache and MySQL limits
– Turn off iptables
– Ensure that you have enough RAM
– Ensure that you are checking logs from MySQL, Apache, and the kernel to pick up any errors and give you clues as to how to best solve them
Round 6
Round 6: Configuration
This test used exactly the same configuration as round 5 with 250Mbit pipe instead of a 120Mbit pipe.
Round 6: Results
Success!!! We got to 140 full page loads per second (including additional images, CSS and JavaScript files also) with zero errors and still a stable 0.7 second average response time. This used the full 250Mbits worth of bandwidth pipe. The datacenter ended up running out of pipe again before the server had any issues.
At this rate we were running at about:
– 30% CPU utilisation
– 40% Memory usage (with 4GB RAM installed)
– 800 apache threads
– 100% Bandwidth
Round 6: Review
Key findings:
– Even with 250Mbits of pipe, bandwidth is still the bottleneck in this configuration.
Round 7
Round 7: Configuration
Even though our server was performing fine, we were given another server to experiment on with much higher specs.
It was a Dell R710 with 48GB of RAM and 8 2.53MHz Xeon processors running in hyper-threading mode (essentially making 16 processors).
We also had this box connected to a dedicated 4Gbit optical internet feed to give it as much bandwidth as it needed.
Everything on the box was configured the same except for Apache and MySQL (which we took the last settings and multipled them by 4) and sysctl.
Apache (/etc/httpd/conf/httpd.conf):
For more info on variables for Apache 2.0.x go to:http://httpd.apache.org/docs/2.0/mod/mpm_common.html
<IfModule prefork.c>
|
MySQL (/etc/my.cnf):
For more info on variables for MySQL 5.0.x go to:http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html
[mysqld]
|
We also added the following lines to sysctl:
ip_conntrack_max = 196608
net.ipv4.ip_local_port_range = 1025 65535
net.ipv4.tcp_max_tw_buckets = 1000000
net.core.somaxconn = 10000
net.ipv4.tcp_max_syn_backlog = 2000
net.ipv4.tcp_fin_timeout = 30
Round 7: Results
We got to 200 full page loads per second (including additional images, CSS and JavaScript files also) with zero errors and still a stable 0.8 second average response time. This test used 330Mbits or about 8% worth of the bandwidth available. We stopped the test simply because we didn’t need to go any higher, but potentially could have gone much higher.
At this rate we were running at about:
– 16% CPU utilisation
– 6% Memory usage (with 48GB RAM installed)
– 1227 apache threads
– 8% Bandwidth
Round 7: Review
Key findings:
– Bandwidth seem to be a much bigger bottleneck than server capability.