SysAdmin's Journey

Combining IPSEC, Dynamic NAT, and Static NAT Behind a Cisco IOS Router

Trying to combine IPSEC, dynamic NAT, & static NAT on a Cisco router? Check out Cisco’s article on how to do it first. If that doesn’t work and you’re ready to drop kick the router out of the datacenter like I was, put away your black belt for a few minutes, and read about how I worked around a couple of bugs. Let’s define our problem first.

  • You have two subnets connected via IPSEC VPN - for the purposes of this article, and We assume the VPN is already up and functional.
  • You want any host with an IP in the subnet to have access to the Internet as well as access to the hosts on the other side of the IPSEC tunnel.
  • You have one host, in this case, that you want to have access to the subnet, as well as have a static NAT’d IP address when accessing the Internet - in this case, It took me a couple days, and I still think there’s a bug in there somewhere, but I did finally get it to work. The IOS I was using was 12.4(17). Now, here’s my personal take on what I think happens that causes things to break. Again, I worked on this for 2 days straight, so I’m a little blurry on everything, but I do know that this method works. According to Cisco’s artic le, you can get these results by simply using route-maps on your static NAT commands. Almost, but not really. I found two other requirements had to be in place before the NAT’s would work as they were supposed to:

  • Only one route-map can be used in all of your NAT statements. I think this is a bug, as no one specifies this as being a rule, but I even went as far as to create two identical route-maps with different names, and set up two static NAT’s with each NAT rule using one route-map. This would not work until I set both static NAT rules to use the same route-map. The same goes for the dynamic NAT rule, which is why we use an access-list here.

  • Once you use a route-map in your static NAT’s, then the order in which the NAT statements are processed is reversed. Again, I think this is a bug. Basically, normal NAT rule processing dictates that the static NAT rules are evaluated before the dynamic ones. In my situation, this was true until I enabled the route-map option on the static NAT. If I eliminated the route-map option, the static NAT worked, but the host being static NAT’d could not access hosts on the other side of the VPN. Once I enabled the static NAT with the route-map option, I could access the hosts on the other side of the VPN, but was getting the dynamic NAT applied instead of the static NAT. Step One: Configure Dynamic NAT We first need to setup an access list that will:

    • NOT NAT packets from our host that needs static NAT applied.
    • NOT NAT packets that traverse the VPN.
    • NAT packets from our subnet to everywhere else.
ip access-list extended NoNat
 deny   ip host any
 deny   ip
 permit ip any

Then, we use this command to setup dynamic NAT:

ip nat inside source list NoNat interface GigabitEthernet0/0 overload

At this point, you should be able to access the Internet from any host with a 192.168.11.x address but not, as well as be able to access hosts on the subnet. Step Two: Setup Static NAT So, right now, can access hosts across the tunnel, but not access anyplace on the Internet. All other hosts on the subnet can access both. Again, according to the Cisco article above, we shouldn’t have to do this, but I did. Since we have excluded our host from being NAT’d at all, we need to craft a route-map for him to be static NAT’d, but not NAT’d when accessing the remote VPN hosts. This boils down to creating an ACL identical to the one above minus one line:

ip access-list extended NoNatStatic
 deny   ip
 permit ip any

Now, create a route-map that points to this ACL:

route-map nonat-static permit 10
 match ip address NoNatStatic

Finally, setup your static NAT rule:

ip nat inside source static route-map nonat-static

Finally, all your NAT rules should be working perfectly now. In order to add new static NAT’s, you simply need to add the local IP address to the top of the NoNat ACL, and then create a new static NAT rule the points to the same route-map. Do not use a different route-map, or you will run into the bug above. Let me re-iterate that Cisco’s article above is cleaner, and that I tried other cleaner ways of implementing this setup. If you have the time, try to get your setup working using the article above. However, if you can’t get it to work, try my way and see if you get the results you’re looking for.

Apache, Mod_ssl, and the Sun Fire T1000 - Part III

In part one of the series, I went over how to compile Apache 2.0 to take advantage of the T1000 hardware. In part two, I talked about patching Apache 2.0 to support the SSLHonorCipherOrder directive. I didn’t realize there might be a part three, but here we are! After finishing the second piece, I sent an email of thanks to Jan Pechanec at Sun for his blog entries mentioned in part one. In the email, I mentioned that I was running Apache 2.0 in worker mode. He cautioned me that worker mode with Sun’s pkcs11 engine still had outstanding issues with worker mode, and pointed me to this bug report on OpenSolaris. I hadn’t been able to find a reliable load testing tool for HTTPS, so I was just using the check_http plugin from Nagios. The performance results I were getting were correct, but I wasn’t stressing the server at all. Jan pointed me to http_load, a simple multithreaded http load tester that supports HTTPS if you compile it against OpenSSL. He was also kind enough to give me his shell script that he was using to load up HTTPS connections. I later found the script posted on a bug report, so I’m assuming it’s okay to repost it here:

#!/bin/bash [ $#
-ne 4 -a $# -ne 5 ] && echo "$0  []" && exit if [ -n "$5" ]; then
cipher="-cipher $5" fi # for SSL for i in `yes | head -$3`; do printf "."
./http_load $cipher -parallel $2 -fetch $4 $1 & done echo "" # wait for all so
that we can time the script wait

You then run the shell script (named in this case) like so:

time ./ sslurl.txt 10 20 500 RC4-MD5 >/dev/null

This will fork 10 processes, each using 20 threads, to fetch the url contained within sslurl.txt as fast as possible 500 times. By wrapping the command with the ‘time’ command, you get the amount of time it takes to fetch the HTTPS url 5,000 times. Take 5,000 divided by the number of real seconds returned by time, and you have a requests per second benchmark. To my shock, running this against my Apache 2.0 worker server never even completed. OpenSSL started to complain about ‘bad mac’ errors, and eventually started to time out. Well, back to the drawing board! I started by recompiling Apache to use the prefork MPM. See part one for the configure options I used. Since I had benchmarks from a T1000 using worker MPM, a V210 using worker MPM, a T1000 using prefork MPM, and the Sun CoolStack package (Apache 2.2 w/prefork MPM) installed on a T1000, I decided to keep track of performance and publish some very pretty graphs. First up, a comparison of reported requests per second from ApacheBench (ApacheBench was used with keepalives requesting a very small static file): ApacheBench Response Time
Chart You can see that the T1000 is much faster than the v210 in all configurations. Interesting to note that the prefork 2.0 on the T1000 actually was faster than the worker 2.0 on the same box until extreme loads were placed on the server. Okay, what about response times? The below graph represents the 95th percentile of the number of milliseconds all requests were completed in: ApacheBench Response Time
Chart Again, it’s safe to assume the T1000 is outperforming the v210. Here, prefork consistently outperformed worker, and Apache 2.2 is much better at keeping response times to a minimum under load. Finally, let’s look at HTTPS requests per second. The CoolStack Apache 2.2 isn’t present, because I had configuration issues with getting SSL to work. From the get-go, 2.2 was not an option, as we have a proprietary proxy module for our application server that does not yet support 2.2. That’s why 2.2 was not tuned, and I didn’t spend too much time with it. The T1000 worker for 2.0 is missing because when using pkcs11, it would not complete the benchmark tests. Apache Peak SSL Requests Per
Second Rather obvious results, eh? The asterisk after the tuned prefork means that I only got it to perform this well after applying the Solaris patches to the SUNWCry package. Quick Tips and Tricks for Performance

  • Use noatime on your DocumentRoot partition.
  • Apply all SUNWCry patches
  • Use ‘pthread’ for your SSLMutex and AcceptMutex
  • Make sure to use the shmcb for your SSLSessionCache
  • Use /dev/urandom for your SSLRandomSeed entries
  • Compile all the modules you might ever need, but only load them if you need them. Closing Thoughts
    The T1000 makes for a strong Apache box. I have a lot going on in my Apache config that probably drags down my performance - a lot of logging, about 100 mod_rewrite rules, proxies, and whatnot. You might be able to google around and find people getting 20,000 requests per second and more from Apache, but they aren’t doing that with my configuration. By replacing our v210’s with cheaper T1000’s, we’ve ensured that our webserver layer will not be the bottleneck in our stack for years to come!

Apache, Mod_ssl, and the Sun Fire T1000 - Part II

After recompiling Apache to take advantage of the T1000’s MAU as described in part I, I set out to doing some testing. Something was amiss - using some clients, I would see SSL page load times of about .025 seconds, others took close to a second. The v210 consistently tested out at about .080 seconds per page. Do not use the worker MPM with the pkcs11 engine!!!! There is a bug on that will bite you. In part III I’ll compare performance of worker vs prefork on the T1000 that will follow up with this issue. After a lot of Googling, I finally figured out what it was. The T1000’s MAU is only fast at doing RSA, and it generally sucks loudly when it tries to do anything with DH signing. Bug # 6241300 on OpenSolaris confirmed the issue. If you limit mod_ssl’s CipherSuite to just RSA algorithms, performance is great. However, we’re ecommerce here, and we don’t want to turn away anyone. Especially if they’re trying to go SSL, which is generally reserved for registering and checkout. So I though to myself, why not try our best to use RSA with everyone, but if they can’t or won’t do it, then we fall back to DH and eat the performance hit? I read Apache’s documentation on the CipherSuite directive until I could recite it word-for-word from memory. No matter what I did, I could not get FireFox to negotiate RC4-MD5 if there were any 256bit ciphersuites offered. I found a nice online tool at that allows you to find out what other sites are offering for ciphersuites. Using as my model, I found that they were somehow forcing me to use RC4-MD5 as long as my browser supported it. Just as I was ready to throw in the towel, and give up, I found the SSLHonorCipherOrder Apache directive. Yaayyyy!!! Crap! That feature was added in Apache 2.2 - we’re on 2.0. Before I get into the details, let me explain what this option does. The SSL specification says that as part of SSL negotiation, the server can dictate what the ciphersuite will be. However, until the SSLHonorCipherOrder option was introduced, Apache always went with what the client wanted to use. So, envision the server and the client walking down the street. They bump into each other, and want to talk in a secret language:

  • Server: Hi, I can speak the following secret languages: A,B,C,X,Y,Z. Which would you like to use?
  • Client: I can speak all of those too, but my favorite is Y. Let’s use that.
  • Server: Sounds good to me!

Now, when you set SSLHonorCipherOrder to true, the conversation is like this:

  • Server: Hi, what secret languages can you speak?
  • Client: I can speak A,B,C,X,Y,Z.
  • Server: Well, A is first in my list, we’ll use A.

So, by turning on SSLHonorCipherOrder, I can get the desired behavior where Apache does everything it can to use high performance ciphersuites before falling back to the slow ones! Now about that Apache 2.2 thing… Using my elite Googling skills once more, I found that the SSLHonorCipherOrder was a feature that was actually added to Apache when it was in 2.0, but it was branched off into 2.1 which eventually became 2.2. This meant that I might actually be able to “backport” that feature to 2.0 by simply copying and pasting some code. I found the original Apache bug and tried to patch it against 2.0.59. Using ‘patch < myfile.patch’ got most of the way, but there was a chunk at the end that I had to manually paste into the source code. It still fit perfectly, but the line numbers had changed a bit. So, once more I recompiled Apache, used the SSLHonorCipherOrder flag, and with no complaints, Apache was off and serving requests. Now, how in the world do I find out if it’s working or not? Verification First, make sure that the RSA operations of SSL are getting handed off to the hardware:

root@web1->kstat -n ncp0 -s rsaprivate
module: ncp                             instance: 0
name:   ncp0                            class:    misc
rsaprivate                      840

Hit an SSL page, then check the counter again. It should be incrementing. So, that tells us that the crypto hardware is being used, but I wanted a way to find out what the ciphersuite distribution was. While memorizing mod_ssl’s documentation, I remembered that I could log the protocol version and ciphersuite. So, I created a new logformat named combinedssl and used it in httpd.conf like so:

LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %{SSL_PROTOCOL}x %{SSL_CIPHER}x" combinedssl
CustomLog logs/www_ssl combinedssl

After restarting Apache, I had a logfile named logs/www_ssl with lines like this: - - [08/Aug/2007:17:14:27 -0500] "GET /favicon.ico HTTP/1.1" 200 1406 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20070508 Firefox/" TLSv1 RC4-MD5

Look at the last two fields - there’s our SSL info! Next, I whipped up some Perl to do a report on the data. I named it

#!/usr/bin/perl -w
use strict;

my $input;
if (-t STDIN) { #is STDIN standard?
  my $file = shift || die "I need a filename to parse!n";
  $input = *F;
} else {
  $input = *STDIN;

my %sslcounts;
my %ips;
while (<$input>) {
  if (/^([0-9.]+) .* ([w-]+) ([w-]+$)/) {
    if (! defined($ips{$1})) {
      next if ($1 eq '-' || $2 eq '-');
  else { die "Can't parse!"; }

my $grandtotal = $sslcounts{total};
delete $sslcounts{total};
printf("%-25s %6d (%5.2f","SSL Connections",$grandtotal,"100"); print "%)n";
foreach my $proto (sort { $sslcounts{$b}{total} <=> $sslcounts{$a}{total} } keys(%sslcounts)) {
  next if ($proto eq "total");
  my $ptotal = $sslcounts{$proto}{total};
  delete $sslcounts{$proto}{total};
  printf("%-25s %6d (%5.2f","  Protocol: $proto",$ptotal,($ptotal / $grandtotal * 100)); print "%)n";
  foreach my $cipher (sort { $sslcounts{$proto}{$b}{total} <=> $sslcounts{$proto}{$a}{total} } keys(%{$sslcounts{$proto}})) {
    next if ($cipher eq "total");
    my $ctotal = $sslcounts{$proto}{$cipher}{total};
    delete $sslcounts{$proto}{$cipher}{total};
    printf("%-25s %6d (%5.2f","    $cipher",$ctotal,($ctotal / $grandtotal * 100)); print "%)n";

I don’t claim that the above code is proper, but I do know that it works:

root@web1-> perl /tmp/ logs/www_ssl
    SSL Connections              250 (100.00%)
      Protocol: TLSv1            130 (52.00%)
        RC4-MD5                  117 (46.80%)
        AES256-SHA                 9 ( 3.60%)
        DES-CBC3-SHA               4 ( 1.60%)
      Protocol: SSLv3            120 (48.00%)
        RC4-MD5                  106 (42.40%)
        DES-CBC3-SHA              10 ( 4.00%)
        AES256-SHA                 4 ( 1.60%)

Nice! All SSL requests since we redeployed Apache are using the fast RSA ciphersuites! For what it’s worth, I could not get the nicely formatted +HIGH:+MEDIUM:+LOW type of ciphersuite syntax to work properly. Every time I added the word ALL to the mix, it blew up my sort order. I’d beaten my head against the wall enough already, so I just hardcoded all the ones I wanted in there.


If anyone knows of a cleaner way to represent that list in the same order, please let me know. I’d also like to know what ciphersuites other ecommerce shops use. References used but not linked above: Sun offers a blueprint of the crypto accelerator of the UltraSPARC T1 processor as a PDF.

Apache, Mod_ssl, and the Sun Fire T1000 - Part I

I like Apache. A lot. It’s one of the few apps out there that you can count on in a production environment, and it always does what you expect it to. However, Apache is only as good as the person configuring it. **Do not use the worker MPM with the pkcs11 engine!!!!** There is a bug on that **will** bite you. In part III I’ll compare performance of worker vs prefork on the T1000 that will follow up with this issue Since our Apache’s run on Sparc hardware, I like to compile it from source using Sun Studio compilers tweaked for performance. It goes against my open source bias, but when you’re at work, you do what’s best for the bottom line. Anyways, we are in the process of swapping out our Sun Fire v210’s with Sun Fire T1000’s for use as our frontend webservers. I did some initial performance testing in our lab environment. The general gist was that the v210 and T1000 were almost identical when testing Apache with a single thread, but the T1000 started to severely outrun the v210 once the load jumped up. That was what we hoped to see, so we kept going with our plan to replace the v210’s. Here are the actual numbers from siege: **v210** siege -c60 -b -r50 -f \~/urls.txt Transactions: 2999 hits Availability: 99.30 % Elapsed time: 48.50 secs Data transferred: 21.86 MB Response time: 0.38 secs Transaction rate: 61.84 trans/sec Throughput: 0.45 MB/sec Concurrency: 23.56 Successful transactions: 2999 Failed transactions: 21 Longest transaction: 29.85 Shortest transaction: 0.00 **T1000:** siege -c60 -b -r50 -f \~/urls.txt Transactions: 3000 hits Availability: 100.00 % Elapsed time: 6.45 secs Data transferred: 22.28 MB Response time: 0.11 secs Transaction rate: 465.12 trans/sec Throughput: 3.45 MB/sec Concurrency: 51.91 Successful transactions: 3000 Failed transactions: 0 Longest transaction: 2.08 Shortest transaction: 0.00 So, like any good sysadmin, I put the new servers in place in a phased approach. Take one v210 out of the load balancer, insert the new T1000, and slowly bring it into service. All went fine, and the load balancer was even favoring the T1000 over the v210. Then I happened to look at SSL stats. For some reason, the load balancer was favoring the v210 by a ratio of 3:1 for SSL connections. I knew this couldn’t be right, as the T1000 has circuitry in the CPU itself that acts as a hardware crypto accelerator. After Googling for a bit, I found Chi-Chang Lin’s blog. There, he details a way to performance test OpenSSL. Read the blog entry for the specifics, but here’s what I got from the v210 and the T1000: **v210:** sign verify sign/s verify/s rsa 1024 bits 0.003673s 0.000199s 272.3 5017.1 rsa 2048 bits 0.021869s 0.000625s 45.7 1600.9 **T1000:** sign verify sign/s verify/s rsa 1024 bits 0.004711s 0.000250s 212.3 4003.2 rsa 2048 bits 0.028339s 0.000814s 35.3 1229.2 For some reason, my T1000 is indeed not outperforming my v210 in SSL crypto operations. Also on Chi-Chang’s blog, I discovered that in order to use the crypto hardware on the UltraSparc T1, you have to either use Sun’s SSL, or patch the one you have. Aha! As a habit, I always compile OpenSSL from source and build Apache sources against that. My Apache on the T1000 was not using the patch, nor Sun’s OpenSSL. Just to make sure I was barking up the right tree, I ran the same OpenSSL tests as above, except this time I ran it with Sun’s OpenSSL. The v210 was virtually the same, but the T1000 - well: sign verify sign/s verify/s rsa 1024 bits 0.0003s 0.0001s 3175.2 7940.5 rsa 2048 bits 0.0014s 0.0003s 730.1 3284.7 WOW. 1,500% better numbers. I’d say that it’s worth recompiling Apache for that kind of benefit. So, I set out and recompiled Apache. For those wondering, here’s my configure:

make distclean
INSTALLDIR=/apps/apache2 LDFLAGS="-L/usr/sfw/lib -R/usr/sfw/lib"
./configure --prefix=$INSTALLDIR --enable-mods-shared=all
--enable-logio --enable-proxy --enable-proxy-connect
--enable-proxy-ftp --enable-proxy-http --enable-cache
--enable-mem-cache --enable-ssl --with-mpm=prefork --enable-so
--enable-rule=SSL\_EXPERIMENTAL --with-ssl=/usr/sfw
--enable-deflate --with-z=/usr LDFLAGS="$LDFLAGS" && dmake -j 64 &&
dmake install

Compiling Nagios Plugins and NRPE on Solaris 10 With Sun Studio

We have some Sun T1000’s running Solaris 10 that we are going to deploy as web servers. By compiling Apache from source using the Sun Studio compilers, you get a huge boost in performance because of the compiler’s built-in optimizations for the Niagra processor. Before deploying them, I needed to get NRPE setup, which requires that the Nagios plugins be installed. Once setup on the client side, I can point our Nagios server at the webserver and get notified of hardware issues, disk usage, load averages and what not. Installing NRPE and the plugins using gcc is a no brainer. I thought using Sun Studio wouldn’t be too much harder, but after 5 hours of banging my head against the wall, I figured out how to make them compile… To compile the two, first set your PATH variable so that it can find Sun Studio, and the Sun make binary:

export PATH

Now, the tricky part. Everything I did was failing with SSL issues. Once I fixed that, check_procs wasn’t working properly. Turns out you need to set some CFLAGS and tell configure how to run ps:

CFLAGS='-DSSL_EXPERIMENTAL -DSSL_ENGINE -xO4' ./configure --with-ps-command="/usr/bin/ps -eo 's uid pid ppid vsz rss pcpu etime comm args'" --with-ps-format='%s %d %d %d %d %d %f %s %s %n' --with-ps-cols=10 --with-ps-varlist='procstat,&;procuid,&;procpid,&;procppid,&;procvsz,&;procrss,&;procpcpu,procetime,procprog,&pos;'

Then make, su, make install as usual. Wash, rinse, and repeat for NRPE.