SysAdmin's Journey

QuickTip: Make Life Easier With Ssh-copy-id

How many times have you ran through this series of events?

$ cat ~/.ssh/id_dsa.pub
...copy output to clipboard...
$ ssh myhost
...enter password...
myhost$ vi ~./ssh/authorized_keys
...paste public key and save...
myhost$ exit

Thanks to bash’s tab completion, I happened upon ssh-copy-id. Instead of all that above, just do this:

$ ssh-copy-id myhost
...enter password...

You’re done!

I'm Not Dead Yet!!!

Contrary to the lack of posts on my blog, I’m still alive. I apologize for not writing much, but things have been so busy lately. At $work, I’m working on rolling out a forklift upgrade in a lab environment that involves:

  • Upgrade our application servers from Solaris 9 to Solaris 10.
  • Upgrade Java from 1.4.2 to 5. Yes, I know that 5 is already EOL, but our API vendor runs on Weblogic 9, which is constrained to Java 5.
  • Speaking of Weblogic, upgrade that from 8.1 to 9.2.
  • Upgrade our ecommerce API from version 6.x to 10.x
  • Upgrade our Apache frontends from 2.0 to 2.2 - Done!
  • Implement a couple of Nginx boxes as a static HTTP servers, allowing our Apache servers to focus proxying and URL rewriting. All of this of course needs to happen in parallel with the normal firefighting ;-) Oh, and I’m trying to get Puppet up and running in the lab in all my free time. On the other hand at $home, I’ve been doing a lot of work with Drupal. I’m working on designing a photography site for a friend. Heck, I’ve even had some code committed to the node gallery contrib module! I’ve got two books I’ve yet to crack open, Mac OS X Snow Leopard: The Missing Manual, and Pro Drupal Develop ment. I’m trying to read through Pro Git, and Pulling Strings with Puppet. All of this productivity came to a screeching halt last Friday, when I got my Motorola Droid. The best part about my HTC running Windows Mobile was that it kept me in the dark about how fricking cool a real smartphone can be! As part of my new addiction to my phone, I caved in and signed up for a Facebook account – don’t worry, I didn’t give them my email password, so no one will get any spam. Feel free to hit me up though! I have to admit, the sheer number of users is impressive; but it’s the percentage of those that are actively using it every day that is incredible. So, stay tuned. There’s a lot of good posts coming - book reviews, a Droid/Android review (with an SA POV of course), and who knows, maybe some Drupal related stuff.

CheckPoint UTM-1 vs Cisco ASA in an ECommerce Setting

Recently at $WORK, we had to come up with budget proposals for next year. We knew that we were going to outgrow our current Checkpoint UTM appliances by holiday next year, so we had to buy new hardware. We just had to decide which hardware. While I’m capable of building a Linux/*BSD firewall on my own, I frankly don’t have the time to mess around with updates and compliance documentation. We need an appliance, and for our needs, Cisco and CheckPoint are about the only options for us. We switched to the UTM appliances from a pair of Cisco ASA 5500’s a few years ago. However, after examining the pros and cons of both, I recommended we switch back to the ASA platform next year. Read on for my decision making process explained.

Our first experience with the ASA product line from Cisco was a few years ago. The current ASA software at the time was 5.x (IIRC, maybe it was 6.x). The reason we switched to CheckPoint and their UTM-1 appliances was due to the lack of configurability. First of all, it was very tricky to make the ASA behave like a router AND a firewall, not just a firewall. Eventually, they supported the features necessary to do basic static routing, but I hit an issue where the “ASP” or “Accelerated Security Path” filters on the ASA were throwing away packets that I didn’t want it to. I was unable to write an ACL or tweak a configuration that would let the packets I needed to get through. In essence, the firewall was saying it knew what was bad for me, and wouldn’t listen to my argument on the matter at all. After going round-n-round with Cisco TAC, it became my sole purpose in life to get rid of those damned ASA’s. I succeeded in that two months later with a pair of UTM-1’s from CheckPoint.

We’re in the minority of businesses where our firewall’s priority isn’t protecting internal users from the big, bad Internet. Our goal is to let all but the most blatantly abusive potential guests browse our website and buy stuff. This is an important distinction - if we were the typical corporate network that focused on the former, we probably would have stuck with CheckPoint. So, here’s my list of pros and cons for each device:

  • CheckPoint

    • Pros

      • SmartView Tracker. This app has no competition that I’ve found. This app lets you view events in real time, or do some pretty complex searches in real time. Beats the heck out of grep | cut | sort | uniq | wc on a syslog file.

      • SmartDashboard: If you’re into GUI’s, this one is very nice at configuring rulesets, and giving you a graphical view of your networks.

      • SmartDefense: while quite expensive, this L7 deep inspection filter does it’s job well. You get updates every week or so, and can turn them on, off, or put them in monitor mode which lets the packets through, but logs an event. This allows you to see what would happen if you turned it on, without actually interrupting packet flow.

    • Cons

      • Expense. Yikes. Comparing a Cisco ASA solution to the closest CheckPoint solution in our case has CheckPoint coming in at more than 25% more than the Cisco which will push more pps.

      • Lack of a robust CLI. This is a killer for me. While having a GUI can be nice at times, nothing beats a concise CLI. Where Cisco’s ASDM solution is a GUI built upon a CLI foundation, CheckPoint’s CLI is an afterthought to the GUI. Some might argue there’s nothing you can’t do via the CLI on a CheckPoint, but if you’re editing the policy files in vi, then you’re just asking for trouble.

      • Commodity hardware. CheckPoint is a software solution, and their UTM-1 appliances are nothing more than x86 boxes running SecurePlatform (which is a pared down RHEL). While there’s nothing inherently wrong with that, the result of CheckPoint using off-the-shelf hardware versus Cisco’s custom hardware is that Cisco’s can push a lot more packets than comparable CheckPoints.

      • Hard Drives. Cisco’s run off flash and have no moving parts save for the fans. CheckPoint’s appliance requires a full-on hard drive. While I’ve had DOA Cisco flash, I’ve never had their flash fail me once put into service. I can’t count how many hard drives have failed me over the years.

      • Reliance upon a SmartCenter server. Some may see this as a positive, but I see it as a negative. When you install your CheckPoint policy, it first goes to a separate server called the SmartCenter. The SmartCenter then pushes this config to the individual appliances one-by-one. All logs on the appliances are sent to the SmartCenter. I have a few problems with this. First, it’s another server. Second, it’s another single point of failure – if your SmartCenter dies, you lose the ability to change the configuration on your appliances until you get it back up. To eliminate the single point of failure, you’re encouraged to run a Active/Standby HA setup of SmartCenter. At this point, you have not just two appliances to keep up to date, but two SmartCenter servers as well. Each one of these devices is an x86 box with a hard drive, so MTBF is comparitively low.

  • Cisco

    • Pros

      • CLI. While it’s not quite IOS, it’s damn close, and anyone at home in IOS can pick up the ASA differences very quickly.

      • Easy upgrades and rollbacks. Cisco’s software upgrades might be odd to some, but once you get the hang of it, you won’t find anything better.

      • Optimized hardware. With the ASA’s, you get very few moving parts and ASIC’s that are optimized for pushing packets. Cisco’s been doing this for a long time, and they’re very good at it.

      • More bang for the buck. You pay less for a Cisco solution that has higher specs than a CheckPoint solution that doesn’t do as much.

      • ASDM. If you’re into the GUI thing, you can not ever have to touch the CLI.

    • Cons

      • Bugs. Cisco’s everything-including-the-kitchen-sink mindset means that their software tends to have a lot more bugs in it than what I’ve seen with CheckPoint. In their defense, Cisco seems to have been pretty quick to fix the bugs that I’ve encountered.

      • VPN Policy Management. I can’t speak to this too much, as I never really used the VPN portion of either appliance. However, it was plain to see that CheckPoint had a lot more to offer in their solution when it came to VPN policy management features.

I’m sure there’s a lot that I missed, but in the end, it came down to a few major points. Cisco has incremented their software 3 major versions since my last production experience with them, and it seems to me that much of the reason why I switched away has been resolved. I feel much more at home using the IOS-like CLI. I didn’t need a lot of the extra features that CheckPoint offered. Last, but certainly not least, there’s a lot more fun things I can spend that 25% on like new servers! However, if I had a bunch of business users to extend VPN functionality to, while making sure that they were secured from the Internet, I wouldn’t hesitate to chose the UTM-1.

I’m really interested to hear what others think. Do you think I made the right choice? No? Why? If you care to share your choices, let me know what your appliance is protecting (users or servers) and what choice you made.

New in Solaris 10 Update 8 - ZFS Support in Flash Archives

The release of Solaris 10 10/09 (Update 8) has come and gone, and without too much fanfare from my point of view. In my opinion, there is one new feature that will really help propel root ZFS installations into the enterprises where there was resistance before. You see, many larger corporations have invested a lot of time and money into using Flash Archive coupled with Jumpstart to be able to deploy golden images to many machines in a small amount of time. However, up until now, ZFS and Flash Archive were incompatible, meaning you were forced to use UFS root file systems if you wanted to use Flash Archive installs. Some have even went to great lengths to hack solutions together, but I doubt they made many in-roads into the enterprise. Read on for a quick overview of Flash Archive and ZFS support in the latest update of Solaris 10.

From Sun’s ZFS Administration Guide, here’s some potential “gotcha’s”:

  • Only a JumpStart installation of a ZFS Flash archive is supported. You cannot use the interactive installation option of a Flash archive to install a system with a ZFS root file system. Nor can you use a Flash archive to install a ZFS BE with Solaris Live Upgrade.

  • You can only install a system of the same architecture with a ZFS Flash archive. For example, an archive that is created on a sun4u system cannot be installed on a sun4v system.

  • Only a full initial installation of a ZFS Flash archive is supported. You cannot install differential Flash archive of a ZFS root file system nor can you install a hybrid UFS/ZFS archive.

  • Existing UFS Flash archives can still only be used to install a UFS root file system. The ZFS Flash archive can only be used to install a ZFS root file system.

  • Although the entire root pool, minus any explicitly excluded datasets, is archived and installed, only the ZFS BE that is booted when the archive is created is usable after the Flash archive is installed. However, pools that are archived with the flar or flarcreate command’s -R rootdir option can be used to archive a root pool other than the one that is currently booted.

  • A ZFS root pool name that is created with a Flash archive must match the master root pool name. The root pool name that is used to create the Flash archive is the name that is assigned to the new pool created. Changing the pool name is not supported.

  • The flarcreate and flar command options to include and exclude individual files are not supported in a ZFS Flash archive. You can only exclude entire datasets from a ZFS Flash archive.

  • The flar info command is not supported for a ZFS Flash archive.

There’s a few constraints in there that might cause a few people problems, but overall Sun has done a good job opening up yet another in-road to the enterprise folks to let them experience the joy that is administering ZFS.

I have yet to setup Flash Archive and ZFS myself, but once I do, you can bet I’ll post about it!

Not So Typical Jumpstart: Part Three - the Scripts

In part two of the series, we left off with a non-working Custom Jumpstart setup. By creating our Jumpstart profile file and a sysidcfg file, we’ll have a basic, but working Custom Jumpstart. The profile contains settings specific to the installation, where the sysidcfg file contains settings specific to the machine during and after installation.

Step One: Create the sysidcfg file

If you’ve been following along, we specified the path to web1’s sysidcfg file when we generated our dhcpd.conf file in part one. Let’s create this file, so that the installer doesn’t have to ask us these questions during the installation:

cd /export/home/jumpstart/configs
mkdir -p sysids/web
cat <<EOD > sysids/web/sysidcfg
keyboard=US-English
system_locale=en_US.UTF-8
security_policy=NONE
nfs4_domain=domain.com
name_service=DNS {domain_name=domain.com name_server=192.168.0.5}
network_interface=bge0 { protocol_ipv6=no
  primary
  hostname=bkeweb1
  ip_address=192.168.0.21
  default_route=192.168.0.10
  netmask=255.255.255.0
}
terminal=vt100
timezone=US/Central
timeserver=localhost
root_password=8X123/ZkOFICY
service_profile=limited_net
EOD

You should read the manpage for sysidcfg, but in our example above some of the not-so-obvious settings are:

  • security_policy=NONE is relative to the kerberos security settings during installation.
  • root_password the value of root’s password as written in a real /etc/shadow file.
  • service_profile=limited_net - using limited_net disables unnecessary services and restricts others to localhost access only. I recommend this setting for anyone not in a lab environment.

Step Two: Create the profile file

There’s a huge amount of operations you can configure via the jumpstart profile - I recommend reading Sun’s documentation on creating a Jumpstart profile. I also owe a link to the Keep Da Link blog for a nice list of clusters and packages to install. Using this profile below will result in an installation with pretty much every command line tool you would need, minus any X11/Gnome packages. Here’s the contents of my Jumpstart profile:

cat <<EOD > webserver
install_type    initial_install
system_type     standalone
#pool           name  size swap dump devices
#pool           rpool auto auto auto mirror c0t0d0s0 c0t1d0s0
pool            rpool auto auto auto any
bootenv         installbe bename s10_u7 
partitioning    default

# base cluster
cluster SUNWCreq
# additional clusters and packages
cluster SUNWCacc add # System accounting utilities
cluster SUNWCadm add # System And Network Administration (showrev etc)
cluster SUNWCcpc add # CPU Performance Counter driver and utilities
cluster SUNWCcry add # Supplemental cryptographic modules and libraries
cluster SUNWCfwcmp add # Freeware Compression Utilities (bzip zip zlib gzip)
cluster SUNWCfwshl add # Freeware Shells (bash tcsh zsh)
cluster SUNWCfwutil add # Freeware Other Utilities (gpatch less rpm)
cluster SUNWCgcc add # GNU binutilis, C compiler, m4 and make
cluster SUNWCged add # Gigabit Ethernet Adapter Software
cluster SUNWCjv add # JavaVM
cluster SUNWCjvx add # JavaVM (64-bit)
cluster SUNWClibusb add # wrapper for libusb; user level usb ugen library
cluster SUNWClu add # Live Upgrade Software
cluster SUNWCntp add # Network Time Protocol
cluster SUNWCopenssl add # the classical super-old Solaris OpenSSL
cluster SUNWCpd add # PCI drivers
cluster SUNWCperl add # perl5
cluster SUNWCpm add # Power Management Software
cluster SUNWCpmgr add # Patch Manager Software
cluster SUNWCpool add # core software for resource pools
cluster SUNWCptoo add # Programming tools and libraries
cluster SUNWCrcapu add # Solaris Resource Capping Daemon
cluster SUNWCsma add # Solaris Management Agent (snmpd)
cluster SUNWCssh add # Secure Shell Client/Server
cluster SUNWCts add # Solaris Trusted Extensions
cluster SUNWCusb add # USB drivers and header files
cluster SUNWCutf8 add # en_US.UTF-8 locale support
cluster SUNWCvld add # Sun Ethernet Vlan Utility
cluster SUNWCvol add # Volume Management
cluster SUNWCwget add # GNU wget
cluster SUNWCzone add # Solaris Zones
package SUNWarc add # Lint Libraries (usr)
package SUNWarcr add # Lint Libraries (root)
package SUNWman add # On-Line Manual Pages
package SUNWdoc add # Documentation Tools
package SUNWsfwhea add # Open Source header files
package SUNWtoo add # Programming Tools
package SUNWhea add # SunOS Header Files
package SUNWxcu4 add # XCU4 Utilities
package SUNWxcu4t add # XCU4 make and sccs utilities
package SUNWxcu6 add # XCU6 Utilities
package SUNWgcmn add # gcmn - Common GNU package
package SUNWggrp add # ggrep - GNU grep utilities
package SUNWgtar add # gtar - GNU tar
package SUNWuium add # ICONV Manual pages for UTF-8 Locale
package SUNWladm add # Locale Administrator (really optional)
package SUNWGlib add # GLIB - Library of useful routines for C programming
package SUNWPython-share add # python
package SUNWPython add # python
package SUNWfss add # Fair Share Scheduler
package SUNWscpr add # /usr/ucb tools
package SUNWscpu add # /usr/ucb tools
package SUNWrsg add # needed by sshd
package SUNWgssdh add # needed by sshd
package SUNWspnego add # needed by sshd
package SUNWbind add # host&dig

# optional clusters and packages
#cluster SUNWCapache add # Apache 1.3.9
#cluster SUNWCapch2 add # Apache 2
#cluster SUNWCpostgr add # PostgreSQL
#cluster SUNWCpostgr-82 add # PostgreSQL 8.2
#cluster SUNWCdhcp add # DHCPv4 Services
#cluster SUNWCtcat add # Tomcat Servlet/JSP Container
#package SUNWmysqlr add # MySQL (Root)
#package SUNWmysqlu add # MySQL (User)

# The following are for webstack dependencies
package         SUNWfontconfig
package         SUNWfreetype2
package         SUNWjpg
package         SUNWpng
package         SUNWxwplt
package         SUNWpostgr-82-libs
EOD

After running an install with this profile, you will end up with a ZFS root that contains just under 2GB worth of data. If that’s not your desired setup, then tweak away!

Step Three: Validate the profile file

The last step in this article is to validate the rules file and its associated profiles. Run the following:

# ./check
Validating rules...
Validating profile webserver...
The custom JumpStart configuration is ok.

This process validates your config files, and creates a rules.ok file in the root of your config server directory. It is this file that the client loads and parses when starting a Jumpstart installation.

This concludes part three of the series. You should now be able to boot using PXE for X86 or using ‘boot net:dhcp - install’ from Sparc, and have a completely automated installation. However, you likely also have a rather vanilla installation which we’ll remedy in part four of the series.

Not So Typical Jumpstart: Part Three - the Jumpstart Profile

In part two of the series, we left off with a non-working Custom Jumpstart setup. By creating our Jumpstart profile file and a sysidcfg file, we’ll have a basic, but working Custom Jumpstart. The profile contains settings specific to the installation, where the sysidcfg file contains settings specific to the machine during and after installation.

Step One: Create the sysidcfg file

If you’ve been following along, we specified the path to web1’s sysidcfg file when we generated our dhcpd.conf file in part one. Let’s create this file, so that the installer doesn’t have to ask us these questions during the installation:

cd /export/home/jumpstart/configs
mkdir -p sysids/web
cat <<EOD > sysids/web/sysidcfg
keyboard=US-English
system_locale=en_US.UTF-8
security_policy=NONE
nfs4_domain=domain.com
name_service=DNS {domain_name=domain.com name_server=192.168.0.5}
network_interface=bge0 { protocol_ipv6=no
                         primary
                         hostname=bkeweb1
                         ip_address=192.168.0.21
                         default_route=192.168.0.10
                         netmask=255.255.255.0
                        }
terminal=vt100
timezone=US/Central
timeserver=localhost
root_password=8X123/ZkOFICY
service_profile=limited_net
EOD

You should read the manpage for sysidcfg, but in our example above some of the not-so-obvious settings are:

  • security_policy=NONE is relative to the kerberos security settings during installation.
  • root_password the value of root’s password as written in a real /etc/shadow file.
  • service_profile=limited_net - using limited_net disables unnecessary services and restricts others to localhost access only. I recommend this setting for anyone not in a lab environment.

Step Two: Create the profile file

There’s a huge amount of operations you can configure via the jumpstart profile - I recommend reading Sun’s documentation on creating a Jumpstart profile. I also owe a link to the Keep DaLink blog for a nice list of clusters and packages to install. Using this profile below will result in an installation with pretty much every command line tool you would need, minus any X11/Gnome packages. Here’s the contents of my Jumpstart profile:

cat <<EOD > webserver
install_type    initial_install
system_type     standalone
#pool           name  size swap dump devices
#pool           rpool auto auto auto mirror c0t0d0s0 c0t1d0s0
pool            rpool auto auto auto any
bootenv         installbe bename s10_u7 
partitioning    default

# base cluster
cluster SUNWCreq
# additional clusters and packages
cluster SUNWCacc add # System accounting utilities
cluster SUNWCadm add # System And Network Administration (showrev etc)
cluster SUNWCcpc add # CPU Performance Counter driver and utilities
cluster SUNWCcry add # Supplemental cryptographic modules and libraries
cluster SUNWCfwcmp add # Freeware Compression Utilities (bzip zip zlib gzip)
cluster SUNWCfwshl add # Freeware Shells (bash tcsh zsh)
cluster SUNWCfwutil add # Freeware Other Utilities (gpatch less rpm)
cluster SUNWCgcc add # GNU binutilis, C compiler, m4 and make
cluster SUNWCged add # Gigabit Ethernet Adapter Software
cluster SUNWCjv add # JavaVM
cluster SUNWCjvx add # JavaVM (64-bit)
cluster SUNWClibusb add # wrapper for libusb; user level usb ugen library
cluster SUNWClu add # Live Upgrade Software
cluster SUNWCntp add # Network Time Protocol
cluster SUNWCopenssl add # the classical super-old Solaris OpenSSL
cluster SUNWCpd add # PCI drivers
cluster SUNWCperl add # perl5
cluster SUNWCpm add # Power Management Software
cluster SUNWCpmgr add # Patch Manager Software
cluster SUNWCpool add # core software for resource pools
cluster SUNWCptoo add # Programming tools and libraries
cluster SUNWCrcapu add # Solaris Resource Capping Daemon
cluster SUNWCsma add # Solaris Management Agent (snmpd)
cluster SUNWCssh add # Secure Shell Client/Server
cluster SUNWCts add # Solaris Trusted Extensions
cluster SUNWCusb add # USB drivers and header files
cluster SUNWCutf8 add # en_US.UTF-8 locale support
cluster SUNWCvld add # Sun Ethernet Vlan Utility
cluster SUNWCvol add # Volume Management
cluster SUNWCwget add # GNU wget
cluster SUNWCzone add # Solaris Zones
package SUNWarc add # Lint Libraries (usr)
package SUNWarcr add # Lint Libraries (root)
package SUNWman add # On-Line Manual Pages
package SUNWdoc add # Documentation Tools
package SUNWsfwhea add # Open Source header files
package SUNWtoo add # Programming Tools
package SUNWhea add # SunOS Header Files
package SUNWxcu4 add # XCU4 Utilities
package SUNWxcu4t add # XCU4 make and sccs utilities
package SUNWxcu6 add # XCU6 Utilities
package SUNWgcmn add # gcmn - Common GNU package
package SUNWggrp add # ggrep - GNU grep utilities
package SUNWgtar add # gtar - GNU tar
package SUNWuium add # ICONV Manual pages for UTF-8 Locale
package SUNWladm add # Locale Administrator (really optional)
package SUNWGlib add # GLIB - Library of useful routines for C programming
package SUNWPython-share add # python
package SUNWPython add # python
package SUNWfss add # Fair Share Scheduler
package SUNWscpr add # /usr/ucb tools
package SUNWscpu add # /usr/ucb tools
package SUNWrsg add # needed by sshd
package SUNWgssdh add # needed by sshd
package SUNWspnego add # needed by sshd
package SUNWbind add # host&dig

# optional clusters and packages
#cluster SUNWCapache add # Apache 1.3.9
#cluster SUNWCapch2 add # Apache 2
#cluster SUNWCpostgr add # PostgreSQL
#cluster SUNWCpostgr-82 add # PostgreSQL 8.2
#cluster SUNWCdhcp add # DHCPv4 Services
#cluster SUNWCtcat add # Tomcat Servlet/JSP Container
#package SUNWmysqlr add # MySQL (Root)
#package SUNWmysqlu add # MySQL (User)

# The following are for webstack dependencies
package         SUNWfontconfig
package         SUNWfreetype2
package         SUNWjpg
package         SUNWpng
package         SUNWxwplt
package         SUNWpostgr-82-libs
EOD

After running an install with this profile, you will end up with a ZFS root that contains just under 2GB worth of data. If that’s not your desired setup, then tweak away!

Step Three: Validate the profile file

The last step in this article is to validate the rules file and its associated profiles. Run the following:

# ./check
Validating rules...
Validating profile webserver...
The custom JumpStart configuration is ok.

This process validates your config files, and creates a rules.ok file in the root of your config server directory. It is this file that the client loads and parses when starting a Jumpstart installation.

This concludes part three of the series. You should now be able to boot using PXE for X86 or using ‘boot net:dhcp - install’ from Sparc, and have a completely automated installation. However, you likely also have a rather vanilla installation which we’ll remedy in part four of the series.

Not So Typical Jumpstart: Part Two

In part one of the series, we setup the ISC DHCP server. Now it’s time to set up our install and config servers – both of which will reside on the same box in this case. Solaris Jumpstart uses standard protocols, namely TFTP and NFS to provide these services. In this post, we’ll just be setting up the framework for the real customizations that will come in parts three and four.

Step 1: Setup the Install Server

In our example, we’ll be assuming that our install and config directories will be setup under /export/home/jumpstart/. So, pop the Solaris 10 DVD into the drive of your server, and issue the following commands as root:

mkdir -p /export/home/jumpstart/install/s10u7s
cd /cdrom/cdrom0/Solaris_10/Tools
./setup_install_server /export/home/jumpstart/install/s10u7s

Once that’s done (it will take awhile), we need to make sure the NFS server is running, and share our install location as a read-only mountpoint:

svcadm enable nfs/server
echo 'share -F nfs -o ro,anon=0 -d "install server directory" /export/home/jumpstart/install/s10u7s' >> /etc/dfs/dfstab
shareall

Now, it’s time to create a boot file and make it available via TFTP download for our client:

cd /export/home/jumpstart/install/s10u7s/Solaris_10/Tools
./add_install_client -d SUNW,Sun-Fire-T1000 sun4v
svcadm enable tftp/udp6

Note that the add_install_client command is likely not a command you can cut and paste! To determine the two arguments to add_install_client, you need to run some uname’s if you’re deploying to Sparc. To get the platform name, run ‘uname -i’, and replace the ‘SUNW,Sun-Fire-T1000’ string above with the response. Then, run ‘uname -m’, and replace sun4v with the response (which will most likely be sun4u). If you’re running x86, you’ll run this command:

./add_install_client -d SUNW.i86pc i86pc

At this point, you can do interactive network installations of the Solaris OS to your client. In fact, I recommend that you go ahead and try booting from the network and make sure that everything’s setup to this point. On a Sparc box, from OBP, type ‘boot net:dhcp - install’. If you’re on x86, boot via PXE. If you don’t have a working network installation at this point, stop here and get it working before you move on to setting up the config server.

Step 2: Setup the Config Server

We could do an interactive network install, which is helpful for systems that may not have a CD/DVD drive, but really – where’s the fun in that? We’re striving for automation here! That’s where the config server comes in. The config server provides a sysidcfg file to the system which tells it things like what IP’s belong on what interfaces, what is my hostname, etc. The sysidcfg file tells the system settings that will be unique to that system. The other function of the config server is to provide the jumpstart profile and all the scripts that the profile refers to. The jumpstart profile tells the machine specifics about the installation procedure – what packages to install, what locale to use, etc. Let’s setup our directories first:

for d in scripts sshkeys sysids jumpstart_sample; do
mkdir -p /export/home/jumpstart/configs/$d
done
echo 'share -F nfs -o ro,anon=0 /export/home/jumpstart/configs' >> /etc/dfs/dfstab
shareall

We’ve created our configs directory, which is the root of the config server setup. We created a few subdirectories – scripts is where we’ll store our add-on scripts, sshkeys is where we’ll keep our ssh public keys, sysids is where the system-specific sysidcfg files will be stored, and jumpstart_sample is where we will keep our jumpstart_sample profiles. These files are handy to have around for reference. Let’s copy those over now:

cp /export/home/jumpstart/install/s10u7s/Solaris_10/Misc/jumpstart_sample/* /export/home/jumpstart/configs/jumpstart_sample

Now, we’re ready to create our jumpstart rules file. Run the following:

cd /export/home/jumpstart/configs/
ln -s  /export/home/jumpstart/configs/check  /export/home/jumpstart/configs/jumpstart_sample/check
echo "hostname web1    delete_raidctl_vols    webserver      webserver_finish.sh" >> rules
touch delete_raidctl_vols webserver_finish.sh
chmod 755 delete_raidctl_vols webserver_finish.sh

We added one rule to the rules file. There is a well-commented rules file in the jumpstart_sample directory that you can pour over to get all the gory details. In our rules file, we’re essentially saying this:

  • hostname web1: Any client that has a hostname equal to ‘web1’ will use this rule. Remember in part one when I told you to jot down the hostname we used in dhcpd.conf? That hostname and this one have to match exactly.
  • delete_raidctl_vols: This is the begin script. It is ran before the actual installation occurs. If you don’t need a begin script, you can simply use a ‘-’ here. Right now, delete_raidctl_vols is empty, but in part four we’ll populate it with a script that deletes any “hardware” raid volumes so that we can use ZFS software mirroring in our installation.
  • webserver: This is the actual jumpstart profile file. We’ll go over this in part three.
  • webserver_finish: This is the finish script. This script provides the sysadmin with an interface to do anything he wants via a shell script after installation, but before reboot. All the power of jumpstart is in this one file. We’ll cover that in part four. Just as with the begin script, if you don’t need a finish script, just use a dash here.

At this point, we still don’t have a full custom jumpstart setup yet. Before it’s ready to be tested, we need to add the content to our ‘webserver’ jumpstart profile file, and run the ‘check’ command on it to generate the rules.ok file. We’ll do this in part three. Part four will cover some examples of things you can do in the begin and finish scripts - stay tuned!

Not So Typical Jumpstart: Part One

Only in my world do you get a RHCE one week, and then come back and work on nothing but Solaris Jumpstart for the next couple of weeks! Oh well, it’s always good as long as you’re learning. What started out as a simple upgrade from Apache 2.0 to 2.2 quickly turned into re- provisioning our web tier. We could have upgraded, but there ended up being so many dependencies needed I decided that it would be easier to just start fresh with Solaris 10u7 on all our webservers. Since I knew that I’d be doing this to all our webservers, it made sense to spend the time up-front on setup up a completely automated installation. This time spent on the front-end should save a huge amount of time on the back side when it comes to troubleshooting. In my case, I learned a lot of undocumented tips and tricks, and stumbled across a few “gotchas” as well. After going through an exercise like this, I now know what Puppet is for, and ordered my first book. I’ll give a review once I’m done.

First of all, we already have an ISC DHCPd server running to provide Red Hat Kickstart installs, so I decided to leverage that. Sun’s DHCPd works fine, but it’s a completely different beast when it comes to configuration.

Part one of the series covers setting up your dhcpd.conf.

Set Up Your ISC DHCP Server

There’s not too much involved here, but there is a gotcha or two involved. I used this article on BigAdmin as the primary source for this info.

When all was said and done, I had added this to my already existing dhcpd.conf:

# Jumpstart Support
option space SUNW;
option SUNW.root-mount-options code 1 = text;
option SUNW.root-server-ip-address code 2 = ip-address;
option SUNW.root-server-hostname code 3 = text;
option SUNW.root-path-name code 4 = text;
option SUNW.swap-server-ip-address code 5 = ip-address;
option SUNW.swap-file-path code 6 = text;
option SUNW.boot-file-path code 7 = text;
option SUNW.posix-timezone-string code 8 = text;
option SUNW.boot-read-size code 9 = unsigned integer 16;
option SUNW.install-server-ip-address code 10 = ip-address;
option SUNW.install-server-hostname code 11 = text;
option SUNW.install-path code 12 = text;
option SUNW.sysid-config-file-server code 13 = text;
option SUNW.JumpStart-server code 14 = text;
option SUNW.terminal-name code 15 = text;
option SUNW.SbootURI code 16 = text;
# Solaris 10 SPARC 05/09
group {
   use-host-decl-names on;
   next-server 192.168.0.28;
   vendor-option-space SUNW;
   option SUNW.JumpStart-server "192.168.0.28:/export/home/jumpstart/configs";
   option SUNW.install-server-hostname "192.168.0.28";
   option SUNW.install-server-ip-address 192.168.0.28;
   option SUNW.install-path "/export/home/jumpstart/install/s10u7s";
   option SUNW.root-server-hostname "192.168.0.28";
   option SUNW.root-server-ip-address 192.168.0.28;
   option SUNW.root-path-name "/export/home/jumpstart/install/s10u7s/Solaris_10/Tools/Boot";

  host web1 { 
        hardware ethernet 00:14:4f:cb:66:79;
        option SUNW.sysid-config-file-server = "192.168.0.28:/export/home/jumpstart/configs/sysids/web"; 
        }

}

Things of note:

  • 192.168.0.28 is the IP of our yet-to-be-setup NFS/TFTP server
  • Make note of the hostname you use in the host stanza above (web1 in our case) – we’ll need it later.

GOTCHA #1:

In the previously mentioned BigAdmin article, you may get the impression that the SUNW.sysid-config-file-server option is the path to a file. It isn’t, it’s the path to a directory that contains a file named sysidcfg. It took a packet sniffer to tell me that one.

GOTCHA #2:

Note how I used IP addresses instead of hostnames in many of the options above? I also use some abbreviations like s10u7s for “Solaris 10 Update 7 Sparc”. This is due to an incompatibility between the ISC DHCP server and the Solaris DHCP client. If the vendor specific options exceed 255 bytes, then ISC DHCP will send another packet containing the remainder of the options. Solaris' DHCP client doesn’t know how to deal with this. If you swear you have everything right, but Jumpstart isn’t taking off, you might be getting bit by this. tcpdump/snoop will tell you for sure.

At this point, you should be able to test your DHCP config, and restart your ISC DHCP server. Next, we’ll setup our Solaris TFTP/NFS server with the necessary media to allow our Jumstart installation to happen.

Random Password Generation in a Perl One-Liner

Say you need a quick random 8 character alpha-numeric password. In sh, there isn’t a $RANDOM variable and tr can give different results on different OS’s. More than likely you have perl available - use it!

perl
-le 'print map { (a..z,A..Z,0..9)[rand 62] } 0..pop' 8

Thanks to Chris Angell’s Perl One- Liner page for this one. Do you have a better cross-platform way of doing it? Let me know!

Apache 2.2.12 - 2.2.13 and Solaris 10 Bug Nastiness

At work, I’ve been working on an upgrade from a custom-compiled version of Apache 2.0.x to the Sun-provided Glassfish Webstack 1.5. I spent about a week troubleshooting what I thought was configuration issue, only to finally find it’s a bug way upstream in Apache 2.2.12+. This bug only affects Solaris 10, and is near-impossible to reproduce. If you use Solaris 10 and Apache, read on so you don’t waste a week of your life like I did.

The problem presented itself as Apache intermittently hanging. It didn’t depend on load, or anything else. Sometimes it would happen at 2pm in the afternoon, other times at 4am. While load isn’t required, a lot of simultaneous connnections helps trigger the bug. The guy I worked with at Sun had to introduce some sleep times into the Apache source code in order to trigger it, so my guess is that it’s a race condition on the microsecond level.

Basically, Nagios would alert me that Apache had quit responding. netstat showed a huge number of connections stuck in a CLOSE_WAIT state. Either restarting or gracefully restarting Apache would resolve the issue. Luckily, I found the solution before I had to pull out pstack and truss.

If you think you might be encountering the same bug, the first prerequisite is that you have multiple Listen statements in your config (most everyone does). If you do, then do the following to your stuck Apache.

  1. pstack pgrep httpd > /tmp/httpd_pstack.txt

  2. Find the pid in apr_pollset_poll(). Looking through httpd_pstack.txt, exactly one process should have this backtrace:

    1652: /usr/apache2/2.2/bin/httpd -k start feda1167 portfs (6, 13, 835d350, 2, 1, 8047b48) feedd302 apr_pollset_poll (835d308, 989680, 0, 8047ba4, 8047ba8, 2) + 126 08091611 child_main (0, 8090fac, 8047c08, 8091801) + 329 08091846 make_child (80c8128, 0, 0, 80c4228) + 86 0809192f startup_children (5, 80c6230, 8047d18, 8091a47) + 43 08091ab6 ap_mpm_run (80c6230, 80f42e8, 80c8128, 8070831) + 162 0807083e main (3, 8047ddc, 8047dec, feffb7b4) + 812 0806f9fd _start (3, 8047e8c, 8047ea7, 8047eaa, 0, 8047eb0) + 7d

In this case, the pid is 1652.

If you don’t find such a pid, you have a different problem.

  1. Run truss against the pid in apr_pollset_poll()

    truss -p 1652

It should look like this:

    port_getn(19, 0x0835D350, 2, 1, 0x08047B48) (sleeping...)
    port_getn(19, 0x0835D350, 2, 1, 0x08047B48) = 0 [62]
    port_getn(19, 0x0835D350, 2, 1, 0x08047B48) (sleeping...)

… (over and over)

with port_getn() returning about every 10 seconds, and the web side

inaccessible during this time.

If this is what you have, then you are indeed being bitten by this bug. Initially, I found a post on the Webstack forums that put me in touch with Jeff Trawick. After doing a bit more digging, I found the Apache HTTPD bug report. After emailing Jeff, he was able to send me a .so file that I could load before executing Apache that fixes the problem. I don’t have the okay to redistribute, so email Jeff if you need the fix. Sun more than likely won’t release an official update to Glassfish Webstack to resolve the issue, and going forward Apache 2.2.14 will include Jeff’s fix (technically the bug is in APR and is fixed in APR 1.3.9 which will be included in httpd 2.2.14).

Many thanks to Jeff Trawick for his quick help, as well as the steps on how to confirm existence of the bug using the steps listed above.