Ask SAJ: What to do with Apache logs > 50GB?

Our site at $work is generating Apache logs that, when combined sequentially into one file, are larger than 50GB in size for one day's worth of traffic. AWStats' perl script pretty much chokes when working on this much data. Last I checked, Webalizer wasn't much different, and probably wouldn't scale up to that amount of data either. Does anyone out there have any advice on a commercial solution for Apache log analysis that can scale up like that?

Comments

I'm not sure what kind of log analysis you're looking for, but if it's security log analysis then OSSEC can handle that with no problem. I suspect you may want to know about hits, trending and so-on, and for that OSSEC isn't really the best tool for the job.

I think splunk let's you do what you can do with AWStats.

Google Urchin (6.5) will handle it (far faster than AWstats since it's doing 'reverse lookups' via it's proprietary (local) 'geo database' (vs. making 10,000 of DNS lookups requests)). I'd recommend using the following settings for the Storage/DB tab when processing sites with large logs (trying to use the built in backup/roll-back mechanisms will have too much overhead, you're better just backing up the files (using your backup system) and nuking the files manually and re-processing the logs if things get screwed up. As for DB Table limit, the default is 60,000, Urchin doesn't recommend over 100k -- you can see a performance and disk usage change as you scale up. As for 'DB Memory Usage', don't use 'Keep All In Memory', it will fail):

DB Memory Usage: Limit with Cache System
DB Table Limit: Global Default
Keep Raw Tracking Data: on
Log Tracking: on
Auto Rollback DB: off
Create Backups: off
Clean Backups: off
Archive db: off

Add new comment

Subscribe to SysAdmin's Journey RSS