gov.nasa.pds.report.processing
Class LogReformatProcessor

java.lang.Object
  extended by gov.nasa.pds.report.processing.LogReformatProcessor
All Implemented Interfaces:
Processor

public class LogReformatProcessor
extends Object
implements Processor

This class is used to reformat text-based log files so that they can be parsed with a common Sawmill profile. This reformatting uses regular expression patterns to determine how to break down input and restructure it for output. Therefore, the class must be configured before the processing can begin. The patterns use the less-than and greater-than symbols to label the substrings that are captured and rearranged. Each such substring is split into sections by one or more semicolons. The first section is the name of the substring. In substrings in the input pattern, the second section is the RE pattern used to capture that substring. There can also be an additional optional section to supply extra information, by setting flags to label the substrings as a date-time or requiring a valid value to be present. For example, when switching from IIS7 to the Apache/Combined format, we start with a log that looks like this: 2014-12-01 06:00:47 10.10.1.46 GET /merb/merxbrowser/help/Content/About+the+mission/MSL/Instruments/MSL+Navcam.htm - 443 - 66.249.69.46 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) - 200 0 0 10757 314 312 We want to reformat this log into something that looks like this: 66.249.69.46 - - [01/Dec/2014:06:00:47 -0800] "GET /merb/merxbrowser/help/Content/About+the+mission/MSL/Instruments/MSL+Navcam.htm HTTP/1.1" 200 10757 "-" "Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html)" To make this happen, we specify the input pattern like this: This processor then parses the lines in the input log and stores the log details in a map. Log details that are not specified (such as the URI query in the example above), do not have their keys added to the map. Using the example input line above, the map would look like this: date-time: 2014-12-01 06:00:47 (stored as a Date object) server-ip: 10.10.1.46 http-method: GET requested-resource: /merb/merxbrowser/help/Content/About+the+mission/MSL/Instruments/MSL+Navcam.htm server-port: 443 client-ip: 66.249.69.46 client-browser: Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) status-code: 200 substatus: 0 win32-status: 0 bytes-transfered: 10757 bytes-received: 314 time-taken: 312 Finally, we specify the output pattern like this: [] " " "" "" This causes the data from the original log line to be output in the desired format at the beginning of this example! The substrings in the input and output patterns can optional be given flags, separated by commas. required: A substring with this flag must have a valid value, otherwise the input line is discarded. This will happen for an input substring if the given value is "-" and for an output substring if the name is not present as a key in the map created from input. datetime: A substring with this flag designates a date-time using the format following an equals sign, as shown in the examples above. default: A substring with this flag will default to the value following an equals sign, as shown in the example above.

Author:
resneck

Field Summary
static String OUTPUT_DIR_NAME
           
 
Constructor Summary
LogReformatProcessor()
           
 
Method Summary
 void configure(Properties props)
          Configure the Processor, providing the details needed to process logs.
 String getDirName()
          Get the name of the directory where the output of the processor is placed.
 void process(File in, File out)
          Process the files in the input directory and place them in the output directory.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

OUTPUT_DIR_NAME

public static final String OUTPUT_DIR_NAME
See Also:
Constant Field Values
Constructor Detail

LogReformatProcessor

public LogReformatProcessor()
Method Detail

getDirName

public String getDirName()
Description copied from interface: Processor
Get the name of the directory where the output of the processor is placed.

Specified by:
getDirName in interface Processor
Returns:
The name of the directory created by the Processor.

process

public void process(File in,
                    File out)
             throws ProcessingException
Description copied from interface: Processor
Process the files in the input directory and place them in the output directory. The process performed will vary based on the implementation and the output will be placed in a sibling directory. The name of that directory will vary based upon the implementation being used.

Specified by:
process in interface Processor
Parameters:
in - The directory containing the input files
out - The directory where output is placed
Throws:
ProcessingException - If an error occurs.

configure

public void configure(Properties props)
               throws ProcessingException
Description copied from interface: Processor
Configure the Processor, providing the details needed to process logs.

Specified by:
configure in interface Processor
Parameters:
props - A Properties containing the needed configuration values.
Throws:
ProcessingException - If the provided Properties do not contain the needed configuration values.


Copyright © 2010–2015 Planetary Data System. All rights reserved.