gov.nasa.pds.report.processing
Class LogReformatProcessor
java.lang.Object
gov.nasa.pds.report.processing.LogReformatProcessor
- All Implemented Interfaces:
- Processor
public class LogReformatProcessor
- extends Object
- implements Processor
This class is used to reformat text-based log files so that they can be
parsed with a common Sawmill profile. This reformatting uses regular
expression patterns to determine how to break down input and restructure
it for output. Therefore, the class must be configured before
the processing can begin.
The patterns use the less-than and greater-than symbols to label the
substrings that are captured and rearranged. Each such substring is split
into sections by one or more semicolons. The first section is the name of
the substring. In substrings in the input pattern, the second section is
the RE pattern used to capture that substring. There can also be an
additional optional section to supply extra information, by setting flags
to label the substrings as a date-time or requiring a valid value to be present.
For example, when switching from IIS7 to the Apache/Combined format, we
start with a log that looks like this:
2014-12-01 06:00:47 10.10.1.46 GET /merb/merxbrowser/help/Content/About+the+mission/MSL/Instruments/MSL+Navcam.htm - 443 - 66.249.69.46 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) - 200 0 0 10757 314 312
We want to reformat this log into something that looks like this:
66.249.69.46 - - [01/Dec/2014:06:00:47 -0800] "GET /merb/merxbrowser/help/Content/About+the+mission/MSL/Instruments/MSL+Navcam.htm HTTP/1.1" 200 10757 "-" "Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html)"
To make this happen, we specify the input pattern like this:
This processor then parses the lines in the input log and stores the log
details in a map. Log details that are not specified (such as the URI query
in the example above), do not have their keys added to the map. Using the
example input line above, the map would look like this:
date-time: 2014-12-01 06:00:47 (stored as a Date object)
server-ip: 10.10.1.46
http-method: GET
requested-resource: /merb/merxbrowser/help/Content/About+the+mission/MSL/Instruments/MSL+Navcam.htm
server-port: 443
client-ip: 66.249.69.46
client-browser: Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html)
status-code: 200
substatus: 0
win32-status: 0
bytes-transfered: 10757
bytes-received: 314
time-taken: 312
Finally, we specify the output pattern like this:
[] " " "" ""
This causes the data from the original log line to be output in the desired
format at the beginning of this example!
The substrings in the input and output patterns can optional be given
flags, separated by commas.
required: A substring with this flag must have a valid value, otherwise the
input line is discarded. This will happen for an input substring if the
given value is "-" and for an output substring if the name is not present as
a key in the map created from input.
datetime: A substring with this flag designates a date-time using the format
following an equals sign, as shown in the examples above.
default: A substring with this flag will default to the value following an
equals sign, as shown in the example above.
- Author:
- resneck
Method Summary |
void |
configure(Properties props)
Configure the Processor, providing the details needed to process logs. |
String |
getDirName()
Get the name of the directory where the output of the processor is
placed. |
void |
process(File in,
File out)
Process the files in the input directory and place them in the output
directory. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
OUTPUT_DIR_NAME
public static final String OUTPUT_DIR_NAME
- See Also:
- Constant Field Values
LogReformatProcessor
public LogReformatProcessor()
getDirName
public String getDirName()
- Description copied from interface:
Processor
- Get the name of the directory where the output of the processor is
placed.
- Specified by:
getDirName
in interface Processor
- Returns:
- The name of the directory created by the Processor.
process
public void process(File in,
File out)
throws ProcessingException
- Description copied from interface:
Processor
- Process the files in the input directory and place them in the output
directory. The process performed will vary based on the implementation
and the output will be placed in a sibling directory. The name of that
directory will vary based upon the implementation being used.
- Specified by:
process
in interface Processor
- Parameters:
in
- The directory containing the input filesout
- The directory where output is placed
- Throws:
ProcessingException
- If an error occurs.
configure
public void configure(Properties props)
throws ProcessingException
- Description copied from interface:
Processor
- Configure the Processor, providing the details needed to process logs.
- Specified by:
configure
in interface Processor
- Parameters:
props
- A Properties
containing the needed
configuration values.
- Throws:
ProcessingException
- If the provided Properties do not contain
the needed configuration values.
Copyright © 2010–2015 Planetary Data System. All rights reserved.