gov.nasa.pds.harvest.crawler
Class PDSProductCrawler

java.lang.Object
  extended by gov.nasa.jpl.oodt.cas.crawl.config.ProductCrawlerBean
      extended by gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
          extended by gov.nasa.pds.harvest.crawler.PDSProductCrawler
All Implemented Interfaces:
gov.nasa.jpl.oodt.cas.commons.spring.SpringSetIdInjectionType, gov.nasa.jpl.oodt.cas.filemgr.metadata.CoreMetKeys
Direct Known Subclasses:
CollectionCrawler, PDS3ProductCrawler

public class PDSProductCrawler
extends gov.nasa.jpl.oodt.cas.crawl.ProductCrawler

Class that extends the Cas-Crawler to crawl a directory or PDS inventory file and register products to the PDS Registry Service.

Author:
mcayanan

Field Summary
protected  boolean inPersistanceMode
          Flag for crawler persistance.
protected  int numBadFiles
          The number of bad files that were found during the crawl.
protected  int numDiscoveredProducts
          The number of candidate products that were discovered during the crawl.
protected  int numFilesSkipped
          The number of files that were skipped during the crawl.
protected  Map<File,Long> touchedFiles
          A map of files that were touched during crawler persistance.
 
Fields inherited from class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
DIR_FILTER, FILE_FILTER, LOG
 
Fields inherited from class gov.nasa.jpl.oodt.cas.crawl.config.ProductCrawlerBean
MIME_TYPES_HIERARCHY
 
Fields inherited from interface gov.nasa.jpl.oodt.cas.filemgr.metadata.CoreMetKeys
FILE_LOCATION, FILENAME, MIME_TYPE, PRODUCT_ID, PRODUCT_NAME, PRODUCT_RECEVIED_TIME, PRODUCT_STRUCTURE, PRODUCT_TYPE
 
Constructor Summary
PDSProductCrawler()
          Default constructor.
PDSProductCrawler(Pds4MetExtractorConfig extractorConfig)
          Constructor.
 
Method Summary
 void addAction(gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction action)
          Adds a crawler action.
 void addActions(List<gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction> actions)
          Adds a list of crawler actions.
protected  void addKnownMetadata(File product, gov.nasa.jpl.oodt.cas.metadata.Metadata productMetadata)
          Method not implemented at the moment.
 void clearCrawlStats()
           
 void crawl(File dir)
          Crawls the given directory.
 List<gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction> getActions()
          Gets a list of crawler actions defined for the crawler.
protected  gov.nasa.jpl.oodt.cas.metadata.Metadata getMetadataForProduct(File product)
          Extracts metadata from the given product.
 Pds4MetExtractorConfig getMetExtractorConfig()
          Get the MetExtractor configuration object.
 int getNumBadFiles()
           
 int getNumDiscoveredProducts()
           
 int getNumFilesSkipped()
           
 RegistryIngester getRegistryIngester()
          Gets the registry ingester.
 String getRegistryUrl()
          Gets the registry location.
protected  boolean passesPreconditions(File product)
          Determines whether the supplied file passes the necessary pre-conditions for the file to be registered.
 void setFileFilter(List<String> filters)
          Sets the file filter.
 void setInPersistanceMode(boolean value)
           
 void setMetExtractorConfig(Pds4MetExtractorConfig config)
           
 void setProperties(String registryUrl, RegistryIngester ingester, List<gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction> actions)
           
 void setRegistryUrl(String url)
          Sets the registry location.
 
Methods inherited from class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
clearIngestStatus, crawl, getIngestStatus, handleFile, setActionRepo
 
Methods inherited from class gov.nasa.jpl.oodt.cas.crawl.config.ProductCrawlerBean
addRequiredMetadata, getActionIds, getApplicationContext, getDaemonPort, getDaemonWait, getFilemgrUrl, getGlobalMetadata, getId, getIngester, getProductPath, getRequiredMetadata, isCrawlForDirs, isNoRecur, isSkipIngest, setActionIds, setApplicationContext, setCrawlForDirs, setDaemonPort, setDaemonWait, setFilemgrUrl, setGlobalMetadata, setId, setIngester, setNoRecur, setProductPath, setRequiredMetadata, setSkipIngest
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

inPersistanceMode

protected boolean inPersistanceMode
Flag for crawler persistance.


touchedFiles

protected Map<File,Long> touchedFiles
A map of files that were touched during crawler persistance.


numDiscoveredProducts

protected int numDiscoveredProducts
The number of candidate products that were discovered during the crawl.


numBadFiles

protected int numBadFiles
The number of bad files that were found during the crawl.


numFilesSkipped

protected int numFilesSkipped
The number of files that were skipped during the crawl.

Constructor Detail

PDSProductCrawler

public PDSProductCrawler()
Default constructor.


PDSProductCrawler

public PDSProductCrawler(Pds4MetExtractorConfig extractorConfig)
Constructor.

Parameters:
extractorConfig - A configuration class that tells the crawler what data product types to look for and what metadata to extract.
Method Detail

getMetExtractorConfig

public Pds4MetExtractorConfig getMetExtractorConfig()
Get the MetExtractor configuration object.

Returns:
The PDSMetExtractorConfig object.

setMetExtractorConfig

public void setMetExtractorConfig(Pds4MetExtractorConfig config)

setInPersistanceMode

public void setInPersistanceMode(boolean value)

getNumDiscoveredProducts

public int getNumDiscoveredProducts()

getNumBadFiles

public int getNumBadFiles()

getNumFilesSkipped

public int getNumFilesSkipped()

clearCrawlStats

public void clearCrawlStats()

setRegistryUrl

public void setRegistryUrl(String url)
                    throws MalformedURLException
Sets the registry location.

Parameters:
url - A url of the registry location.
Throws:
MalformedURLException - If the given url is malformed.

getRegistryUrl

public String getRegistryUrl()
Gets the registry location.

Returns:
A url of the registry location.

getRegistryIngester

public RegistryIngester getRegistryIngester()
Gets the registry ingester.

Returns:
A registry ingester object.

setFileFilter

public void setFileFilter(List<String> filters)
Sets the file filter.

Parameters:
filters - A list of file filters.

addKnownMetadata

protected void addKnownMetadata(File product,
                                gov.nasa.jpl.oodt.cas.metadata.Metadata productMetadata)
Method not implemented at the moment.

Overrides:
addKnownMetadata in class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
Parameters:
product - The product file.
productMetadata - The metadata associated with the product.

crawl

public void crawl(File dir)
Crawls the given directory.

Overrides:
crawl in class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
Parameters:
dir - The directory to crawl.

addAction

public void addAction(gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction action)
Adds a crawler action.

Parameters:
action - A crawler action.

addActions

public void addActions(List<gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction> actions)
Adds a list of crawler actions.

Parameters:
actions - A list of crawler actions.

getActions

public List<gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction> getActions()
Gets a list of crawler actions defined for the crawler.

Returns:
A list of crawler actions that will be performed during crawling.

setProperties

public void setProperties(String registryUrl,
                          RegistryIngester ingester,
                          List<gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction> actions)
                   throws MalformedURLException
Throws:
MalformedURLException

getMetadataForProduct

protected gov.nasa.jpl.oodt.cas.metadata.Metadata getMetadataForProduct(File product)
Extracts metadata from the given product.

Specified by:
getMetadataForProduct in class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
Parameters:
product - A PDS file.
Returns:
A Metadata object, which holds metadata from the product.

passesPreconditions

protected boolean passesPreconditions(File product)
Determines whether the supplied file passes the necessary pre-conditions for the file to be registered.

Specified by:
passesPreconditions in class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
Parameters:
product - A file.
Returns:
true if the file passes.


Copyright © 2010-2012 Planetary Data System. All Rights Reserved.