gov.nasa.pds.harvest.crawler
Class PDSProductCrawler

java.lang.Object
  extended by gov.nasa.jpl.oodt.cas.crawl.config.ProductCrawlerBean
      extended by gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
          extended by gov.nasa.pds.harvest.crawler.PDSProductCrawler
All Implemented Interfaces:
gov.nasa.jpl.oodt.cas.commons.spring.SpringSetIdInjectionType, gov.nasa.jpl.oodt.cas.filemgr.metadata.CoreMetKeys
Direct Known Subclasses:
CollectionCrawler, PDS3ProductCrawler

public class PDSProductCrawler
extends gov.nasa.jpl.oodt.cas.crawl.ProductCrawler

Class that extends the Cas-Crawler to crawl a directory or PDS inventory file and register products to the PDS Registry Service.

Author:
mcayanan

Field Summary
protected  boolean inPersistanceMode
          Flag for crawler persistance.
protected  Map<File,Long> touchedFiles
          A map of files that were touched during crawler persistance.
 
Fields inherited from class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
DIR_FILTER, FILE_FILTER, LOG
 
Fields inherited from class gov.nasa.jpl.oodt.cas.crawl.config.ProductCrawlerBean
MIME_TYPES_HIERARCHY
 
Fields inherited from interface gov.nasa.jpl.oodt.cas.filemgr.metadata.CoreMetKeys
FILE_LOCATION, FILENAME, MIME_TYPE, PRODUCT_ID, PRODUCT_NAME, PRODUCT_RECEVIED_TIME, PRODUCT_STRUCTURE, PRODUCT_TYPE
 
Constructor Summary
PDSProductCrawler()
          Default constructor.
PDSProductCrawler(Pds4MetExtractorConfig extractorConfig)
          Constructor.
 
Method Summary
 void addAction(gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction action)
          Adds a crawler action.
 void addActions(List<gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction> actions)
          Adds a list of crawler actions.
protected  void addKnownMetadata(File product, gov.nasa.jpl.oodt.cas.metadata.Metadata productMetadata)
          Method not implemented at the moment.
 void crawl(File dir)
          Crawls the given directory.
 List<gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction> getActions()
          Gets a list of crawler actions defined for the crawler.
protected  gov.nasa.jpl.oodt.cas.metadata.Metadata getMetadataForProduct(File product)
          Extracts metadata from the given product.
 Pds4MetExtractorConfig getMetExtractorConfig()
          Get the MetExtractor configuration object.
 RegistryIngester getRegistryIngester()
          Gets the registry ingester.
 String getRegistryUrl()
          Gets the registry location.
protected  boolean passesPreconditions(File product)
          Determines whether the supplied file passes the necessary pre-conditions for the file to be registered.
 void setDirectoryFilter(DirectoryFilter filter)
          Sets the directory filter for the crawler.
 void setFileFilter(FileFilter filter)
          Sets the file filter for the crawler.
 void setInPersistanceMode(boolean value)
           
 void setMetExtractorConfig(Pds4MetExtractorConfig config)
           
 void setProperties(String registryUrl, RegistryIngester ingester, List<gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction> actions)
           
 void setRegistryUrl(String url)
          Sets the registry location.
 
Methods inherited from class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
clearIngestStatus, crawl, getIngestStatus, handleFile, setActionRepo
 
Methods inherited from class gov.nasa.jpl.oodt.cas.crawl.config.ProductCrawlerBean
addRequiredMetadata, getActionIds, getApplicationContext, getDaemonPort, getDaemonWait, getFilemgrUrl, getGlobalMetadata, getId, getIngester, getProductPath, getRequiredMetadata, isCrawlForDirs, isNoRecur, isSkipIngest, setActionIds, setApplicationContext, setCrawlForDirs, setDaemonPort, setDaemonWait, setFilemgrUrl, setGlobalMetadata, setId, setIngester, setNoRecur, setProductPath, setRequiredMetadata, setSkipIngest
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

inPersistanceMode

protected boolean inPersistanceMode
Flag for crawler persistance.


touchedFiles

protected Map<File,Long> touchedFiles
A map of files that were touched during crawler persistance.

Constructor Detail

PDSProductCrawler

public PDSProductCrawler()
Default constructor.


PDSProductCrawler

public PDSProductCrawler(Pds4MetExtractorConfig extractorConfig)
Constructor.

Parameters:
extractorConfig - A configuration class that tells the crawler what data product types to look for and what metadata to extract.
Method Detail

getMetExtractorConfig

public Pds4MetExtractorConfig getMetExtractorConfig()
Get the MetExtractor configuration object.

Returns:
The PDSMetExtractorConfig object.

setMetExtractorConfig

public void setMetExtractorConfig(Pds4MetExtractorConfig config)

setInPersistanceMode

public void setInPersistanceMode(boolean value)

setRegistryUrl

public void setRegistryUrl(String url)
                    throws MalformedURLException
Sets the registry location.

Parameters:
url - A url of the registry location.
Throws:
MalformedURLException - If the given url is malformed.

getRegistryUrl

public String getRegistryUrl()
Gets the registry location.

Returns:
A url of the registry location.

getRegistryIngester

public RegistryIngester getRegistryIngester()
Gets the registry ingester.

Returns:
A registry ingester object.

setFileFilter

public void setFileFilter(FileFilter filter)
Sets the file filter for the crawler.

Parameters:
filter - A File Filter defined in the Harvest policy config.

setDirectoryFilter

public void setDirectoryFilter(DirectoryFilter filter)
Sets the directory filter for the crawler.

Parameters:
filter - A Directory Filter defined in the Harvest policy config.

addKnownMetadata

protected void addKnownMetadata(File product,
                                gov.nasa.jpl.oodt.cas.metadata.Metadata productMetadata)
Method not implemented at the moment.

Overrides:
addKnownMetadata in class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
Parameters:
product - The product file.
productMetadata - The metadata associated with the product.

crawl

public void crawl(File dir)
Crawls the given directory.

Overrides:
crawl in class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
Parameters:
dir - The directory to crawl.

addAction

public void addAction(gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction action)
Adds a crawler action.

Parameters:
action - A crawler action.

addActions

public void addActions(List<gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction> actions)
Adds a list of crawler actions.

Parameters:
actions - A list of crawler actions.

getActions

public List<gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction> getActions()
Gets a list of crawler actions defined for the crawler.

Returns:
A list of crawler actions that will be performed during crawling.

setProperties

public void setProperties(String registryUrl,
                          RegistryIngester ingester,
                          List<gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction> actions)
                   throws MalformedURLException
Throws:
MalformedURLException

getMetadataForProduct

protected gov.nasa.jpl.oodt.cas.metadata.Metadata getMetadataForProduct(File product)
Extracts metadata from the given product.

Specified by:
getMetadataForProduct in class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
Parameters:
product - A PDS file.
Returns:
A Metadata object, which holds metadata from the product.

passesPreconditions

protected boolean passesPreconditions(File product)
Determines whether the supplied file passes the necessary pre-conditions for the file to be registered.

Specified by:
passesPreconditions in class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
Parameters:
product - A file.
Returns:
true if the file passes.


Copyright © 2010–2014 Planetary Data System. All rights reserved.