gov.nasa.pds.harvest.crawler
Class PDSProductCrawler

java.lang.Object
  extended by gov.nasa.jpl.oodt.cas.crawl.config.ProductCrawlerBean
      extended by gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
          extended by gov.nasa.pds.harvest.crawler.PDSProductCrawler
All Implemented Interfaces:
gov.nasa.jpl.oodt.cas.commons.spring.SpringSetIdInjectionType, gov.nasa.jpl.oodt.cas.filemgr.metadata.CoreMetKeys
Direct Known Subclasses:
CollectionCrawler

public class PDSProductCrawler
extends gov.nasa.jpl.oodt.cas.crawl.ProductCrawler

Class that extends the Cas-Crawler to crawl a directory or PDS inventory file and register products to the PDS Registry Service.

Author:
mcayanan

Field Summary
 
Fields inherited from class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
DIR_FILTER, FILE_FILTER, LOG
 
Fields inherited from class gov.nasa.jpl.oodt.cas.crawl.config.ProductCrawlerBean
MIME_TYPES_HIERARCHY
 
Fields inherited from interface gov.nasa.jpl.oodt.cas.filemgr.metadata.CoreMetKeys
FILE_LOCATION, FILENAME, MIME_TYPE, PRODUCT_ID, PRODUCT_NAME, PRODUCT_RECEVIED_TIME, PRODUCT_STRUCTURE, PRODUCT_TYPE
 
Constructor Summary
PDSProductCrawler(PDSMetExtractorConfig extractorConfig)
          Constructor.
 
Method Summary
 void addAction(gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction action)
          Adds a crawler action.
 void addActions(List<gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction> actions)
          Adds a list of crawler actions.
protected  void addKnownMetadata(File product, gov.nasa.jpl.oodt.cas.metadata.Metadata productMetadata)
          Method not implemented at the moment.
 void clearCrawlStats()
           
 void crawl(File dir)
          Crawls the given directory.
 List<gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction> getActions()
          Gets a list of crawler actions defined for the crawler.
protected  gov.nasa.jpl.oodt.cas.metadata.Metadata getMetadataForProduct(File product)
          Extracts metadata from the given product.
 PDSMetExtractorConfig getMetExtractorConfig()
          Get the MetExtractor configuration object.
 int getNumBadFiles()
           
 int getNumDiscoveredProducts()
           
 int getNumFilesSkipped()
           
 RegistryIngester getRegistryIngester()
          Gets the registry ingester.
 String getRegistryUrl()
          Gets the registry location.
protected  boolean passesPreconditions(File product)
          Determines whether the supplied file passes the necessary pre-conditions for the file to be registered.
 void setFileFilter(List<String> filters)
          Sets the file filter.
 void setInContinuousMode(boolean value)
           
 void setRegistryUrl(String url)
          Sets the registry location.
 
Methods inherited from class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
clearIngestStatus, crawl, getIngestStatus, handleFile, setActionRepo
 
Methods inherited from class gov.nasa.jpl.oodt.cas.crawl.config.ProductCrawlerBean
addRequiredMetadata, getActionIds, getApplicationContext, getDaemonPort, getDaemonWait, getFilemgrUrl, getGlobalMetadata, getId, getIngester, getProductPath, getRequiredMetadata, isCrawlForDirs, isNoRecur, isSkipIngest, setActionIds, setApplicationContext, setCrawlForDirs, setDaemonPort, setDaemonWait, setFilemgrUrl, setGlobalMetadata, setId, setIngester, setNoRecur, setProductPath, setRequiredMetadata, setSkipIngest
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PDSProductCrawler

public PDSProductCrawler(PDSMetExtractorConfig extractorConfig)
Constructor.

Parameters:
extractorConfig - A configuration class that tells the crawler what data product types to look for and what metadata to extract.
Method Detail

getMetExtractorConfig

public PDSMetExtractorConfig getMetExtractorConfig()
Get the MetExtractor configuration object.

Returns:
The PDSMetExtractorConfig object.

setInContinuousMode

public void setInContinuousMode(boolean value)

getNumDiscoveredProducts

public int getNumDiscoveredProducts()

getNumBadFiles

public int getNumBadFiles()

getNumFilesSkipped

public int getNumFilesSkipped()

clearCrawlStats

public void clearCrawlStats()

setRegistryUrl

public void setRegistryUrl(String url)
                    throws MalformedURLException
Sets the registry location.

Parameters:
url - A url of the registry location.
Throws:
MalformedURLException

getRegistryUrl

public String getRegistryUrl()
Gets the registry location.

Returns:
A url of the registry location.

getRegistryIngester

public RegistryIngester getRegistryIngester()
Gets the registry ingester.

Returns:
A registry ingester object.

setFileFilter

public void setFileFilter(List<String> filters)
Sets the file filter.

Parameters:
filters - A list of file filters.

addKnownMetadata

protected void addKnownMetadata(File product,
                                gov.nasa.jpl.oodt.cas.metadata.Metadata productMetadata)
Method not implemented at the moment.

Overrides:
addKnownMetadata in class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
Parameters:
product - The product file.
productMetadata - The metadata associated with the product.

crawl

public void crawl(File dir)
Crawls the given directory.

Overrides:
crawl in class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
Parameters:
dir - The directory to crawl.

addAction

public void addAction(gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction action)
Adds a crawler action.

Parameters:
action - A crawler action.

addActions

public void addActions(List<gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction> actions)
Adds a list of crawler actions.

Parameters:
actions - A list of crawler actions.

getActions

public List<gov.nasa.jpl.oodt.cas.crawl.action.CrawlerAction> getActions()
Gets a list of crawler actions defined for the crawler.

Returns:
A list of crawler actions that will be performed during crawling.

getMetadataForProduct

protected gov.nasa.jpl.oodt.cas.metadata.Metadata getMetadataForProduct(File product)
Extracts metadata from the given product.

Specified by:
getMetadataForProduct in class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
Parameters:
product - A PDS file.
Returns:
A Metadata object, which holds metadata from the product.

passesPreconditions

protected boolean passesPreconditions(File product)
Determines whether the supplied file passes the necessary pre-conditions for the file to be registered.

Specified by:
passesPreconditions in class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
Parameters:
product - A file.
Returns:
true if the file passes.


Copyright © 2010-2011 Planetary Data System. All Rights Reserved.