gov.nasa.pds.harvest.crawler
Class HarvestCrawler

java.lang.Object
  extended by gov.nasa.jpl.oodt.cas.crawl.config.ProductCrawlerBean
      extended by gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
          extended by gov.nasa.pds.harvest.crawler.HarvestCrawler
All Implemented Interfaces:
gov.nasa.jpl.oodt.cas.commons.spring.SpringSetIdInjectionType, gov.nasa.jpl.oodt.cas.filemgr.metadata.CoreMetKeys, PDSCoreMetKeys

public class HarvestCrawler
extends gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
implements PDSCoreMetKeys

Class that extends the Cas-Crawler to crawl a directory or PDS inventory file and register products to the PDS Registry Service.

Author:
mcayanan

Field Summary
 
Fields inherited from class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
DIR_FILTER, FILE_FILTER, LOG
 
Fields inherited from class gov.nasa.jpl.oodt.cas.crawl.config.ProductCrawlerBean
MIME_TYPES_HIERARCHY
 
Fields inherited from interface gov.nasa.pds.harvest.crawler.metadata.PDSCoreMetKeys
LOGICAL_ID, OBJECT_TYPE, PRODUCT_VERSION, REFERENCES, TITLE
 
Fields inherited from interface gov.nasa.jpl.oodt.cas.filemgr.metadata.CoreMetKeys
FILE_LOCATION, FILENAME, MIME_TYPE, PRODUCT_ID, PRODUCT_NAME, PRODUCT_RECEVIED_TIME, PRODUCT_STRUCTURE, PRODUCT_TYPE
 
Constructor Summary
HarvestCrawler(PDSMetExtractorConfig extractorConfig)
          Constructor
 
Method Summary
protected  void addKnownMetadata(File product, gov.nasa.jpl.oodt.cas.metadata.Metadata productMetadata)
           
 void crawl(File dir, List<String> fileFilters)
          Crawls a directory.
 void crawlBundle(File bundle)
          Crawl a PDS4 bundle file.
 void crawlCollection(File collection)
          Crawl a PDS4 collection file.
protected  gov.nasa.jpl.oodt.cas.metadata.Metadata getMetadataForProduct(File product)
          Extracts metadata from the given product.
 RegistryIngester getRegistryIngester()
          Gets the registry ingester.
 String getRegistryUrl()
          Gets the registry location.
protected  boolean passesPreconditions(File product)
          Determines whether the supplied file passes the necessary pre-conditions for the file to be registered.
 void setRegistryUrl(String url)
          Sets the registry location.
 
Methods inherited from class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
crawl, crawl, handleFile, setActionRepo
 
Methods inherited from class gov.nasa.jpl.oodt.cas.crawl.config.ProductCrawlerBean
addRequiredMetadata, getActionIds, getApplicationContext, getDaemonPort, getDaemonWait, getFilemgrUrl, getGlobalMetadata, getId, getIngester, getProductPath, getRequiredMetadata, isCrawlForDirs, isNoRecur, isSkipIngest, setActionIds, setApplicationContext, setCrawlForDirs, setDaemonPort, setDaemonWait, setFilemgrUrl, setGlobalMetadata, setId, setIngester, setNoRecur, setProductPath, setRequiredMetadata, setSkipIngest
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HarvestCrawler

public HarvestCrawler(PDSMetExtractorConfig extractorConfig)
Constructor

Parameters:
extractorConfig - A configuration class that tells the crawler what data product types to look for and what metadata to extract.
Method Detail

setRegistryUrl

public void setRegistryUrl(String url)
                    throws MalformedURLException
Sets the registry location.

Parameters:
url - A url of the registry location.
Throws:
MalformedURLException

getRegistryUrl

public String getRegistryUrl()
Gets the registry location.

Returns:
A url of the registry location.

getRegistryIngester

public RegistryIngester getRegistryIngester()
Gets the registry ingester.

Returns:
A registry ingester object.

crawl

public void crawl(File dir,
                  List<String> fileFilters)
Crawls a directory.

Parameters:
dir - A directory
fileFilters - A list of filters to allow the crawler to touch only specific files.

crawlCollection

public void crawlCollection(File collection)
                     throws InventoryReaderException
Crawl a PDS4 collection file. Method will register the collection first before attempting to register the product files it is pointing to.

Parameters:
collection - The PDS4 Collection file.
Throws:
InventoryReaderException

crawlBundle

public void crawlBundle(File bundle)
                 throws InventoryReaderException
Crawl a PDS4 bundle file. The bundle will be registered first, then the method will proceed to crawling the collection file it points to.

Parameters:
bundle - The PDS4 bundle file.
Throws:
InventoryReaderException

addKnownMetadata

protected void addKnownMetadata(File product,
                                gov.nasa.jpl.oodt.cas.metadata.Metadata productMetadata)
Overrides:
addKnownMetadata in class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler

getMetadataForProduct

protected gov.nasa.jpl.oodt.cas.metadata.Metadata getMetadataForProduct(File product)
Extracts metadata from the given product.

Specified by:
getMetadataForProduct in class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
Parameters:
product - A PDS file.
Returns:
A Metadata object, which holds metadata from the product.

passesPreconditions

protected boolean passesPreconditions(File product)
Determines whether the supplied file passes the necessary pre-conditions for the file to be registered.

Specified by:
passesPreconditions in class gov.nasa.jpl.oodt.cas.crawl.ProductCrawler
Parameters:
product - A file.
Returns:
true if the file passes.


Copyright © 2005-2010 Planetary Data System. All Rights Reserved.