public class DataLoader extends Object
AbstractTripleStore
. This
class supports a number of options, including a durable queues pattern, and
can be more efficient if multiple files are batched into a single commit
point. The main routine
will open the Journal
itself and therefore this class can not be used while the Journal
is
open in the webapp.
Note: This class is not efficient for scale-out.
com.bigdata.rdf.load.MappedRDFDataLoadMaster
Modifier and Type | Class and Description |
---|---|
static class |
DataLoader.ClosureEnum
A type-safe enumeration of options effecting whether and when entailments
are computed as documents are loaded into the database using the
DataLoader . |
static class |
DataLoader.CommitEnum
A type-safe enumeration of options effecting whether and when the database
will be committed.
|
class |
DataLoader.MyLoadStats |
static interface |
DataLoader.Options
Options for the
DataLoader . |
Modifier and Type | Field and Description |
---|---|
protected static org.apache.log4j.Logger |
log
Logger.
|
Constructor and Description |
---|
DataLoader(AbstractTripleStore database)
Configure
DataLoader using properties used to configure the
database. |
DataLoader(Properties properties,
AbstractTripleStore database) |
DataLoader(Properties properties,
AbstractTripleStore database,
PrintStream os)
Configure a data loader with overridden properties.
|
Modifier and Type | Method and Description |
---|---|
ClosureStats |
doClosure()
Compute closure as configured.
|
void |
endSource()
Flush the
StatementBuffer to the backing store. |
protected StatementBuffer<?> |
getAssertionBuffer()
Return the assertion buffer.
|
DataLoader.ClosureEnum |
getClosureEnum()
How the
DataLoader will maintain closure on the database. |
DataLoader.CommitEnum |
getCommitEnum()
Whether and when the
DataLoader will invoke
ITripleStore.commit() |
AbstractTripleStore |
getDatabase()
The target database.
|
static FilenameFilter |
getFilenameFilter() |
boolean |
getFlush()
When
true (the default) the StatementBuffer is
flushed by each loadData(String, String, RDFFormat) or
loadData(String[], String[], RDFFormat[]) operation and when
doClosure() is requested. |
InferenceEngine |
getInferenceEngine()
The object used to compute entailments for the database.
|
LoadStats |
loadData(InputStream is,
String baseURL,
org.openrdf.rio.RDFFormat rdfFormat)
Load from an input stream.
|
LoadStats |
loadData(Reader reader,
String baseURL,
org.openrdf.rio.RDFFormat rdfFormat)
Load from a reader and commit.
|
LoadStats |
loadData(String[] resource,
String[] baseURL,
org.openrdf.rio.RDFFormat[] rdfFormat)
Load a set of RDF resources into the associated triple store and commit.
|
LoadStats |
loadData(String resource,
String baseURL,
org.openrdf.rio.RDFFormat rdfFormat)
Load a resource into the associated triple store and commit.
|
LoadStats |
loadData(URL url,
String baseURL,
org.openrdf.rio.RDFFormat rdfFormat)
Load from a
URL . |
protected void |
loadData2(DataLoader.MyLoadStats totals,
String resource,
String baseURL,
org.openrdf.rio.RDFFormat rdfFormat,
boolean endOfBatch)
Load an RDF resource into the database.
|
protected void |
loadData3(LoadStats totals,
Object source,
String baseURL,
org.openrdf.rio.RDFFormat rdfFormat,
String defaultGraph,
boolean endOfBatch)
Deprecated.
|
void |
loadFiles(DataLoader.MyLoadStats totals,
int depth,
File file,
String baseURI,
org.openrdf.rio.RDFFormat rdfFormat,
String defaultGraph,
FilenameFilter filter,
boolean endOfBatch)
Recursive load of a file or directory.
|
LoadStats |
loadFiles(File file,
String baseURI,
org.openrdf.rio.RDFFormat rdfFormat,
String defaultGraph,
FilenameFilter filter) |
void |
logCounters(AbstractTripleStore database)
Report out a variety of interesting information on stdout and the
log . |
static void |
main(String[] args)
Utility method may be used to create and/or load RDF data into a local
database instance.
|
DataLoader.MyLoadStats |
newLoadStats()
Factory for
DataLoader specific LoadStats extension. |
static Properties |
processProperties(String propertyFileName,
boolean quiet,
int verbose,
boolean durableQueues) |
public DataLoader(AbstractTripleStore database)
DataLoader
using properties used to configure the
database.database
- The database.public DataLoader(Properties properties, AbstractTripleStore database)
public DataLoader(Properties properties, AbstractTripleStore database, PrintStream os)
properties
- Configuration properties - see DataLoader.Options
.database
- The database.os
- The PrintStream
for output messagespublic AbstractTripleStore getDatabase()
public InferenceEngine getInferenceEngine()
protected StatementBuffer<?> getAssertionBuffer()
The assertion buffer is used to buffer statements that are being asserted so as to maximize the opportunity for batch writes. Truth maintenance (if enabled) will be performed no later than the commit of the transaction.
Note: The same buffer
is reused by each loader so that we can on
the one hand minimize heap churn and on the other hand disable auto-flush
when loading a series of small documents. However, we obtain a new buffer
each time we perform incremental truth maintenance.
Note: When non-null
and non-empty, the buffer MUST be
flushed (a) if a transaction completes (otherwise writes will not be
stored on the database); or (b) if there is a read against the database
during a transaction (otherwise reads will not see the unflushed
statements).
Note: if truthMaintenance is enabled then this buffer is backed by a
temporary store which accumulates the SPO
s to be asserted.
Otherwise it will write directly on the database each time it is flushed,
including when it overflows.
public boolean getFlush()
true
(the default) the StatementBuffer
is
flushed by each loadData(String, String, RDFFormat)
or
loadData(String[], String[], RDFFormat[])
operation and when
doClosure()
is requested. When false
the caller
is responsible for flushing the buffer
.
This behavior MAY be disabled if you want to chain load a bunch of small
documents without flushing to the backing store after each document and
loadData(String[], String[], RDFFormat[])
is not well-suited to
your purposes. This can be much more efficient, approximating the
throughput for large document loads. However, the caller MUST invoke
endSource()
once all documents are loaded successfully. If an error
occurs during the processing of one or more documents then the entire
data load should be discarded.
DataLoader.Options.FLUSH
public void endSource()
StatementBuffer
to the backing store.
Note: If you disable auto-flush AND you are not using truth maintenance then you MUST explicitly invoke this method once you are done loading data sets in order to flush the last chunk of data to the store. In all other conditions you do NOT need to call this method. However it is always safe to invoke this method - if the buffer is empty the method will be a NOP.
public DataLoader.ClosureEnum getClosureEnum()
DataLoader
will maintain closure on the database.public DataLoader.CommitEnum getCommitEnum()
DataLoader
will invoke
ITripleStore.commit()
public DataLoader.MyLoadStats newLoadStats()
DataLoader
specific LoadStats
extension.public final LoadStats loadData(String resource, String baseURL, org.openrdf.rio.RDFFormat rdfFormat) throws IOException
resource
- A resource to be loaded (required).baseURL
- The baseURL to use for that resource (required).rdfFormat
- The RDFFormat
to use as a fall back for the resource
(required).IOException
public final LoadStats loadData(String[] resource, String[] baseURL, org.openrdf.rio.RDFFormat[] rdfFormat) throws IOException
resource
- An array of resources to be loaded (required).baseURL
- An array baseURL to use for those resources (required and must
be 1:1 with the array of resources).rdfFormat
- An array of RDFFormat
values to use as a fall back for
each resource (required and must be 1:1 with the array of
resources).IOException
public LoadStats loadData(Reader reader, String baseURL, org.openrdf.rio.RDFFormat rdfFormat) throws IOException
reader
- The reader (required).baseURL
- The base URL (required).rdfFormat
- The RDFFormat
to use as a fallback (required).IOException
public LoadStats loadData(InputStream is, String baseURL, org.openrdf.rio.RDFFormat rdfFormat) throws IOException
is
- The input stream (required).baseURL
- The base URL (required).rdfFormat
- The format (required).IOException
public LoadStats loadData(URL url, String baseURL, org.openrdf.rio.RDFFormat rdfFormat) throws IOException
URL
. If in quads mode, the triples in the default
graph will be inserted into the named graph associate with the specified
url
.url
- The URL (required).baseURL
- The base URL (required).rdfFormat
- The RDFFormat
(required).IOException
protected void loadData2(DataLoader.MyLoadStats totals, String resource, String baseURL, org.openrdf.rio.RDFFormat rdfFormat, boolean endOfBatch) throws IOException
resource
- Either the name of a resource which can be resolved using the
CLASSPATH, or the name of a resource in the local file system,
or a URL.baseURL
- rdfFormat
- endOfBatch
- IOException
- if the resource can not be resolved or loaded.public LoadStats loadFiles(File file, String baseURI, org.openrdf.rio.RDFFormat rdfFormat, String defaultGraph, FilenameFilter filter) throws IOException
file
- The file or directory (required).baseURI
- The baseURI (optional, when not specified the name of the each
file load is converted to a URL and used as the baseURI for
that file).rdfFormat
- The format of the file (optional, when not specified the
format is deduced for each file in turn using the
RDFFormat
static methods).defaultGraph
- The value that will be used for the graph/context co-ordinate when
loading data represented in a triple format into a quad store.filter
- A filter selecting the file names that will be loaded
(optional). When specified, the filter MUST accept directories
if directories are to be recursively processed.IOException
public void loadFiles(DataLoader.MyLoadStats totals, int depth, File file, String baseURI, org.openrdf.rio.RDFFormat rdfFormat, String defaultGraph, FilenameFilter filter, boolean endOfBatch) throws IOException
totals
- depth
- file
- baseURI
- rdfFormat
- defaultGraph
- filter
- endOfBatch
- IOException
@Deprecated protected void loadData3(LoadStats totals, Object source, String baseURL, org.openrdf.rio.RDFFormat rdfFormat, String defaultGraph, boolean endOfBatch) throws IOException
totals
- Used to report out the total LoadStats
.source
- A Reader
or InputStream
.baseURL
- The baseURI (optional, when not specified the name of the each
file load is converted to a URL and used as the baseURI for
that file).rdfFormat
- The format of the file (optional, when not specified the
format is deduced for each file in turn using the
RDFFormat
static methods).defaultGraph
- The value that will be used for the graph/context co-ordinate
when loading data represented in a triple format into a quad
store.endOfBatch
- Signal indicates the end of a batch.IOException
public void logCounters(AbstractTripleStore database)
log
.database
- DataLoader.Options.VERBOSE
public ClosureStats doClosure()
DataLoader.ClosureEnum.None
was selected
then this MAY be used to (re-)compute the full closure of the database.IllegalStateException
- if assertion buffer is null
#removeEntailments()
public static void main(String[] args) throws IOException
args
- [-quiet][-closure][-verbose][-durableQueues][-namespace namespace] propertyFile (fileOrDir)*
where
DataLoader.Options.VERBOSE
..good
or
.fail
as they are processed. The files will
remain in the same directory. The changes the default for
DataLoader.Options.IGNORE_INVALID_FILES
to true
and
the default for
RDFParserOptions.Options#STOP_AT_FIRST_ERROR
to
false
. Failures can be detected by looking for
".fail" files. (This is a shorthand for
DataLoader.Options.DURABLE_QUEUES
.)IOException
(durable queues)
public static Properties processProperties(String propertyFileName, boolean quiet, int verbose, boolean durableQueues) throws IOException
IOException
public static FilenameFilter getFilenameFilter()
Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.