Skip to main content

Samplers

A sampler is an object used to retrieve and prepare a dataset for the training of a model.

You handle the samplers in Administration > Artificial Intelligence > Models > Samplers

Samplers
 

You can count on different types of samplers with different settings:

SamplerDescriptionSettings
CSV

Reads the contents of a CSV file extracting all the rows as string arrays.
Expected format of each resource is this one:

5.1,3.5,1.4,.2,"Setosa"
7,3.2,4.7,1.4,"Versicolor"
6.2,3.4,5.4,2.3,"Virginica"

This example will produce three rows of 5 elements each:

5.1, 3.5, 1.4, .2, Setosa
7, 3.2, 4.7, 1.4, Versicolor
6.2, 3.4, 5.4, 2.3, Virginica
  • Delimiter: the character used as fields delimiter
  • Quote: the character used to enclose the value of a field
  • Document: the CSV document that contains the data
Paragraph

Extracts the paragraphs, interpreted as blocks of text separated by blank lines.
Expected format of each resource is this one:

A colleague of mine told me that the document 12356897 contains very important information, so I want to get it. Understood, but are you registered as LogicalDOC's user? If you are a user, just access the interface and then execute a search by document id = 12356897.

Where can I locate a specific file? I was not able to find what I was looking for. Ok, just enter LogicalDOC and search for document with ID -96668429, it is very easy. Sure! Easy and quick, many thanks for your hint.

The example above will produce two paragraphs.

  • Document: the text document that contains the data
MetadataExtract samples from a list of documents. By default the extended attributes of the documents are considered as the features, and so all the documents in the referenced folder must share the same attributes scheme. With the Automation you may also extract whatever data for each document.
  • Folder: the folder that contains the documents to process
  • Category: name of the extended attribute that contains the category, optional
  • Features: ordered comma-separated list name of extended attributes used to store the feature values
  • Automation: an automation script used to extract a sample from a source document accessible via the dictionary key $document
ChainCollects the samples extracted by a collection of other samplers
  • Chain: ordered list of samplers