Artificial Intelligence - LogicalDOC Documentation

AI Models

An AI Model (or simply model) represents a program that has been trained on a set of data to recognize certain patterns or make certain decisions without further human intervention.

You handle the models in Administration > Artificial Intelligence > Models

You can count on different types of models to implement specific AI algorithms, with different settings.

Settings

You create your own models by clicking on Add model or edit an existing one.

The settings panel is different depending on the kind of model and allows you to properly configure all its aspects.

Training

Because each model must be trained first, the Training tab is the place where you instruct the system on how to execute such activity.

The most important setting is the Sampler that allows you to choose among the list of samplers you previously created.

Quite all the models require more training cycles, this you specify in the Epochs the number of iterations you want when training the model.

The mode epochs you put, the more accurate the model you get, but the training operation will take more time.

If the data set you use to train the models changes regularly, it would be a good to flag the Enable scheduling option and provide a Schedule, this way the model will be constantly trained.

At any moment, you can manually force the training by using the Start training item of the contextual menu.

Training output

The completion of the training is reflected in the models grid but also in the log area of the training tab where you can see how the operations were performed.

The trained model is then saved in LogicalDOC itself as a regular document in the default folder /Default/ai-models as well as other files depending on the nature of the model.

You may change the path where LogicalDOC saves the training results using the Settings button of the toolbar.

Querying the model

After the model has been trained, you can query it. To do so, just choose the context menu item Query the model to open the query dialog and input the sample to evaluate.

The possible results are displayed below, ordered by descending score.

History and Statistics

In the History tab you find the list of events related to the current model.

Inside the Stats tab there is a graphical representation of the total queries per month.

Export and Import

Models can be exported and imported, this has been thought to allow you to prepare and train your models in a LogicalDOC installation and then import them already trained in the production system.

To export a model, just choose the option Export of the contextual menu. It will download a compressed archive containing both the model definition, and it's training.

In a target system where you want to import a previously exported model, click on the Import button of the toolbar, you will be asked to upload the archive and provide the name for the new model.

In case you want to update an existing model, just select it and choose the Import item of the contextual menu, you will then to upload the archive and overwrite the current model.

Neural Network

A neural network is an AI model that teaches computers to process data by modeling it on how the human brain works. It is a type of machine learning (ML) process, called deep learning, that uses interconnected nodes or neurons in a layered structure that resembles the human brain. It creates an adaptive system that computers use to learn from their mistakes and continually improve. Artificial neural networks thus attempt to solve complex problems.

How a Neural Network works

The architecture of a neural network is inspired by the human brain. Human brain cells, called neurons, form a complex, highly interconnected network and send electrical signals to each other to help humans process information. Similarly, an artificial neural network is made of artificial neurons that work together to solve a problem. Artificial neurons are software modules, called nodes, and artificial neural networks are software programs or algorithms that essentially use computing to solve mathematical calculations.

A basic neural network has artificial neurons interconnected at three levels:

Input Layer

Information from the outside world enters the neural network at the input layer. Input nodes process the data, analyze or categorize it, and pass it on to the next layer.

Hidden Layer

Hidden layers take their input from the input layer or from other hidden layers. Artificial neural networks can have a large number of hidden layers. Each hidden layer analyzes the output from the previous layer, processes it further, and passes it to the next layer.

Output Layer

The output layer returns the final result of all the data processing through the artificial neural network. It can have one or more nodes. For example, if we have a binary classification problem (yes/no), the output layer will have a single output node, which will return either 1 or 0. If, however, we have a multi-class classification problem, the output layer could consist of multiple output nodes.

Deep Neural Network Architecture

Deep neural networks, or deep learning networks, have multiple hidden layers with millions of artificial neurons connected to each other. A number, called a weight, represents the connections between each node. The weight is positive if a node stimulates another node, negative if it suppresses it. Nodes with higher weight values exert a greater influence on the others.
Theoretically, deep neural networks can map any type of input to any type of output. However, they require more training than other machine learning methods. They require millions of examples of training data, compared to the hundreds or thousands that a simpler network needs.

Activation Function

Each level define a mathematical function called Activation Function. This function receives the input from the previous level multiplied by the weight and produces the output for the next level.


Activation Function	Graphic
CUBE
ELU
HARDSIGMOID
HARDTANH
IDENTITY
LEAKYRELU
RATIONALTANH
RELU
RELU6
RRELU
SIGMOID
SOFTMAX
SOFTPLUS
SOFTSIGN
TANH
RECTIFIEDTANH
SELU
SWISH
THRESHOLDEDRELU
GELU
MISH

Training

The training of a neural network is the task to assign the best values to all the weights in order to minimize the difference between the model's prediction and the actual target value.

Loss Function

A loss function (also called a cost function or error function) measures the difference between a model's predictions and the actual target values. It quantifies how well the network is performing and guides the learning process by providing a numerical measure of the "error". The goal during training is to minimize the loss function, meaning the model's predictions should get closer to the actual values.

Configuring a Neural Network

The input of a Neural Network is a tuple of numbers called features, so specify the name of each feature in the Features field as comma-separated string of names.

The output is one of the possible categories you specify as a comma-separated string of options in the Categories field.

Batch field represents the number of samples returned by the samples' iterator during the training.

Seed is a number used as seed for the internal random numbers generator.

Weight Init Scheme is the algorithm to use to give initial value to all the weights.

In the selector Loss Function, you indicate what function to use to measure the error of the predicted values.

The Activation Function selector just indicates the default function to use for the layers.

On the right side of the panel define all the layers, giving a specific activation function for each of them.

Evaluation

The evaluation allow you to test the network against a random subset of the same training data set, to launch the process choose the item Start Evaluation of the contextual menu and at the end look at the results in the Evaluation tab.

Confusion matrix

The confusion matrix is a synthetic representation of the performance of the neural network.

Here's a breakdown of what a confusion matrix shows:

Rows: Represent the actual (true) class labels of the data
Columns: Represent the predicted class labels by the model
Cells: Each cell in the matrix represents a specific combination of actual and predicted labels, with the number in the cell indicating how many instances fall into that category

Natural Language Processing

Natural Language Processing, or simply NLP, is a class of AI model designed to process naturally written texts.

NLP enables computers to understand, interpret, and generate human language. It bridges the gap between human communication and machine understanding, allowing machines to read text, hear speech, interpret it, and even respond in natural ways. NLP combines computational linguistics with statistical, machine learning, and deep learning models to process and analyze large amounts of natural language data.

By using NLP, systems can perform tasks like language translation, sentiment analysis, speech recognition, chatbot conversations, and document summarization.

How Natural Language Processing Works

NLP involves a series of techniques and steps that convert unstructured human language into structured data that machines can understand and act upon.

Text Processing

Tokenization: breaking text into words or phrases.
Stop-word Removal: filtering out common words (like "and", "the") that carry little meaning.
Stemming/Lemmatization: reducing words to their base or root form.

Syntax and Semantics Analytics

Syntax Analysis (Parsing) involves analyzing the grammatical structure of a sentence, identifying parts of speech and relationships between words.
Semantic Analysis focuses on understanding the meaning behind words, sentences, and context.

Feature Extraction

Relevant features are extracted from the text, such as keywords, named entities (e.g., people, places), and sentiment indicators. These features serve as input for machine learning models.

Modeling and Interpretation

Using techniques such as classification, clustering, or neural networks, the system interprets the text and performs a task, like identifying sentiment, generating responses, or categorizing content.

At the time of this writing, there are different types of NLP models, each designed to solve specific language-related tasks. The models used in LogicalDOC are:

Vector Stores

A vector store indexes and stores vector embeddings (the vectorial representation of documents) for fast retrieval and semantic search. Embeddings are generated by AI models, in the context of machine learning these features represent different dimensions of the data that are essential for understanding patterns, relationships, and underlying structures.

You handle the vector stores in Administration > Artificial Intelligence > Embeddings > Vector Stores

At the time of writing, LogicalDOC just supports MariaDB as vector store; in the future, more vector store providers will be made available.

It is therefore required to insert the connection details to a MariaDB (version 11.8 or greater) database; you may use the wizard icon to get some help in composing the connection Url.

MariaDB

Since LogicalDOC 9.2.2, the Windows installer also includes a modern MariaDB with vector capabilities and automatically uses it without having you to take any action. In the same way, if you already installed LogicalDOC using an older version but already connected to a MariaDB >= 11.8, the 9.2.2 update will automatically use it as vector store.

In all the other cases, you must provide an installation of MariaDB 11.8 or greater and manually configure the connection to it.

Please refer to the product's website for installing MariaDB: https://mariadb.org/download

Robots

A robot is an intelligent agent designed to understand user questions and provide meaningful answers.

Robots act as an interface between the user and the Natural Language Processing (NLP) engine, using trained models to classify queries and extract key information. The platform provides a default robot is named Mentor, but you may create your ones dedicated to specific areas.

How Robots Work

Each robot is configured with two core NLP models:

Classifier: categorizes the user’s question into a specific action or intent (e.g., GETDOC, SEARCHFILE, UNKNOWN).
Tokens Detector: extracts specific values from the text, such as document IDs or filenames.

When a user asks a question, the robot:

Classifies the sentence using the classifier.
Extracts tokens using the tokens detector.
Executes a matching automation script (called an “answer”) associated with the identified category.
If the classification or token extraction fails, a default fallback answer is used.

Robot Configuration

Robots are configured through the Robot Management Interface.

The most relevant aspect of a robot's configuration is the Answer section, where each category (e.g., GETDOC, SEARCHDOC, SEARCHFILE) is mapped to an Automation script. These scripts define how the robot responds to a user query once the classifier and tokens detector have done their job.

Using these scripts, you can:

Retrieve and open documents by ID
Perform keyword-based full-text searches
Look up files by name
Handle unknown queries gracefully

These answers are stored as Automation scripts, allowing advanced conditional logic, data access, and dynamic rendering of results.

Dictionary available for the Automation in this context


AUTOMATION CONTEXT: ROBOT
Variable	Java Class	Description
robot	Robot	The current robot instance (e.g., A.I.D.A.).
transaction	RobotHistory	Contains metadata about the current query, user ID, tenant, and session.
category	String	The category assigned by the classifier (e.g., GETDOC, SEARCHDOC etc.)
tokens	Map<String, List<Result<String>>>	Extracted tokens from the input
answer	Value<String>	Value holder used to carry the answer, put here your answer

Read the Automation manual for more information.