Class EmbeddingTextUtils

java.lang.Object
com.logicaldoc.ai.embedding.EmbeddingTextUtils

public abstract class EmbeddingTextUtils extends Object
Utility class for chunking and sanitizing
Since:
9.2.2
Author:
Giuseppe Desiato - LogicalDOC
  • Method Details

    • chunk

      public static List<String> chunk(String text, Chunking chunking)
      Chunk using a model-specific policy.
    • chunk

      public static List<String> chunk(String text)
      Chunk using a reasonable default policy (used when no per-model Chunking is available).
    • sanitize

      public static String sanitize(String text)
      Normalize and sanitize text before tokenization/embedding.