Class HTMLParser

  • All Implemented Interfaces:
    Parser

    public class HTMLParser
    extends AbstractParser
    Text extractor for HyperText Markup Language (HTML).
    Since:
    3.5
    Author:
    Michael Scholz, Alessandro Gasparini - LogicalDOC
    • Constructor Detail

      • HTMLParser

        public HTMLParser()
    • Method Detail

      • parse

        public String parse​(File file,
                            String filename,
                            String encoding,
                            Locale locale,
                            String tenant,
                            Document document,
                            String fileVersion)
        Description copied from interface: Parser
        Same as the other method that accept an input stream, use this when you have a file rather than a stream.
        Specified by:
        parse in interface Parser
        Overrides:
        parse in class AbstractParser
        Parameters:
        file - the file
        filename - name of the file
        encoding - character encoding
        locale - the locale
        tenant - name of the tenant
        document - the document the file belongs to (optional)
        fileVersion - the file version being processed (optional)
        Returns:
        the extracted text