Properties

$tracksLineNumbers

$tracksLineNumbers

Whether or not this lexer implements line-number/column-number tracking.

If it does, set to true.

$_special_entity2str

$_special_entity2str

Most common entity to raw value conversion table for special entities.

$factory

$factory

Methods

__construct()

__construct() : mixed

Returns

mixed —

tokenizeHTML()

tokenizeHTML(string  $html, \HTMLPurifier_Config  $config, \HTMLPurifier_Context  $context) : \HTMLPurifier_Token[]

Lexes an HTML string into tokens.

Parameters

string $html
\HTMLPurifier_Config $config
\HTMLPurifier_Context $context

Returns

\HTMLPurifier_Token[] —

muteErrorHandler()

muteErrorHandler(int  $errno, string  $errstr) : mixed

An error handler that mutes all errors

Parameters

int $errno
string $errstr

Returns

mixed —

callbackUndoCommentSubst()

callbackUndoCommentSubst(array  $matches) : string

Callback function for undoing escaping of stray angled brackets in comments

Parameters

array $matches

Returns

string —

callbackArmorCommentEntities()

callbackArmorCommentEntities(array  $matches) : string

Callback function that entity-izes ampersands in comments so that callbackUndoCommentSubst doesn't clobber them

Parameters

array $matches

Returns

string —

create()

create(\HTMLPurifier_Config  $config) : \HTMLPurifier_Lexer

Retrieves or sets the default Lexer as a Prototype Factory.

By default HTMLPurifier_Lexer_DOMLex will be returned. There are a few exceptions involving special features that only DirectLex implements.

Parameters

\HTMLPurifier_Config $config

Throws

\HTMLPurifier_Exception

Returns

\HTMLPurifier_Lexer —

parseText()

parseText(mixed  $string, mixed  $config) : mixed

Parameters

mixed $string
mixed $config

Returns

mixed —

parseAttr()

parseAttr(mixed  $string, mixed  $config) : mixed

Parameters

mixed $string
mixed $config

Returns

mixed —

parseData()

parseData(string  $string, mixed  $is_attr, mixed  $config) : string

Parses special entities into the proper characters.

This string will translate escaped versions of the special characters into the correct ones.

Parameters

string $string

String character data to be parsed.

mixed $is_attr
mixed $config

Returns

string —

Parsed character data.

normalize()

normalize(string  $html, \HTMLPurifier_Config  $config, \HTMLPurifier_Context  $context) : string

Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff.

Parameters

string $html

HTML.

\HTMLPurifier_Config $config
\HTMLPurifier_Context $context

Returns

string —

extractBody()

extractBody(mixed  $html) : mixed

Takes a string of HTML (fragment or document) and returns the content

Parameters

mixed $html

Returns

mixed —

tokenizeDOM()

tokenizeDOM(\DOMNode  $node, \HTMLPurifier_Token[]  $tokens, mixed  $config) : \HTMLPurifier_Token

Iterative function that tokenizes a node, putting it into an accumulator.

To iterate is human, to recurse divine - L. Peter Deutsch

Parameters

\DOMNode $node

DOMNode to be tokenized.

\HTMLPurifier_Token[] $tokens

Array-list of already tokenized tokens.

mixed $config

Returns

\HTMLPurifier_Token —

of node appended to previously passed tokens.

getTagName()

getTagName(\DOMNode  $node) : mixed

Portably retrieve the tag name of a node; deals with older versions of libxml like 2.7.6

Parameters

\DOMNode $node

Returns

mixed —

getData()

getData(\DOMNode  $node) : mixed

Portably retrieve the data of a node; deals with older versions of libxml like 2.7.6

Parameters

\DOMNode $node

Returns

mixed —

createStartNode()

createStartNode(\DOMNode  $node, \HTMLPurifier_Token[]  $tokens, bool  $collect, mixed  $config) : bool

Parameters

\DOMNode $node

DOMNode to be tokenized.

\HTMLPurifier_Token[] $tokens

Array-list of already tokenized tokens.

bool $collect

Says whether or start and close are collected, set to false at first recursion because it's the implicit DIV tag you're dealing with.

mixed $config

Returns

bool —

if the token needs an endtoken

createEndNode()

createEndNode(\DOMNode  $node, \HTMLPurifier_Token[]  $tokens) : mixed

Parameters

\DOMNode $node
\HTMLPurifier_Token[] $tokens

Returns

mixed —

transformAttrToAssoc()

transformAttrToAssoc(\DOMNamedNodeMap  $node_map) : array

Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array.

Parameters

\DOMNamedNodeMap $node_map

DOMNamedNodeMap of DOMAttr objects.

Returns

array —

Associative array of attributes.

wrapHTML()

wrapHTML(string  $html, \HTMLPurifier_Config  $config, \HTMLPurifier_Context  $context, mixed  $use_div = true) : string

Wraps an HTML fragment in the necessary HTML

Parameters

string $html
\HTMLPurifier_Config $config
\HTMLPurifier_Context $context
mixed $use_div

Returns

string —

escapeCDATA()

escapeCDATA(string  $string) : string

Translates CDATA sections into regular sections (through escaping).

Parameters

string $string

HTML string to process.

Returns

string —

HTML with CDATA sections escaped.

escapeCommentedCDATA()

escapeCommentedCDATA(string  $string) : string

Special CDATA case that is especially convoluted for <script>

Parameters

string $string

HTML string to process.

Returns

string —

HTML with CDATA sections escaped.

removeIEConditional()

removeIEConditional(string  $string) : string

Special Internet Explorer conditional comments should be removed.

Parameters

string $string

HTML string to process.

Returns

string —

HTML with conditional comments removed.

CDATACallback()

CDATACallback(array  $matches) : string

Callback function for escapeCDATA() that does the work.

Parameters

array $matches

PCRE matches array, with index 0 the entire match and 1 the inside of the CDATA section.

Returns

string —

Escaped internals of the CDATA section.