\HTMLPurifier_Lexer_DOMLex

Parser that uses PHP 5's DOM extension (part of the core).

In PHP 5, the DOM XML extension was revamped into DOM and added to the core. It gives us a forgiving HTML parser, which we use to transform the HTML into a DOM, and then into the tokens. It is blazingly fast (for large documents, it performs twenty times faster than HTMLPurifier_Lexer_DirectLex,and is the default choice for PHP 5.

Summary

Methods
Properties
Constants
create()
__construct()
parseText()
parseAttr()
parseData()
tokenizeHTML()
normalize()
extractBody()
muteErrorHandler()
callbackUndoCommentSubst()
callbackArmorCommentEntities()
$tracksLineNumbers
No constants found
escapeCDATA()
escapeCommentedCDATA()
removeIEConditional()
CDATACallback()
tokenizeDOM()
getTagName()
getData()
createStartNode()
createEndNode()
transformAttrToAssoc()
wrapHTML()
$_special_entity2str
N/A
No private methods found
$factory
N/A

Properties

$tracksLineNumbers

$tracksLineNumbers : 

Whether or not this lexer implements line-number/column-number tracking.

If it does, set to true.

Type

$_special_entity2str

$_special_entity2str : 

Most common entity to raw value conversion table for special entities.

Type

$factory

$factory : 

Type

Methods

create()

create(\HTMLPurifier_Config  $config) : \HTMLPurifier_Lexer

Retrieves or sets the default Lexer as a Prototype Factory.

By default HTMLPurifier_Lexer_DOMLex will be returned. There are a few exceptions involving special features that only DirectLex implements.

Parameters

\HTMLPurifier_Config $config

Throws

\HTMLPurifier_Exception

Returns

\HTMLPurifier_Lexer

__construct()

__construct() 

parseText()

parseText(  $string,   $config) 

Parameters

$string
$config

parseAttr()

parseAttr(  $string,   $config) 

Parameters

$string
$config

parseData()

parseData(string  $string,   $is_attr,   $config) : string

Parses special entities into the proper characters.

This string will translate escaped versions of the special characters into the correct ones.

Parameters

string $string

String character data to be parsed.

$is_attr
$config

Returns

string —

Parsed character data.

normalize()

normalize(string  $html, \HTMLPurifier_Config  $config, \HTMLPurifier_Context  $context) : string

Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff.

Parameters

string $html

HTML.

\HTMLPurifier_Config $config
\HTMLPurifier_Context $context

Returns

string

extractBody()

extractBody(  $html) 

Takes a string of HTML (fragment or document) and returns the content

Parameters

$html

muteErrorHandler()

muteErrorHandler(integer  $errno, string  $errstr) 

An error handler that mutes all errors

Parameters

integer $errno
string $errstr

callbackUndoCommentSubst()

callbackUndoCommentSubst(array  $matches) : string

Callback function for undoing escaping of stray angled brackets in comments

Parameters

array $matches

Returns

string

callbackArmorCommentEntities()

callbackArmorCommentEntities(array  $matches) : string

Callback function that entity-izes ampersands in comments so that callbackUndoCommentSubst doesn't clobber them

Parameters

array $matches

Returns

string

escapeCDATA()

escapeCDATA(string  $string) : string

Translates CDATA sections into regular sections (through escaping).

Parameters

string $string

HTML string to process.

Returns

string —

HTML with CDATA sections escaped.

escapeCommentedCDATA()

escapeCommentedCDATA(string  $string) : string

Special CDATA case that is especially convoluted for <script>

Parameters

string $string

HTML string to process.

Returns

string —

HTML with CDATA sections escaped.

removeIEConditional()

removeIEConditional(string  $string) : string

Special Internet Explorer conditional comments should be removed.

Parameters

string $string

HTML string to process.

Returns

string —

HTML with conditional comments removed.

CDATACallback()

CDATACallback(array  $matches) : string

Callback function for escapeCDATA() that does the work.

Parameters

array $matches

PCRE matches array, with index 0 the entire match and 1 the inside of the CDATA section.

Returns

string —

Escaped internals of the CDATA section.

tokenizeDOM()

tokenizeDOM(\DOMNode  $node, array<mixed,\HTMLPurifier_Token>  $tokens,   $config) : \HTMLPurifier_Token

Iterative function that tokenizes a node, putting it into an accumulator.

To iterate is human, to recurse divine - L. Peter Deutsch

Parameters

\DOMNode $node

DOMNode to be tokenized.

array<mixed,\HTMLPurifier_Token> $tokens

Array-list of already tokenized tokens.

$config

Returns

\HTMLPurifier_Token

of node appended to previously passed tokens.

getTagName()

getTagName(\DOMNode  $node) 

Portably retrieve the tag name of a node; deals with older versions of libxml like 2.7.6

Parameters

\DOMNode $node

getData()

getData(\DOMNode  $node) 

Portably retrieve the data of a node; deals with older versions of libxml like 2.7.6

Parameters

\DOMNode $node

createStartNode()

createStartNode(\DOMNode  $node, array<mixed,\HTMLPurifier_Token>  $tokens, boolean  $collect,   $config) : boolean

Parameters

\DOMNode $node

DOMNode to be tokenized.

array<mixed,\HTMLPurifier_Token> $tokens

Array-list of already tokenized tokens.

boolean $collect

Says whether or start and close are collected, set to false at first recursion because it's the implicit DIV tag you're dealing with.

$config

Returns

boolean —

if the token needs an endtoken

createEndNode()

createEndNode(\DOMNode  $node, array<mixed,\HTMLPurifier_Token>  $tokens) 

Parameters

\DOMNode $node
array<mixed,\HTMLPurifier_Token> $tokens

transformAttrToAssoc()

transformAttrToAssoc(\DOMNamedNodeMap  $node_map) : array

Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array.

Parameters

\DOMNamedNodeMap $node_map

DOMNamedNodeMap of DOMAttr objects.

Returns

array —

Associative array of attributes.

wrapHTML()

wrapHTML(string  $html, \HTMLPurifier_Config  $config, \HTMLPurifier_Context  $context,   $use_div = true) : string

Wraps an HTML fragment in the necessary HTML

Parameters

string $html
\HTMLPurifier_Config $config
\HTMLPurifier_Context $context
$use_div

Returns

string