\HTMLPurifier_Lexer_PH5P

Experimental HTML5-based parser using Jeroen van der Meer's PH5P library.

Occupies space in the HTML5 pseudo-namespace, which may cause conflicts.

Summary

Methods

Properties

Constants

__construct()
tokenizeHTML()
muteErrorHandler()
callbackUndoCommentSubst()
callbackArmorCommentEntities()
create()
parseText()
parseAttr()
parseData()
normalize()
extractBody()

$tracksLineNumbers

No constants found

tokenizeDOM()
getTagName()
getData()
createStartNode()
createEndNode()
transformAttrToAssoc()
wrapHTML()
escapeCDATA()
escapeCommentedCDATA()
removeIEConditional()
CDATACallback()

$_special_entity2str

N/A

No private methods found

$factory

N/A

File: serve/vendor/ezyang/htmlpurifier/library/HTMLPurifier/Lexer/PH5P.php
Package: Application
Class hierarchy: HTMLPurifier_Lexer

HTMLPurifier_Lexer_DOMLex

\HTMLPurifier_Lexer_PH5P
Uses

Tags

note	Recent changes to PHP's DOM extension have resulted in some fatal error conditions with the original version of PH5P. Pending changes, this lexer will punt to DirectLex if DOM throws an exception.

Properties

$tracksLineNumbers

$tracksLineNumbers

Whether or not this lexer implements line-number/column-number tracking.

If it does, set to true.

$_special_entity2str

$_special_entity2str

Most common entity to raw value conversion table for special entities.

Inherited from: \HTMLPurifier_Lexer
Uses

Tags

type	array

$factory

$factory

Inherited from: \HTMLPurifier_Lexer_DOMLex
Uses

Tags

type	HTMLPurifier_TokenFactory

Methods

__construct()

__construct() : mixed

Returns

mixed —

tokenizeHTML()

tokenizeHTML(string  $html, \HTMLPurifier_Config  $config, \HTMLPurifier_Context  $context) : \HTMLPurifier_Token[]

Lexes an HTML string into tokens.

Parameters

string	$html
\HTMLPurifier_Config	$config
\HTMLPurifier_Context	$context

Returns

\HTMLPurifier_Token[] —

muteErrorHandler()

muteErrorHandler(int  $errno, string  $errstr) : mixed

An error handler that mutes all errors

Parameters

int	$errno
string	$errstr

Returns

mixed —

callbackUndoCommentSubst()

callbackUndoCommentSubst(array  $matches) : string

Callback function for undoing escaping of stray angled brackets in comments

Parameters

array

$matches

Returns

string —

callbackArmorCommentEntities()

callbackArmorCommentEntities(array  $matches) : string

Callback function that entity-izes ampersands in comments so that callbackUndoCommentSubst doesn't clobber them

Parameters

array

$matches

Returns

string —

create()

create(\HTMLPurifier_Config  $config) : \HTMLPurifier_Lexer

Retrieves or sets the default Lexer as a Prototype Factory.

By default HTMLPurifier_Lexer_DOMLex will be returned. There are a few exceptions involving special features that only DirectLex implements.

Parameters

\HTMLPurifier_Config

$config

Throws

\HTMLPurifier_Exception

Returns

\HTMLPurifier_Lexer —

static

Inherited from: \HTMLPurifier_Lexer
Uses

Tags

note	The behavior of this class has changed, rather than accepting a prototype object, it now accepts a configuration object. To specify your own prototype, set %Core.LexerImpl to it. This change in behavior de-singletonizes the lexer object.

parseText()

parseText(mixed  $string, mixed  $config) : mixed

Parameters

mixed	$string
mixed	$config

Returns

mixed —

parseAttr()

parseAttr(mixed  $string, mixed  $config) : mixed

Parameters

mixed	$string
mixed	$config

Returns

mixed —

parseData()

parseData(string  $string, mixed  $is_attr, mixed  $config) : string

Parses special entities into the proper characters.

This string will translate escaped versions of the special characters into the correct ones.

Parameters

string	$string	String character data to be parsed.
mixed	$is_attr
mixed	$config

Returns

string —

Parsed character data.

normalize()

normalize(string  $html, \HTMLPurifier_Config  $config, \HTMLPurifier_Context  $context) : string

Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff.

Parameters

string	$html	HTML.
\HTMLPurifier_Config	$config
\HTMLPurifier_Context	$context

Returns

string —

Inherited from: \HTMLPurifier_Lexer
Uses

Tags

todo	Consider making protected

extractBody()

extractBody(mixed  $html) : mixed

Takes a string of HTML (fragment or document) and returns the content

Parameters

mixed

$html

Returns

mixed —

Inherited from: \HTMLPurifier_Lexer
Uses

Tags

todo	Consider making protected

tokenizeDOM()

tokenizeDOM(\DOMNode  $node, \HTMLPurifier_Token[]  $tokens, mixed  $config) : \HTMLPurifier_Token

Iterative function that tokenizes a node, putting it into an accumulator.

To iterate is human, to recurse divine - L. Peter Deutsch

Parameters

\DOMNode	$node	DOMNode to be tokenized.
\HTMLPurifier_Token[]	$tokens	Array-list of already tokenized tokens.
mixed	$config

Returns

\HTMLPurifier_Token —

of node appended to previously passed tokens.

getTagName()

getTagName(\DOMNode  $node) : mixed

Portably retrieve the tag name of a node; deals with older versions of libxml like 2.7.6

Parameters

\DOMNode

$node

Returns

mixed —

getData()

getData(\DOMNode  $node) : mixed

Portably retrieve the data of a node; deals with older versions of libxml like 2.7.6

Parameters

\DOMNode

$node

Returns

mixed —

createStartNode()

createStartNode(\DOMNode  $node, \HTMLPurifier_Token[]  $tokens, bool  $collect, mixed  $config) : bool

Parameters

\DOMNode	$node	DOMNode to be tokenized.
\HTMLPurifier_Token[]	$tokens	Array-list of already tokenized tokens.
bool	$collect	Says whether or start and close are collected, set to false at first recursion because it's the implicit DIV tag you're dealing with.
mixed	$config

Returns

bool —

if the token needs an endtoken

Inherited from: \HTMLPurifier_Lexer_DOMLex
Uses

Tags

todo	data and tagName properties don't seem to exist in DOMNode?

createEndNode()

createEndNode(\DOMNode  $node, \HTMLPurifier_Token[]  $tokens) : mixed

Parameters

\DOMNode	$node
\HTMLPurifier_Token[]	$tokens

Returns

mixed —

transformAttrToAssoc()

transformAttrToAssoc(\DOMNamedNodeMap  $node_map) : array

Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array.

Parameters

\DOMNamedNodeMap

$node_map

DOMNamedNodeMap of DOMAttr objects.

Returns

array —

Associative array of attributes.

wrapHTML()

wrapHTML(string  $html, \HTMLPurifier_Config  $config, \HTMLPurifier_Context  $context, mixed  $use_div = true) : string

Wraps an HTML fragment in the necessary HTML

Parameters

string	$html
\HTMLPurifier_Config	$config
\HTMLPurifier_Context	$context
mixed	$use_div

Returns

string —

escapeCDATA()

escapeCDATA(string  $string) : string

Translates CDATA sections into regular sections (through escaping).

Parameters

string

$string

HTML string to process.

Returns

string —

HTML with CDATA sections escaped.

escapeCommentedCDATA()

escapeCommentedCDATA(string  $string) : string

Special CDATA case that is especially convoluted for <script>

Parameters

string

$string

HTML string to process.

Returns

string —

HTML with CDATA sections escaped.

removeIEConditional()

removeIEConditional(string  $string) : string

Special Internet Explorer conditional comments should be removed.

Parameters

string

$string

HTML string to process.

Returns

string —

HTML with conditional comments removed.

CDATACallback()

CDATACallback(array  $matches) : string

Callback function for escapeCDATA() that does the work.

Parameters

array

$matches

PCRE matches array, with index 0 the entire match and 1 the inside of the CDATA section.

Returns

string —

Escaped internals of the CDATA section.

static

Inherited from: \HTMLPurifier_Lexer
Uses

Tags

warning	Though this is public in order to let the callback happen, calling it directly is not recommended.