\HTMLPurifier_Lexer_DirectLex

Our in-house implementation of a parser.

A pure PHP parser, DirectLex has absolutely no dependencies, making it a reasonably good default for PHP4. Written with efficiency in mind, it can be four times faster than HTMLPurifier_Lexer_PEARSax3, although it pales in comparison to HTMLPurifier_Lexer_DOMLex.

Summary

Methods
Properties
Constants
create()
__construct()
parseText()
parseAttr()
parseData()
tokenizeHTML()
normalize()
extractBody()
parseAttributeString()
$tracksLineNumbers
No constants found
escapeCDATA()
escapeCommentedCDATA()
removeIEConditional()
CDATACallback()
scriptCallback()
substrCount()
$_special_entity2str
$_whitespace
N/A
No private methods found
No private properties found
N/A

Properties

$tracksLineNumbers

$tracksLineNumbers

Whether or not this lexer implements line-number/column-number tracking.

$_special_entity2str

$_special_entity2str

Most common entity to raw value conversion table for special entities.

$_whitespace

$_whitespace

Whitespace characters for str(c)spn.

Methods

create()

create(\HTMLPurifier_Config  $config) : \HTMLPurifier_Lexer

Retrieves or sets the default Lexer as a Prototype Factory.

By default HTMLPurifier_Lexer_DOMLex will be returned. There are a few exceptions involving special features that only DirectLex implements.

Parameters

\HTMLPurifier_Config $config

Throws

\HTMLPurifier_Exception

Returns

\HTMLPurifier_Lexer —

__construct()

__construct() : mixed

Returns

mixed —

parseText()

parseText(mixed  $string, mixed  $config) : mixed

Parameters

mixed $string
mixed $config

Returns

mixed —

parseAttr()

parseAttr(mixed  $string, mixed  $config) : mixed

Parameters

mixed $string
mixed $config

Returns

mixed —

parseData()

parseData(string  $string, mixed  $is_attr, mixed  $config) : string

Parses special entities into the proper characters.

This string will translate escaped versions of the special characters into the correct ones.

Parameters

string $string

String character data to be parsed.

mixed $is_attr
mixed $config

Returns

string —

Parsed character data.

tokenizeHTML()

tokenizeHTML(string  $html, \HTMLPurifier_Config  $config, \HTMLPurifier_Context  $context) : array|\HTMLPurifier_Token[]

Lexes an HTML string into tokens.

Parameters

string $html
\HTMLPurifier_Config $config
\HTMLPurifier_Context $context

Returns

array|\HTMLPurifier_Token[] —

normalize()

normalize(string  $html, \HTMLPurifier_Config  $config, \HTMLPurifier_Context  $context) : string

Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff.

Parameters

string $html

HTML.

\HTMLPurifier_Config $config
\HTMLPurifier_Context $context

Returns

string —

extractBody()

extractBody(mixed  $html) : mixed

Takes a string of HTML (fragment or document) and returns the content

Parameters

mixed $html

Returns

mixed —

parseAttributeString()

parseAttributeString(string  $string, \HTMLPurifier_Config  $config, \HTMLPurifier_Context  $context) : array

Takes the inside of an HTML tag and makes an assoc array of attributes.

Parameters

string $string

Inside of tag excluding name.

\HTMLPurifier_Config $config
\HTMLPurifier_Context $context

Returns

array —

Assoc array of attributes.

escapeCDATA()

escapeCDATA(string  $string) : string

Translates CDATA sections into regular sections (through escaping).

Parameters

string $string

HTML string to process.

Returns

string —

HTML with CDATA sections escaped.

escapeCommentedCDATA()

escapeCommentedCDATA(string  $string) : string

Special CDATA case that is especially convoluted for <script>

Parameters

string $string

HTML string to process.

Returns

string —

HTML with CDATA sections escaped.

removeIEConditional()

removeIEConditional(string  $string) : string

Special Internet Explorer conditional comments should be removed.

Parameters

string $string

HTML string to process.

Returns

string —

HTML with conditional comments removed.

CDATACallback()

CDATACallback(array  $matches) : string

Callback function for escapeCDATA() that does the work.

Parameters

array $matches

PCRE matches array, with index 0 the entire match and 1 the inside of the CDATA section.

Returns

string —

Escaped internals of the CDATA section.

scriptCallback()

scriptCallback(mixed  $matches) : string

Callback function for script CDATA fudge

Parameters

mixed $matches

Returns

string —

substrCount()

substrCount(string  $haystack, string  $needle, int  $offset, int  $length) : int

PHP 5.0.x compatible substr_count that implements offset and length

Parameters

string $haystack
string $needle
int $offset
int $length

Returns

int —