\HTMLPurifier_Lexer_DirectLex

Our in-house implementation of a parser.

A pure PHP parser, DirectLex has absolutely no dependencies, making it a reasonably good default for PHP4. Written with efficiency in mind, it can be four times faster than HTMLPurifier_Lexer_PEARSax3, although it pales in comparison to HTMLPurifier_Lexer_DOMLex.

Summary

Methods
Properties
Constants
create()
__construct()
parseText()
parseAttr()
parseData()
tokenizeHTML()
normalize()
extractBody()
parseAttributeString()
$tracksLineNumbers
No constants found
escapeCDATA()
escapeCommentedCDATA()
removeIEConditional()
CDATACallback()
scriptCallback()
substrCount()
$_special_entity2str
$_whitespace
N/A
No private methods found
No private properties found
N/A

Properties

$tracksLineNumbers

$tracksLineNumbers : 

Whether or not this lexer implements line-number/column-number tracking.

If it does, set to true.

Type

$_special_entity2str

$_special_entity2str : 

Most common entity to raw value conversion table for special entities.

Type

$_whitespace

$_whitespace : 

Whitespace characters for str(c)spn.

Type

Methods

create()

create(\HTMLPurifier_Config  $config) : \HTMLPurifier_Lexer

Retrieves or sets the default Lexer as a Prototype Factory.

By default HTMLPurifier_Lexer_DOMLex will be returned. There are a few exceptions involving special features that only DirectLex implements.

Parameters

\HTMLPurifier_Config $config

Throws

\HTMLPurifier_Exception

Returns

\HTMLPurifier_Lexer

__construct()

__construct() 

parseText()

parseText(  $string,   $config) 

Parameters

$string
$config

parseAttr()

parseAttr(  $string,   $config) 

Parameters

$string
$config

parseData()

parseData(string  $string,   $is_attr,   $config) : string

Parses special entities into the proper characters.

This string will translate escaped versions of the special characters into the correct ones.

Parameters

string $string

String character data to be parsed.

$is_attr
$config

Returns

string —

Parsed character data.

tokenizeHTML()

tokenizeHTML(String  $html, \HTMLPurifier_Config  $config, \HTMLPurifier_Context  $context) : array|array<mixed,\HTMLPurifier_Token>

Lexes an HTML string into tokens.

Parameters

String $html
\HTMLPurifier_Config $config
\HTMLPurifier_Context $context

Returns

array|array<mixed,\HTMLPurifier_Token>

normalize()

normalize(string  $html, \HTMLPurifier_Config  $config, \HTMLPurifier_Context  $context) : string

Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff.

Parameters

string $html

HTML.

\HTMLPurifier_Config $config
\HTMLPurifier_Context $context

Returns

string

extractBody()

extractBody(  $html) 

Takes a string of HTML (fragment or document) and returns the content

Parameters

$html

parseAttributeString()

parseAttributeString(string  $string, \HTMLPurifier_Config  $config, \HTMLPurifier_Context  $context) : array

Takes the inside of an HTML tag and makes an assoc array of attributes.

Parameters

string $string

Inside of tag excluding name.

\HTMLPurifier_Config $config
\HTMLPurifier_Context $context

Returns

array —

Assoc array of attributes.

escapeCDATA()

escapeCDATA(string  $string) : string

Translates CDATA sections into regular sections (through escaping).

Parameters

string $string

HTML string to process.

Returns

string —

HTML with CDATA sections escaped.

escapeCommentedCDATA()

escapeCommentedCDATA(string  $string) : string

Special CDATA case that is especially convoluted for <script>

Parameters

string $string

HTML string to process.

Returns

string —

HTML with CDATA sections escaped.

removeIEConditional()

removeIEConditional(string  $string) : string

Special Internet Explorer conditional comments should be removed.

Parameters

string $string

HTML string to process.

Returns

string —

HTML with conditional comments removed.

CDATACallback()

CDATACallback(array  $matches) : string

Callback function for escapeCDATA() that does the work.

Parameters

array $matches

PCRE matches array, with index 0 the entire match and 1 the inside of the CDATA section.

Returns

string —

Escaped internals of the CDATA section.

scriptCallback()

scriptCallback(  $matches) : string

Callback function for script CDATA fudge

Parameters

$matches

Returns

string

substrCount()

substrCount(string  $haystack, string  $needle, integer  $offset, integer  $length) : integer

PHP 5.0.x compatible substr_count that implements offset and length

Parameters

string $haystack
string $needle
integer $offset
integer $length

Returns

integer