wp_kses_split2()WP 1.0.0

Callback for wp_kses_split() fixing malformed HTML tags.

This function does a lot of work. It rejects some very malformed things like <:::>. It returns an empty string, if the element isn't allowed (look ma, no strip_tags()!). Otherwise it splits the tag into an element and an attribute list.

After the tag is split into an element and an attribute list, it is run through another filter which will remove illegal attributes and once that is completed, will be returned.

Внутренняя функция — эта функция рассчитана на использование самим ядром. Не рекомендуется использовать эту функцию в своем коде.

Хуков нет.

Возвращает

Строку. Fixed HTML element

Использование

wp_kses_split2( $content, $allowed_html, $allowed_protocols );
$content(строка) (обязательный)
Content to filter.
$allowed_html(array[]|строка) (обязательный)
An array of allowed HTML elements and attributes, or a context name such as 'post'. See wp_kses_allowed_html() for the list of accepted context names.
$allowed_protocols(string[]) (обязательный)
Array of allowed URL protocols.

Список изменений

С версии 1.0.0 Введена.
С версии 6.6.0 Recognize additional forms of invalid HTML which convert into comments.

Код wp_kses_split2() WP 6.8.3

function wp_kses_split2( $content, $allowed_html, $allowed_protocols ) {
	$content = wp_kses_stripslashes( $content );

	/*
	 * The regex pattern used to split HTML into chunks attempts
	 * to split on HTML token boundaries. This function should
	 * thus receive chunks that _either_ start with meaningful
	 * syntax tokens, like a tag `<div>` or a comment `<!-- ... -->`.
	 *
	 * If the first character of the `$content` chunk _isn't_ one
	 * of these syntax elements, which always starts with `<`, then
	 * the match had to be for the final alternation of `>`. In such
	 * case, it's probably standing on its own and could be encoded
	 * with a character reference to remove ambiguity.
	 *
	 * In other words, if this chunk isn't from a match of a syntax
	 * token, it's just a plaintext greater-than (`>`) sign.
	 */
	if ( ! str_starts_with( $content, '<' ) ) {
		return '&gt;';
	}

	/*
	 * When certain invalid syntax constructs appear, the HTML parser
	 * shifts into what's called the "bogus comment state." This is a
	 * plaintext state that consumes everything until the nearest `>`
	 * and then transforms the entire span into an HTML comment.
	 *
	 * Preserve these comments and do not treat them like tags.
	 *
	 * @see https://html.spec.whatwg.org/#bogus-comment-state
	 */
	if ( 1 === preg_match( '~^(?:</[^a-zA-Z][^>]*>|<![a-z][^>]*>)$~', $content ) ) {
		/**
		 * Since the pattern matches `</…>` and also `<!…>`, this will
		 * preserve the type of the cleaned-up token in the output.
		 */
		$opener  = $content[1];
		$content = substr( $content, 2, -1 );

		do {
			$prev    = $content;
			$content = wp_kses( $content, $allowed_html, $allowed_protocols );
		} while ( $prev !== $content );

		// Recombine the modified inner content with the original token structure.
		return "<{$opener}{$content}>";
	}

	/*
	 * Normative HTML comments should be handled separately as their
	 * parsing rules differ from those for tags and text nodes.
	 */
	if ( str_starts_with( $content, '<!--' ) ) {
		$content = str_replace( array( '<!--', '-->' ), '', $content );

		while ( ( $newstring = wp_kses( $content, $allowed_html, $allowed_protocols ) ) !== $content ) {
			$content = $newstring;
		}

		if ( '' === $content ) {
			return '';
		}

		// Prevent multiple dashes in comments.
		$content = preg_replace( '/--+/', '-', $content );
		// Prevent three dashes closing a comment.
		$content = preg_replace( '/-$/', '', $content );

		return "<!--{$content}-->";
	}

	// It's seriously malformed.
	if ( ! preg_match( '%^<\s*(/\s*)?([a-zA-Z0-9-]+)([^>]*)>?$%', $content, $matches ) ) {
		return '';
	}

	$slash    = trim( $matches[1] );
	$elem     = $matches[2];
	$attrlist = $matches[3];

	if ( ! is_array( $allowed_html ) ) {
		$allowed_html = wp_kses_allowed_html( $allowed_html );
	}

	// They are using a not allowed HTML element.
	if ( ! isset( $allowed_html[ strtolower( $elem ) ] ) ) {
		return '';
	}

	// No attributes are allowed for closing elements.
	if ( '' !== $slash ) {
		return "</$elem>";
	}

	return wp_kses_attr( $elem, $attrlist, $allowed_html, $allowed_protocols );
}