プラグイン「ActivityPub」がwpautop関数を使ってる

プラグイン「ActivityPub」の動作を分析している時に、ちょっと気になることがあった。
　見ていたのは、wp_posts テーブルで post_type が ap_outbox になっているデータの post_content であるが、UTF-8にエスケープされていて読みにくいのでGeminiにデコードしてもらって確認した。チェックしたのは改行部分。見ると、 ではなく になっている。
　調べてみたら、どうやら、content を作成する際に、wpautop()関数を使っていることが原因らしい。

wpautop() – Function | Developer.WordPress.org

Replaces double line breaks with paragraph elements.

A group of regex replaces used to identify text formatted with newlines and replace double line breaks with HTML paragraph tags. The remaining line breaks after conversion become   tags, unless $br is set to ‘0’ or ‘false’.
wpautop() – Function | Developer.WordPress.org

　さて、調べた順に書いていく。

　まず、wp_posts テーブルで post_type が ap_outbox になっているデータの一つ、気になっていた投稿のデータの post_content を確認した。そのデータの content の部分は次のようになっていた。

"content":"\u003Cp\u003E\u003Cstrong\u003EWordPress\u306e\u629c\u7c8b\u3067\u6539\u884c\u304c\u524a\u9664\u3055\u308c\u308b\u539f\u56e0\u306f wp_posts \u30c6\u30fc\u30d6\u30eb\u306e post_content\u003C/strong\u003E\u003Cbr /\u003E\u003Ca rel=\u0022tag\u0022 class=\u0022hashtag u-tag u-category\u0022 href=\u0022https://ishii00141.stars.ne.jp/tag/gemini/\u0022\u003E#Gemini\u003C/a\u003E \u003Ca rel=\u0022tag\u0022 class=\u0022hashtag u-tag u-category\u0022 href=\u0022https://ishii00141.stars.ne.jp/tag/php/\u0022\u003E#PHP\u003C/a\u003E \u003Ca rel=\u0022tag\u0022 class=\u0022hashtag u-tag u-category\u0022 href=\u0022https://ishii00141.stars.ne.jp/tag/wordpress/\u0022\u003E#WordPress\u003C/a\u003E \u003Ca rel=\u0022tag\u0022 class=\u0022hashtag u-tag u-category\u0022 href=\u0022https://ishii00141.stars.ne.jp/tag/%e4%bb%95%e6%a7%98/\u0022\u003E#\u4ed5\u69d8\u003C/a\u003E\u003C/p\u003E\u003Cp\u003E\u3000WordPress\u306e\u5165\u529b\u6642\u306bEnter\u30ad\u30fc\u3092\u53e9\u304f\u3068\u6bb5\u843d\u304c\u5909\u308f\u308b\u306e\u3060\u304c\u3001\u5358\u306b\u6539\u884c\u3057\u305f\u3044\u3053\u3068\u3082\u3042\u308a\u3001\u305d\u306e\u5834\u5408\u306fShift\u30ad\u30fc\u3092\u62bc\u3057\u306a\u304c\u3089Enter\u30ad\u30fc\u3092\u53e9\u304f\u3002\u003Cbr /\u003E\u3000\u305d\u3046\u3059\u308b\u3068\u3001\u6bb5\u843d\u304c\u5909\u308f\u3089\u305a\u306b\u6539\u884c\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u308b\u3002\u003Cbr /\u003E\u3000\u305d\u3082\u305d\u3082\u3001\u300c\u6bb5\u843d\u300d\u3068\u3044\u3046\u8868\u73fe\u304c\u65e5\u672c\u8a9e\u3068\u7570\u306a\u3063\u3066\u3044\u308b\u304b\u3089\u56f0\u308b\u3002\u6539\u884c\u3057\u3066\u4e00\u6587\u5b57\u5b57\u4e0b\u3052\u3059\u308c\u3070\u3001\u305d\u308c\u306f\u65b0\u3057\u3044\u6bb5\u843d\u306a\u306e\u306b\u3001html\u306e\u4e16\u754c\u3067\u306f\u305d\u3046\u306f\u307f\u306a\u3055\u306a\u3044\u3002\u003C/p\u003E\u003Cp\u003E\u3000\u305d\u308c\u306f\u3068\u3082\u304b\u304f\u3001WordPress\u306e\u629c\u7c8b\u30d7\u30ed\u30b0\u30e9\u30e0\u3067\u306fhtml\u30bf\u30b0\u3092\u524a\u9664\u3059\u308b\u306e\u3060\u304c\u3001\u6b21\u306e\u3088\u3046\u306a\u30b3\u30fc\u30c9\u306e\u5834\u5408\u3001Web\u8868\u793a\u3067\u306f\u6539\u884c\u304c\u3042\u308b\u306e\u306b\u3001\u30bf\u30b0\u3092\u524a\u9664\u3057\u3061\u3083\u3046\u3068\u6539\u884c\u304c\u7121\u304f\u306a\u308b\u3002\u003C/p\u003E\u003Cp\u003E\u898b\u3064\u304b\u308a\u307e\u305b\u3093\u003Cbr /\u003EAccount h [...\u7d9a\u304d\u306f\u4e0b\u306eURL\u3067]\u003C/p\u003E\u003Cp\u003E\u003Ca href=\u0022https://ishii00141.stars.ne.jp/20251203-1656-4312/\u0022\u003Ehttps://ishii00141.stars.ne.jp/20251203-1656-4312/\u003C/a\u003E\u003C/p\u003E",

　これをMastodonなどが整形して表示したわけだが、このままでは分かりにくいので Unicode エスケープ（UTF-8 相当）をデコードすると、次のようになる。
　ただし、読みやすいように改行や空白行を入れているが、実際には改行コードは含まれず、全ての行が繋がっている。

<p><strong>WordPressの抜粋で改行が削除される原因は wp_posts テーブルの post_content</strong><br />
<a rel="tag" class="hashtag u-tag u-category" href="https://ishii00141.stars.ne.jp/tag/gemini/">#Gemini</a> 
<a rel="tag" class="hashtag u-tag u-category" href="https://ishii00141.stars.ne.jp/tag/php/">#PHP</a> 
<a rel="tag" class="hashtag u-tag u-category" href="https://ishii00141.stars.ne.jp/tag/wordpress/">#WordPress</a> 
<a rel="tag" class="hashtag u-tag u-category" href="https://ishii00141.stars.ne.jp/tag/%e4%bb%95%e6%a7%98/">#仕様</a></p>

<p>　WordPressの入力時にEnterキーを叩くと段落が変わるのだが、単に改行したいこともあり、その場合はShiftキーを押しながらEnterキーを叩く。<br />
　そうすると、段落が変わらずに改行することができる。<br />
　そもそも、「段落」という表現が日本語と異なっているから困る。改行して一文字字下げすれば、それは新しい段落なのに、htmlの世界ではそうはみなさない。</p>

<p>　それはともかく、WordPressの抜粋プログラムではhtmlタグを削除するのだが、次のようなコードの場合、Web表示では改行があるのに、タグを削除しちゃうと改行が無くなる。</p>

<p>見つかりません<br />Account h [...続きは下のURLで]</p>

<p><a href="https://ishii00141.stars.ne.jp/20251203-1656-4312/">https://ishii00141.stars.ne.jp/20251203-1656-4312/</a></p>

これを見た時に に気づいた。

　この content を作成しているのは、activitypub/includes/transformer/class-post.php の次の部分だろう。

/**
 * Returns the content for the ActivityPub Item.
 *
 * The content will be generated based on the user settings.
 *
 * @return string The content.
 */
protected function get_content() {
	// Remove Content from drafts.
	if ( ! $this->is_preview() && 'draft' === \get_post_status( $this->item ) ) {
		return \__( '(This post is being modified)', 'activitypub' );
	}

	global $post;

	// phpcs:ignore WordPress.WP.GlobalVariablesOverride.Prohibited
	$post    = $this->item;
	$content = $this->get_post_content_template();

	/**
	 * Provides an action hook so plugins can add their own hooks/filters before AP content is generated.
	 *
	 * Example: if a plugin adds a filter to `the_content` to add a button to the end of posts, it can also remove that filter here.
	 *
	 * @param \WP_Post $post The post object.
	 */
	\do_action( 'activitypub_before_get_content', $post );

	// It seems that shortcodes are only applied to published posts.
	if ( is_preview() ) {
		$post->post_status = 'publish';
	}

	// Register our shortcodes just in time.
	Shortcodes::register();
	// Fill in the shortcodes.
	\setup_postdata( $post );
	$content = \do_shortcode( $content );
	\wp_reset_postdata();

	$content = \wpautop( $content );
	$content = \preg_replace( '/[\n\r\t]/', '', $content );
	$content = \trim( $content );

	// Don't need these anymore, should never appear in a post.
	Shortcodes::unregister();

	/**
	 * Filters the post content after it was transformed for ActivityPub.
	 *
	 * @param string   $content The transformed post content.
	 * @param \WP_Post $post    The post object being transformed.
	 */
	return \apply_filters( 'activitypub_the_content', $content, $post );
}

　このwpautop()関数はプラグインではなく、WordPressが用意している関数で、wp-includes/formatting.php の中にあり、次のようなコードらしい。

/**
 * Replaces double line breaks with paragraph elements.
 *
 * A group of regex replaces used to identify text formatted with newlines and
 * replace double line breaks with HTML paragraph tags. The remaining line breaks
 * after conversion become `<br />` tags, unless `$br` is set to '0' or 'false'.
 *
 * @since 0.71
 *
 * @param string $text The text which has to be formatted.
 * @param bool   $br   Optional. If set, this will convert all remaining line breaks
 *                     after paragraphing. Line breaks within `<script>`, `<style>`,
 *                     and `<svg>` tags are not affected. Default true.
 * @return string Text which has been converted into correct paragraph tags.
 */
function wpautop( $text, $br = true ) {
	$pre_tags = array();

	if ( '' === trim( $text ) ) {
		return '';
	}

	// Just to make things a little easier, pad the end.
	$text = $text . "\n";

	/*
	 * Pre tags shouldn't be touched by autop.
	 * Replace pre tags with placeholders and bring them back after autop.
	 */
	if ( str_contains( $text, '<pre' ) ) {
		$text_parts = explode( '</pre>', $text );
		$last_part  = array_pop( $text_parts );
		$text       = '';
		$i          = 0;

		foreach ( $text_parts as $text_part ) {
			$start = strpos( $text_part, '<pre' );

			// Malformed HTML?
			if ( false === $start ) {
				$text .= $text_part;
				continue;
			}

			$name              = "<pre wp-pre-tag-$i></pre>";
			$pre_tags[ $name ] = substr( $text_part, $start ) . '</pre>';

			$text .= substr( $text_part, 0, $start ) . $name;
			++$i;
		}

		$text .= $last_part;
	}
	// Change multiple <br>'s into two line breaks, which will turn into paragraphs.
	$text = preg_replace( '|<br\s*/?>\s*<br\s*/?>|', "\n\n", $text );

	$allblocks = '(?:table|thead|tfoot|caption|col|colgroup|tbody|tr|td|th|div|dl|dd|dt|ul|ol|li|pre|form|map|area|blockquote|address|style|p|h[1-6]|hr|fieldset|legend|section|article|aside|hgroup|header|footer|nav|figure|figcaption|details|menu|summary)';

	// Add a double line break above block-level opening tags.
	$text = preg_replace( '!(<' . $allblocks . '[\s/>])!', "\n\n$1", $text );

	// Add a double line break below block-level closing tags.
	$text = preg_replace( '!(</' . $allblocks . '>)!', "$1\n\n", $text );

	// Add a double line break after hr tags, which are self closing.
	$text = preg_replace( '!(<hr\s*?/?>)!', "$1\n\n", $text );

	// Standardize newline characters to "\n".
	$text = str_replace( array( "\r\n", "\r" ), "\n", $text );

	// Find newlines in all elements and add placeholders.
	$text = wp_replace_in_html_tags( $text, array( "\n" => ' <!-- wpnl --> ' ) );

	// Collapse line breaks before and after <option> elements so they don't get autop'd.
	if ( str_contains( $text, '<option' ) ) {
		$text = preg_replace( '|\s*<option|', '<option', $text );
		$text = preg_replace( '|</option>\s*|', '</option>', $text );
	}

	/*
	 * Collapse line breaks inside <object> elements, before <param> and <embed> elements
	 * so they don't get autop'd.
	 */
	if ( str_contains( $text, '</object>' ) ) {
		$text = preg_replace( '|(<object[^>]*>)\s*|', '$1', $text );
		$text = preg_replace( '|\s*</object>|', '</object>', $text );
		$text = preg_replace( '%\s*(</?(?:param|embed)[^>]*>)\s*%', '$1', $text );
	}

	/*
	 * Collapse line breaks inside <audio> and <video> elements,
	 * before and after <source> and <track> elements.
	 */
	if ( str_contains( $text, '<source' ) || str_contains( $text, '<track' ) ) {
		$text = preg_replace( '%([<\[](?:audio|video)[^>\]]*[>\]])\s*%', '$1', $text );
		$text = preg_replace( '%\s*([<\[]/(?:audio|video)[>\]])%', '$1', $text );
		$text = preg_replace( '%\s*(<(?:source|track)[^>]*>)\s*%', '$1', $text );
	}

	// Collapse line breaks before and after <figcaption> elements.
	if ( str_contains( $text, '<figcaption' ) ) {
		$text = preg_replace( '|\s*(<figcaption[^>]*>)|', '$1', $text );
		$text = preg_replace( '|</figcaption>\s*|', '</figcaption>', $text );
	}

	// Remove more than two contiguous line breaks.
	$text = preg_replace( "/\n\n+/", "\n\n", $text );

	// Split up the contents into an array of strings, separated by double line breaks.
	$paragraphs = preg_split( '/\n\s*\n/', $text, -1, PREG_SPLIT_NO_EMPTY );

	// Reset $text prior to rebuilding.
	$text = '';

	// Rebuild the content as a string, wrapping every bit with a <p>.
	foreach ( $paragraphs as $paragraph ) {
		$text .= '<p>' . trim( $paragraph, "\n" ) . "</p>\n";
	}

	// Under certain strange conditions it could create a P of entirely whitespace.
	$text = preg_replace( '|<p>\s*</p>|', '', $text );

	// Add a closing <p> inside <div>, <address>, or <form> tag if missing.
	$text = preg_replace( '!<p>([^<]+)</(div|address|form)>!', '<p>$1</p></$2>', $text );

	// If an opening or closing block element tag is wrapped in a <p>, unwrap it.
	$text = preg_replace( '!<p>\s*(</?' . $allblocks . '[^>]*>)\s*</p>!', '$1', $text );

	// In some cases <li> may get wrapped in <p>, fix them.
	$text = preg_replace( '|<p>(<li.+?)</p>|', '$1', $text );

	// If a <blockquote> is wrapped with a <p>, move it inside the <blockquote>.
	$text = preg_replace( '|<p><blockquote([^>]*)>|i', '<blockquote$1><p>', $text );
	$text = str_replace( '</blockquote></p>', '</p></blockquote>', $text );

	// If an opening or closing block element tag is preceded by an opening <p> tag, remove it.
	$text = preg_replace( '!<p>\s*(</?' . $allblocks . '[^>]*>)!', '$1', $text );

	// If an opening or closing block element tag is followed by a closing <p> tag, remove it.
	$text = preg_replace( '!(</?' . $allblocks . '[^>]*>)\s*</p>!', '$1', $text );

	// Optionally insert line breaks.
	if ( $br ) {
		// Replace newlines that shouldn't be touched with a placeholder.
		$text = preg_replace_callback( '/<(script|style|svg|math).*?<\/\\1>/s', '_autop_newline_preservation_helper', $text );

		// Normalize <br>.
		$text = str_replace( array( '<br>', '<br/>' ), '<br />', $text );

		// Replace any new line characters that aren't preceded by a <br /> with a <br />.
		$text = preg_replace( '|(?<!<br />)\s*\n|', "<br />\n", $text );

		// Replace newline placeholders with newlines.
		$text = str_replace( '<WPPreserveNewline />', "\n", $text );
	}

	// If a <br /> tag is after an opening or closing block tag, remove it.
	$text = preg_replace( '!(</?' . $allblocks . '[^>]*>)\s*<br />!', '$1', $text );

	// If a <br /> tag is before a subset of opening or closing block tags, remove it.
	$text = preg_replace( '!<br />(\s*</?(?:p|li|div|dl|dd|dt|th|pre|td|ul|ol)[^>]*>)!', '$1', $text );
	$text = preg_replace( "|\n</p>$|", '</p>', $text );

	// Replace placeholder <pre> tags with their original content.
	if ( ! empty( $pre_tags ) ) {
		$text = str_replace( array_keys( $pre_tags ), array_values( $pre_tags ), $text );
	}

	// Restore newlines in all elements.
	if ( str_contains( $text, '<!-- wpnl -->' ) ) {
		$text = str_replace( array( ' <!-- wpnl --> ', '<!-- wpnl -->' ), "\n", $text );
	}

	return $text;
}

/**
 * Newline preservation help function for wpautop().
 *
 * @since 3.1.0
 * @access private
 *
 * @param array $matches preg_replace_callback matches array
 * @return string
 */
function _autop_newline_preservation_helper( $matches ) {
	return str_replace( "\n", '<WPPreserveNewline />', $matches[0] );
}

この関数で、 ではなく に改行タグが統一されている。この関数を使っているので、Fediverse に送る content の改行タグが になったらしい。

最後に、Geminiに ではなく が使われている理由を尋ねてみた。

現在のhtmlの標準的な ではなく、今ではほとんど使われていない に統一するコードになっているのはどうしてですか？

はい、それは非常に良い質問です。

標準の wpautop() 関数が   (スラッシュあり) に統一するようになっている主な理由は、WordPressが長年準拠してきた古いWeb標準と、その後の互換性を維持するためです。

🏛️   に統一する理由

主な理由は、以下の2点にあります。

1. XHTML との互換性の維持

XHTML の要件: かつて Web 標準として広く採用されていた XHTML（Extensible HyperText Markup Language）では、HTMLをXMLの文法で書くことが求められました。XMLでは、すべての要素は必ず閉じる必要があり、子要素を持たないタグ（空要素、<img>,   など）は、自己終了タグとして   のようにスラッシュを付けて閉じることが必須とされていました。

WordPress の歴史: WordPressは、XHTMLが主流だった時代から開発されており、wpautop() はその時代に設計されました。そのため、生成するHTMLが XML/XHTML の文法に従うように、  を標準として採用しました。

2. ブラウザ間の安定した動作

wpautop() が処理の途中で既存の   や   を   に統一するのは、その後の正規表現による置換処理（例: (?<! ) の否定後読み）が、複数のタグ形式を気にせず、一貫した形式で動作するようにするためです。これにより、ブラウザ間の互換性や処理の信頼性が高まります。

🌐 HTML5の現在の標準

ご指摘の通り、HTML5 の現在の仕様では、自己終了タグ（空要素）について、スラッシュの有無は任意とされています。

HTML5 で有効な形式:  （スラッシュなし）と  （スラッシュあり）のどちらも正しく、機能的には同じです。

推奨: HTML5では、冗長なスラッシュを省略した   の方が、よりシンプルであるため推奨される傾向にあります。

しかし、wpautop() は古い資産であるため、後方互換性と安定性を優先し、現在も   を標準として出力し続けているのです。

いしいより:

2025年12月5日 3:32 PM

 や が表示されてない。
- ishii より:
 
 2025年12月5日 4:10 PM
 
 このリプライは、Mastodonの方で「削除して下書きに戻す」を行った時に削除されたもの。
 
 ログインして返信する
いしいより:

2025年12月5日 3:33 PM

 や が表示されてない。

じゃなくて、改行タグとして機能させちゃってる。

ishii より:

2025年12月5日 4:16 PM

この記事のap_outboxのpost_contentのcontentは次のようになってる。読めるようにutf-8の部分をデコードしてある。URLも半角スペースを挿入してURLと認識されないようにしてある。

（注意：実際は https:// の後の半角スペースはない）
"content":"<p><strong>プラグイン「ActivityPub」がwpautop関数を使ってる</strong><br /><a rel=\"tag\" class=\"hashtag u-tag u-category\" href=\"https:// ishii00141.stars.ne.jp/tag/activitypub/\">#ActivityPub</a> <a rel=\"tag\" class=\"hashtag u-tag u-category\" href=\"https:// ishii00141.stars.ne.jp/tag/fediverse/\">#Fediverse</a> <a rel=\"tag\" class=\"hashtag u-tag u-category\" href=\"https:// ishii00141.stars.ne.jp/tag/gemini/\">#Gemini</a> <a rel=\"tag\" class=\"hashtag u-tag u-category\" href=\"https:// ishii00141.stars.ne.jp/tag/wordpress/\">#WordPress</a> <a rel=\"tag\" class=\"hashtag u-tag u-category\" href=\"https:// ishii00141.stars.ne.jp/tag/%e3%83%97%e3%83%a9%e3%82%b0%e3%82%a4%e3%83%b3/\">#プラグイン</a> <a rel=\"tag\" class=\"hashtag u-tag u-category\" href=\"https:// ishii00141.stars.ne.jp/tag/%e4%bb%95%e6%a7%98/\">#仕様</a></p><p>　プラグイン「ActivityPub」の動作を分析している時に、ちょっと気になることがあった。<br />　見ていたのは、wp_posts テーブルで post_type が ap_outbox になっているデータの post_content であるが、UTF-8にエスケープされていて読みにくいのでGeminiにデコードしてもらって確認した。チェックしたのは改行部分。見ると、<br />ではなく<br />になっている。<br />　調べてみたら、どうやら、content を作成する際に、wpautop()関数を使っていることが原因らしい。</p><p><a href=\"https:// developer.wordpress.org/reference/functions/wpautop/\" target=\"_blank\" rel=\"nofollow noopener noreferrer\" translate=\"no\"><span class=\"invisible\">https:// </span><span class=\"ellipsis\">developer.wordpress.org/refere</span><span class=\"invisible\">nce/functions/wpautop/</span></a></p><p>A group of [...続きは下のURLで]</p><p><a href=\"https:// ishii00141.stars.ne.jp/20251205-1525-4429/\">https:// ishii00141.stars.ne.jp/20251205-1525-4429/</a></p>"

ishii より:

2025年12月5日 4:23 PM

またバグか…。

ログインして返信する
- ishii より:
  
  2025年12月5日 4:54 PM
  
  このバグは generate_post_summary()関数と、私のフック内の activitypub_get_full_content_for_trimming()関数にある html_entity_decode()関数にあり、html_entity_decode()関数によるデコードをコメントアウトすることで修正されるらしい。
  
  ログインして返信する
  - ishii より:
    
    2025年12月5日 5:14 PM
    
    この対処法だと、文字数のカウントに問題が生じるので、却下。
    
    ログインして返信する

ishii より:

2025年12月18日 4:51 PM

プラグイン「ActivityPub」が7.8.0に変わったことで、あるいはそれ以前か、
activitypub/includes/transformer/class-post.php の次のコードが消えているかもしれない。だから、一部で改行\nの削除が行われない可能性があるので、注意。
現在preタグの改行削除が行われていない。

$content = \wpautop( $content );
$content = \preg_replace( '/[\n\r\t]/', '', $content );
$content = \trim( $content );

ログインして返信する

プラグイン「ActivityPub」がwpautop関数を使ってる

🏛️ `<br />` に統一する理由

1. XHTML との互換性の維持

2. ブラウザ間の安定した動作

🌐 HTML5の現在の標準

コメント

元のコメントの URL

あなたのプロフィール