PCRE -- examples 1

PCRE –– examples 1
written by: admin

Date Written: 7/12/07 Last Updated: 11/7/19

display returns

This will replace 2 or more carriage returns with

$summary=preg_replace('/((\r\n){2,})/'," ",$summary);

Note: str_replace("\r\n","",$summary); can be used to remove newlines and is faster. When being any more specific than that you will want to use preg_replace as it is designed to find more complex patterns.

locate whitespace

This replaces leading whitespace with "blag", trailing whitespace with "bleg", every carriage return with "blig", and every newline with "blog".

$memo = preg_replace("/^\s+/",'blag',$memo);
$memo = preg_replace("/\s+$/",'bleg',$memo);
$memo = preg_replace("/\r/",'blig',$memo);
$memo = preg_replace("/\n/",'blog',$memo);

I used the above code to help me locate where newlines and carriage returns were being stored in a document as there is no way (as far as I know) to display them, for example htmlentities() can be used to convert symbols into their entity names, but none of these ways will actually display the code or entity name for the whitespace.

display spaces

This will display spaces as is without shortening them.

$summary=preg_replace('/((\040){2,2})/',"  ",$summary);

&nbsp is the code for a non breaking space as opposed to the typical breaking spce. The above replaced every chain of two spaces with "  ". The remainder space will be displayed as is and added to the chain thus causing every space to be displayed. For example if there are 5 spaces the five spaces will be replaced with "     ". Six spaces will be replaced with "      ", etc.

It should be noted that the following is faster and better though:

$summary=str_replace("\040\040","  ",$summary);

Replace content between two different terms (exclusive)

The following will replace content located between two different terms without including the two different terms along with the content.

<?php
$css = "font: bold 10pt verdana, geneva;";
$css = preg_replace('/(font)(.*?)(na)/s', '$1XXX$3', $css);
echo "$css";
?>

replace content between two different terms (inclusive)

The following will replace anything between two different terms (ungreedy):

$string = preg_replace('/have.*?not ?/', '', $string);

"the have and the havenots. Yet the haves are not there."

becomes:

"the s. Yet the there."

Here is another one to replace the content between a non–word character like '('. Notice the escape character '\' that I use before the left round bracket '(' and the right round bracket ')'.

$srch = preg_replace('/$.*?$ ?/', 'pop', $srch);

One more example and this time we will use a modifier.

$string="hr{background-color:red;
width:64%;
color:blue;
padding-left:12em;
}
$string =preg_replace('/\{*?\} ?/s', '', $string);

Notice that s is located just after the closing delimiter. This is because the period will not match newlines: \n. Each new line is separated by a newline such as \r\n. In order to have the patterns search all of the lines at once and continue matching throughout the string we added the dotall modifier which is identified by the letter s located just after the closing delimiter of the pattern.

replace anything that is not a letter

\s = whitespace like a space or tab or line terminator.
\d = any decimal digit.
\W = any non word character.
aeiou = just a bunch of letters.

Here is how they would be used together.

$string = preg_replace('/[\d\s\Waeiou]/', '', $string);

replacing a term with octal code

let's say you wanted to replace a pattern with a space, tab, back reference, newline or the like. What I did was put it into a string. In the following example the letters 'b' and 'd' are replaced with a space.

\040 = space.

$r="\040";
$string = preg_replace('/[bd]/', $r, $string);

One problem is that in the string "abbddbc" the output will be "a c" not "a c". This is due to how browsers deal with whitespace. An easy workaround is to replace spaces with   as described above.

I found this to be handy when replacing two or more newlines with just one.

Applying multiple patterns to a single variable

This script that I got from an expert at dynamic drive forums will do multiple things to one string. The purpose of the script is to clean up submitted information designed to be a reference verse. I want it to be in the form of Genesis 1:1,2 or Genesis 1:1–3. I already have a seperate script that will make sure that only the portion after the colon is examined. The Book and the verse have other scripts that arre applied to them. Since the last part is the most complicated to clean up we will be looking at that one.

$text=preg_replace(array('/[^\d,-]/','/[^\d-]*(-)[^\d-]*/','/^\D/','/\D$/'),'$1',$text);
$text=preg_replace('/[-]{2,}/','-',$text);
$text=preg_replace('/[,]{2,}/','-',$text);

Notice there are 4 different patterns to match in the first preg replace separated by commas that will be executed from left to right.

The first match of the first preg_replace'/[^\d,–]/' removes anything that is not a digit, dash, or comma.

The second match '/[^\d–]*(–)[^\d–]*/' is more complicated. [^\d–] will match anything that is not a digit or a dash that is one or more in length. The (–) will capture all of the dashes if they are present and display them later. This also means that if no dashes are present the whole '/[^\d–]*(–)[^\d–]*/' is moot. The second [^\d–]* is the second part which will match anything that is not a dash or digit that is one or more in length. Since it is all on one matching part it will match everything between [^\d–]* and [^\d–]* and delete it while saving the dashes and redisplaying them later where they were.

For example '/[^\d–]*(–)[^\d–]*/' in the example 3–,–,,–,,,–,,,,–,,,,,–4,5 will match:

The first match is 3–,–,,–,,,–,,,,–,,,,,–4,5.
The second match is 3–,–,,–,,,–,,,,–,,,,,–4,5.
The third match is a bit more complicated, but is 3–,–,,–,,,–,,,,–,,,,,–4,5 because the * matches 0 or more times.

The third match '/^\D/' removes anything at the start of the variable that is not a digit and the fourth pattern
'/\D$/' removes anything at the end of the variable that is not a digit.

At this point the string may contain two or more dashes together. This is taken care of with the second preg_replace(). The second preg_replace() command will replace any sets of dashes that may remain and replace them with a dash, but only if there are at least two dashes occuring together. $text is the string to evaluate.

The third line will do the same as the second, but for commas. If no dashes were used between two numbers then you might get multiple commas between two numbers, which is where this third line comes in.

hyperlink text urls –– example 1

replace text urls with hyperlinked urls so that text like http://www.animeviews.com and www.animeviews.com will appear as www.animeviews.com

$text=preg_replace('/(https?:\\/\\/[-_.\\/\w\d!&%#?+\\,\\\\\'=:;@~]+)/i', '<a href="$1">$1</a>', $text);
$text=preg_replace('/[^\/](wwws?[-_.\\/\w\d!&%#?+\\,\\\\\'=:;@~]+)/i', '<a href="$1"> $1</a>', $text);
$text=str_replace('>http://','>',$text);

This script assumes that the "www" url is preceded by a space.

hyperlink your text urls –– example 2

preg_replace("#(<a\s[^>]+>http://\S+</a>)|(<[^>]+http://[^>]+>)|http://\S+#ie",'"$0"=="$1" || "$0"=="$2" ? "$0" : "<a href=\"$0\">$0</a>"',$text);

This example is much better in that it will hyperlink urls and ignore the ones that are used for image src and ones that are already hyperlinked. It does not hyperlink ones that do not have the http:// prefix. I found this example at http://www.tote–taste.de/X–Project/regex/eval.html.

Hyperlink your urls –– example 3

It seems to cover even more types of web addresses, but with a simpler expression. Not sure where I found it.

$testme = preg_replace("/(http(s?):\/\/)?([a-zA-Z0-9\.]+\.[a-zA-Z]{2,3})/",'<a href="http$2://$3">http$2://$3</a>',$test);

Hyperlink your urls –– example 4

The following I designed myself. It will hyperlink urls, unless it is preceeded by http:// or is already hyperlinked or part of <img src>.

<?php
$text = preg_replace('/(?<!http:\/\/|\"|=|\'|\'>|\">)(www\..*?)(\s|\Z|\.\Z|\.\s)/i',"<a href=\"http:\/\/$1\">$1</a>$2",$text);
echo $text;
?>

Hyperlink your urls –– example 5

The following will hyperlink your urls like Hyperlink your urls –– example 4, but it will also include http addresses and, no, it won't hyperlink urls that are already hyperlinked or part of an img src.

<?php
$text = preg_replace('/(?<!http:\/\/|https:\/\/|\"|=|\'|\'>|\">)(www\..*?)(\s|\Z|\.\Z|\.\s|\<|\>|,)/i',"<a href=\"http://$1\">$1</a>$2",$text);
$text = preg_replace('/(?<!\"|=|\'|\'>|\">|site:)(https?:\/\/(www){0,1}.*?)(\s|\Z|\.\Z|\.\s|\<|\>|,)/i',"<a href=\"$1\">$1</a>$3",$text);
?>

Here are the terms I have tested against:

test 1: <a href="http://www.animeviews.com/article.php?ID=59">http://www.animeviews.com/article.php?ID=59</a>. word.
test 2: www.animeviews.com/article.php?ID=59. word.
test 3: http://www.animeviews.com/article.php?ID=59. word.
test 4: <a href="http://www.animeviews.com">test 4</a>. word.
test 5: <a href="www.animeviews.com">www.animeviews.com</a> word.
test 6: http://en.wikipedia.org/wiki/Portable_Network_Graphics word.

Test 5 will not hyperlink correctly, but this is by design. The PCRE is not designed to correct hyperlink urls, just hyperlink urls that are not hyperlinked yet.

The first pattern in this example matches the www links and will hyperlink it till it reaches a space, end of the line, period and the end of a line, or period followed by a space. It will avoid hyperlinking the link if it is already hyperlinked.

The second pattern will do the same for http links.

retrieve the content between two different terms

The following will use preg_match() to locate the data between two different terms and assign it to an array.

<?php
$css = " .css {
color:#scrollbar;
font: bold 10pt verdana, geneva;
}";
preg_match('/\.css \{
color:(.*?);/s', $css, $match);
print_r($match);
?>

The subpattern that is attached to $match[1] is the content we want.

If we want all of the patterns in a string we would use preg_match_all(), but if you decide to do that you will end up with a multidimensional array.

detect emails

Found the following from regexadvice post.

<?php
$test='abc@domain.info';
if(preg_match('/^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$/i',
$test)) {echo "YES";}
else {echo "NO";}
?>

Phone Numbers

I got this one from php.net and altered it slightly for my purposes.

Detect US phone numbers and reformat it into the correct format

<?php
$phoneNumber="337-333-3443";
$regex = '/^(?:1(?:[. -])?)?(?:$(?=\d{3}$))?([2-9]\d{2})'
.'(?:(?<=$\d{3})$)? ?(?:(?<=\d{3})[.-])?([2-9]\d{2})'
.'[. -]?(\d{4})(?: (?i:ext)\.? ?(\d{1,5}))?$/';

if (preg_match($regex, $phoneNumber))

{$formatted = preg_replace($regex, '$1-$2-$3', $phoneNumber);
echo "$formatted";}
else echo "NOT A MATCH";
?>

This will reformat phone numbers into 123–456–6789 format.

Replace content between two different terms UNLESS term X is found between the terms.

<?php
$string ="hi, how goes it this is [quote]Bill[quote]Bill meyer[/quote] meyer[/quote] meyer";
$stringo = preg_replace('/\[quote\](?!.*?\[quote\]).*?\[\/quote\]/is', "XXXXXX", $string);
print $stringo;
echo" $string";
?>

In the above example everything between and including [quote] and [/quote] will be replaced with XXXXXX unless the expression [quote].*?[/quote] is located between [quote] and [/quote].

Using regular expressions with MySQL

select column from table where column regexp '[scbmrtl].*[scbmrtl]'

for more on regular expressions in MySQL go here.

Replacing content using back references

<?php
$test="<li>this is a test</li>
 this should not be here.
<li>good</li>";
$test=preg_replace('/(<\/)(.*?)(>)(.*?)(<\2>)/s','$1$2$3$5',$test);
echo "$test";
?>

At this point you should already understand how to use subpatterns. Subpatterns are represented with $1 or $2 etc. In this example the subpattern is repeated within the pattern itself as opposed to just the replacement.

$test=preg_replace('/(<\/)(.*?)(>)(.*?)(<\2>)/s','$1$2$3$5',$test);

The \2 in the pattern is

$test=preg_replace('/(<\/)(.*?)(>)(.*?)(<\2>)/s','$1$2$3$5',$test);

and in this case is also equal to li. The result is

this is a test

good

some helpful links

TAGS: pcre, php