Using preg_replace in php to replace a character that characters before and after it

The title of this post may not seem like that common of a use case – but I ran across a fascinating problem today.  Admittedly my Regex-Foo is not the greatest so I struggled with this one for a bit. I am posting my solution here in hopes that it helps other people.

In short I had a document that was being sent through a Character Set encoding.  The source of the encoding was unkown and despite different attempts and parsing it properly, I was still left with a few documents that had stray question marks in them for their dashes and quotes.

At the end of the day it was easier to run them through a cleaning filter after performing the charset conversion.

First Problem.  Finding a punctuation character, in this case a question mark, that had alphanumeric text before and after it.

Match: didn?t
Don’t Match: hello?

Solution.  Using pre_replace I was able to create a regex pattern that would find the question mark and confirm that it did in fact have alphanumeric characters before and after it.



Here is the preg_replace code:

$str = preg_replace("/(?<![\W_])\?(?![\W_])/", "'", $str);

Its a bit easier to read when you break it down in the three main parts, the condition, the character to find and then the closing condition.





Posted in , and tagged .