Wednesday, April 6, 2011

Regex Replace - Multiple Characters

I have 20 or so characters that I need to replace with various other characters in a block of text. Is there a way to do this in a single regex, and what would this regex be? Or is there an easier way to do this in .NET?

For example, an excerpt from my mapping table is

œ => oe
ž => z
Ÿ => Y
À => A
Á => A
 => A
à => A
Ä => AE

From stackoverflow
  • I'm not aware of an easy way to do it using regex(not sure it is possible) but here is a clean way to do it:

    var replaceChars = new Dictionary<string, string>
                       {
                           {"œ", "oe"},
                           {"ž", "z"}
                       };
    string s = "ždfasœ";
    
    foreach (var c in replaceChars)
        s = s.Replace(c.Key, c.Value);
    
    Console.WriteLine(s);
    
    Joe : StringBuilder would be a better solution.
  • http://stackoverflow.com/questions/249087/how-do-i-remove-diacritics-accents-from-a-string-in-net

    Nicolas Webb : This likely addresses the actual need of the questioner. I'll have to keep that around.
  • For string replacement, I'd just iterate through these in your mapping table and use string.Replace on them:

    foreach(var r in replacements.Values)
    {
        myString.Replace(r.Key, r);
    }
    

    Not the most performant, but if you don't have a lot of strings to go through it should be good enough :).

  • If you really like to do it in single regex, there is way to do that.

    Dictionary<string, string> map = new Dictionary<string, string>() {
        {"œ","oe"},
        {"ž", "z"},
        {"Ÿ","Y"},
        {"À","A"},
        {"Á","A"},
        {"Â","A"},
        {"Ã","A"},
        {"Ä","AE"},
    };
    
    string str = "AAAœžŸÀÂÃÄZZZ";
    
    Regex r = new Regex(@"[œžŸÀÂÃÄ]");
    
    string output = r.Replace(str, (Match m) => map[m.Value]);
    
    Console.WriteLine(output);
    

    Result

    AAAoezYAAAAEZZZ
    
    serhio : is this more or less resources and time consuming that string.replace("œ","oe"), etc...?
    S.Mark : Normal string replace should be faster than regex

0 comments:

Post a Comment