Regular Expression is a powerful tool for string matching and string modification. It is broadly used in all kinds of development works. Specially in web scraping area, regular expression is a must known technology. Below are some frequently used concepts.
Simple Python Match 1 2 3 4 5 6 pattern1 = 'cat' pattern2 = 'bird' string = 'dog runs to cat' print(pattern1 in string) print(pattern2 in string)
True
False
Match with Regex 1 2 3 4 5 6 pattern1 = 'cat' pattern2 = 'bird' string = 'dog runs to cat' print(re.search(pattern1, string)) print(re.search(pattern2, string))
<_sre.SRE_Match object; span=(12, 15), match='cat'>
None
Match with Multiple Patterns using [] 1 2 3 ptn = r"r[au]n" print(re.search(ptn, 'dog runs to cat' ))
<_sre.SRE_Match object; span=(4, 7), match='run'>
Useful Pattern Matching 1 2 3 4 5 print(re.search(r"r[A-Z]n" , 'dog runs to cat' )) print(re.search(r"r[a-z]n" , 'dog runs to cat' )) print(re.search(r"r[0-9]n" , 'dog r2ns to cat' )) print(re.search(r"r[0-9a-z]n" , 'dog runs to cat' ))
None
<_sre.SRE_Match object; span=(4, 7), match='run'>
<_sre.SRE_Match object; span=(4, 7), match='r2n'>
<_sre.SRE_Match object; span=(4, 7), match='run'>
Special Type Matching Numbers 1 2 3 4 print(re.search(r"r\dn" , 'run r4n' )) print(re.search(r"r\Dn" , 'run r4n' ))
<_sre.SRE_Match object; span=(4, 7), match='r4n'>
<_sre.SRE_Match object; span=(0, 3), match='run'>
White space 1 2 3 4 print(re.search(r"r\sn" , 'r\nn r4n' )) print(re.search(r"r\Sn" , 'r\nn r4n' ))
<_sre.SRE_Match object; span=(0, 3), match='r\nn'>
<_sre.SRE_Match object; span=(4, 7), match='r4n'>
All Numbers, Letters and “_” 1 2 3 4 print(re.search(r"r\wn" , 'r\nn r4n' )) print(re.search(r"r\Wn" , 'r\nn r4n' ))
<_sre.SRE_Match object; span=(4, 7), match='r4n'>
<_sre.SRE_Match object; span=(0, 3), match='r\nn'>
Empty String 1 2 3 4 print(re.search(r"\bruns\b" , 'dog runs to cat' )) print(re.search(r"\B runs \B" , 'dog runs to cat' ))
<_sre.SRE_Match object; span=(4, 8), match='runs'>
<_sre.SRE_Match object; span=(4, 10), match=' runs '>
Special Characters 1 2 3 4 print(re.search(r"runs\\" , 'runs\ to me' )) print(re.search(r"r.n" , 'r[ns to me' ))
<_sre.SRE_Match object; span=(0, 5), match='runs\\'>
<_sre.SRE_Match object; span=(0, 3), match='r[n'>
Start and End 1 2 3 4 print(re.search(r"^dog" , 'dog runs to dog' )) print(re.search(r"cat$" , 'cat runs to cat' ))
<_sre.SRE_Match object; span=(0, 3), match='dog'>
<_sre.SRE_Match object; span=(12, 15), match='cat'>
Maybe 1 2 3 print(re.search(r"Mon(day)?" , 'Monday' )) print(re.search(r"Mon(day)?" , 'Mon' ))
<_sre.SRE_Match object; span=(0, 6), match='Monday'>
<_sre.SRE_Match object; span=(0, 3), match='Mon'>
Multi-Line Matching 1 2 3 4 5 6 7 string = """ dog runs to cat. I run to dog. """ print(re.search(r"^I" , string)) print(re.search(r"^I" , string, flags=re.M))
None
<_sre.SRE_Match object; span=(18, 19), match='I'>
Match 0 or Multiple times 1 2 3 print(re.search(r"ab*" , 'a' )) print(re.search(r"ab*" , 'abbbbbb' ))
<_sre.SRE_Match object; span=(0, 1), match='a'>
<_sre.SRE_Match object; span=(0, 7), match='abbbbbb'>
Match 1 or Multiple times 1 2 3 print(re.search(r"ab+" , 'a' )) print(re.search(r"ab+" , 'abbbbbb' ))
None
<_sre.SRE_Match object; span=(0, 7), match='abbbbbb'>
Defined Match Times 1 2 3 print(re.search(r"ab{2,10}" , 'a' )) print(re.search(r"ab{2,10}" , 'abbbbbb' ))
None
<_sre.SRE_Match object; span=(0, 7), match='abbbbbb'>
Group 1 2 3 4 5 match = re.search(r"(\d+), Date: (.+)" , 'ID: 021523, Date: Feb/12/2017' ) print(match.group()) print(match.group(1 )) print(match.group(2 ))
021523, Date: Feb/12/2017
021523
Feb/12/2017
1 2 3 match = re.search(r"(?P<id>\d+), Date: (?P<date>.+)" , 'ID: 025123, Date: Feb/12/2017' ) print(match.group('id' )) print(match.group('date' ))
025123
Feb/12/2017
Find All Matchs 1 2 print(re.findall(r"r[ua]n" , 'run ran ren' ))
['run', 'ran']
1 2 print(re.findall(r"(run|ran)" , 'run ran ren' ))
['run', 'ran']
Replace 1 2 print(re.sub(r"r[au]ns" , 'catches' , 'dog runs to cat' ))
dog catches to cat
Split 1 2 print(re.split(r"[,;\.]" , 'a;b,c.d;e' ))
['a', 'b', 'c', 'd', 'e']
Compile 1 2 3 compiled_re = re.compile(r"r[au]n" ) print(compiled_re.search('dog runs to cat' ))
<_sre.SRE_Match object; span=(4, 7), match='run'>
More about Regular Expression Link