Issue
I am facing some issues with my search operation using Regex. I am searching for all valid phone numbers in a text file. The patterns that I match are:
000-000-0000
000 000 0000
000.000.0000
(000)000-0000
(000)000 0000
(000)000.0000
(000) 000-0000
000-0000
000 0000
000.0000
(000)0000000
Text file:
The magazine is published by Microsoft Corp., 1
Microsoft Way, Redmond, WA 98052; Telephone: (425) 882-8080 or
1-888-424-6898 or (03) 681 3870 or 02-534 8999 or 702/256-5111.
All content of this magazine represents the views (1265468) of the respective authors
and does not necessarily reflect the views of Microsoft Corp from 1986-1989.
Here the regex searches the below strings:
(425) 882-8080
888-424-6898
681 3870
534 8999
256-5111
1265468
986-1989
It doesn't include 1- or (03) code or 02- code or 702/.
It misinterprets the date (2020-2022) as a telephone number. It matches with "020-2022".
It matches with large numbers like "1263537"
How do I solve this issue? Any help would be great
My python code is as below:
no = '((?:\+\d{2}[-\.\s]??|\d{4}[-\.\s]??)?(?:\d{3}[-\.\s]??\d{3}[-\.\s]??\d{4}|\(\d{3}\)\s*\d{3}[-\.\s]??\d{4}|\d{3}[-\.\s]??\d{4}))'
f2 = open('out.txt','w+')
for line in open(sys.argv[1]):
out = re.findall(no,line)
for i in out :
f2.write(i + '\n')
Solution
You may try using the following regex pattern:
^(?:\(\d{3}\)|\d{3})?[. -]?\d{3}[. -]?\d{4}$
Here is a working regex demo.
Explanation:
^ from the start of the number
(?:
\(\d{3}\) match (111) style area code
| OR
\d{3} match 111 style area code
)? area code is optional
[. -]? optional dot, space or dash separator
\d{3} match 3 digit exchange
[. -]? another optional dot, space or dash separator
\d{4} match 4 digit number
$ end of the number
Answered By - Tim Biegeleisen Answer Checked By - David Goodson (WPSolving Volunteer)