Course: CSCI 1900

Regular Expression

  • Pattern that can represent a variety of strings
  • Tool for verifying information

Why?

  • Validation
  • Searching (such as grep)
  • Identify valid credit card numbers
  • Represent complex strings with single simpler* string

Set Correspondence

  • The set of all distinct elements of the sequence
  • {π‘Ž,𝑏} is the set corresponding to the sequence abababababbababab

Basics

  • Given a set 𝐴={π‘Ž,𝑏,𝑐,𝑑,𝑒,𝑓,𝑔,β„Ž,𝑖,𝑗,π‘˜,𝑙,π‘š,𝑛,π‘œ,𝑝,π‘ž,π‘Ÿ,𝑠,𝑑,𝑒,𝑣,𝑀,π‘₯,𝑦,𝑧}
  • π΄βˆ— would be the words that could be created with the characters, including non-valid words

Examples

Phone Number

  • (123) 456-7890
    • Opening paren
    • 3 digits
    • Closing paren
    • Space
    • 3 digits
    • Hyphen
    • 4 digits
  • \(\d{3}\) \d{3}-\d{4}

Email Address

  • foo@bar.com
    • User
    • @
    • Domain
    • .
    • TLD
  • \w+@\w+\.\w+$

Word Ending in β€˜ing’

  • Crying
    • A word
    • Literal β€œing”
  • \w+ing\b

Password

  • SuperStr0ngPassword!
    • At least one letter β†’ (?=.*[a-zA-Z])
    • At least one digit β†’ (?=.*\d)
    • At least one special character β†’ (?=.*[!@#$%^&*?])
    • At least 8 characters β†’ [a-zA-Z\d!@#$%^&*?]{8,}
  • ^(?=.*[a-zA-Z])(?=.*\d)(?=.*[!@#$%^&*?])[a-zA-Z\d!@#$%^&*?]{8,}$

Hex Color

  • #ffffff or #fff
    • Hashtag
    • 6 or 3 characters, 0-9 or a-f
  • ^#([a-fA-F\d]{6}|[a-fA-F\d]{3})

Python Code

import re
 
test_cases = [
	"(123) 456-7890",
	"foo@bar.com",
	"Crying",
	"MyPasswordIsVeryStr0ng!",
	"#abc"
]
 
patterns = {
	"Phone Number": r"\(\d{3}\) \d{3}-\d{4}",
	"Email Address": r"\w+@\w+\.\w+$",
	"Word ending in 'ing'": r"\w+ing\b",
	"Password": r"^(?=.*[a-zA-Z])(?=.*\d)(?=.*[!@#$%^&*?])[a-zA-Z\d!@#$%^&*?]{8,}$",
	"Hex Code": r"^#([a-fA-F\d]{6}|[a-fA-F\d]{3})"
}
 
for label, patterns in patterns:
	print(f"\nTesting {label}:")
	for case in test_cases:
		match = re.match(pattern, case)
		print(f"  {case}: {"βœ…" if match else "❌"}")