Post

10. Regular Expressions in Shell Scripting

🚀 Master shell scripting with regex! Learn to use grep, sed, and awk for powerful text manipulation, understand common metacharacters, and conquer advanced techniques. 🤯

10. Regular Expressions in Shell Scripting

What we will learn in this post?

  • 👉 Introduction to Regular Expressions
  • 👉 Using Grep with Regular Expressions
  • 👉 Using Sed for Text Processing
  • 👉 Using Awk for Pattern Matching
  • 👉 Common Regular Expression Metacharacters
  • 👉 Advanced Regular Expression Techniques
  • 👉 Conclusion!

Regular Expressions: Your Text-Processing Friend 🤝

Regular expressions (regex or regexp) are powerful patterns used to search and manipulate text. Think of them as wildcards on steroids! They let you find specific text within larger blocks, regardless of its exact position.

Why are they important?

Regexes are crucial for text processing because they automate tasks that would be tedious manually. Imagine finding all email addresses in a large document – regexes make this quick and easy. They’re used in countless applications, from code editors to web search engines.

Shell Scripting & Regex

In shell scripting (like bash), tools like grep, sed, and awk use regexes extensively. For example, grep "pattern" file.txt searches file.txt for lines containing “pattern”.

Basic Regex Examples

  • . matches any single character.
  • * matches zero or more occurrences of the preceding character.
  • + matches one or more occurrences.
  • ? matches zero or one occurrence.
  • [abc] matches any of a, b, or c.
  • \d matches any digit.
  • \w matches any alphanumeric character.

Example: grep "\w+@\w+\.\w+" email_list.txt finds email addresses in email_list.txt.

For more information:

This shows a simplified view. Regexes can be incredibly complex, but understanding the basics is a great starting point!

Grep & Regular Expressions: A Friendly Guide 🔎

Basic Grep with Regular Expressions

grep is a powerful command-line tool that searches for patterns within files. Combined with regular expressions (regex), it becomes incredibly versatile. A basic example: finding lines containing “error” in a log file:

1
grep "error" mylogfile.txt

This uses a simple string match. For more complex patterns, use regex. For instance, to find lines containing any number followed by “errors”:

1
grep "[0-9]+ errors" mylogfile.txt

[0-9]+ matches one or more digits.

Basic Regex Characters

  • . matches any single character.
  • * matches zero or more occurrences of the preceding character.
  • + matches one or more occurrences.
  • [] defines a character set (e.g., [abc] matches ‘a’, ‘b’, or ‘c’).

Extended Grep (egrep)

egrep (or grep -E) uses extended regular expressions, offering more concise syntax. For example, the + and ? quantifiers are built-in without needing backslashes:

1
egrep "error+" mylogfile.txt #Matches "error", "errorr", etc.

Note: Always escape special characters like . , *, +, ? with a backslash \ when using them literally (not as regex metacharacters) within basic grep.


For more advanced techniques and a complete regex reference, check out: Regular Expressions 101

Using sed for Text Manipulation ✍️

sed (stream editor) is a powerful command-line tool for searching and replacing text within files or input streams. It’s incredibly useful in shell scripts for automating text processing tasks.

Basic Search and Replace 🔄

The basic syntax is: sed 's/pattern/replacement/g' file.txt

  • s: Specifies substitution.
  • /pattern/: The text to search for.
  • /replacement/: The text to replace it with.
  • g: Replaces all occurrences (omit for only the first).

Example: Replacing “apple” with “orange” in my_file.txt:

1
sed 's/apple/orange/g' my_file.txt

Deletion and Insertion ✂️➕

Deletion

To delete lines matching a pattern:

1
sed '/pattern/d' file.txt

Example: Deleting lines containing “error”:

1
sed '/error/d' my_log.txt

Insertion

To insert text before a line matching a pattern:

1
sed '/pattern/i\text to insert' file.txt

Example: Inserting “Important Note:” before lines containing “warning”:

1
sed '/warning/i\Important Note:' my_log.txt

For more detailed information and advanced sed commands, refer to the GNU sed manual. Happy scripting! 🎉

Awk: Your Friend for Text Wrangling 🤝

Awk is a powerful command-line tool for processing text and data. Think of it as a mini-programming language specifically designed for pattern matching and text manipulation. It’s incredibly useful for working with structured data like CSV files.

Awk and CSV Files 🗄️

Awk excels at handling CSV (Comma Separated Value) data. Each line in a CSV file represents a record, and each field within a record is separated by a comma.

Example: Extracting Specific Columns

Let’s say you have a CSV file named data.csv with columns Name, Age, and City. To extract just the Name and Age columns, you’d use this command:

1
awk -F, '{print $1, $2}' data.csv

Here, -F, sets the field separator to a comma, and {print $1, $2} prints the first and second fields (Name and Age).

Beyond Basic Extraction ✨

Awk allows for much more complex operations:

  • Filtering rows: awk '$3 == "New York" {print}' data.csv prints only rows where the city is New York.
  • Calculations: You can perform calculations on numeric fields.
  • Conditional logic: Use if, else statements to control what Awk does.

Want to learn more? Check out these resources:

Remember, Awk is a versatile tool. Mastering its basics opens up a world of efficient text processing possibilities.

Regular Expression Metacharacters Explained 🔎

Regular expressions (regex or regexp) are powerful tools for pattern matching in text. Here are some common metacharacters:

Basic Metacharacters

  • . (Dot): Matches any single character except a newline. r".at" matches “cat”, “hat”, “bat”, etc.

  • * (Star): Matches zero or more occurrences of the preceding character. r"ca*t" matches “ct”, “cat”, “caaat”, etc.

  • ? (Question Mark): Matches zero or one occurrence of the preceding character. r"colou?r" matches both “color” and “colour”.

  • + (Plus): Matches one or more occurrences of the preceding character. r"ca+t" matches “cat”, “caat”, “caaat”, but not “ct”.

Anchors and Character Sets

  • ^ (Caret): Matches the beginning of a string. r"^Hello" matches “Hello world” but not “World Hello”.

  • $ (Dollar): Matches the end of a string. r"world$" matches “Hello world” but not “world Hello”.

  • [] (Square Brackets): Defines a character set. r"[aeiou]" matches any single lowercase vowel. r"[a-z]" matches any lowercase letter.

Grouping and Quantifiers

  • () (Parentheses): Groups characters together. r"(abc)+" matches “abc”, “abcabc”, etc.

  • {} (Curly Braces): Specifies the number of occurrences. r"a{2,4}" matches “aa”, “aaa”, “aaaa”.

Example: The regex r"^[A-Z][a-z]+ \d{5}$" would match strings like “John 12345” (capital letter, lowercase letters, space, 5 digits).

For more information, check out these resources:

Remember that the specific syntax might vary slightly depending on the programming language you’re using. Happy regex-ing! 😄

Advanced Regex in Shell Scripting

Regex, or regular expressions, are powerful tools for text manipulation. Let’s explore some advanced techniques:

Grouping and Backreferences

Grouping uses parentheses () to capture parts of a match. Backreferences, using \1, \2, etc., refer to previously captured groups.

Example

Finding and replacing repeated words:

1
2
string="hello hello world"
echo "$string" | sed 's/\(\w\+\) \1/\1/'  # Output: hello world

Here, \(\w\+\) captures a word, and \1 refers to it, replacing repeated words.

Lookarounds

Lookarounds assert conditions without including them in the match.

Positive Lookahead

(?=pattern): Matches only if followed by pattern.

Example

Finding words ending in “ing”:

1
2
string="singing running jumping"
echo "$string" | grep -o '\w\+(?=ing)' # Output: sing run jump

(?=ing) ensures only words ending in “ing” are matched, but “ing” itself isn’t included in the output.

For more info: Regular Expressions

This is a brief overview. Mastering regex takes practice, but the power it provides for text processing is invaluable! 👍

This post is licensed under CC BY 4.0 by the author.