Post

11. File Searching and Text Processing Utilities

🚀 Master essential command-line tools! Learn to search, filter, manipulate, and analyze text data with `find`, `grep`, `sed`, `awk`, `cut`, `sort`, `uniq`, `diff`, and `patch`. Become a text processing ninja! 🥷

11. File Searching and Text Processing Utilities

What we will learn in this post?

  • 👉 File Searching with Find Command
  • 👉 Pattern Matching with Grep
  • 👉 Text Processing with Sed
  • 👉 Data Extraction and Formatting with Awk
  • 👉 Working with Cut, Sort, and Uniq
  • 👉 Merging and Comparing Files with Diff and Patch
  • 👉 Conclusion!

Finding Files with find 🔎

The find command is your best friend for locating files and directories on Linux/macOS. It’s incredibly powerful and flexible!

Searching by Name and Type

To find files named myfile.txt:

1
find . -name "myfile.txt"

The . means “start searching in the current directory”. To find all .pdf files recursively:

1
find . -name "*.pdf"

Finding Directories

To find all directories:

1
find . -type d

Searching by Size and Time

Find files larger than 10MB:

1
find . -size +10M

Find files modified in the last 7 days:

1
find . -mtime -7

Deleting Files Safely ⚠️

Never delete files directly with find without double-checking! First, always test with a dry run:

1
find . -name "*.tmp" -print  # Lists files

Then, if you’re sure, add -delete:

1
find . -name "*.tmp" -delete # DELETES files

Caution: -delete is powerful and irreversible! Use with extreme care.

Learn more about find

graph TD
    A["🚀 Start"] --> B{"🔍 Specify Search Criteria"};
    B --> C["🖥️ Find Command Execution"];
    C --> D{"📄 Results Displayed"};
    D --> E["🗑️ Delete (Optional)"];
    E --> F["🏁 End"];

    %% Style Definitions
    classDef startStyle fill:#4CAF50,stroke:#388E3C,color:#fff,stroke-width:2px,rx:12px;
    classDef inputStyle fill:#2196F3,stroke:#1565C0,color:#fff,stroke-width:2px,rx:12px;
    classDef execStyle fill:#FFC107,stroke:#FFA000,color:#000,stroke-width:2px,rx:12px;
    classDef resultStyle fill:#00BCD4,stroke:#0097A7,color:#fff,stroke-width:2px,rx:12px;
    classDef deleteStyle fill:#F44336,stroke:#B71C1C,color:#fff,stroke-width:2px,rx:12px;
    classDef endStyle fill:#9C27B0,stroke:#6A1B9A,color:#fff,stroke-width:2px,rx:12px;

    %% Apply Styles
    class A startStyle;
    class B inputStyle;
    class C execStyle;
    class D resultStyle;
    class E deleteStyle;
    class F endStyle;

Using grep for Text Filtering 🔎

grep is a powerful command-line tool for searching text within files. It’s like a super-powered “find” function!

Basic Usage and Options ✨

To search for the word “example” in a file named myfile.txt, you’d use: grep example myfile.txt

  • Case-insensitive search: Use the -i flag: grep -i example myfile.txt This finds “Example,” “EXAMPLE,” etc. too.
  • Recursive search: Use the -r flag to search through all files and subdirectories within a directory: grep -r example mydirectory/

Regular Expressions (Regex) 🤯

grep supports powerful regular expressions for complex pattern matching. For example, to find lines containing numbers: grep '[0-9]' myfile.txt

Example: Finding Specific Emails ✉️

Let’s say you want to find all email addresses in a file. A basic regex could be: grep -E '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' myfile.txt

This uses -E for extended regex and a pattern matching typical email formats. Be aware, this isn’t perfect and might miss some unusual email addresses.

For more advanced regex, check out this resource: Regular Expression Tutorial

graph TD
    A["📄 Your File"] --> B{"🔍 grep Command"};
    B --> C["✅ Matching Lines"];
    B --> D["❌ Non-Matching Lines"];

    %% Style Definitions
    classDef fileStyle fill:#42A5F5,stroke:#1E88E5,color:#fff,stroke-width:2px,rx:10px;
    classDef grepStyle fill:#FFCA28,stroke:#F9A825,color:#000,stroke-width:2px,rx:10px;
    classDef matchStyle fill:#66BB6A,stroke:#388E3C,color:#fff,stroke-width:2px,rx:10px;
    classDef nonMatchStyle fill:#EF5350,stroke:#C62828,color:#fff,stroke-width:2px,rx:10px;

    %% Apply Styles
    class A fileStyle;
    class B grepStyle;
    class C matchStyle;
    class D nonMatchStyle;

This simple flowchart illustrates how grep filters lines based on your search criteria.

Sed: Your Friendly Text Editor ✏️

Sed (stream editor) is a powerful command-line tool for manipulating text. It’s perfect for tasks like modifying config files or bulk-replacing text.

Text Substitution 🔄

Sed’s core function is substitution. The basic syntax is sed 's/pattern/replacement/' file.

  • s/: indicates substitution.
  • pattern: the text to find (e.g., old_value).
  • replacement: the text to replace it with (e.g., new_value).

Example: To change “apple” to “orange” in my_file.txt:

1
sed 's/apple/orange/g' my_file.txt

The g flag ensures all occurrences are replaced.

Deletion 🗑️

To delete lines matching a pattern, use the d command:

1
sed '/pattern/d' file

Example: Remove lines containing “error” from log.txt:

1
sed '/error/d' log.txt

Pattern Matching 🔎

Sed uses regular expressions for powerful pattern matching. For example, sed '/^#.*$/d' removes all comment lines (starting with #).

Further Resources

Remember to always back up your files before using sed for any modifications, especially with crucial configuration files!

Awk: Your Friendly Text Processor 🤝

Awk is a powerful command-line tool for manipulating text data, especially useful for structured files like CSV (Comma Separated Values). Think of it as a mini-programming language designed for text processing. Let’s explore its capabilities!

Selecting Columns ✨

Let’s say you have a CSV file named data.csv with columns “Name”, “Age”, and “City”. To extract only the name and age, you’d use:

1
awk -F, '{print $1, $2}' data.csv

-F, sets the field separator to a comma. $1 represents the first column (Name), and $2 the second (Age).

Example

If data.csv contains:

1
2
Alice,30,New York
Bob,25,London

The command above outputs:

1
2
Alice 30
Bob 25

Calculations & Filtering 🧮

Awk can perform calculations directly on the data. For example, to add 5 to each age:

1
awk -F, '{print $1, $2+5}' data.csv

You can also filter rows based on conditions. To show only people over 28:

1
awk -F, '$2 > 28 {print $0}' data.csv

$0 represents the entire line.

Further Exploration 🚀

  • Conditional statements: Use if and else for more complex filtering.
  • Built-in functions: Awk offers many functions like length, substr, etc.
  • Custom variables: Define your own variables for more flexibility.

For more in-depth information and advanced techniques, refer to:

Remember, practice makes perfect! Experiment with different commands and data to master Awk’s capabilities.

Text Processing with cut, sort, and uniq

These three command-line tools are incredibly useful for efficiently manipulating text data. Let’s explore them!

Extracting Fields with cut ✂️

cut is your go-to for extracting sections from each line of a text file. Imagine a file with comma-separated values (CSV):

1
2
3
Name,Age,City
Alice,30,New York
Bob,25,London

To get just the names (first field):

1
cut -d ',' -f 1 input.csv

-d ',' sets the delimiter to a comma, and -f 1 selects the first field.

Sorting Text with sort ⬆️⬇️

sort arranges lines alphabetically or numerically. For example, sorting the input.csv file by age:

1
cut -d ',' -f 2 input.csv | sort -n

-n specifies numerical sorting.

Reverse sorting:

Use the -r flag for reverse order (descending).

Removing Duplicates with uniq 🧹

uniq removes consecutive duplicate lines. It’s crucial that your data is pre-sorted for uniq to work correctly:

1
sort input.csv | uniq

This will remove any duplicate lines in the file.

Example Workflow:

graph TD
    A["📄 Input File"] --> B{"✂️ cut"};
    B --> C{"🔃 sort"};
    C --> D{"🔁 uniq"};
    D --> E["📤 Output File"];

    %% Style Definitions
    classDef fileInput fill:#42A5F5,stroke:#1E88E5,color:#fff,stroke-width:2px,rx:12px;
    classDef toolCut fill:#FF7043,stroke:#D84315,color:#fff,stroke-width:2px,rx:12px;
    classDef toolSort fill:#AB47BC,stroke:#6A1B9A,color:#fff,stroke-width:2px,rx:12px;
    classDef toolUniq fill:#26C6DA,stroke:#00838F,color:#000,stroke-width:2px,rx:12px;
    classDef fileOutput fill:#66BB6A,stroke:#388E3C,color:#fff,stroke-width:2px,rx:12px;

    %% Apply Styles
    class A fileInput;
    class B toolCut;
    class C toolSort;
    class D toolUniq;
    class E fileOutput;

For more detailed information:

Remember to always back up your data before using these powerful tools!

Diff & Patch: Your File Comparison & Update Buddies 🤝

Understanding diff

diff compares two files (or directories) and shows you the differences between them. Think of it as a detailed “spot the difference” game for your code! The output is a patch file, containing instructions on how to transform one file into the other.

1
diff file1.txt file2.txt > my_patch.patch

Example:

Let’s say file1.txt has “Hello” and file2.txt has “Hello World”. diff would generate a patch showing the addition of “ World”.

Applying Patches with patch

patch uses the information in a patch file to update a file. It’s like applying the solution from the “spot the difference” game.

1
patch file1.txt my_patch.patch

This would change file1.txt to contain “Hello World”.

Version Control & Change Tracking ✨

diff and patch are fundamental in version control systems (like Git). They track changes between versions, allowing you to revert to older versions or merge updates from different branches.

  • Create a patch: Show changes between two file versions.
  • Apply a patch: Update a file based on the changes in the patch.

Learn more about diff Learn more about patch

graph LR
    A["📄 File 1"] --> B{"🧮 diff"};
    B --> C["📄 Patch File"];
    C --> D{"🩹 patch"};
    D --> E["✅ File 1 (updated)"];

    %% Style Definitions
    classDef file fill:#42A5F5,stroke:#1E88E5,color:#fff,stroke-width:2px,rx:12px;
    classDef command fill:#FFA726,stroke:#EF6C00,color:#000,stroke-width:2px,rx:12px;
    classDef patch fill:#66BB6A,stroke:#388E3C,color:#fff,stroke-width:2px,rx:12px;

    %% Apply Styles
    class A,E file;
    class B,D command;
    class C patch;

Conclusion

And that’s a wrap! 🎉 I hope you enjoyed this post. What are your thoughts? Let me know in the comments below! 👇 I’d love to hear your feedback and ideas. Let’s chat! 😊

This post is licensed under CC BY 4.0 by the author.