Class Central is learner-supported. When you buy through links on our site, we may earn an affiliate commission.

University of California, San Diego

Linux Tools for Text Processing

University of California, San Diego via Coursera

Overview

Coursera Flash Sale
40% Off Coursera Plus for 3 Months!
Grab it
Many fields, such as data science, bioinformatics, and Linux systems administration, require the manipulation of textual data. Tasks include extracting fields or records meeting certain conditions from structured data (e.g., comma-separated files), combining content from multiple files, applying systematic changes to all lines of a document, sorting or randomizing data, and splitting larger files into smaller files. While these operations could be done by hand, they tend to be time-consuming, tedious and, worst of all, error prone. In this course we systematically explore the text processing tools found in Linux and Linux-like environments that enable you to simplify and automate these tasks. We’ll begin with the simplest utilities, covering the features of head, tail, paste, nl, sort, shuffle, split, tr and cut. We’ll then move onto the tools grep, awk and sed, which provide much more powerful capabilities for searching and manipulation. We conclude with an introduction to regular expressions (regexes) and explain how they can be used to specify richer and more complex patterns. Regex topics will include quantifiers, wildcards, anchors, character classes, grouping and alternation, along with advanced concepts such as word boundaries, lazy and greedy matching, and regex flavors.

Syllabus

  • Introduction
    • In this module, we start by providing some background on Linux tools, the prerequisites for the course, Linux flavors, the motivation for automating tedious and repetitive tasks, and accessing the GitHub repository. We also provide optional instructions for Mac users who wish to access the standard version of the tools that are distributed with Linux distros.
  • Linux text tools basics
    • In this module, we dive in and cover the simpler tools in roughly their order of increasing complexity: head, tail, wc, expand, tac, paste, nl, tr, sort, shuf, uniq, split and cut. Along the way, we discuss the similarities and differences between the tools, potential pitfalls, limitations and linking tools together to create simple workflows.
  • grep, awk and sed
    • This module explores three of the more powerful Linux text processing tools: grep, awk and sed. We start with using grep to find matches in a file. We then cover awk, which is actually a full programming language, but will limit our treatment to one-liners for extracting fields and records from a file. The module concludes with sed, a stream editor that can operate efficiently on arbitrarily large files.
  • Regular expressions part 1
    • This module introduces regular expressions and the key features such as character classes, quantifiers, groups, anchors, alternation and word boundaries. These features are all covered in the context of grep, but we'll see later that they carry over to awk and sed.
  • Regular expressions part 2
    • This module continues our discussion or regular expressions, going into some of the more advanced features that are only available in Perl compatible regular expressions, such as lazy matching, lookahead, lookbehind and backreferences. We also show how regexes can be used with awk and sed.
  • Final exam and conclusions
    • This is the last mandatory module of the course and includes the final exam and some closing remarks.
  • Resources (optional)
    • This module contains the lecture notes used during the production of the videos. This is not required reading and is provided for those who wish to supplement the videos or who find that they learn better from written materials. Unlike the transcripts, these PDFs contain the commands that were executed during the videos. Note that while the videos generally follow the lecture notes, there will not be an exact correspondence between the videos and notes.

Taught by

Robert Sinkovits

Reviews

Start your review of Linux Tools for Text Processing

Never Stop Learning.

Get personalized course recommendations, track subjects and courses with reminders, and more.

Someone learning on their laptop while sitting on the floor.