PowerShell and Regex: A Comprehensive Guide

Published:5 January 2021 - 8 min. read

Christopher Bisset Image

Christopher Bisset

Read more tutorials by Christopher Bisset!

Azure Cloud Labs: these FREE, on‑demand Azure Cloud Labs will get you into a real‑world environment and account, walking you through step‑by‑step how to best protect, secure, and recover Azure data.

Understanding regular expressions (regex) can be a pain for us, humans, to understand but regex can be an incredibly powerful way to work with strings. In this article, you’re going to learn the basics of working with PowerShell and Regex.

You’ll get an introduction to handy cmdlets like Select-String, learn about regex capture groups and get an intro to various regex parsing techniques.

Prerequisites

  • A Windows 7 or later machine running PowerShell 5.1+. This article will be using PowerShell 7.1.0.

Matching Simple Text with Select-String

To demonstrate PowerShell and regex together, it’s always best to walk through an actual example.

Let’s say you are gathering data from legacy machines about their hardware and, using the wmic utility, you build a simple text file like below. We’ll call it computername.txt.

BiosCharacteristics={7,11,12,15,16,19,20,21,22,23,24,25,27,30,32,33,39,40,42,43}
 BIOSVersion={"ACRSYS - 2","V1.15","INSYDE Corp. - 59040115"}
 BuildNumber=
 Caption=V1.15
 CodeSet=
 CurrentLanguage=
 Description=V1.15
 EmbeddedControllerMajorVersion=1
 EmbeddedControllerMinorVersion=15
 IdentificationCode=
 InstallableLanguages=
 InstallDate=
 LanguageEdition=
 ListOfLanguages=
 Manufacturer=Insyde Corp.
 Name=V1.15
 OtherTargetOS=
 PrimaryBIOS=TRUE
 ReleaseDate=20200826000000.000000+000
 SerialNumber=NXHHYSA4241943017724S00
 SMBIOSBIOSVersion=V1.15
 SMBIOSMajorVersion=3
 SMBIOSMinorVersion=2
 SMBIOSPresent=TRUE
 SoftwareElementID=V1.15
 SoftwareElementState=3
 Status=OK
 SystemBiosMajorVersion=1
 SystemBiosMinorVersion=15
 TargetOperatingSystem=0
 Version=ACRSYS - 2

In this instance, let’s say that you need to extract the serial number of this computer. This serial number is located on the SerialNumber= line.

In this situation, Select-String is going to be your new favorite tool.

Select-String is a PowerShell cmdlet that allows you to provide a regular expression pattern and return a string that matches that pattern.

Related: How to use PowerShell’s Grep (Select-String)

Since the pattern you’re looking for is in a file, you’ll first need to read that file and then look for a regex match. To do that, provide a regex pattern using the Pattern parameter and the path to the text file using the Path parameter.

Select-String -Pattern "SerialNumber" -Path '.\computername.txt'

The Select-String cmdlet reads the .\computername.txt file attempts to find a set of characters matching SerialNumber.

PowerShell and Regex : an example output of select-string
PowerShell and Regex : an example output of select-string

Believe it or not, you’re already using Regex. Regex, in its simplest form, is matching specific characters. In this situation, you are matching the literal “SerialNumber” phrase.

However, you most likely do not want that whole line. Let’s instead start building a native script to retrieve only the data you care about.

Select-String Output is a Rich Object

In the previous example, Select-String returned, what looked to be, a simple string but that output was actually a whole lot more. Select-String doesn’t just output a text match. The cmdlet actually returns a whole object.

For example, specify a regex pattern of This (is) to search in the string This is a string. You can see below that if you pipe that output to Get-Member, Select-String returns a Microsoft.PowerShell.Commands.MatchInfo object.

select-string "This (is)" -inputobject "This is a String" | get-member
The properties of a select-string operation
The properties of a select-string operation

Using Capture Groups

In the previous example, notice the regex pattern used (This (is)). This pattern contains a set of parenthesis. In a regular expression, those parentheses create a capture group.

By surrounding a search term with parentheses, PowerShell is creating a capture group. Capture groups “capture” the content of a regex search into a variable.

Notice in the above example, Select-String outputs a property called Matches. This property contains all of the lines or values of capture groups (if using parentheses) found.

The values of all capture groups are found under the Matches.Groups property. Thegroups property is an array of objects, within which the value property is the actual data. The groups array starts from 0 (with a value of the whole regex match), and increments by each capture group you specify in the Regex term.

For the above example, you can extract both the whole string with the matches property, as well as the is match you extracted:

$match = select-string "This (is)" -inputobject "This is a String"
#this property will match the whole select-string value
$match.Matches.groups[0].value
#this property will match the first capture group
$match.Matches.groups[1].value
the output of a capture group

Using Capture Groups with Pattern Matches

Capturing a literal string is fairly pointless like capturing the literal is in This is. You’re not gaining valuable data from capturing a string you already knew the contents of. You can also combine capture groups with pattern matching, to extract only the information you care about.

Pattern matching is using specially defined characters to match a range of characters, rather than a specific character. You can think of pattern matching as a wildcard * (like in notepad) on steroids.

Let’s say you want to match only the serial number in the line SerialNumber=NXHHYSA4241943017724S00 and not the entire line. You’d like to capture any character past the SerialNumber= phrase. You can extract that pattern by using the special dot . character, followed by a regex wildcard * (referred to as a Quantifier).

The dot tells regex to match any single character after SerialNumber=. The * tells regex to repeat the . match zero or more times. Combined with a capture group, the regex will look like SerialNumber=(.*). You can see this below:

$string = "SerialNumber=numberwecareabout1042"
#extract the serial number using a capture group
$match = select-string "SerialNumber=(.*)" -inputobject $string
#output the serial number
$match.matches.groups[1].value
Using Capture Groups to extract important information
Using Capture Groups to extract important information

The special . character is only one of many different pattern match possibilities. You can match words, character ranges, number ranges, and the like. The Regex Reference category on the regexr website (via the sidebar) is an excellent resource for different regex expressions.

A Practical PowerShell Regex Example

Putting all of the above together, lets create a script that:

  1. Ingests a list of text files (in the example, you will only grab the sample text file)
  2. Loops through the text files, and find the serial number using SerialNumber=(.*)
  3. Generates a hashtable that has a list of computer names, and their associated serial numbers
#Create a hashtable to hold the serial numbers
$serialNumbers = @{}

#Get all of the text files. In this case, you are limiting your scope to a single text file
$files = Get-ChildItem "$pwd\computername.txt"

#populate the hashtable
foreach ($file in $files) {
    #first, retrieve that same string, like in the first example. This time, also capture the information after the label in a capture group
    $serialNumber = select-string "SerialNumber=(.*)" $file.FullName
    #now, use the capture group to extract the serial number only. This is done using the special matches property. We also use the filename (without extension) as the index for the serial number
    $serialNumbers[$file.basename] = $serialNumber.matches.groups[1].value
}
# write the output of the hashtable to the screen
$serialNumbers | format-table

You can see the above script in action below using computername.txt:

output of the above code
output of the above code

The Match Operator

You’ve learned how to use Select-String to match regex patterns in text but PowerShell also has a few handy operators that support regex.

One of the most useful and popular PowerShell regex operators is the match and notmatch operators. These operators allow you to test whether or not a string contains a specific regex pattern.

If the string does match the pattern, the match operator will return a True value. If not, it will return a False value. The opposite is true for the notmatch operator.

Below you will see a simple example of this behavior in action.

#example of using a match parameter
if("my string" -match "string") {
    "string is in my string!"
}

The Split Operator

If you’d like to split strings on non-static character like a space, a comma or a tab, you can use the split operator. The split operator performs a regex match on a string and take a second action of splitting the string into one or more strings.

The split operator “converts” a string into an array of strings split on a specific regex pattern.

#create an array of strings split by the "\" symbol. The "\" is escaped within split because it is a special character
"somebody\once told me\the world\is going\to roll me" -split ("\\")

ValidatePattern Parameter Validation

PowerShell’s regex support doesn’t just end at cmdlets and operators; you can also integrate regex matching in parameters too.

Related: Everything you Ever Wanted to Know about PowerShell Parameters

Using the ValidatePattern parameter validation attribute, you can validate string parameter values based on a regex pattern. This validation routine is useful to limit what input a user can use for a parameter value.

#example validation using regex. The ValidatePattern in this function will
#only accept lowercase or uppercase alphabetical letters, as well as spaces.
#the ^ at the start of the regex represents the start of the string, and $ at the end
#represents the end of the string (to match the *entire* string). The +
#means that the string must have one or more characters to be accepted
function alphaOnly {
    param([ValidatePattern('^[a-zA-Z ]+$')][string]$alphaCharacters)
    write-output $alphaCharacters
}
#this will succeed
alphaOnly "Hi Mom"
#this will fail
alphaOnly "Hi Mom!"

Replacing Text with PowerShell and Regex

In the previous sections, you learned a few different ways to match patterns with PowerShell and regex. You can take that knowledge one step further and also replace text that PowerShell has matched.

One popular method to replace text with regex is to use the -replace operator. The -replace operator takes two arguments (separated by a comma) and allows you to use regex to replace a string with a replacement. -replace also supports capture groups, allowing you to match a capture group in the search and use the match in the replacement.

For example, using -replace you can append text to a serial number:

$string = "SerialNumber=numberwecareabout1042"
$currentYear = "2020"
#append the year to the end of the serialnumber
$serialNumber = $string -replace "SerialNumber=(.*)","SerialNumber=`$1-$currentYear"
write-output $serialNumber
Appending text using the -replace operator and capture groups
Appending text using the -replace operator and capture groups

Note that in the above example, The dollar sign in $1 is escaped using a backtick. Otherwise, PowerShell would treat that $1 as a variable instead of a special regex character.

Learning How to Write Better PowerShell Regex

All of the above might sound complicated, and, well, it is. In fact, there are lots of regex features not covered in the above example. Luckily regex is a widely used method of machine reading, and there are tons of utilities to help learn how to effectively use regex.

  • RegexOne is considered the de facto resource for learning regex. Regexone introduces the capabilities of regex in a bite sized and interactive way, letting you learn regex as you write it. RegexOne is a fantastic resource to begin learning how regex works from the beginning
  • Regexr is one of the best tools out there to validate and build your regex. Aside from having a great real-time regex testing tool, regexr also includes a cheatsheet and a fantastic documentation engine.
  • Regexstorm specifically uses the .Net engine to drive its tool. The site doesn’t have all of the bells and whistles that sites like Regexr has, but it will accurately test your regular expression the same way PowerShell does. Even if you use other tools to build your regex, you should always run the regex through regexstorm to make sure PowerShell will parse it correctly.

Don’t Use PowerShell and Regex if You Don’t Have To!

PowerShell works around objects. PowerShell is built around structured objects. Objects with properties are a lot easier to manage than loose text where regex comes into play.

Related: Back to Basics: Understanding PowerShell Objects

One of the main purposes of PowerShell and also structured languages like JSON is to make regex and text parsing obsolete. Human language is fantastic for regex to decipher, but generally, regex is something you try to avoid when it comes to storing or transferring data.

Related: Wrangling REST APIs with PowerShell and JSON

Some people even get worked up over using regex on structured languages.

If you can use an object-oriented method or a structured language like JSON, XML, etc over regex, do it! Even though you can do just about anything with regex, doesn’t mean you should!

Moving on with Regex

With this article, you should now have a basic understanding of how regex helps machines parse and find text, even when looking for highly specific or complicated phrases. You should also have the tools to test, validate, and learn about regex in the context of PowerShell.

If you haven’t done so, the RegexOne tutorials are a fantastic next step. Test your regex knowledge and power up your string powers in PowerShell!

Hate ads? Want to support the writer? Get many of our tutorials packaged as an ATA Guidebook.

Explore ATA Guidebooks

Looks like you're offline!