XML is all over the place. Despite its annoying use of angle brackets, XML format is still widely used. Configuration files, RSS feeds, Office files (the ‘x’ in the .docx) are just a partial list. Using PowerShell to parse XML files is an essential step in your PowerShell journey.
This tutorial will show you how PowerShell parse XML files and validate them. This will walk you from zero to hero for all aspects of getting and evaluating XML data. You will be given tools that will help you validate XML data integrity and stop faulty data right at the gate of your scripts!
Announcing a Free LIVE training – Starting your PowerShell Journey – presented by Johan Arwidmark. Understand how PowerShell skills enhance your IT career, learn where to start with PowerShell, build your first scripts, and ask Johan questions directly in a live training environment.
Prerequisites
To follow along with the presented material, you should have:
- PowerShell version 3.0 and above. The examples were created on Windows PowerShell v5.1
- Notepad++, Visual Studio Code or another text editor that understands XML.
Parsing Powershell XML Elements with Select-Xml
Let’s first cover one of the most popular and easiest ways to use PowerShell to parse XML and that’s with Select-Xml
. The Select-Xml
cmdlet allows you to provide an XML file or string along with a “filter” known as XPath to pull out specific information.
XPath is a chain of element names. It uses a “path like” syntax to identify and navigate nodes in an XML document.
Let’s say you have an XML file with a bunch of computers and would like to use PowerShell to parse this XML file. Each computer has various elements like name, IP address and an Include
element for inclusion in a report.
An element is an XML portion with an opening tag and a closing tag, possibly with some text in-between, such as
<Name>SRV-01</Name>
<Computers>
<Computer>
<Name>SRV-01</Name>
<Ip>127.0.0.1</Ip>
<Include>true</Include>
</Computer>
<Computer>
<Name>SRV-02</Name>
<Ip>192.168.0.102</Ip>
<Include>false</Include>
</Computer>
<Computer>
<Name>SRV-03</Name>
<Ip>192.168.0.103</Ip>
<Include>true</Include>
</Computer>
</Computers>
You’d like to use PowerShell to parse this XML file get the computer names. To do that, you could use the Select-Xml
command.
In the file above, the computer names appear in the inner text (InnerXML) of the Name element.
InnerXML is the text between the two element’s tags.
To find the computer names, you’d first provide the appropriate XPath (/Computers/Computer/Name
). This XPath syntax would return only the Name
nodes under the Computer
elements. Then to only get the InnerXML of each Name
element, reach the Node.InnerXML
property on each element with a ForEach-Object
loop.
Select-Xml -Path C:\Work\computers-elm.xml -XPath '/Computers/Computer/Name' | ForEach-Object { $_.Node.InnerXML }
Using PowerShell to Parse XML Attributes with Select-Xml
Now let’s address this problem of finding computer names from a different angle. This time, instead of the computer descriptors represented with XML elements, they are represented with XML attributes.
An attribute is a key/value portion such as
name="SRV-01"
. Attributes always appear within the opening tag, right after the tag name.
Below is the XML file with computer descriptors represented with attributes. You can now see each descriptor as an attribute rather than an element.
<Computers>
<Computer name="SRV-01" ip="127.0.0.1" include="true" />
<Computer name="SRV-02" ip="192.168.0.102" include="false" />
<Computer name="SRV-03" ip="192.168.0.103" include="true" />
</Computers>
Since each descriptor is an attribute this time, tweak the XPath a little bit to only find the Computer elements. Then, using a ForEach-Object
cmdlet again, find the value of the name attribute.
Select-Xml -Path C:\Work\computers-attr.xml -XPath '/Computers/Computer' | ForEach-Object { $_.Node.name }
And indeed, this also brings the same results: SRV-01, SRV-02 and SRV-03 :
In both cases, whether you are reading elements or attributes, the syntax of Select-Xml
is cumbersome: it forces you to use the XPath
parameter, then to pipe the result to a loop, and finally to look for the data under the Node property.
Luckily, PowerShell offers a more convenient and intuitive way to read XML files. PowerShell lets you read XML files and convert them to XML objects.
Related: Using PowerShell Data Types Accelerators to Speed up Coding
Casting XML Strings to Objects
Another way to use PowerShell to parse XML is to convert that XML to objects. The easiest way to do this is with the [xml]
type accelerator.
By prefixing the variable names with [xml]
, PowerShell converts the original plain text XML into objects you can then work with.
[xml]$xmlElm = Get-Content -Path C:\Work\computers-elm.xml
[xml]$xmlAttr = Get-Content -Path C:\Work\computers-attr.xml
Reading XML Object Elements
Now both the $xmlElm
and $xmlAttr
variables are XML objects allowing you to reference properties via dot notation. Perhaps you need to find the IP address of each computer element. Since the XML file is an object, you can so by simply referencing the IP element.
$xmlElm.Computers.Computer.ip
Starting from PowerShell version 3.0, the XML object gets the attribute value with the same syntax used for reading the element’s inner text. Therefore, the IP addresses’ values are read from the attributes file with the exactly same syntax as the elements file.
Reading XML Attributes
Using exactly the same dot notation, you can read XML attributes as well despite differences in the XML structure.
$xmlElm.Computers.Computer.ip
$xmlAttr.Computers.Computer.ip
And the results below show that both got the same data, each one from its corresponding file:
Moreover, with the object, once it is loaded to memory, you even get IntelliSense to tab-complete if you’re using the PowerShell ISE. For example as shown below.
Iterating Through XML Data
Once you get around reading XML files directly to XML object (by taking advantage of the [xml]
type accelerator), you have all the full power of PowerShell objects.
Say, for example, you are required to loop through all the computers that appear in the XML file with the include=”true” attribute to check their connection status. The code below shows how it can be done.
This script is:
- Reading the file and casting it to an XML object.
- Iterating through the relevant computers to get their connection status.
- Finally, sending the result output object to the pipeline.
## casting the file text to an XML object
[xml]$xmlAttr = Get-Content -Path C:\Work\computers-attr.xml
## looping through computers set with include="true"
$xmlAttr.Computers.Computer | Where-Object include -eq 'true' | ForEach-Object {
## see if the current computer is online
if(Test-Connection -ComputerName $_.ip -Count 1 -Quiet)
{
$status = 'Connection OK'
}
else
{
$status = 'No Connection'
}
## output the result object
[pscustomobject]@{
Name = $_.name
Ip = $_.ip
Status = $status
}
}
And the results of the script above are shown below:
XML Schemas
In the previous section, you saw two different XML files representing a data set in two different ways. Those ways are called XML schemas. An XML Schema defines the legal building blocks of a specific XML document:
- The names of elements and attributes that can appear in that specific document.
- The number and order of child elements.
- The data types for elements and attributes.
The schema essentially defines the structure of the XML.
Validating XML Data
An XML file may have the correct syntax (editors like Notepad++ will complain if not), yet its data might not match the project requirement. This is where the schema comes in to play. When you lean on XML data, you must ensure that all the data is valid according to the defined schema.
The last thing you want is to discover data errors in runtime, 500 lines deep in the script’s middle. It might have already performed some irreversible operations on the file system and the registry, by that time.
So, how can you check in advance that the data is correct? Let’s see first some possible error types.
Possible Errors in XML Data
Generally speaking, errors found on XML files belong to one of two categories; metadata errors and errors in the data itself.
XML Metadata Errors
This file MorePeople.xml below, is perfectly valid syntax-wise. You can see below that the file has a single People element (the root element) with three Person elements inside. This structure is perfectly acceptable. Still, it contains one exception, can you see it?
<People>
<Person Name="Debra" County="Canada" IsAdmin="true" />
<Person Name="Jacob" Country="Israel" IsAdmin="true" />
<Person Name="Olivia" Country="Cyprus" IsAdmin="false" />
</People>
Do not worry if you didn’t, it is just hiding. The problem is found in the first inner element:
What should have been a Country was misspelled, and Canada was degraded to a County.
Errors in the XML Data
After fixing the Country issue on MorePeople.xml, another problem sneaked in:
<People>
<Person Name="Debra" Country="Canada" IsAdmin="yes" />
<Person Name="Jacob" Country="Israel" IsAdmin="true" />
<Person Name="Olivia" Country="Cyprus" IsAdmin="false" />
</People>
The metadata, i.e., the elements and attributes, are fine. So what is wrong? This time the problem, again on the first Person line, is in one of the values. Someone decided that yes is a good enough substitute for true – but code like below will fail to get the first element as it is looking for true, not for yes:
$peopleXml.People.Person | Where-Object IsAdmin -eq 'true'
Creating an XML Schema
Now that you know the types of errors that may occur, it is time to show how a schema file helps. The first step is creating a sample data file. The sample can be the smallest example and contain nothing but a single inner element. For the above examples, let’s create a sample file like this People.xml :
<People>
<Person Name="Jeff" Country="USA" IsAdmin="true" />
</People>
Now build a PowerShell function below and use it with the sample data to create the .xsd schema.
function New-XmlSchema
{
[CmdletBinding()]
Param
(
[Parameter(Mandatory)]
[ValidateScript({ Test-Path -Path $_ })]
[ValidatePattern('\.xml')]
[string]$XmlExample
)
$item = Get-Item $XmlExample
$dir = $item.Directory
$name = $item.Name
$baseName = $item.BaseName
## build the schema path by replacing '.xml' with '.xsd'
$SchemaPath = "$dir\$baseName.xsd"
try
{
$dataSet = New-Object -TypeName System.Data.DataSet
$dataSet.ReadXml($XmlExample) | Out-Null
$dataSet.WriteXmlSchema($SchemaPath)
## show the resulted schema file
Get-Item $SchemaPath
}
catch
{
$err = $_.Exception.Message
Write-Host "Failed to create XML schema for $XmlExample`nDetails: $err" -ForegroundColor Red
}
}
Copy the function to your ISE or your favorite Powershell editor, load it into memory and use it to create the schema. With the function loaded, the code to create the schema is this one-liner:
New-XmlSchema -XmlExample 'C:\Work\People.xml'
The results will show the path to the newly created schema:
Using the Schema File to Validate Your Data
Take a look at the location specified by the results above. If the .xsd is there, you are on the right way to see the validation in action. For the confirmation step, use the function below:
function Test-XmlBySchema
{
[CmdletBinding()]
[OutputType([bool])]
param
(
[Parameter(Mandatory)]
[ValidateScript({ Test-Path -Path $_ })]
[ValidatePattern('\.xml')]
[string]$XmlFile,
[Parameter(Mandatory)]
[ValidateScript({ Test-Path -Path $_ })]
[ValidatePattern('\.xsd')]
[string]$SchemaPath
)
try
{
[xml]$xml = Get-Content $XmlFile
$xml.Schemas.Add('', $SchemaPath) | Out-Null
$xml.Validate($null)
Write-Verbose "Successfully validated $XmlFile against schema ($SchemaPath)"
$result = $true
}
catch
{
$err = $_.Exception.Message
Write-Verbose "Failed to validate $XmlFile against schema ($SchemaPath)`nDetails: $err"
$result = $false
}
finally
{
$result
}
}
Load the function to memory, and use it to validate the MorePeople.xml from the two error examples. To trigger the validation, use the command below:
Test-XmlBySchema -XmlFile 'C:\Work\MorePeople.xml' -SchemaPath 'C:\Work\People.xsd' -Verbose
The actual results depend on the content of MorePeople.xml.
Let’s see two examples. Notice that when MorePeople.xml is error-free, the function above will return True
.
When the MorePeople.xml file contains wrong data (the Country key misspelled as County), the function will return some failure details and return False
.
As you can see, the error specified on the verbose output is very informative: It directs to the culprit file and points to the exact component in it where the problem occurred.
Fine Tuning the Validation Schema
Let’s take a look at the schema file, then see how we can make it even better.
The schema created by the New-XmlSchema
by default is below:
<?xml version="1.0" standalone="yes"?>
<xs:schema id="People" xmlns="" xmlns:xs="<http://www.w3.org/2001/XMLSchema>" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xs:element name="People" msdata:IsDataSet="true" msdata:UseCurrentLocale="true">
<xs:complexType>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="Person">
<xs:complexType>
<xs:attribute name="Name" type="xs:string" />
<xs:attribute name="Country" type="xs:string" />
<xs:attribute name="IsAdmin" type="xs:string" />
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
</xs:element>
</xs:schema>
The default schema above is good, but not perfect. Indeed it has caught the typo with the Country attribute. But, if you leave the schema as-is, in case other expectations you might have are not met – these issues will not be reported as errors by the Test-XmlBySchema
validation. Let’s solve this.
The table below presents some cases that are not considered validation errors and will go unnoticed by Test-XmlBySchema
. On each row, the right column shows how to manually change the schema to add support for the necessary protection.
A modified version of the schema file, with all protections added, is shown right after the table.
Adding setting to the default schema – examples
Required Behavior | Relevant setting on the schema file |
At least one Person element | Set the minOccurs value of the Person element to 1 |
Make Name, Country and IsAdmin attributes mandatory | Add use=”required” to each of these attributes declaration |
Allow only true/false values for IsAdmin | Set type=”xs:boolean” for the IsAdmin declaration |
Allow only Country names between 3 and 40 characters | Use the xs:restriction (see detailed explanation after the schema text) |
<?xml version="1.0" standalone="yes"?>
<xs:schema id="People" xmlns="" xmlns:xs="<http://www.w3.org/2001/XMLSchema>" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
<xs:element name="People" msdata:IsDataSet="true" msdata:UseCurrentLocale="true">
<xs:complexType>
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="Person">
<xs:complexType>
<xs:attribute name="Name" type="xs:string" use="required" />
<xs:attribute name="Country" use="required">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:minLength value="3"/>
<xs:maxLength value="40"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
<xs:attribute name="IsAdmin" type="xs:boolean" use="required" />
</xs:complexType>
</xs:element>
</xs:choice>
</xs:complexType>
</xs:element>
</xs:schema>
With the boolean restriction in place for the IsAdmin attribute in the example, its value must be a lower case true or false.
String Length Validation with xs:restriction
The string length validation is a bit complex. So, even though it is shown above as part of the modified schema, it deserves a little more focus.
The original schema item for the Country attribute (after having manually added the use=”required”), is as below:
<xs:attribute name="Country" type="xs:string" use="required" />
To add the length protection you should add the <xs:simpleType>
element, and within it, the <xs:restriction base="xs:string">
. This restriction in turn contains the required limits declared on xs:minLength
and on xs:minLength
.
Following all these changes the final xs:attribute
declaration has grown from a single line to a giant 8 lines node:
<xs:attribute name="Country">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:minLength value="3"/>
<xs:maxLength value="40"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
If your head does not spin after the explanation above, you earned the right to see the validation in action. To do that let’s intentionally shorten the value Canada to a two characters syllable: Ca
With the short country name in place, and MorePeople.xml saved, you are ready to run the validation command below:
Test-XmlBySchema -XmlFile 'C:\Work\MorePeople.xml' -SchemaPath 'C:\Work\People.xsd' -Verbose
and the results indeed show the complex schema has done its work:
XML schema validation can grow in complexity and validate just about any pattern you can think of, especially when combined with regular expressions.
Announcing a Free LIVE training – Starting your PowerShell Journey – presented by Johan Arwidmark. Understand how PowerShell skills enhance your IT career, learn where to start with PowerShell, build your first scripts, and ask Johan questions directly in a live training environment.