PowerShell to Parse XML: Read and Validate

Published:12 January 2021 - 12 min. read

Azure Cloud Labs: these FREE, on‑demand Azure Cloud Labs will get you into a real‑world environment and account, walking you through step‑by‑step how to best protect, secure, and recover Azure data.

XML is all over the place. Despite its annoying use of angle brackets, XML format is still widely used. Configuration files, RSS feeds, Office files (the ‘x’ in the .docx) are just a partial list. Using PowerShell to parse XML files is an essential step in your PowerShell journey.

This tutorial will show you how PowerShell parse XML files and validate them. This will walk you from zero to hero for all aspects of getting and evaluating XML data. You will be given tools that will help you validate XML data integrity and stop faulty data right at the gate of your scripts!

Announcing a Free LIVE training – Starting your PowerShell Journey – presented by Johan Arwidmark. Understand how PowerShell skills enhance your IT career, learn where to start with PowerShell, build your first scripts, and ask Johan questions directly in a live training environment.

Prerequisites

To follow along with the presented material, you should have:

  • PowerShell version 3.0 and above. The examples were created on Windows PowerShell v5.1
  • Notepad++, Visual Studio Code or another text editor that understands XML.

Parsing Powershell XML Elements with Select-Xml

Let’s first cover one of the most popular and easiest ways to use PowerShell to parse XML and that’s with Select-Xml. The Select-Xml cmdlet allows you to provide an XML file or string along with a “filter” known as XPath to pull out specific information.

XPath is a chain of element names. It uses a “path like” syntax to identify and navigate nodes in an XML document.

Let’s say you have an XML file with a bunch of computers and would like to use PowerShell to parse this XML file. Each computer has various elements like name, IP address and an Include element for inclusion in a report.

An element is an XML portion with an opening tag and a closing tag, possibly with some text in-between, such as <Name>SRV-01</Name>

<Computers>
	<Computer>
		<Name>SRV-01</Name>
		<Ip>127.0.0.1</Ip>
		<Include>true</Include>
	</Computer>	
	<Computer>
		<Name>SRV-02</Name>
		<Ip>192.168.0.102</Ip>
		<Include>false</Include>
	</Computer>	
	<Computer>
		<Name>SRV-03</Name>
		<Ip>192.168.0.103</Ip>
		<Include>true</Include>
	</Computer>	
</Computers>

You’d like to use PowerShell to parse this XML file get the computer names. To do that, you could use the Select-Xml command.

In the file above, the computer names appear in the inner text (InnerXML) of the Name element.

InnerXML is the text between the two element’s tags.

To find the computer names, you’d first provide the appropriate XPath (/Computers/Computer/Name). This XPath syntax would return only the Name nodes under the Computer elements. Then to only get the InnerXML of each Name element, reach the Node.InnerXML property on each element with a ForEach-Object loop.

Select-Xml -Path C:\Work\computers-elm.xml -XPath '/Computers/Computer/Name' | ForEach-Object { $_.Node.InnerXML }

Using PowerShell to Parse XML Attributes with Select-Xml

Now let’s address this problem of finding computer names from a different angle. This time, instead of the computer descriptors represented with XML elements, they are represented with XML attributes.

An attribute is a key/value portion such as name="SRV-01" . Attributes always appear within the opening tag, right after the tag name.

Below is the XML file with computer descriptors represented with attributes. You can now see each descriptor as an attribute rather than an element.

<Computers>
	<Computer name="SRV-01" ip="127.0.0.1" include="true" />
	<Computer name="SRV-02" ip="192.168.0.102" include="false" />
	<Computer name="SRV-03" ip="192.168.0.103" include="true" />
</Computers>

Since each descriptor is an attribute this time, tweak the XPath a little bit to only find the Computer elements. Then, using a ForEach-Object cmdlet again, find the value of the name attribute.

Select-Xml -Path C:\Work\computers-attr.xml -XPath '/Computers/Computer' | ForEach-Object { $_.Node.name }

And indeed, this also brings the same results: SRV-01, SRV-02 and SRV-03 :

Reading data using Select-Xml
Reading data using Select-Xml

In both cases, whether you are reading elements or attributes, the syntax of Select-Xml is cumbersome: it forces you to use the XPath parameter, then to pipe the result to a loop, and finally to look for the data under the Node property.

Luckily, PowerShell offers a more convenient and intuitive way to read XML files. PowerShell lets you read XML files and convert them to XML objects.

Related: Using PowerShell Data Types Accelerators to Speed up Coding

Casting XML Strings to Objects

Another way to use PowerShell to parse XML is to convert that XML to objects. The easiest way to do this is with the [xml] type accelerator.

By prefixing the variable names with [xml], PowerShell converts the original plain text XML into objects you can then work with.

[xml]$xmlElm = Get-Content -Path C:\Work\computers-elm.xml
[xml]$xmlAttr = Get-Content -Path C:\Work\computers-attr.xml

Reading XML Object Elements

Now both the $xmlElm and $xmlAttr variables are XML objects allowing you to reference properties via dot notation. Perhaps you need to find the IP address of each computer element. Since the XML file is an object, you can so by simply referencing the IP element.

$xmlElm.Computers.Computer.ip

Starting from PowerShell version 3.0, the XML object gets the attribute value with the same syntax used for reading the element’s inner text. Therefore, the IP addresses’ values are read from the attributes file with the exactly same syntax as the elements file.

Reading XML Attributes

Using exactly the same dot notation, you can read XML attributes as well despite differences in the XML structure.

$xmlElm.Computers.Computer.ip
$xmlAttr.Computers.Computer.ip

And the results below show that both got the same data, each one from its corresponding file:

PowerShell to Parse XML : Reading XML Attributes
PowerShell to Parse XML : Reading XML Attributes

Moreover, with the object, once it is loaded to memory, you even get IntelliSense to tab-complete if you’re using the PowerShell ISE. For example as shown below.

IntelliSense and tab completion for XML object
IntelliSense and tab completion for XML object

Iterating Through XML Data

Once you get around reading XML files directly to XML object (by taking advantage of the [xml] type accelerator), you have all the full power of PowerShell objects.

Say, for example, you are required to loop through all the computers that appear in the XML file with the include=”true” attribute to check their connection status. The code below shows how it can be done.

This script is:

  • Reading the file and casting it to an XML object.
  • Iterating through the relevant computers to get their connection status.
  • Finally, sending the result output object to the pipeline.
 ## casting the file text to an XML object
 [xml]$xmlAttr = Get-Content -Path C:\Work\computers-attr.xml

 ## looping through computers set with include="true"
 $xmlAttr.Computers.Computer | Where-Object include -eq 'true' |  ForEach-Object {
     ## see if the current computer is online
     if(Test-Connection -ComputerName $_.ip -Count 1 -Quiet)
     {
         $status = 'Connection OK'
     }
     else
     {
         $status = 'No Connection'
     }

     ## output the result object
     [pscustomobject]@{
         Name = $_.name
         Ip = $_.ip
         Status = $status
     }
 }

And the results of the script above are shown below:

Connection status results
Connection status results

XML Schemas

In the previous section, you saw two different XML files representing a data set in two different ways. Those ways are called XML schemas. An XML Schema defines the legal building blocks of a specific XML document:

  • The names of elements and attributes that can appear in that specific document.
  • The number and order of child elements.
  • The data types for elements and attributes.

The schema essentially defines the structure of the XML.

Validating XML Data

An XML file may have the correct syntax (editors like Notepad++ will complain if not), yet its data might not match the project requirement. This is where the schema comes in to play. When you lean on XML data, you must ensure that all the data is valid according to the defined schema.

The last thing you want is to discover data errors in runtime, 500 lines deep in the script’s middle. It might have already performed some irreversible operations on the file system and the registry, by that time.

So, how can you check in advance that the data is correct? Let’s see first some possible error types.

Possible Errors in XML Data

Generally speaking, errors found on XML files belong to one of two categories; metadata errors and errors in the data itself.

XML Metadata Errors

This file MorePeople.xml below, is perfectly valid syntax-wise. You can see below that the file has a single People element (the root element) with three Person elements inside. This structure is perfectly acceptable. Still, it contains one exception, can you see it?

<People>
	<Person Name="Debra" County="Canada" IsAdmin="true" />
	<Person Name="Jacob" Country="Israel" IsAdmin="true" />
	<Person Name="Olivia" Country="Cyprus" IsAdmin="false" />
</People>

Do not worry if you didn’t, it is just hiding. The problem is found in the first inner element:

What should have been a Country was misspelled, and Canada was degraded to a County.

Errors in the XML Data

After fixing the Country issue on MorePeople.xml, another problem sneaked in:

<People>
	<Person Name="Debra" Country="Canada" IsAdmin="yes" />
	<Person Name="Jacob" Country="Israel" IsAdmin="true" />
	<Person Name="Olivia" Country="Cyprus" IsAdmin="false" />
</People>

The metadata, i.e., the elements and attributes, are fine. So what is wrong? This time the problem, again on the first Person line, is in one of the values. Someone decided that yes is a good enough substitute for true – but code like below will fail to get the first element as it is looking for true, not for yes:

$peopleXml.People.Person | Where-Object IsAdmin -eq 'true'

Creating an XML Schema

Now that you know the types of errors that may occur, it is time to show how a schema file helps. The first step is creating a sample data file. The sample can be the smallest example and contain nothing but a single inner element. For the above examples, let’s create a sample file like this People.xml :

<People>
	<Person Name="Jeff" Country="USA" IsAdmin="true" />
</People>

Now build a PowerShell function below and use it with the sample data to create the .xsd schema.

function New-XmlSchema
{
    [CmdletBinding()]
    Param
    (
        [Parameter(Mandatory)]
        [ValidateScript({ Test-Path -Path $_ })]
        [ValidatePattern('\.xml')]
        [string]$XmlExample
    ) 

    $item = Get-Item $XmlExample
    $dir = $item.Directory
    $name = $item.Name
    $baseName = $item.BaseName

    ## build the schema path by replacing '.xml' with '.xsd'
    $SchemaPath = "$dir\$baseName.xsd"

    try
    {
        $dataSet = New-Object -TypeName System.Data.DataSet
        $dataSet.ReadXml($XmlExample) | Out-Null
        $dataSet.WriteXmlSchema($SchemaPath)
        
        ## show the resulted schema file
        Get-Item $SchemaPath
    }
    catch
    {
        $err = $_.Exception.Message
        Write-Host "Failed to create XML schema for $XmlExample`nDetails: $err" -ForegroundColor Red
    }
}

Copy the function to your ISE or your favorite Powershell editor, load it into memory and use it to create the schema. With the function loaded, the code to create the schema is this one-liner:

New-XmlSchema -XmlExample 'C:\Work\People.xml'

The results will show the path to the newly created schema:

Creating XML schema from a sample data file
Creating XML schema from a sample data file

Using the Schema File to Validate Your Data

Take a look at the location specified by the results above. If the .xsd is there, you are on the right way to see the validation in action. For the confirmation step, use the function below:

function Test-XmlBySchema
{
    [CmdletBinding()]
    [OutputType([bool])]
    param
    (
        [Parameter(Mandatory)]
        [ValidateScript({ Test-Path -Path $_ })]
        [ValidatePattern('\.xml')]
        [string]$XmlFile,
        [Parameter(Mandatory)]
        [ValidateScript({ Test-Path -Path $_ })]
        [ValidatePattern('\.xsd')]
        [string]$SchemaPath
    )

    try
    {
        [xml]$xml = Get-Content $XmlFile
        $xml.Schemas.Add('', $SchemaPath) | Out-Null
        $xml.Validate($null)
        Write-Verbose "Successfully validated $XmlFile against schema ($SchemaPath)"
        $result = $true
    }
    catch
    {
        $err = $_.Exception.Message
        Write-Verbose "Failed to validate $XmlFile against schema ($SchemaPath)`nDetails: $err"
        $result = $false
    }
    finally
    {
        $result
    }
}

Load the function to memory, and use it to validate the MorePeople.xml from the two error examples. To trigger the validation, use the command below:

Test-XmlBySchema -XmlFile 'C:\Work\MorePeople.xml' -SchemaPath 'C:\Work\People.xsd' -Verbose

The actual results depend on the content of MorePeople.xml.

Let’s see two examples. Notice that when MorePeople.xml is error-free, the function above will return True.

Validation success
Validation success

When the MorePeople.xml file contains wrong data (the Country key misspelled as County), the function will return some failure details and return False.

Validation failure - wrong attribute name detected
Validation failure – wrong attribute name detected

As you can see, the error specified on the verbose output is very informative: It directs to the culprit file and points to the exact component in it where the problem occurred.

Fine Tuning the Validation Schema

Let’s take a look at the schema file, then see how we can make it even better.

The schema created by the New-XmlSchema by default is below:

<?xml version="1.0" standalone="yes"?>
<xs:schema id="People" xmlns="" xmlns:xs="<http://www.w3.org/2001/XMLSchema>" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
  <xs:element name="People" msdata:IsDataSet="true" msdata:UseCurrentLocale="true">
    <xs:complexType>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element name="Person">
          <xs:complexType>
            <xs:attribute name="Name" type="xs:string" />
            <xs:attribute name="Country" type="xs:string" />
            <xs:attribute name="IsAdmin" type="xs:string" />
          </xs:complexType>
        </xs:element>
      </xs:choice>
    </xs:complexType>
  </xs:element>
</xs:schema>

The default schema above is good, but not perfect. Indeed it has caught the typo with the Country attribute. But, if you leave the schema as-is, in case other expectations you might have are not met – these issues will not be reported as errors by the Test-XmlBySchema validation. Let’s solve this.

The table below presents some cases that are not considered validation errors and will go unnoticed by Test-XmlBySchema. On each row, the right column shows how to manually change the schema to add support for the necessary protection.

A modified version of the schema file, with all protections added, is shown right after the table.

Adding setting to the default schema – examples

Required BehaviorRelevant setting on the schema file
At least one Person elementSet the minOccurs value of the Person element to 1
Make Name, Country and IsAdmin attributes mandatoryAdd use=”required” to each of these attributes declaration
Allow only true/false values for IsAdminSet type=”xs:boolean” for the IsAdmin declaration
Allow only Country names between 3 and 40 charactersUse the xs:restriction (see detailed explanation after the schema text)
<?xml version="1.0" standalone="yes"?>
<xs:schema id="People" xmlns="" xmlns:xs="<http://www.w3.org/2001/XMLSchema>" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata">
  <xs:element name="People" msdata:IsDataSet="true" msdata:UseCurrentLocale="true">
    <xs:complexType>
      <xs:choice minOccurs="0" maxOccurs="unbounded">
        <xs:element name="Person">
          <xs:complexType>
            <xs:attribute name="Name" type="xs:string" use="required" />
	    <xs:attribute name="Country" use="required">
	      <xs:simpleType>
		<xs:restriction base="xs:string">
		  <xs:minLength value="3"/>
		  <xs:maxLength value="40"/>
		</xs:restriction>
	      </xs:simpleType>
	    </xs:attribute>	
            <xs:attribute name="IsAdmin" type="xs:boolean" use="required" />
          </xs:complexType>
        </xs:element>
      </xs:choice>
    </xs:complexType>
  </xs:element>
</xs:schema>

With the boolean restriction in place for the IsAdmin attribute in the example, its value must be a lower case true or false.

String Length Validation with xs:restriction

The string length validation is a bit complex. So, even though it is shown above as part of the modified schema, it deserves a little more focus.

The original schema item for the Country attribute (after having manually added the use=”required”), is as below:

<xs:attribute name="Country" type="xs:string" use="required" />

To add the length protection you should add the <xs:simpleType> element, and within it, the <xs:restriction base="xs:string">. This restriction in turn contains the required limits declared on xs:minLength and on xs:minLength.

Following all these changes the final xs:attribute declaration has grown from a single line to a giant 8 lines node:

<xs:attribute name="Country">
  <xs:simpleType>
	<xs:restriction base="xs:string">
	  <xs:minLength value="3"/>
	  <xs:maxLength value="40"/>
	</xs:restriction>
  </xs:simpleType>
</xs:attribute>

If your head does not spin after the explanation above, you earned the right to see the validation in action. To do that let’s intentionally shorten the value Canada to a two characters syllable: Ca

With the short country name in place, and MorePeople.xml saved, you are ready to run the validation command below:

Test-XmlBySchema -XmlFile 'C:\Work\MorePeople.xml' -SchemaPath 'C:\Work\People.xsd' -Verbose

and the results indeed show the complex schema has done its work:

A string length error detected by the schema validation
A string length error detected by the schema validation

XML schema validation can grow in complexity and validate just about any pattern you can think of, especially when combined with regular expressions.

Announcing a Free LIVE training – Starting your PowerShell Journey – presented by Johan Arwidmark. Understand how PowerShell skills enhance your IT career, learn where to start with PowerShell, build your first scripts, and ask Johan questions directly in a live training environment.

Hate ads? Want to support the writer? Get many of our tutorials packaged as an ATA Guidebook.

Explore ATA Guidebooks

Looks like you're offline!