contact Me

Use the form on the right to contact me.

You are welcome, to contact me regarding the topics of this page, my open source projects, or my work. Please use the contact form and leave a valid email address for me to respond to.

Thank you.

Egidestr. 9
44892 Bochum
Germany

/brain/dump

Random thoughts, bright ideas and interesting experiments. In short the ramblings of a fulltime nerd.

 

Select HTML elements with more than one css class using XPath

Jakob Westhoff

During a discussion on IRC with Thomas Weinert we asked ourselves how it would be possible to select HTML elements by a given css class, if it has multiple classes defined. Think of something like this:

<div class="foo bar baz">42</div>

This div element has got the three classes foo, bar and baz associated with it. If you want to select all HTML nodes with the class foo, this div element would be one of them.

The XPath expression to solve this selection problem might not be quite obvious.

After a little bit of thinking about the it I came up with the following solution:

//*[count( index-of( tokenize( @class, '\s+' ), '$classname' ) ) = 1]

This selection works quite well. Unfortunately it uses the functions tokenize and index-of, which are only available in XPath 2.0. Unfortunately this is not supported by PHP, which renders the expression above virtually useless for the scenario it should be used in.

Therefore I tried to think of something different, only using XPath 1.0 functions. The following expression is what I came up with:

//*[
  contains( normalize-space( @class ), ' $classname ' )
  or substring( normalize-space( @class ), 1, string-length( '$classname' ) + 1 ) = '$classname '
  or substring( normalize-space( @class ), string-length( @class ) - string-length( '$classname' ) ) = ' $classname'
  or @class = '$classname'
]

The normalize-spaces function takes care of replacing all tab and whitespace sequences with a single whitespace character. After that only four matchings are possible. The First of this disjunctions ensures a proper matching if the class is defined somewhere inside the class definition list. The second disjunction matches only classnames at the beginning of the class list, whereas the third one matches only classnames at the end of the list. The fourth one matches in case only one classname is defined. Unfortunately this kind of complexity is needed to ensure no partial classnames are matched.

You may download a hackish demonstration script here, which uses the presented expression in combination with PHP DOM to select nodes with a certain class