How to use MSXML properly in Delphi

Using MSXML in Delphi is quite simple, except couple of caveats:

Consider the following sample code:

procedure Proc1(Node: IXMLDOMNode);
var
  i: Integer;
begin
  for i := 0 to Node.childNodes.length - 1 do
  begin
    if Node.childNodes.item[i].nodeName = 'test' then
    begin
      Proc2(Node.childNodes.item[i]);
    end; 
  end;
end;

it kinda does the job, but if you put the following assertion (check line #05):

procedure Proc1(Node: IXMLDOMNode);
var
  i: Integer;
begin
  Assert(Node.childNodes = Node.childNodes);
  for i := 0 to Node.childNodes.length - 1 do
  begin
    if Node.childNodes.item[i].nodeName = 'test' then
    begin
      Proc2(Node.childNodes.item[i]);
    end; 
  end;
end;

the assertion will fail. The trick is that MSXML returns new instance of IXMLDOMNodeList on each call of Node.childNodes. This IXMLDOMNodeList is instantly released just after the call finishes.

So this code is not optimal (slow), more optimal code would be:

procedure Proc1(Node: IXMLDOMNode);
var
  ChildNodes: IXMLDOMNodeList;
  i: Integer;
begin
  ChildNodes := Node.childNodes;
  for i := 0 to ChildNodes.length - 1 do
  begin
    if ChildNodes.item[i].nodeName = 'test' then
    begin
      Proc2(ChildNodes.item[i]);
    end; 
  end;
end;

Put one more assertion (check line #09):

procedure Proc1(Node: IXMLDOMNode);
var
  ChildNodes: IXMLDOMNodeList;
  i: Integer;
begin
  ChildNodes := Node.childNodes;
  for i := 0 to ChildNodes.length - 1 do
  begin
    Assert(ChildNode.item[0] = ChildNode.item[0]);
    if ChildNodes.item[i].nodeName = 'test' then
    begin
      Proc2(ChildNodes.item[i]);
    end; 
  end;
end;

This assertion will fail as well, i.e. MSXML returns new instance of COM object for every ChildNode.item[i] call. So, more optimal code would be:

procedure Proc1(Node: IXMLDOMNode);
var
  ChildNodes: IXMLDOMNodeList;
  ChildNode: IXMLDOMNode;
  i: Integer;
begin
  ChildNodes := Node.childNodes;
  for i := 0 to ChildNodes.length - 1 do
  begin
    ChildNode := ChildNodes.item[i];
    if ChildNode.nodeName = 'test' then
    begin
      Proc2(ChildNode);
    end; 
  end;
end;

Generally, this rule applies to all objects MSXML will return: on each call, new instance of COM object will be generated. But why?

Use the source, Luke. If you recall, couple of years hackers have stolen Windows 2000 source code. Grab these sources from the torrents, unpack and explore private/inet/xml subfolder -- surprise, here are MSXML sources! :)

Brief studying of MSXML sources shows that MSXML internally is implemented as C++ parser, COM support is a bridge/adapter feature; for each return a new COM wrapper object is created.

Another important point is that MSXML uses linked lists to implement NodeList, not array lists. So, using DOMNodeList in an array list manner (methods item(), length) reduces performance.

Thus, the most optimal code would be:

procedure Proc1(Node: IXMLDOMNode);
var
  ChildNode: IXMLDOMNode;
  i: Integer;
begin
  ChildNode := Node.firstChild;
  while ChildNode <> nil do
  begin
    if ChildNode.nodeName = 'test' then
    begin
      Proc2(ChildNode);
    end;
    ChildNode := ChildNode.nextSibling; 
  end;
end;

P. S.
Java's default XML parser uses linked lists as well to implement org.w3c.dom.NodeList.

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.