Monday, July 25, 2011

Switched to libxml2 ~ 25% less time and a lot less method calling

Since I was still disappointed with the XML parsing speed on the iTouch I decided to give libxml2 a shot to replace NSXMLParser. I got a 25% boost in processing speed, less memory thrashing and less method messaging.

iTouch Rev 1 hardware
Original TouchXML code (full DOM) 79  seconds
SAX parsing NSXMLParser           42  seconds
SAX parsing libxml2               30  seconds


iTouch Rev 3 hardware
SAX parsing libxml2              7.5 seconds

This is for 17,000 XML records in NSData format in memory to be written to an SQLite table. I still wish it could run faster but a speed up from 80 seconds a table to 30 seconds is pretty darn massive. In the simulator it takes 0.5 seconds. I have a feeling it could be a bit faster if I wrote out the records in this code instead of doing the callback but this is much more generic and reusable.

This was much easier to write than I thought it would be although it took a lot of research to get just the lines I needed. The pure documentation for libxml2 is not aimed at Objective C programmers so I had to find bits and pieces here and there. One site would talk about parsing from a URL and another how to do it from memory. One would show a big structure with statics for callbacks and another with a SWITCH / CASE statement. None of them showed how to clean up memory at the end.

Here is what the final code looks like. Its job is to take XML looking like this crappy made up sample (real stuff is diagnosis codes from doctors but the structure is the same):

<itemnames>
  <itemname>
    <id>10</id>
    <name>soup</name>
    <desc>soup you fool</desc>
  </itemname>
  <itemname>
    <id>15</id>
    <name>cat food</name>
    <desc>food for your cat</desc>
  </itemname>
</itemnames>

Passing in @"itemname" for elementName your SAXParserDeletgate would get two callbacks, each with a dictionary holding the following keys: id, name, desc and their associated values. As I get the callback I convert the dictionary items to an SQLite insert command (that code is not shown).

This is a very special case where I am simply getting a list of data. The only items in my XML data are records. All the data is in the TEXT area. You should be able to look up the XML_READER_???? info to parse attributes or other aspects of the XML. Already feels like I spent more time trying to put this into a blog posting than it did to get the code working but I don't want my efforts to go to waste.

This is my first attempt a using a syntax highlight editor in the blog so sorry if any of the code is screwy when you copy / paste it out of here. I picked C++ syntax as the closest as the syntax highlighter helper code I am using did not have anything specific for Objective C.

I really hope this can help someone out and show you how easy it is to actually use nearly straight C code to help speed up Objective C when parsing large XML data record sets. I ran it against the memory leak checker and it came out clean. The key was the final call to xmlFreeTextReader(reader);


If it does help please post a simple thank you comment. Always nice to know when the effort makes the programming life of another easier.


You MUST go to Project - targets (your app) -> Build Phase -> Link Binary with Libraries -> (+) to add libxml2.2.7.3.dylib to your overall iPhone project to be able to use libxml2. I am using Xcode 4.03, don't know how to do that in older versions but I assume it means adding it to the framework area. Doing the (+) way in Xcode 4 auto added the headers to the correct path. If it is not version 2.2.7.3 I don't think that will be an issue. It appears I am using some pretty basic libxml2 calls.

Call back delegate
//  SAXParserDelegate.h

#import <foundation/foundation.h>

@protocol SAXParserDelegate 
- (void) SAXDictionaryElement:(NSDictionary *)dictionary;
@end

Header file
//  SAXParser.h

#import <foundation/foundation.h>
#import "SAXParserDelegate.h"

@interface SAXParser : NSObject {
    id<saxparserdelegate> saxDelegate;
    NSData *data;
    NSString *name;
    NSMutableDictionary *dictionary;
    
    bool inZone;
    NSString *keyName;
    NSString *valueData;
}

- (id) initWithData:(NSData *)xmlData elementName:(NSString *)elementName delegate:(id<saxparserdelegate>)delegate;
- (void) parse;

@end

And finally the parsing code itself

//  SAXParser.m

#import "SAXParser.h"
#import <libxml/parser.h>
#import <libxml/tree.h>
#import <libxml/xmlreader.h>

@implementation SAXParser

// Initial with data, element name to find and callback delegate
- (id) initWithData:(NSData *)xmlData elementName:(NSString *)elementName delegate:(id<saxparserdelegate>)delegate{
    self = [super init];
    if (self != nil) {
        data = xmlData;
        name = elementName;
        saxDelegate = delegate;
        inZone = NO;
        
        dictionary = [[NSMutableDictionary alloc] init];
    }
    return self;
}

// Clean up any memory we used
- (void) dealloc {
    [keyName release];
    [dictionary release];
    
    [super dealloc];
}

// Start the data parse
- (void) parse {
    xmlTextReaderPtr reader = xmlReaderForMemory([data bytes], [data length], NULL, NULL, 
                                                 (XML_PARSE_NOBLANKS | XML_PARSE_NOCDATA | XML_PARSE_NOERROR | XML_PARSE_NOWARNING));

    if (!reader) {
        NSLog(@"Failed to create xmlTextReader");
        return;
    } 
    
    while (xmlTextReaderRead(reader)) {
        switch (xmlTextReaderNodeType(reader)) {
            case XML_READER_TYPE_ELEMENT: {
                NSString *elementName = [NSString stringWithCString:(char *)xmlTextReaderConstName(reader) 
                                                    encoding:NSUTF8StringEncoding];                
                if (!inZone && [elementName compare:name] == NSOrderedSame) {
                    inZone = YES;
                    [dictionary removeAllObjects];
                    [keyName release];
                    keyName = nil;
                } else if (inZone) {
                    [keyName release];
                    keyName = [elementName copy];
                }
            }
            break;
                
            case XML_READER_TYPE_TEXT: {
                if (inZone && keyName != nil) {
                    NSString *string = [NSString stringWithCString:(char *)xmlTextReaderConstValue(reader) encoding:NSUTF8StringEncoding];
                    valueData = [string retain];
                }
            }
            break;
                
            case XML_READER_TYPE_END_ELEMENT: {
                if (inZone) {
                    NSString *elementName = [NSString stringWithCString:(char *)xmlTextReaderConstName(reader) 
                                                               encoding:NSUTF8StringEncoding];                
                    if ([elementName compare:name] == NSOrderedSame) {
                        [saxDelegate SAXDictionaryElement:dictionary];
                        [keyName release];
                        keyName = nil;
                        inZone = NO;
                    } else {
                        if (valueData != nil) {
                            [dictionary setObject:[[valueData copy] autorelease] forKey:keyName];
                            [valueData release];
                            valueData = nil;
                        } else {
                            [dictionary setObject:@"" forKey:keyName];
                        }
                    }
                }
            }
            break;
        }
    }
    
    xmlFreeTextReader(reader);
}

@end

No comments:

Post a Comment