Techniques and Workarounds

Extracting Data from an XML Snippet

There are times when it is necessary to obtain data from the source record where the data consists of a set of correlated values. This could be a set of attribute value pairs, or any other matched set of data.

Under these circumstances, the results from a query can include an XML element from the record that contains the set of correlated values. This XML snippet can be parsed to extract the desired set of data so that it is represented in a more convenient form for further processing. The following Python function is one example of how a correlated set of data can be extracted from the returned XML element. In other words, it creates and returns Python dictionaries that can then be used for accessing values needed in functions.

Example:

#This function will return two dictionaries as a tuple
#The first dictionary is all of the attributes and their values
#The second dictionary is all of the xml tags and their values
def parse_xml_tags(xml_string):
    '''xml_string = '<firstName attr1="test>John</firstName><lastName>Doe</lastName>
    will return ({'attr1': 'test'}, {'lastName': 'Doe', 'firstName': 'John'})'''
    import re
    attribute_value_pairs = dict(re.findall(r'(\w*)="(.*?)"', xml_string))
    xml_values = dict(re.findall(r'<(\w*).*?>(.*?)</.*?>', xml_string))
    return (attribute_value_pairs, xml_values)

The code does not solve the problem, but suggests a technique that can be used in writing a compound query that can process the query output and generate an output file that is compatible with the tool the end user may wish to use (such as Excel).