Let us assume that the file I am working with is the master data file of 100,000 employees. At any given point of time I want to find out how many employees live in a certain zip code.
Step 1 is to load the web address of the XML file in a vector.
fileURL < - "http://www.website.com/filename.xml"
Step 2 is to load all the content of the XML file in another vector.
documentcontent <- xmlTreeParse(fileURL, userInternal=TRUE)
Step 3 is to parse the root node of the XML content and store it in another vector.
rootNode <- xmlRoot(documentcontent)
Step 4 is to extract all zip codes into a vector.
allzipcodes <- xpathSApply(rootNode, "//zipcode", xmlValue)
Step 5 is to count the number of people who have the zip code "90210".
sum(allzipcodes == "90210")
In 5 simple steps you have performed meaninful data extraction from XML data, which normally requires very sophisticated and costly tools.
To perform data extraction like this, you will need some basic understanding of XML and some logical thinking. If you are a cloud professional services or an SAP ERP HCM functional consultant, I believe you can perform basic data extraction like the one I described below using R, with a little bit of effort .