Scala test drive
Last week I got together with some coworkers to learn Scala and write some code. Since it was our first gathering, we kept it very simple and tried to really understand how the Scala code works and how it is different from Java.
Before the session, my day job gave me a very simple programming task, to read first and last names out of a directory of xml files and write them to a CSV file along with the name of the xml file it came from. There are many better and easier ways to do this, but I wrote it in Java, intending to rewrite it in Scala with my group.
Here’s the Java code:
import java.io.File; import java.io.FileOutputStream; import java.io.FileFilter; import java.io.IOException; import java.io.OutputStreamWriter; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.parsers.ParserConfigurationException; import javax.xml.xpath.XPath; import javax.xml.xpath.XPathConstants; import javax.xml.xpath.XPathExpressionException; import javax.xml.xpath.XPathFactory; import org.w3c.dom.Document; import org.w3c.dom.NodeList; import org.xml.sax.SAXException; public class OTHExtractor { private static OutputStreamWriter writer = null; public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, XPathExpressionException { writer = new OutputStreamWriter(new FileOutputStream(new File("names.java.txt"))); try { File[] files = new File(".").listFiles(new FileFilter() { public boolean accept(File file) { return file.getName().endsWith(".xml"); } }); for (File file : files) { process(file); } } finally { writer.flush(); writer.close(); } System.out.println("done"); } private static void process(File xmlFile) throws ParserConfigurationException, SAXException, IOException, XPathExpressionException { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(xmlFile); XPathFactory xpFactory = XPathFactory.newInstance(); XPath xpath = xpFactory.newXPath(); NodeList nodeList = (NodeList)xpath.evaluate("//PersonGivenName/text()", doc, XPathConstants.NODESET); String givenName = ""; if (nodeList != null && nodeList.getLength() > 0) { givenName = nodeList.item(0).getNodeValue(); } nodeList = (NodeList)xpath.evaluate("//PersonSurName/text()", doc, XPathConstants.NODESET); String surName = ""; if (nodeList != null && nodeList.getLength() > 0) { surName = nodeList.item(0).getNodeValue(); } writer.write(givenName + ", " + surName + ", " + xmlFile.getName() + "\n"); } }
And here’s the Scala code we wrote to replace it:
import java.io.PrintWriter import scala.xml.XML import java.io.File object OTHExtractorScala { def main(args:Array[String]) = { val root = new File(".") val files = root.listFiles().filter(file => file.getName.endsWith(".xml")) val writer = new PrintWriter(new File("names.txt")) try { files foreach (file => writer.write(process(file))) } finally { writer.flush writer.close } println("done") } def process(file:File) = { val names = parseXML(file) names._1 + ", " + names._2 + ", " + names._3 + "\n" } def parseXML(file:File) = { val xml = XML.loadFile(file) ((xml\\"PersonGivenName").text, (xml\\"PersonSurName").text, file.getName) } }
That’s 64 lines of verbose code whittled down to 29 very readable lines. Not bad!
Let’s take a closer look at some of this fancy Scala code.
val files = root listFiles() filter(file => file.getName.endsWith(".xml"))
The first interesting thing about this line of code is that the call to filter the result of root’s listFiles call. We skipped the dots! You’re allowed to skip the dot and the parentheses when you call a function with zero or one arguments. Here we had to include the parentheses after listFiles so Scala would know that we’re calling a function called filter and not sending a value called filter to the listFiles function.
Next up is file => file.getName.endsWith(".xml"))
. What is this? This is an anonymous function. The bit before the => is the argument this function takes, in this case file. The part after the => is the body of the function. The Scala Array object’s filter method takes a function that returns Boolean. So this is a much slicker way of getting just the xml files than Java’s FileFilter interface.
The other thing I want to point out is the tuple. The parseXML method returns a tuple where the first element is the given name, the second element is the surname, and the third element is the file name. In Java we’d have to package them into a collection of some sort, or create an inner class to hold the values. Tuples make this much easier. The process method grabs the elements from the tuple names and turns them into a concatenated String separated by commas.
Oh, one other thing. Scala’s XML module ROCKS. Look how much easier it is to run XPath queries on documents. One line of code creates the XML document, one line of code finds the elements in the document. We needed a lot more verbose Java to do that.