Archive

Archive for November, 2010

Scala test drive

November 22, 2010 Leave a comment

Last week I got together with some coworkers to learn Scala and write some code. Since it was our first gathering, we kept it very simple and tried to really understand how the Scala code works and how it is different from Java.

Before the session, my day job gave me a very simple programming task, to read first and last names out of a directory of xml files and write them to a CSV file along with the name of the xml file it came from. There are many better and easier ways to do this, but I wrote it in Java, intending to rewrite it in Scala with my group.

Here’s the Java code:

import java.io.File;
import java.io.FileOutputStream;
import java.io.FileFilter;
import java.io.IOException;
import java.io.OutputStreamWriter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;

public class OTHExtractor {
    private static OutputStreamWriter writer = null;
    
    public static void main(String[] args) throws ParserConfigurationException,
            SAXException, IOException, XPathExpressionException {
        writer = new OutputStreamWriter(new FileOutputStream(new File("names.java.txt")));
        try {
            File[] files = new File(".").listFiles(new FileFilter() {
                public boolean accept(File file) { 
                    return file.getName().endsWith(".xml");
                } 
            });
            for (File file : files) {
                process(file);
            }
        } finally {
            writer.flush();
            writer.close();
        }
        System.out.println("done");
    }

    private static void process(File xmlFile)
            throws ParserConfigurationException, SAXException, IOException,
            XPathExpressionException {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        Document doc = db.parse(xmlFile);

        XPathFactory xpFactory = XPathFactory.newInstance();
        XPath xpath = xpFactory.newXPath();

        NodeList nodeList = (NodeList)xpath.evaluate("//PersonGivenName/text()", 
                doc, XPathConstants.NODESET);
        String givenName = "";
        if (nodeList != null && nodeList.getLength() > 0) {
            givenName = nodeList.item(0).getNodeValue();
        }

        nodeList = (NodeList)xpath.evaluate("//PersonSurName/text()", 
                doc, XPathConstants.NODESET);
        String surName = "";
        if (nodeList != null && nodeList.getLength() > 0) {
            surName = nodeList.item(0).getNodeValue();
        }
        writer.write(givenName + ", " + surName + ", " + xmlFile.getName() + "\n");
    }
}

And here’s the Scala code we wrote to replace it:

import java.io.PrintWriter
import scala.xml.XML
import java.io.File

object OTHExtractorScala {
  def main(args:Array[String]) = {
    val root = new File(".")
    
    val files = root.listFiles().filter(file => file.getName.endsWith(".xml"))
    val writer = new PrintWriter(new File("names.txt"))
    try {
      files foreach (file => writer.write(process(file)))
    } finally {
      writer.flush
      writer.close
    }
    println("done")
  }
  
  def process(file:File) = {
    val names = parseXML(file)
    names._1 + ", " + names._2 + ", " + names._3 + "\n"
  }
  
  def parseXML(file:File) = {
    val xml = XML.loadFile(file)
    ((xml\\"PersonGivenName").text, (xml\\"PersonSurName").text, file.getName)
  }
}

That’s 64 lines of verbose code whittled down to 29 very readable lines. Not bad!

Let’s take a closer look at some of this fancy Scala code.

    val files = root listFiles() filter(file => file.getName.endsWith(".xml"))

The first interesting thing about this line of code is that the call to filter the result of root’s listFiles call. We skipped the dots! You’re allowed to skip the dot and the parentheses when you call a function with zero or one arguments. Here we had to include the parentheses after listFiles so Scala would know that we’re calling a function called filter and not sending a value called filter to the listFiles function.

Next up is file => file.getName.endsWith(".xml")). What is this? This is an anonymous function. The bit before the => is the argument this function takes, in this case file. The part after the => is the body of the function. The Scala Array object’s filter method takes a function that returns Boolean. So this is a much slicker way of getting just the xml files than Java’s FileFilter interface.

The other thing I want to point out is the tuple. The parseXML method returns a tuple where the first element is the given name, the second element is the surname, and the third element is the file name. In Java we’d have to package them into a collection of some sort, or create an inner class to hold the values. Tuples make this much easier. The process method grabs the elements from the tuple names and turns them into a concatenated String separated by commas.

Oh, one other thing. Scala’s XML module ROCKS. Look how much easier it is to run XPath queries on documents. One line of code creates the XML document, one line of code finds the elements in the document. We needed a lot more verbose Java to do that.

Categories: code Tags:

eclipse.ini

November 3, 2010 Leave a comment

Eclipse is great, but it takes a while to get started on my work laptop. I decided to search around to see what I could do to speed it up.

The first tip I found was to make sure you were running Eclipse under the most recent JDK. By default Eclipse uses the first JVM it finds in your path. The server components on this project require Java 5, not 6, so many developers have to configure their systems to allow Maven to use Java 5. They can still install Java 6 and use it with Eclipse with the following argument:

-vm "C:\Program Files\Java\jdk1.6.0_22\bin\javaw.exe"

You can either put that in the shortcut you use to start Eclipse, or you can put it in the Eclipse.ini file.

The second tip is to disable unnecessary plug-ins on startup. In Eclipse 3.6 (Helios) this is configurable in Preferences -> General -> Startup and Shutdown. I don’t think a lot of people use Mylyn, so that’s a feature to disable if you don’t.

The third tip is to customize the eclipse.ini file. I found this great post on StackOverflow that explains the best options for entries in eclipse.ini for any recent version of Eclipse you are using.

For an in-depth look at all the Eclipse runtime options, check out the Eclipse documentation.

Categories: code Tags:

Local Variables

November 3, 2010 Leave a comment

A few years ago a colleague criticized some code I wrote for local variable inefficiency. Here’s an example of what he did not like:

for (int i = 0; i < 10; i++)
{
	String s = String.valueOf(i);
}

He said that Java allocated 10 Strings when that code ran. A better way was this:

String s;
for (int i = 0; i < 10; i++)
{
	s = String.valueOf(i);
}

So that’s the way I’ve been writing Java code for years now, thinking I am being super efficient.

Well, joke’s on me. Today I found this blog post and learned that his assumption is completely incorrect. The first way above is better because it makes the code more readable and prevents possible variable scoping problems.

Categories: code Tags: