General

How did the GeoAPI project get started, and what is its history?

The GeoAPI project emerged from the collaboration of several free software projects and from the work on various specifications at the Open Geospatial Consortium (OGC).

You can follow the early, pre-history of GeoAPI by reading the following three posts to the DigitalEarth.org website; at this point it had no name, only a goal of bringing together multiple Java GIS projects.

As you can see in part III, the OGC had just anounced a Geographic Objects initiative which intended to define Java interfaces for geographic software. This followed earlier work on the OGC Implementation Specification 01-009 Coordinate Transformation Services which included interfaces defined in the org.opengis namespace ultimately adopted by GeoAPI.

The GeoAPI project eventually formed to persue this work. The interfaces defined in the OGC specification 01-009 became GeoAPI version 0.1. GeoAPI 1.0 was released with the draft of OGC specification 03-064 GO-1 Application Objects. In May 2005, the final draft of the GO-1 specification, which included GeoAPI interfaces, was accepted as an OGC standard and the matching version of GeoAPI was released as version 2.0.

The GeoAPI working group of the OGC has formed more recently to formalized and continue the work of standardizing the most stable interfaces produced by the GeoAPI project.

[top]

What is the relationship between GeoAPI and OGC?

GeoAPI is closely tied to the OGC both in its origins and in its ongoing work.

The GeoAPI project is a collaboration of participants from various institutions and software communities. The GeoAPI project is developing a set of interfaces in the Java language to help software projects produce high quality geospatial software. The core interfaces follow closely the specifications produced in the 19100 series of the International Organization for Standradization (ISO) and by the OGC. The interfaces use the org.opengis namespace and copyright to the code is assigned to the OGC. The project started with the code produced by the OGC Implementation Specification 01-009 Coordinate Transformation Services and refactored this code in collaboration with the standradization work surrounding the OGC specification 03-064 GO-1 Application Objects.

The GeoAPI working group of the OGC is a separate effort made up principally of members of the OGC and formed to continue the work of formalizing the interfaces developed by the GeoAPI project as ratified standards of the OGC. The working group decided to start the GeoAPI Implementation Specification as a new standard focused exclusively on the interfaces produced by the GeoAPI project. In acknoledgement to the earlier work and to match the numbering scheme of GeoAPI, the first specification released under this name is expected to carry the 3.0 version number.

[top]

Why a standardized set of programming interfaces? Shouldn't OGC standards stick to web services only?

We believe that both approaches are complementary. Web services are efficient ways to publish geographic information using existing software. But some users need to build their own solution, for example as a wrapper on top of their own numerical model. Many existing software packages provide sophisticated developer toolkits, but each toolkit has its own learning curve, and one can not easily switch from one toolkit to another or mix components from different toolkits. Using standardized interfaces, a significant part of the API can stay constant across different toolkits, thus reducing both the learning curve (especially since the interfaces are derived from published abstract UML) and the interoperability pain points.

The situation is quite similar to JDBC (Java DataBase Connectivity)'s one. The fact that a high-level language already exists for database queries (SQL) doesn't means that low-level programming interfaces are not needed. JDBC interfaces have been created as a developer tools in complement to SQL, and they proven to be quite useful.

[top]

With standardization of interfaces, aren't you forcing a particular implementation?

We try to carefully avoid implementation-specific API. Again, JDBC is a good example of what we try to achieve. JDBC is an example of successful interfaces-only specification implemented by many vendors. Four categories of JDBC drivers exists (pure Java, wrappers around native code, etc.). Implementations exist for (in alphabetical order) Access, Derby, HSQL, MySQL, Oracle, PostgreSQL and many others.

It is important to stress out that GeoAPI is all about interfaces. Concrete classes must implement all methods declared in their interfaces, but those interfaces don't put any constraint on the class hierarchy. For example GeoAPI provides a MathTransform2D interface which extends MathTransform. In no way do implementation classes need to follow the same hierarchy. Actually, in the particular case of MathTransforms, they usually don't! A class implementing MathTransform2D doesn't need to extend a class implementing MathTransform. The only constraint is to implement all methods declared in the MathTransform2D interface and its parent interfaces.

[top]

Technical

Why don't you translate all OGC's UML into Java interfaces using some automatic script?

We tried that path at the beginning of GeoAPI project, and abandoned it. Automatic scripts provide useful starting points, but a lot of human intervention is still essential. The relationship between UML and Java interfaces is not always straightforward.

Example 1:

In the Coordinate Reference System (CRS) framework, a GeocentricCRS interface is defined. The ISO 19111's UML defines two associations for this class: usesCartesianCS and usesSphericalCS. In addition, this class inherits the usesCS association from its parent SingleCRS class. Translating this UML blindly into Java interfaces leads to three getter methods: getCartesianCS(), getSphericalCS() and getCS(). Now, lets look at the intend of those associations. The documentation said that one and only one of usesCartesianCS and usesSphericalCS can be defined for a given GeocentricCRS. In others words, we still have conseptually only one association (usesCS), but the type is constrained to CartesianCS or SphericalCS. In Java language, we feel preferable to keep only the getCS() method inherited from SingleCRS, and enforce the constraints at GeocentricCRS creation time (i.e. in CRSFactory). In addition, we follow the Java usage of avoiding abbreviations and renamed getCS() as getCoordinateSystem(). Of course, the constraints must be explained in GeocentricCRS's javadoc, which involve one more hand editing.

Example 2:

The XML schema defines two attributes (among other) in Layer: CRS and BoundingBox. Those two attributes can have an arbitrary amount of elements. From an automatic tool's point of view, they look like independent attributes and can be translated into getCRSs() and getBoundingBoxes() methods, each of them returning a List. However, reading the documentation, one can realize that those two methods form together a Map of CoordinateReferenceSystem keys with Envelope values. Whatever we should replace the two above-cited methods by a single one returning a Map is subject to debate. But it is reasonable to expect getCRSs() to returns a Set and getBoundingBoxes() to returns a Collection, so that implementations backed by a Map can associate them to Map.keySet() and Map.values() methods respectively.

[top]

Why do you favor Collections over arrays as a return type?

For performance, more orthogonal API and more freedom on the implementer side.

Performance (including memory usage)

Some robust implementations will want to protect their internal state against uncontrolled changes. In such implementations, getter methods need to make defensive copies of their mutable attributes (see Effective Java, chapter 6, item 24). Arrays are mutable objects; nothing prevent an user from writing PointArray.positions()[1000] = null, and thus altering the PointArray state if positions() was returning a direct reference to its internal array. The box below compares two ways to protect an implementation from changes. Note that in both case, the internal data are stored as an array but the getter return type differ.

Array return type Collection return type
public class PointArray {
    private Position[] p = ...;

    public Position[] positions() {
        return (Position[]) p.clone();
    }
}
public class PointArray {
    private Position[] p = ...;

    private List<Position> pl = Collections.unmodifiableList(Arrays.asList(p));

    public List<Position> positions() {
        return pl;
    }
}

Since the collection is read-only in the above example, it doesn't need to be cloned (note: the elements in an array or collection may still mutable, but this is a separated topic). The collection in this example is a view over the array elements. This view doesn't copy the array, and any change in the array is reflected in the view. This is different from Collection.toArray(), which always copy all elements in an array. The conversion from collection to array using Collection.toArray() is usually more expensive and consume more memory than the conversion from array to collection using Arrays.asList(Object[]). One may argue that iteration over a collection is slower than iteration over an array. This slight advantage is compromised (in regard of array.clone() cost) if the user doesn't want to iterate over the whole array. Furthermore, if an array is really wanted, some Collection.toArray() implementations map directly to array.clone().

In addition of the above, collections allow on-the-fly object creation. For example positions may be stored as a suite of (x,y) coordinates in a single double[] array for efficiency, and temporary position objects created on the fly:

public class PointArray {
    private double[] coordinates = ...;

    private List<Position> pl = new AbstractList<Position>() {
        public int size() {
            return coordinates.length / 2;
        }

        public Position get(int i) {
            return new Position2D(coordinates[i*2], coordinates[i*2+1]);
        }
    };

    public List<Position> positions() {
        return pl;
    }
}

More sophisticated implementations may load or write their data directly to a database on a per-element basis. In comparison, arrays require initialization of all array's element before the array is returned. It still possible to initialize an array with elements that use deferred execution, but implementers have one less degree of freedom with arrays compared to collections.

More orthogonal API

If a geometry is mutable (at implementer choice), an user may whish to add, edit or remove elements. With arrays as return types, we would need to add some add(...) and remove(...) methods in most interfaces. Using collections, such API weight is not needed since the user can write the following idiom:

pointArray.positions().add(someNewPosition);

The PointArray behavior in such case is left to implementers. It may throw an UnsupportedOperationException, keep the point in memory, stores its coordinates immediately in a database, etc.

In addition of keeping the API lighter, collections as return types also give us for free many additional methods like contains(...), addAll(...), removeAll(...), etc. Adding those kind of methods directly into the geometry interfaces would basically transforms geometries into new kind of collections and duplicates the collection framework work without its "well accepted standard" characteristic.

More freedom on implementer side

  • In the Java language, collections are more abstract than arrays. A collection can be a view over an array (using Arrays.asList(...) for example). The converse is impossible in the general case (Collection.toArray() doesn't create a view; it usually copies the array).
  • Collections are more abstract than arrays in .NET too: an array is a collection, but a collection is not always an array (conversions from an arbitrary collection to an array may require a copy, like in Java). The array type is more restrictive than the collection type.
  • A collection can be read-only or not, at implementer choice. Java arrays are always mutable and need defensive copies (not to be confused with defensive copies of array or collection elements, which is yet an other topic).
  • Collections allow one more degree of freedom for deferred execution or lazy data loading. Object creations can occur on a per-element basis in collection getter methods. In an array, the reference to all elements must be initialized before the array is returned.

[top]