Java Output Folder

Last revised 15:45 Friday October 12, 2001

Work item: "Support for dealing with class files generated by external Java compilers like javac and jikes from an Ant script."

Here's the crux of one problem (from WSAD, via John W.):

In some environments the client has limited flexibility in how they structure their Java projects. Sources must go here; resource files here; mixed resource and class files here; etc.

When clients attempts either, they discover that (a) the Java model and builder ignore any class files in the output folder, and (b) from time to time these files in the output folder get deleted without warning.

The Java builder was designed under the assumption that it "owns" the output folder. The work item, therefore, is to change the Java builder to give clients and users more flexiblility as to where they place their source, resource, library class, and generated class files.

Current Functionality

Eclipse 1.0 Java builder has the following characteristics (and inconsistencies):

Proposal

The Java model has 2 primitive kinds of inputs: Java source files, and Java library class files. The Java builder produces one primary output: generated Java class files. Each Java project has a build classpath listing what kinds of inputs it has and where they can be found, and a designated output folder where generated class files are to be placed. The runtime classpath is computed from the build classpath by substituting the output folder in place of the source folders.

Java "resource" files, defined to be files other than Java sources and class files, are of no particular interest to the Java model for compiling purposes. However, these resource files are very important to the user, and to the program when it runs. Resource files are routinely co-located with library class files. But it is also convenient for the user if resource files can be either co-located with source code, or segregated in a separate folder.

Ideally, the Java model should not introduce constraints on where inputs and outputs are located. This would give clients and users maximum flexibility with where they locate their files.

The proposal here has 4 separate parts. Taken in conjunction they remove the current constraints that make it difficult for some clients to place their files where they need to be.

[Revised proposal: Rather than write a completely new proposal, I've added a note like to the end of each subsequent section describing a revised proposal.]

Java Builder Attitude Adjustment

To appreciate the difficulties inherent with the Java builder sharing its output folder with other folk, consider the following workspace containing a Java project. Assume that this project has not been built in quite a while, and the user has been manually inserting and deleting class files in the project's output folder.

Java project p1/
    src/com/example/  (source folder on build classpath)
        Bar.java
        Foo.java
        Quux.java
    bin/com/example/ (output folder)
        Bar.class {SourceFile="Bar.java"}
        Foo.class {SourceFile="Foo.java"}
        Foo$1.class {SourceFile="Foo.java"}
        Internal.class {SourceFile="Foo.java"}
        Main.class {SourceFile="Main.java"}

From this arrangement of files (and looking at the SourceFile attributed embedded in class files), we can infer that:

Java Builder - Obsolete Class File Deletion

If the user was to request a full build of this project, how would the Java builder proceed? Before it compile any source files, it begins by deleting existing class files that correspond to source files it is about to recompile. Why? Because obsolete class files left around (a) waste storage and (b) would be available at runtime where they could cause the program to run incorrectly.

In this situation, the Java builder deletes the class files corresponding to Bar.java (i.e., Bar.class), to Foo.java (i.e., Foo.class, Foo$1.class, and Internal.class), and to Quux.java (none, in this case). The remaining class files (Main.class) must be retained because it is irreplaceable.

The Java builder takes responsibility for deleting obsolete class files in order to support automated incremental recompilation of entire folders of source files. Note that standard Java compilers like javac never ever delete class files; they simply write (or overwrite) class files to the output folder for the source files that they are given to compile. Standard Java compilers do not support incremental recompilation: the user is responsible for deleting any obsolete class files that they bring about.

If the Java builder is free to assume that all class files in the output folder are ones that correspond to source files, then it can simply delete all class files in the output folder at the start of a full build. If it cannot assume this, the builder is forced to look at class files in the output folder to determine whether it has source code for them. This is clearly more expensive that not having to do so. By declaring that it "owns" the output folder, the current builder is able to makes this simplifying assumption. Allowing users and clients to place additional class files in the output folder requires throwing out this assumption.

If the user or client is free to manipulate class files in the output folder without the Java builder's involvement, then the builder cannot perform full or incremental builds without looking at and deleting the obsolete class files from the output folder corresponding to source files being compiling.

Under the proposed change, the Java builder would need to look at the class files in the output folder to determine whether it should delete them. The only files in the output folder that the Java builder would be entitled to overwrite or delete are class files which the Java builder would reasonably generate, or did generate, while compiling that project.

This change is not a breaking API change. The old spec said that the Java model/builder owned the output folder, but didn't further specify what all that entailed. The new spec will modify this position to allow clients to store files in the output folder; it will promise that these files are perfectly safe unless they are in the Java builder's direct line of fire.

Java Model - Obsolete Class File Deletion

There is another facet of the obsolete class file problem that the Java builder is not in a position to help with.

If the source file Foo.java were to be deleted, its three class files become obsolete and need to be deleted immediately. Why immediately? Consider what happens if the class files are not deleted immediately. If the user requests a full build, the Java builder is presented with the following workspace:

Java project p1/
    src/com/example/  (source folder on build classpath)
        Bar.java
        Quux.java
    bin/com/example/ (output folder)
        Bar.class {SourceFile="Bar.java"}
        Foo.class {SourceFile="Foo.java"}
        Foo$1.class {SourceFile="Foo.java"}
        Internal.class {SourceFile="Foo.java"}
        Main.class {SourceFile="Main.java"}

Since a full build is requested, the Java builder is not passed a resource delta tree for the project. This means that the Java builder has no way of knowing that Foo.java was just deleted. The Java builder has no choice but to retain the three class files Foo.class, Foo$1.class, and Internal.class, just as it retains Main.class. This too is a consequence of allowing the Java builder to share the output folder with the user's class files.

If the obsolete class files are not deleted in response to the deletion of a source file, these class files will linger around. The Java builder will be unable to get rid of them.

The proposal is to have the Java model monitor source file deletions on an ongoing basis and identify and delete any corresponding obsolete class files in the output folder. This clean up activity must handle the case of source files that disappear while the Java Core plug-in is not activated (this entails registering a Core save participant).

Since deleting (including renaming and moving) a source file is a relatively uncommon thing for a developer to do, the implementation should bet it does not have to do this very often. When a source file in deleted, its package name gives us exactly which subfolder of the output folder might contain corresponding class files that might now be obsolete. In the worst case, the implementation would need to access all class files in that subfolder to determine whether any of them have become obsolete. In cases where there is more than one source folder on the builder classpath, and there is therefore the possibility of one source file hiding another by the same name, it is necessary to consult the build classpath to see whether the deleted source file was exposed or buried.

Implementation Tricks

Some observations and implementation tricks that should help reduce the space and time impact of doing this.

When all else fails

A special concern is that the user must be able to recover from crashes or other problems that result in obsolete class files being left behind in the output folder. It can be very bad when this kind of thing happens (and it does happen, despite our best efforts), and can undercut the user's confidence in the Java compiler and IDE. In a large output folder that contains important user files, the user can't just delete the output folder and do a full build. The user has no easy way to distinguish class files with corresponding source from ones without. A simple way to address this need would be to have a command (somewhere in the UI) that would delete all class files in the output folder for which source code is available ("Delete Generated Class Files"). This would at least give the user some help in recovering from these minor disasters.

[Revised proposal: The Java builder remembers the names of the class files it has generated. On full builds, it cleans out all class files that it has on record as having generated; all other class files are left in place. On incremental builds, it selectively cleans out the class files that it has on record as having generated corresponding to the source files that it is going to recompile. There is no need to monitor source file deletions: corresponding generated class files will be deleted on the next full build (because it nukes them all) or next incremental build (because it sees the source file deletion in the delta). The Java builder never looks at class files for their SourceFile attributes. A full build always deletes generated class files, so there's no need to a special UI action.]

Allowing Folders to Play Multiple Roles

The proposed change is to consistently allow the same folder to be used in multiple ways on the same build classpath. This change is not a breaking change; it would simply allow some classpath configurations that are currently disallowed to be considered legitimate. The API would not need to change.

[Revised proposal: Many parts of the Java model assume that library folders are relatively quiet. Allow a library folder to coincide with the output folder would invalidate this assumption, which would tend to degrade performance. For instance, the indexer indexes libraries and source folders, but completely ignores the output folder. If the output folder was also a library, it would repeatedly extract indexes for class files generated by the builder.

N.B. This means that the original scenario of library class files in the output folder is unsupportable.

Allowing source folder to coincide with a library folder would be allowed.]

Completely eliminate resource file copying behavior

The current Java builder copies "resource" files from source folders to the output folder (provided that source and output do not coincide). Once in the output folder, the resource files are available at runtime because the output folder is always present on the runtime class path.

This copying is problematic:

The proposal is to eliminate this copying behavior. The proper way to handle this is to include an additional library entry on the build classpath for any source folders that contain resources. Since library entries are also included on the runtime classpath, the resource files contained therein will be available at runtime.

We would beef up the API specification to explain how the build classpath and the runtime classpath are related, and suggests that one deals with resource files in source folders using library entries. This would be a breaking change for clients or users that rely on the current resource file copying behavior.

The clients that would be most affected are ones that co-locate their resource files with their source files in a folder separate from their output folder. This is a fairly large base of customers that would need to add an additional library entry for their source folder.

It would be simple to write a plug-in that detected and fixed up the Java projects in the workspace as required. By the same token, the same mechanism could be built in to the Java UI. If the user introduces a resource files into a source folder that had none and there is no library entry for that folder on the build classpath, ask the user whether they intend this resource file to be available at runtime.

(JW believes that WSAD will be able to roll with this punch.)

[Revised proposal: Retain copying from source to output folder where necessary.

This eliminates the screw case where resources get copied from one source folder into another source folder, possibly overwriting client data.]

Minimize the opportunity for obsolete class files to have bad effects

The Java compiler should minimize the opportunity for obsolete class files to have bad effects.

Consider the following workspace:

Java project p1/
    src/com/example/  (source folder on build classpath)
        C1.java {package com.example; public class C1 {}}
        C2.java {package com.example; public class C2 extends Secondary {})
    lib/com/example/ (library folder on build classpath)
        C1.class {from compiling an old version of C1.java
           that read package com.example; public class C1 {}; class Secondary {}}
        C2.class {from compiling an old but unchanged version of C2.java}
        Secondary.class {from compiling an old but unchanged version of C2.java}
        Quux.class {from compiling Quux.java}

Assume the source folder precedes the library folder on the build classpath (sources should always precede libraries).

When the compiler is compiling both C1.java and C2.java, it should not satisfy the reference to the class com.example.Secondary using the existing Secondary.class because the SourceFile attributes shows that Secondary.class is clearly an output from compiling C1.java, not an input. In general, the compiler should ignore library class files that correspond to source files which are in the process of being recompiled. (In this case, only Quux.class is available to satisfy references.) The Java builder does not do this.

Arguably, the current behavior should be considered a bug. (javac 1.4 (beta) has this bug too.) Fixing this bug should not be a breaking change.

When the SourceFile attribute is not present in a class file, there is no choice but to use it.

[Revised proposal: Maintain current behavior.]