Wednesday, January 27, 2010

Java & JVM

Transition rules for Java

The machine execJavaC of execJava extends the machine execJavaI by the rules which define the semantics of the new JavaC -expressions and JavaC - statements. Therefore like execJavaI it consists of two submachines, for ex- pression evaluation and statement execution, which are defined below.

execJavaC = execJavaExpC execJavaStmC

For the initial state of JavaC we assume that the environment of the respec- tive lookup functions and predicates is defined by the given program which consists of a list of classes and interfaces. All class fields of all classes are set
to their default or constant values. The run of JavaC starts with meth being the class method

public static void main()

which is supposed to be part of the environment; pos denotes the first position
of the body of main which defines restbody , locals is undefined (because main
is invoked without parameters). The run terminates, if no rule of JavaC can be applied any more.
The machine execJavaExpC is defined in Fig. 4.4. We first look at its effect for expressions belonging to initialized classes. In this case the evalua- tion of class fields is defined similarly to the evaluation of local variables in execJavaExpI , but using the memory function globals instead of locals . The value of a class field access is the value bound under the name of the class and of the field in the global environment [18, §15.10]. To execute a class field assignment is to compute the value of its right-hand side [18, §15.25], followed by binding it as new value to the class and field name in the global environment. When a class method is called [18, §15.11], execJavaExpC trig- gers the evaluation of the actual arguments of the invocation, respecting the left-to-right strategy; their values are then bound by invokeMethod (see be- low) to the parameters of the method, to start the execution of the called method and at the same time preparing the return to up (pos ) in the calling method.
Now we explain the class initialization mechanism in execJavaExpC . The program execution starts in a state in which no class or interface is initialized
(except for Object). The initialization is required to be done at the first active use of any class or interface C , namely by executing its static initializer. The first active use of C can occur in the following situations:

– A class method declared in C is invoked
– A static field declared in C is used or assigned which is not a primitive constant
– A new instance of class C is created (see Chapter 5)

64 4. The procedural extension JavaC of JavaI


Fig. 4.4 Execution of JavaC expressions

execJavaExpC = case context (pos ) of
c.f → if initialized (c) then yield (globals (c/f )) else initialize (c)
c.f = α exp → pos := α
c.f = I val → if initialized (c) then globals (c/f ) := val yieldUp (val )
else initialize (c)

c.m α (exps ) → pos := α
c.m I (vals ) → if initialized (c) then invokeMethod (up (pos ), c/m , vals )
else initialize (c)

( ) → yield ([ ])
(α1 exp 1 , . . . , αn exp n ) → pos := α1
(α1 val 1 , . . . , I val n ) → yieldUp ([val 1 , . . . , val n ])
(α1 val1 , . . . , I vali , αi +1 exp i +1 . . . αn exp n ) → pos := αi +1



A primitive constant is a field that is both static and final, and that is
initialized with a compile-time constant expression. A constant expression is built up from literals and other primitive constants using unary and binary operators. Primitive constants are replaced in the source code, at compile- time, by the corresponding literals with the same value (so called inlining of primitive constants).
Before a class is initialized, its superclasses must be initialized. As a con- sequence, when execJavaExpC accesses fields, or assigns to fields, or calls methods, as first active use of a not yet initialized class C , it invokes the class initializer of C which is described by the following rule initialize (C ).

initialize (c) =
if classState (c) = Linked then classState (c) := InProgress forall f ∈ staticFields (c)
globals (f ) := defaultVal (type (f ))
invokeMethod (pos , c/, [ ])

This rule records that starting from now, the initialization of class C is InProgress . All static fields (staticFields ) of the class are initialized with their default value. Through the macro invokeMethod defined below, the rule pushes the current computation state on the frame stack, with unchanged pos (since after the initialization the computation has to proceed where the initialization was started) and without parameters (since class initialization methods have no arguments), and it transfers the control to the first position
of the body of the static initialization code (where firstPos can be considered as a system constant).

4.2 Transition rules for JavaC 65


The statement execution submachine execJavaStmC of execJavaC , de- fined in Fig. 4.5, starts the execution of static initialization blocks—unless the current class has a superclass which is not yet initialized, in which case execJavaStmC invokes the initialization of that superclass. When a static block of class C is executed, accesses to the fields of C and invocations of methods of C should not trigger a new first use of C . This is the reason why InProgress is included in the definition of the predicate initialized .
When returning from a method, in case an expression is returned, the submachine execJavaStmC first transfers the control to the expression eval- uation submachine execJavaExpC ; in case no expression is returned, or after the value val of the returned expression has been computed, the computation abrupts, with the possible return value encoded into the reason Return (val ) for abruption, and the current method has to be exited. In the extension execJavaE of execJavaC , this method exit will be subject to prior execu- tion of so called finally code, which may be present in the current method. Therefore, instead of transferring the control from a return statement di- rectly to the invoker, we let execJavaStmC propagate the return abruption up to the starting point firstPos of the current method body, from where the method will be exited. However, the return from a class initialization method has to be excluded from this propagation. Therefore we refine the predicate propagatesAbr from execJavaI in execJavaC as follows:
propagatesAbr (phrase ) = phrase = lab : s ∧ phrase = static s

In execJavaE propagatesAbr will be further refined to take also finally code into account.
The rule invokeMethod , which is used in Fig. 4.5, pushes the current frame, with the position given as parameter, on the frame stack and creates a new current frame to start executing the body of the invoked method. The ar- gument values are bound to the formal parameters of the method in the new local environment. We use the submachine invokeNative which describes the invocation of native methods and is defined in Sect. 7.2.2. The function modifiers returns the method modifiers (Native , Public, . . .), as specified in the class definition.
invokeMethod (nextPos , c/m , values )
| Native ∈ modifiers (c/m ) =
invokeNative (c/m , values )
| otherwise =
frames := push (frames , (meth , restbody , nextPos , locals ))
meth := c/m restbody := body (c/m ) pos := firstPos
locals := zip (argNames (c/m ), values )

Upon return from the execution of a method, the rule exitMethod (result )
passes control back to the current position of the invoker, reestablishes the

66 4. The procedural extension JavaC of JavaI


Fig. 4.5 Execution of JavaC statements

execJavaStmC = case context (pos ) of
static α stm → let c = classNm (meth )
if c = Object ∨ initialized (super (c)) then pos := α
else initialize (super (c))
static α Return → yieldUp (Return )

return α exp ; → pos := α
return I val ; → yieldUp (Return (val ))
return; → yield (Return )
lab : I Return → yieldUp (Return )
lab : I Return (val ) → yieldUp (Return (val ))
Return → if pos = firstPos ∧ ¬null (frames ) then
exitMethod (Norm )
Return (val ) → if pos = firstPos ∧ ¬null (frames ) then
exitMethod (val )

I Norm ; → yieldUp (Norm )




method and local variables of the invoker and replaces in its restbody the
method invocation statement by the result of the invoked method (i.e., the value to be returned if there is one or the information on normal completion
of the method body)—except for the normal return from a class initialization whereupon the execution proceeds with the previous restbody and the class state is updated to Initialized.

exitMethod (result ) =
let (oldMeth , oldPgm , oldPos , oldLocals ) = top (frames )
meth := oldMeth pos := oldPos locals := oldLocals
frames := pop (frames )
if methNm (meth ) = "" ∧ result = Norm then
restbody := oldPgm classState (classNm (meth )) := Initialized else
restbody := oldPgm [result /oldPos ]


4.2.1 Exercises

Exercise (❀ CD) 4.2.1. Which field is selected at position α?

interface I {
int i = 11;
}
class A implements I {
private static int i = 7;

4.2 Transition rules for JavaC 67


void m() {
i = α B.i;
}
}
class B extends A {}

Exercise (❀ CD) 4.2.2. Which field is selected at position α?

interface I {
int i = 11;
}
class A {
private static int i = 7;
}
class B extends A implements I {
static int j = α i + 1;
}

Exercise (❀ CD) 4.2.3. What are the applicable methods at positions
α, β, γ? Which method is selected by the compiler?

class A {
static void m(int x) {}
static void m(char x) {}
}
class B extends A {
static void m(long x) {
α m(0); β m(’A’); γ m(0L);
}
}

Exercise (❀ CD) 4.2.4. What is the order of initialization?

class A {
static int x = 7;
static { System.out.println("Initialization of A"); }
}
class B extends A {
static { System.out.println("Initialization of B"); }
}
class Test {
public static void main(String[] argv) { System.out.println(B.x);
}
}

68 4. The procedural extension JavaC of JavaI


Exercise (❀ CD) 4.2.5. What is the order of initialization?

class A {
static int x = 7;
static { System.out.println("Initialization of A"); }
}
class B extends A {
static int x = 3;
static { System.out.println("Initialization of B"); }
}
class Test {
public static void main(String[] argv) { System.out.println(B.x);
}
}

Exercise (❀ CD) 4.2.6. What is the order of initialization?

class A {
static int x = 7;
static { System.out.println("Initialization of A"); }
}
class B extends A {
static final int x = 3;
static { System.out.println("Initialization of B"); }
}
class Test {
public static void main(String[] argv) { System.out.println(B.x);
}
}

Exercise (❀ CD) 4.2.7. What is the output of the following program?

interface I {
int c = 1 + A.x;
int d = 1 + A.y;
}
class A implements I { static int x = 2; static int y = I.d;
}
class Test {
public static void main(String[] args) { System.out.println("I.c = " + I.c); System.out.println("I.d = " + I.d); System.out.println("A.x = " + A.x);

4.2 Transition rules for JavaC 69


System.out.println("A.y = " + A.y);
}
}

Exercise (❀ CD) 4.2.8. What is the output of the following program?

interface I {
int c = 1 + A.x;
int d = 1 + A.y;
}
class A implements I { static int x = 2; static int y = I.d;
}
class Test {
public static void main(String[] args) { System.out.println("A.x = " + A.x); System.out.println("A.y = " + A.y); System.out.println("I.c = " + I.c); System.out.println("I.d = " + I.d);
}
}


Decomposing Java and the JVM



We decompose Java and the JVM into language layers and security modules, thus splitting the overall definition and verification problem into a series of tractable subproblems. This is technically supported by the abstraction and refinement capabilities of ASMs. As a result we succeed

– To reveal the structure of the language and the virtual machine
– To control the size of the models and of the definition of the compilation scheme, which relates them
– To keep the effort of writing and understanding the proofs and the exe- cutable models, manageable

The first layering principle reflects the structure of the Java language and
of the set of JVM instructions. In Part I and Part II we factor the sets of Java and of JVM instructions into five sublanguages, by isolating language features which represent milestones in the evolution of modern programming languages and of the techniques for their compilation, namely imperative (se- quential control), procedural (module), object-oriented, exception handling, and concurrency features. We illustrate this in Fig. 1.2. A related structur- ing principle, which helps us to keep the size of the models small, consists
in grouping similar instructions into one abstract instruction each, coming with appropriate parameters. This goes without leaving out any relevant language feature, given that the specializations can be regained by mere pa- rameter expansion, a refinement step whose correctness is easily controllable instruction-wise. See Appendix C.8 for a correspondence table between our abstract JVM instructions and the real bytecode instructions.
This decomposition can be made in such a way that in the resulting sequence of machines, namely JavaI , JavaC , JavaO , JavaE , JavaT and JVMI , JVMC , JVMO , JVME , JVMN , each ASM is a purely incremental—similar to what logicians call a conservative—extension of its predecessor, because each
of them provides the semantics of the underlying language instruction by instruction. The general compilation scheme compile can then be defined between the corresponding submachines by a simple recursion.

8 1. Introduction


Fig. 1.4 Security oriented decomposition of the JVM


Usr.java C om piler Usr.class Internet



JVM Run−time machine


V erifier L oader Sys.class




P reparator

Interpreter

Inpu t

Output




Functionally we follow a well known pattern and separate the treatment
of parsing, elaboration, and execution of Java programs. We describe how our Java machines, which represent abstract interpreters for arbitrary pro- grams in the corresponding sublanguage, are supposed to receive these input programs in the form of abstract syntax trees resulting from parsing. For each Java submachine we describe separately, in Part I, the static and the dynamic part of the program semantics. We formulate the relevant static constraints of being well-formed and well-typed, which are checked during the program elaboration phase and result in corresponding annotations in the abstract syntax tree. In the main text of the book we restrict the analysis
of the static constraints to what is necessary for a correct understanding of the language and for the proofs in this book. The remaining details appear
in the executable version of the Java model. We formalize the dynamical program behavior by ASM transition rules, describing how the program run- time state changes through evaluating expressions and executing statements. This model allows us to rigorously define what it means for Java to be type safe, and to prove that well-formed and well-typed Java programs are in- deed type safe (Theorem 8.4.1). This includes defining rules which achieve the definite assignment of variables, and to prove the soundness of such as- signments. The resulting one-thread model execJava can be used to build a multiple–thread executable ASM execJavaThread which reflects the intention
of [18, 23], namely to leave the specification of the particular implementation
of the scheduling strategy open, by using a choice that is a not further spec- ified function (Fig. 1.3)3 . For this model we can prove a correctness theorem for thread synchronization (Theorem 7.3.1).

3 The flowchart notation we use in this introduction has the expected precise meaning, see Chapter 2, so that these diagrams provide a rigorous definition, namely of so called control state ASMs.

1.3 Decomposing Java and the JVM 9


Fig. 1.5 Decomposing trustfulVMs into execVMs and switchVMs


execVM



switch=Noswitch

no

switchVM


yes

no

isNative(meth)


yes


execVMN





trustfulVM = execVMI

execVMC

execVMO

execVME

execVMN

execVMD

switchVMD extends switchVME extends switchVMC



For JVM programs, we separate the modeling of the security relevant load-
ing (Chapter 18) and linking (i.e., preparation and verification, see Part III) from each other and from the execution (Part II), as illustrated in Fig. 1.4. In Part II we describe the trustful execution of bytecode which is assumed
to be successfully loaded and linked (i.e., prepared and verified to satisfy the required link-time constraints). The resulting sequence of stepwise refined trustful VMs, namely trustfulVMI , trustfulVMC , trustfulVMO , trustfulVME , and trustfulVMN , yields a succinct definition of the functionality of JVM execution in terms of language layered submachines execVM and switchVM
(Fig. 1.5). The machine execVM describes the effect of each single JVM in- struction on the current frame, whereas switchVM is responsible for frame stack manipulations upon method call and return, class initialization and ex- ception capture. The machines do nothing when no instruction remains to be executed. As stated above, this piecemeal description of single Java/JVM in- structions yields a simple recursive definition of a general compilation scheme for Java programs to JVM code, which allows us to incrementally prove it to be correct (see Chapter 14). This includes a correctness proof for the han- dling of Java exceptions in the JVM, a feature which considerably complicates the bytecode verification, in the presence of embedded subroutines, class and object initialization and concurrently working threads.
In Chapter 17 we insert this trustfully executing machine into a diligent JVM which, after loading the bytecode, which is stored in class files, and before executing it using the trustfully executing component trustfulVM , prepares and verifies the code for all methods in that class file, using a sub- machine verifyVM which checks, one after the other, each method body to satisfy the required type and stack bound constraints (Fig. 1.6).
The machine verifyVM is language layered, like trustfulVM , since it is built from a language layered submachine propagateVM , a language layered

10 1. Introduction


Fig. 1.6 Decomposing diligent JVMs into trustfulVMs and verifyVMs


trustfulVM


no

some meth still



yes



curr meth still



yes

to be verified

to be verified verifyVM


no


set next meth up for verification

report
failure



verifyVM built from submachines propagate, succ, check



predicate check and a language layered function succ. The verifier machine
chooses an instruction among those which are still to be verified, checks whether it satisfies the required constraints and either reports failure or propagates the result of the checked conditions to the successor instructions
(Fig. 1.7).
The submachine propagateVM , together with the function succ in the verifying submachine verifyVM , defines a link-time simulation (type version)
of the trustful VM of Part II, although the checking functionality can be better defined in terms of a run-time checking machine, see Chapter 15. The defensive VM we describe there, which is inspired by the work of Cohen [13], defines what to check for each JVM instruction at run-time, before its trust- ful execution. We formulate the constraints about types, resource bounds, references to heap objects, etc., which are required to be satisfied when the given instruction is executed (Fig. 1.8).
The reason for introducing this machine is to obtain a well motivated and clear definition of the bytecode verification functionality, a task which is best accomplished locally, in terms of run-time checks of the safe executability of single instructions. However, we formulate these run-time checking conditions referring to the types of values, instead of the values themselves, so that we can easily lift them to link-time checkable bytecode type assignments (see Chapter 16). When lifting the run-time constraints, we make sure that if a given bytecode has a type assignment, this implies that the code runs on the defensive VM without violating any run-time checks, as we can indeed prove
in Theorem 16.4.1. The notion of bytecode type assignment also allows us to prove the completeness of the compilation scheme defined in Part II. Com- pleteness here means that bytecode which is compiled from a well-formed and well-typed Java program (in a way which respects our compilation scheme), can be typed successfully, in the sense that it does have type assignments

1.4 Sources and literature 11


Fig. 1.7 Decomposing verifyVMs into propagateVMs, checks, succs


choose pc for verification check(pc)
no



propagateVM(succ,pc)

record pc as verified

yes


repo rt failure



succI  succC  succO  succE and propagateI  propagateE



(Theorem 16.5.2). To support the inductive proof for this theorem we refine
our compiler to a certifying code generator, which issues instructions together with the type information needed for the bytecode verification.
The details of the machines outlined above are explained in this book and are summarized in appendices B and C. Putting together the proper- ties of the language layered submachines and of the security components of Java and of the JVM, one obtains a precise yet graspable statement, and an understandable (and therefore checkable) proof of the following property of Java and the JVM.

Main Theorem. Under explicitly stated conditions, any well-formed and well-typed Java program, when correctly compiled, passes the verifier and is executed on the JVM. It executes without violating any run-time checks, and is correct with respect to the expected behavior as defined by the Java machine.

For the executable versions of our machines, the formats for inputting and compiling Java programs are chosen in such a way that the ASMs for the JVM and the compiler can be combined in various ways with current im- plementations of Java compilers and of the JVM (see Appendix A and in particular Fig. A.1 for the details).



1.4 Sources and literature

This book is largely self-contained and presupposes only basic knowledge
in object-oriented programming and about the implementation of high-level programming languages. It uses ASMs, which have a simple mathematical foundation justifying their intuitive understanding as “pseudo-code over ab- stract data”, so that the reader can understand them correctly and success- fully without having to go through any preliminary reading. We therefore

12 1. Introduction


Fig. 1.8 Decomposing defensiveVMs into trustfulVMs and checks


yes

valid code index no

trustfulVM & check


no

switch=Noswitch



yes


no

isNative(meth)



report failure



trustfulVMN



yes


yes
no
check N


checkD extends checkN extends checkE extends checkO extends checkC extends checkI



invite the reader to consult the formal definition of ASMs in Chapter 2 only
should the necessity be felt.
The Java/JVM models in this book are completely revised—streamlined, extended and in some points corrected—versions of the models which ap- peared in [9, 11]. The original models were based upon the first edition of the Java and JVM specifications [18, 23], and also the models in this book still largely reflect our interpretation of the original scheme. In particular we do not treat nested and inner classes which appear in the second edition of the Java specification, which was published when the work on this book was fin- ished. It should be noted however that the revision of [23], which appeared in
1999 in the appendix of the second edition of the JVM specification, clarifies most of the ambiguities, errors and omissions that were reported in [10].
The proofs of the theorems were developed for this book by Robert Stark and Egon B¨orger, starting from the proof idea formulated for the compiler correctness theorem in [8], from its elaboration in [33] and from the proof for the correctness of exception handling in [12]. The novel subroutine call stack free bytecode verifier was developed by Robert Stark and Joachim Schmid. Robert Stark constructed the proof for Theorem 16.5.2 that this verifier ac- cepts every legal Java program which is compiled respecting our compilation scheme. The AsmGofer executable versions of the models were developed for this book by Joachim Schmid and contributed considerably towards getting the models correct.
We can point the reader to a recent survey [21] of the rich literature on modeling and analyzing safety aspects of Java and the JVM. Therefore we limit ourselves to citing in this book only a few sources which had a direct impact on our own work. As stated above, the complex scheme to implement Java security through the JVM interpreter requires a class loader, a security manager and a bytecode verifier. For a detailed analysis of the class loading mechanism, which is underspecified in [18] and therefore only sketched in this book, we refer the reader to [29, 35] where also further references on this still widely open subject can be found. We hope that somebody will use and

1.4 Sources and literature 13


extend our models for a complete analysis of the critical security features of Java, since the framework allows to precisely state and study the necessary system safety and security properties; the extensive literature devoted to this theme is reviewed in [21].
Draft chapters of the book have been used by Robert Stark in his summer term 2000 course at ETH Zu¨rich, and by Egon B¨orger in his Specification Methods course in Pisa in the fall of 2000.

HR Questions

These general questions can be the toughest ones to get through. They might sound easy, but they require a lot of thought and preparation...