Memory Usage Veriﬁcation for OO Programs

. We present a new type system for an object-oriented (OO) language that characterizes the sizes of data structures and the amount of heap memory required to successfully execute methods that operate on these data structures. Key components of this type system include type assertions that use symbolic Presburger arithmetic expressions to capture data structure sizes, the effect of methods on the data structures that they manipulate, and the amount of memory that methods allocate and deallocate. For each method, we conservatively capture the amount of memory required to execute the method as a function of the sizes of the method’s inputs. The safety guarantee is that the method will never attempt to use more memory than its type expressions specify. We have implemented a type checker to verify memory usages of OO programs. Our experience is that the type system can precisely and effectively capture memory bounds for a wide range of programs.


Introduction
Memory management is a key concern for many applications.Over the years researchers have developed a range of memory management approaches; examples include explicit allocation and deallocation, copying garbage collection, and region-based memory allocation.However, an important aspect that has been largely ignored in past work is the safe estimation of memory space required for program execution.Overallocation of memory may cause inefficiency, while underallocation may cause software failure.In this paper, we attempt to make memory usage more predictable by static verification on the memory usage of each program.
We present a new type system, based on dependent type [21], that characterizes the amount of memory required to execute each program component.The key components of this type system include: -Data Structure Sizes and Size Constraints: The type of each data structure includes index parameters to characterize its size properties, which are expressed in terms of the sizes of data structures that it contains.In many cases the sizes of these data structures are correlated; our approach uses size constraints expressed using symbolic Presburger arithmetic terms to precisely capture these correlations.
-Heap Recovery: Our type system captures the distinction between shared and unaliased objects and supports explicit deallocation of unaliased objects.-Preconditions and Postconditions: Each method comes with a precondition that captures both the expected sizes of the data structures on which it operates and any correlations between these sizes.The method's postcondition expresses the new size and correlations of these data structures after the method executes as a function of the original sizes when the method was invoked.-Heap Usage Effects: Each method comes with two memory effects.These effects use symbolic values (present in method precondition) to capture (i) memory requirement which specify the maximum heap space that the method may consume, (ii) memory release which specify the minimum heap space that the method will recover.Heap effects are expressed at the granularity of classes and can capture the net change in the number of instances of each class.
Our paper makes several new technical contributions.Firstly, we design a formal verification system in the form of a type system, that can formally and statically capture memory usage for the object-oriented (OO) paradigm.We believe that ours is the first such formal type system for OO paradigm.Secondly, we advocate for explicit heap recovery to provide more timely reclamation of dead objects in support of tighter bounds on memory usage.We show how such recovery commands may be automatically inserted.Thirdly, we have proven the soundness of our type checking rules.Each welltyped program is guaranteed to meet its memory usage specification, and will never fail due to insufficient memory whenever its memory precondition is met.Lastly, we have implemented a type checker (with an inference mechanism) and have shown that it is fairly precise and can handle a reasonably large class of programs.Runtime stack space to hold methods' parameters and local variables is another aspect of memory needed.For simplicity, we omit its consideration in this paper.

Overview
Memory usage occurs primarily in the heap to hold dynamically created objects.In our model, heap space is consumed via the new operation for newly created objects, while unused objects may be recovered via an explicit deallocation primitive, called dispose.Memory usage (based on consumption and recovery) should be calculated over the entire computation of each program.This calculation is done in a safe manner to help identify the high watermark on memory space needed.We achieve this through the use of a conservative upper bound on memory consumed, and a conservative lower bound on memory recovered for each expression (and method).
To safely predict the memory usage of each program, we propose a size-polymorphic type system for object-oriented programs with support for interprocedural size analysis.In this type system, size properties of both user-defined types and primitive types are captured.In the case of primitive integer type int v , the size variable v captures its integer value, while for boolean type bool b , the size variable b is either 0 or 1 denoting false or true, respectively.(Note that size variables capture some integer-based properties of the data structure.For simple types, the values are directly captured.)For userdefined class types, we use c n1, . . ., np where φ ; φI with size variables n1, . . ., np to denote size properties that are defined in size relation φ, and invariant constraint φI .As an example, consider a user-defined stack class, that is implemented with a linked list, and a binary tree class as shown below.
class List n where n=m+1 ; n≥0 { Object @S val; List m @U next; List n denotes a linked-list data structure of size n, and similarly for Stack n .The size relations n=m+1 and n=m define some size properties of the objects in terms of the sizes of their components, while the constraint n≥0 signifies an invariant associated with the class type.Class BTree s, d represents a binary tree with size variables s and d denoting the total number of nodes and the depth of the tree, respectively.Due to the need to track the states of mutable objects, our type system requires the support of alias controls of the form We use U and S to mark each reference that is (definitely) unaliased and (possibly) shared, respectively.We use R to mark readonly fields which must never be updated after object initialization.We use L to mark unique references that are temporarily borrowed by a parameter for the duration of its method's execution.Our alias annotation mechanism are adapted from [5,8,1] and reported in [9].Briefly, they allow us to track unique objects from mutable fields, as well as shareable objects from read-only fields.
To specify memory usage, we decorate each method with the following declaration: t mn(t1v1, . . ., tnvn) where φpr; φpo; c ; r {e} where φpr and φpo denote the precondition and postcondition of the method, expressed in terms of constraints/formulae on the size variables of the method's parameters and result.Precondition φpr denotes an applicability condition of the method in terms of the sizes of its parameters.Postcondition φpo can provide a precise size relation for the parameters and result of the declared method.The memory effect is captured by c and r .Note that c denotes memory requirement, i.e., the maximum memory space that may be consumed, while r denotes net release, i.e., the minimum memory space that will be recovered at the end of method invocation.Memory effects (consumption and recovery) are expressed using a bag notation of the form {(ci, αi)} m i=1 , where ci denotes a class type, while αi denotes its symbolic count.) prior to each method captures the alias annotation of the current this parameter.Note our use of the primed notation, advocated in [13,17], to capture imperative changes on size properties.For the push method, n =n+1 captures the fact that the size of the stack object has increased by 1; similarly, the postcondition for the pop method, n =n−1, denotes that the size of the stack is decreased by 1 after the operation.The memory requirement for the push method, r ={(List, 1)}, captures the fact that one List node will be consumed.For the pop method, r ={(List, 1)} indicates that one List node will be recovered.For the isEmpty method, n =n captures the fact that the size of the receiver object (this) is not changed by the method.Furthermore, its output of type bool b @S is related to the object's size through a disjunctive constraint n=0∧b=1∨n>0∧b=0.Primitive types are annotated with alias S because their values are immutable and can be freely shared and yet remain trackable.The emptyStack method releases all List nodes of the Stack object.For push3pop2 method, the memory consumed (or required) from the heap is {(List, 2)}, while the net release is {(List, 1)}, as illustrated in Fig. 2.
Size variables and their constraints are specified at method boundary, and need not be specified for local variables.Hence, we may use bool @S instead of bool v @S for the type of a local variable.

Language and Annotations
We focus on a core object-oriented language, called MEMJ, with size, alias, and memory annotations in Fig 3 .MEMJ is designed to be an intermediate language for Java with either supplied or inferred annotations.A suffix notation y * denotes a list of zero or more distinct syntactic terms that are suitably separated.For example, (t v) * denotes (t1 v1, . . ., tn vn) where n≥0.Local variable declarations are supported by block structure of the form: (t v = e1; e2) with e2 denoting the result.We assume a call-by-value semantics for MEMJ, where values (primitives or references) are passed as arguments to parameters of methods.For simplicity, we do not allow the parameters to be updated (or re-assigned) with different values.There is no loss of generality, as we can always copy such parameters to local variables for updating.
The MEMJ language is deliberately kept simple to facilitate the formulation of static and dynamic semantics.Typical language constructs, such as multi-declaration block, sequence, calls with complex arguments, etc. can be automatically translated to constructs in MEMJ.Also, loops can be viewed as syntactic abbreviations for tail-recursive methods, and are supported by our analysis.Several other language features, including downcast and a field-binding construct are also supported in our implementation.For simplicity, we omit them in this paper, as they play supporting roles and are not where φpr; φpo; c; r {e} where 3. Syntax for the MEMJ Language core to the main ideas proposed here.The interested reader may refer to our companion technical report [10] for more information.
To support sized typing, our programs are augmented with size variables and constraints.For size constraints, we restrict to Presburger form, as decidable (and practical) constraint solvers exist, e.g.[19].We are primarily interested in tracking size properties of objects.We therefore restrict the relation φ in each class declaration of c1 n1, .., np which extends c2 n1, .., nq to the form p i=q+1 ni=αi whereby V(αi) ∩ {n1, .., np} = ∅.Note that V(αi) returns the set of size variables that appeared in αi.This restricts size properties to depend solely on the components of their objects.
Note that each class declaration has a set of instance methods whose main purpose is to manipulate objects of the declared class.For convenience, we also provide a set of static methods with the same syntax as instance methods, except for its access to the this object.One important feature of MEMJ is that memory recovery is done safely (without creating dangling references) through a v.dispose() primitive.

Heap Usage Specification
To allow memory usage to be precisely specified, we propose a bag abstraction of the form {(ci, αi)} n i=1 where ci denotes its classification, while αi is its cardinality.In this paper, we shall use ci ∈ CN where CN denotes all class types.For instance, Υ1 = {(c1, 2), (c2, 4), (c3, x + 3)} denotes a bag with c1 occurring twice, c2 four times and c3 x + 3 times.We provide the following two basic operations for bag abstraction to capture both the domain and the count of its element, as follows: We define union, difference, exclusion over bags as: To check for adequacy of memory, we provide a bag comparator operation under a size constraint ∆, as follows: The bag abstraction notation for memory is quite general and can be made more precise by refining its operations.For example, some class types are of the same size and could replace each other to increase memory reuse.To achieve this we can use a bag abstraction that is grouped by size(ci) instead of class type ci.

Heap Consumption
Heap space is consumed when objects are created by the new primitive, and also by method calls, except that the latter is aggregated to include recovery prior to consumption.Our aggregation (of recovery prior to consumption) is designed to identify a high watermark of maximum memory needed for safe program execution.For each expression, we predict a conservative upper bound on the memory that the expression may consume, and also a conservative lower bound on the memory that the expression will release.If the expression releases some memory before consumption, we will use the released memory to obtain a lower memory requirement.Such aggregated calculations on both consumption and recovery can help capture both a net change in the level of memory, as well as the high watermark of memory needed for safe execution.
For example, consider a recursive function which does p pops from one stack object, followed by the same number of pushes on another stack.
where a≥p≥0; a =a−p∧b =b+p; {} ; {} { if i<1 then () else {Object @S o = s.top();s.pop(); moverec(s, t, i−1); t.push(o)} } Due to aggregation (involving recovery before consumption), the heap space that may be consumed is zero.For each recursive call, the space for a List node is released by s.pop() before it is reused by t.push(o).Aggregated over the recursive calls, we will have p number of List nodes that have been released before the same number of nodes are consumed.Hence, no new heap space is needed.Such aggregation is sensitive to the order of the operations.
Consider now a different function which performs p pushes on t, followed by the same number of pops from s. void @S moverec2(Stack a @L s, Stack b @L t, int p @S i) where a≥p≥0; a =a−p∧b =b+p; {(List, p)}; {(List, p)} { if i<1 then () else {Object @S o = s.top();t.push(o); moverec2(s, t, i−1); s.pop()} } Though the net change in memory usage is also zero, the memory effect for this function is different as we require p number of List nodes to be consumed on entry, before the same number of List nodes are recovered.This new memory effect has the potential to push up the high watermark of memory needed by p List nodes.

Heap Recovery
Explicit heap space recovery via dispose has several advantages.It facilitates the timely recovery of dead objects, which allows memory usage to be predicted more accurately (with tighter bounds).It also permits the use of more efficient custom allocators [4], where desired.Moreover, we shall provide an automatic technique to insert dispose primitives with the help of alias annotation.With such a technique, we only need to ensure that objects that are being disposed are non-null.This non-nullness property can be captured by a non-nullness analyser, such as [12].This property is required as we always recover memory space for each dispose primitive.
Memory recovery via dispose should occur when unique references that are still alive (not in dead-set) are being discarded.This could occur at four places1 : (i) end of local block, (ii) end of method block, (iii) prior to assignment operation, and (iv) at conditional expression.We would like to recover the memory space for each nonnull reference that is about to become dead.For example, consider the pop method's definition: The object pointed to by head is about to become dead prior to the operation, head = t1.next.To recover this dead object, we insert a dispose command to obtain head = (t1.next<; head.dispose())where e1<;e2≡(t v = e1;e2;v).Consider the definition of the destroy method which calls emptyStack with an L-mode parameter.
void @S destroy(Stack n @U s) where A unique s object is about to become dead at the end of the destroy method.To recover this space, we can insert s.dispose() prior to the method's exit.
Let us formalise an automatic technique for the explicit recovery of dead objects that are known at compile-time.Given an expression e, we utilize the alias annotation to obtain a new expression e1 where suitable explicit heap dispose operations have been safely inserted.This is achieved by a translation below with Γ to denote a type environment mapping program variables to their annotated types, and Θ(Θ1) to denote the set of dead references (of the form v or v.f) before (after) the evaluation of expression e. Γ ; Θ e →H e1 :: t, Θ1 Most rules are structure-preserving (or identity) rewritings, except for four rules given in Fig 4 .A sequence of disposals can be effected through dispose(D), with D containing a set of variable/field references that are about to be dead at the end of expression e.
For the assignment rule [H:ASSIGN], we add w to the disposal set if it is unique and is not yet in dead-set using D = {w | ann(t)=U}−Θ1.The function isParam(w) returns  Furthermore, we have: For the method declaration rule [H:METH], we add to the disposal set those parameters which are unique but not yet dead using {w | (w :: t) ∈ Γ1, ann(t) = U} − Θ.For the local declaration rule [H:LOCAL], we add v to the disposal set if it is unique but not yet dead using {v | ann(t) = U} − Θ2.For the [H:IF] rule, the uniqueness that are consumed in one branch may have their heap spaces recovered in the other branch.This is captured by Di = Θ3−Θi , i = 1, 2. Notice that msst(t1, t2) returns the minimal supertype of both t1 and t2, as follows: Note that τ1 <: τ2 denotes the subtype relation for underlying types (without annotations).Alias subtyping rules (shown below) allow unique references to be passed to shared and lent-once locations (in addition to other unique locations), but not vice-versa.
In the rest of this paper, we shall present a new static type system for verifying memory heap usage, followed by a set of safety theorems on the type rules.

Rules for Memory Checking
We present type judgements for expressions, method declarations, class declarations and programs to check for adequacy of memory, using relations of the form: Γ ; ∆; Υ e :: t, ∆1, Υ1 Γ meth meth class def P Note that Γ is the type environment as explained earlier; ∆(∆1) denotes the size constraint, which holds for the size variables associated with Γ (Γ and t) for expression e before (after) its evaluation; t is an annotated type.Also, Υ (Υ1) is used to denote the available memory space in terms of bag abstraction before (after) the evaluation.
We present a few key syntax-directed type rules in Fig 5, with the rest of the rules in the technical report.Before that, let us describe some notations used by the type rules.
We extend this function to annotated types (and type environments), as follows: noX (t) = df noX (V(t)).Also, we use n * = fresh() to generate new size variables n * .We extend it to annotated type, so that t = fresh(t) will return a new type t with the same underlying type as t but with fresh size variables instead.Function rename(t1, t2) returns an equality substitution, e.g.rename(Int r , Int s )=[r →s ].The operator ∪ combines two domain disjoint substitutions into one.
The function fdList is used to retrieve a full list of fields for a given class, together with its size relation.The function inv is used to retrieve the size invariant that is associated with each type.This function shall also be extended to type environment and list of types.The function V field classifies size variables from each field into three groups : (i) immutable, (ii) mutable but unique, (iii) otherwise (non-trackable).

Assignment
The [ASSIGN] rule captures imperative updates (to object fields and variables) by modifying the current size constraint to a new updated state with changes to the imperative size variables from the LHS.From the rule, note that Γ w :: t, φ, Y is to identify Y as a set of imperative size variables and also to gather a constraint φ for this set.The subtype relation t1 <: t, ρ will return a substitution that maps the size variables of supertype to that of the subtype.This mapping ignores all non-trackable size variables that may be globally aliased, but immutable and unique mutable size variables are captured.

Memory Operations
The heap space is directly changed by the new and dispose primitives.Their corresponding type rules, [NEW] and [DISPOSE], would ensure that sufficient memory is available for consumption by new and will credit back space relinquished by dispose.The memory effect is accumulated according to the flow of computation.Consider: The new operation consumes a List node, while the dispose operation releases back a List node.The net effect is that available memory Υ is unchanged.However, due to the order of the two operations, we require ∆ Υ {(List, 1)} which affects the maximum memory required.
Another rule which has a direct effect on memory is the method invocation rule [IMI].Sufficient memory must be available for consumption prior to each call (as specified by ∆1 Υ c), with the net memory release added back in the end (as specified by Υ1 = Υ − c r ).Each method precondition must be met by the pre-state of its caller.This is checked by ∆ ≈ > V(Γ ) ∃V( c)∪V( r )•ρ φpr which uses a relation ≈ >X , defined as: Note that Vu returns size variables in unprimed form, e.g.Vu(x =z+1∧y=2) = {x, y, z}.

Conditional
Our type rule for conditional [IF] is able to track both the size-constraints and memory usages in a path-sensitive manner.Path-sensitivity is encoded by adding b =1 and b =0 to the pre-states of the two branches, respectively.We achieve path-sensitivity for memory usage specification by integrating it with relational size constraints derived.Take note that the unify operation merges the post-state constraints and memory usages from the two branches via a disjunction, a formal definition and an example can be found in our report [10].Path-sensitivity makes our analysis more accurate and is critical for analysing the memory requirement of recursive methods.

Method Declaration
Each method declaration is checked to see if its definition is consistent with the memory usage specification given in its declaration header by the [METH] rule.The initial memory is c.The final available memory of the method body e is Υ1 which must not be less than the declared net memory release (as specified by φpr∧∆1 Υ1 r ).Function subtyping for the OO paradigm is used to support method overriding.This is captured by the [OVERRIDE] rule in Fig 5 .Each method which overrides another is expected to be contravariant on its precondition (and memory consumption) and covariant on its postcondition (and memory releases).

Soundness of Type System
We have proposed a small-step operational semantics (denoted by → transitions) instrumented with alias and size notations [10], and have also formalised two safety theorems for our type rules.The first theorem states that each well-typed expression preserves its type under reduction with a runtime environment Π and a store that are consistent with the compile-time counterparts, Γ (type environment) and Σ (store typing).Also, final size constraint is consistent with the value obtained on termination.
Proof: By induction over the depth of type derivation for expression e.Details are given in the technical report [10].
The second safety theorem on progress captures the fact that well-typed programs cannot go wrong.Specifically, this theorem guarantees that no memory adequacy errors are ever encountered for well-typed MEMJ programs, as follows: Proof: By induction over the depth of type derivation for expression e.Details are given in the technical report [10]. 2

Implementation
We have constructed a type checker for MEMJ, and have also built a preprocessor to allow a more expressive language to be accepted.The entire prototype was built using a Haskell compiler [18] where we have added a library (based on [19]) for Presburger arithmetic constraint-solving.
The main objective of our initial experiments is to show that our memory usage specification mechanism is expressive and that such an advanced form of type checking is viable.We converted to MEMJ a set of programs from the Java version of the Olden benchmark suite [7] and another set of smaller programs from the RegJava benchmark [11], before subjecting them to memory adequacy checking.Our initial experimental results are encouraging; however this is a proof-of-concept implementation and there is scope for optimization and more exhaustive experimentation.Figure 6 summarises the statistics obtained for each program that we have verified via our type checker.Column 3 illustrates the size and memory annotation overheads which must be made in the header declarations of each class and method.Columns 4 and 5 highlight the CPU times used (in seconds) for alias and memory checking, respectively.Our experiments were done under Redhat Linux 9.0 platform on Pentium 2.4 GHz with 768MB main memory.Except for the perimeter program (which has more conditionals from using a quadtree data structure), all programs take under 10 seconds to verify, despite them being medium-sized programs and the high complexity of Presburger solving.We attribute this to the fact that memory declarations are verified in a summary-based fashion for each method definition.The last column highlights the number of methods that have been successfully verified as using memory spaces that are bounded by symbolic Presburger formulae.All methods' heap usage could be statically bounded, except2 for a method in Voronoi that has an allocation inside a loop, with a complex termination condition.We have also conducted a set of experimental results to evaluate on the effectiveness of memory inference, in conjunction with our explicit memory recovery scheme.We modified IBM's Jikes RVM [2,16] to provide support for explicit dispose operation and instrumented its memory system to capture total allocation (c) and actual high watermark (b).We then compare it against the predicted memory requirement (a) from our memory inference.We count the number of objects created and reused.As can be seen in Fig 7, our memory inference is accurate for the RegJava benchmark.Except for sieve, most of the programs have high degree of memory reuse which were facilitated by our use of the dispose operation for memory recovery.

Related Work
Past research on memory models for object-oriented paradigm have focused largely on efficiencyand safety.We are unaware of any prior type-based work on analysing heap memory usage by OO programs for the purpose of checking for memory adequacy.The closest related work on memory adequacy are based on first-order functional paradigm, where data structures are mostly immutable and thus easier to handle.
Hughes and Pareto [15] proposed a type and effect system on space usage estimation for a first-order functional language, extended with region language constructs of Tofte and Talpin's [20].The use of region model facilitates recovery of heap space.However, as each region is only deleted when all its objects become dead, more memory than necessary may be used, as reported by [4].
Hofmann and Jost [14] proposed a solution to obtain linear bounds on the heap space usage of first-order functional programs.A key feature of their solution is the use of linear typing which allows the space of each last-use data constructor (or record) to be directly recycled by a matching allocation.With this approach, memory recovery can be supported within each function, but not across functions in general.Moreover, their model does not track the symbolic sizes of data structures.Nevertheless,one significant advance of their work is an inference mechanism through linear programming (LP) technique.The main advantage of LP technique is that no fix-point analysis is required, but it restricts the memory effects to a linear form without disjunction.
Apart from the above memory analysis work on high level languages, Aspinall and Compagnoni [3] presented a first-order linearly typed assembly language to allow safe reuse of heap space.Their system is a target for the compilation of a functional programming language with a similar type systems (e.g.Hofmann's LFPL).More recently, Cachera et.al. [6] proposed a constraint-based memory analysis for Java Bytecode-like languages.For a given program their loop-detecting algorithm can detect methods and instructions that execute an unbounded number of times, thus can be used to check whether the memory usage is bounded or not.However, their analysis cannot check whether a given amount of memory is adequate or not, while our system does.memory management systems.In particular, bounded memory regions can result in better performance.Synergistically, region-based system can provide timely recovery for shared objects that are dead, providing us with tighter memory bounds.
Here, each dead-set Θ(Θ1) captures the set of references with consumed uniqueness before(after) the evaluation of expression e. Γ is a type enviroment which maps variables to their annotated types.Other type judgements for methods, classes and programs have the following forms.

Fig. 4 .
Fig. 4. Automatic Insertion of dispose operation true if w is a parameter variable, otherwise it returns false (for fields and local variables).The function ann extracts the alias of an annotated type, ann(τ v * @A) = A. A conditional is expressed as ξ1 ¡ b £ ξ2 = df ξ1, if b; ξ2, otherwise.