Historically, memory protection and privilege levels have been implemented in hardware: memory protection via base / limit registers, segments, or pages; and privilege levels via user / kernel mode bits or rings . Recent mobile code systems, however, rely on software rather than hardware for protection. The switch to software mechanisms is being driven by two needs: portability and performance.
The first argument for software protection is portability. A user-level software product like a browser must coexist with a variety of operating systems. For a Web browser to use hardware protection, the operating system would have to provide access to the page tables and system calls, but such mechanisms are not available universally across platforms. Software protection allows a browser to have platform-independent security mechanisms.
Second, software protection offers significantly cheaper cross-domain calls. To estimate the magnitude of the performance difference, we did two simple experiments. The results below should not be considered highly accurate, since they mix measurements from different architectures. However, the effect we are measuring is so large that these small inaccuracies do not affect our conclusions.
First, we measured the performance of a null call between Common Object Model (COM) objects on a 180 MHz PentiumPro PC running Windows NT 4.0. When the two objects are in different tasks, the call takes 230 sec; when the called object is in a dynamically linked library in the same task, the call takes only 0.2 sec -- a factor of 1000 difference. While COM is a very different system from Java, the performance disparity exhibited in COM would likely also appear if hardware protection were applied to Java. This ratio appears to be growing even larger in newer processors [36,1].
The time difference would be acceptable if cross-domain calls were very rare. But modern software structures, especially in mobile code systems, are leading to tighter binding between domains.
To illustrate this fact, we then instrumented the Sun Java Virtual Machine (JVM) (JDK 1.0.2 interpreter running on a 167 MHz UltraSparc) to measure the number of calls across trust boundaries, that is, the number of procedure calls in which the caller is either more trusted or less trusted than the callee. We measured two workloads. CaffeineMark is a widely used Java performance benchmark, and SunExamples is the sum over 29 of the 35 programs on Sun's sample Java programs page . (Some of Sun's programs run forever; we cut these off after 30 seconds.) Note that our measurements almost certainly underestimate the rate of boundary crossings that would be seen on a more current JVM implementation that used a just-in-time (JIT) compiler to execute much more quickly. Figure 1 shows the results. Both workloads show roughly 30000 boundary crossings per CPU-second. Using our measured cost of 0.2 sec for a local COM call, we estimate that these programs spend roughly 0.5% of their time crossing trust boundaries in the Java system with software protection (although the actual cost in Java may be significantly cheaper).
Using our measured cost of 230 sec for a cross-task COM call, we can estimate the cost of boundary crossings for both workloads in a hypothetical Java implementation that used hardware protection. The cost is enormous: for CaffeineMark, about 6 seconds per useful CPU-second of work, and for SunExamples, about 8 seconds per useful CPU-second. In other words, a naïve implementation of hardware protection would slow these programs down by a factor of 7 to 9. This is unacceptable.
Why are there so many boundary crossings? Surely the designers did not strive to increase the number of crossings; most likely they simply ignored the issue, knowing that the cost of crossing a Java protection boundary was no more than a normal function call. What our experiments show, then, is that the natural structure of a mobile code system leads to very frequent boundary crossings.
Of course, systems using hardware protection are structured differently; the developers do what is necessary to improve performance (i.e., buffering calls together). In other words, they deviate from the natural structure in order to artificially reduce the number of boundary crossings. At best, this neutralizes the performance penalty of hardware protection at a cost in both development time and the usefulness and maintainability of the resulting system. At worst, the natural structure is sacrificed and a significant performance gap still exists.