Rationale and Methodology

A faculty member said to me:

"I should be able to hit the print button in NetScape and get a printed copy that is just like the journal pages."

So I decided to see just how close one could come to this goal. There seemed to be about four approaches:

Require the reader to install one of the various commercial digital-paper systems (e.g. Acrobat).
Send down the pages as graphics files (e.g. GIF or JPEG).
Download a Java Applet to provide local printing intelligence.
Code the page conservatively using only the very lowest level HTML markup tags.

At the time of this writing these appeared to be the trade-offs:

Digital Paper

The claim is made that Acrobat is automatically installed by the default installation process of both major web browsers (NetScape Navigator and Microsoft Explorer). In my experience this does not seem to be the case, but I regularly deny the installation process its second pass which downloads who-knows-what from Netscape over the net. It is also not clear if Acrobat functions on non-PostScript printers. Furthermore, the need for a native-mode binary does restrict the browsers that can work with this alternative. However, the big win of being able to do really faithful page reproduction dictates that we must continue to examine this alternative

Page Images as Graphics Files

One disadvantages of downloading the page images as graphics files is that these files would, in general, be large and would take a larger amount of realtime to download and thus would be a larger load on the net itself.

It is also not clear how to deal with varying printer resolutions. If the graphics file is downloaded at 72 DPI it will look blocky at 300 DPI, if it is downloaded at 600 DPI then it will take four times as much time and net loading as it would need at 300 DPI. It might be possible to just bite the bullet and say we do it at 600 DPI and if the user has a 1200 DPI printer the blockyness is too small for her to see, and we just eat the extra overhead if she has only a 300 DPI printer.

My colleague Dan Magorian observes that users would be unable to cut-paste text to cite in their own papers. However, if both readable and printable versions of each paper were posted, citations could be copy-pasted from the readable version as opposed to the printable version we are describing.

Java

One might think a Java approach would have a lot of promise. However, at the current time Java Printing is very new and is not implemented in many of the extant browsers. Furthermore, Java printing is considered security sensitive. This is curious, since arbitrary images can be put into a web page and then printed by the user, why should there be control over graphics produced by Java (or alternatively, why aren't all pages printed from the web marked "Danger, Warning Will Robinson, this came from the evil web" like the screen windows opened by Java are marked?

The current Java-in-browser security schemes are not at all fine-grained. Either you have a security certificate (and you can do everything), or you don't (and you can do nothing, not even printing).

Asking the user to grant wide-open privileges to our printing solution looks like a non-starter now, and given the inertia that is slowing down browser's adoption of new Java technology may not be feasable in the near-term.

HTML

To date the work we have done involves carefully hand-tuned HTML and does in fact accomplish quite a bit. It started when we noticed that Navigator attempts to keep all of a table on a single page, and if there is not enough room on the current page will break to a new page. We then realized this behavior could be exploited to synchronize the contents of a web page with the physical paper. We do a two-column format as two nested tables. The outer table contains the entire printed page and is organized as three boxes stacked vertically. The first one contains the heading, the second one the body, and the third one the centered page number at the bottom. The body box, in turn, contains an inner table that is two horizontally stacked boxes, one for each column. Here is a simple example display with borders showing:

Heading text
Left
column
text
here
Right
column
text
here

55

Note that the text in the columns must be manually broken or the boxes will not, in general, be correctly sized.

The remaining problems are:

It was not possible to implement aligned right margins, since it would be necessary to know at document formatting time what the font metrics were, and this varies from browser to browser (in fact, from printer to printer!) I do have an idea about putting each WORD in a table box, but it is too ugly to fit in the margins of this document
Typesizes had to be a bit smaller because of the need for conservative markup to avoid overflowing the browser's idea of a "page".
Formatting markup is currently a very time consuming manual process, and it is not clear how to ease this using automation.

Work proceeds in testing this approach on various powersets of platform/browser/print-technology. I have direct access to these combinations:

Macintosh/Netscape/PostScript
Solaris/Netscape/PostScript

and beta-testers for:

Macintosh/Netscape/LaserSC
Macintosh/Netscape/InkJet

but it would be nice to have beta testers using the Wintel platform and/or Microsoft Explorer.

The next area to tackle is diagrams and display equations. This could be addressed as graphics files (GIF/JPEG) but some of the objections W/R/T resolution described above still apply. I have some ugly ideas about using tables to do equations, but they are beyond the scope of this document.

It is also an open question if this technology is good enough that self-respecting academics will embrace it. I have one data point that says "Yes, this is useful." but only the marketplace can answer this question in the long term.

Back to ZBEN's home page.