Programming OPL for speed

by Berthold Daum, 16/8/96

This article discusses some techniques to write speed efficient OPL programs.
A good knowledge of OPL is required.


1. Basics

Despite the fact that OPL modules are translated - OPL is an interpreter language.
The translator does not translate OPL source code into machine code (as a C-compiler
would do) but will translate into a pseudo code. This pseudo code is then interpreted
by a runtime system.
However, interpretation of pseudo code involves overhead. At the other hand the code
is kept compact - loading times for code are reduced.

2. Arithmetic

The pseudo code for the arithmetic instruction 

	c = a + b

looks like this:

	1. push address of c to the stack
	2. push value of a to the stack
	3. push value of b to the stack
	4. add (takes top two values from stack, adds them, and leaves result on stack)
	5. assign (takes address and value from the stack)

From that it follows that
	
	d = a + b * c

	will execute faster than

	e = b * c  :  d = a + e

	But don't overdue it: Complex formulae may become confusing!

	For arithmetic operations the rule is simple: The shorter the faster.
	Short integer is faster than long integer is much faster than floating point.

	Examples:
			if a% < b% : a% = b% : endif

			is faster than a% = max(a%,b%)
			
			(max is a floating point operation, so you end up with three
			 type conversions)

			a% = iabs(b%) is definitely better than  a% = abs(b%)

	We did not find noticable speed differences between integer addition 
	and multiplication. For floating point however, multiplication is about twice
	as slow as addition, and division even slower.

	So for floating point it is better to write

			A+A   instead of 2*A

	and		A*0.1 instead of A/10.0

2a. String operations
    
    A frequent construction is to make a string dependent from a
    boolean expression. This can be done two ways, provided bool% is either
    0 or -1.

	left$("The condition is true",bool% and 255)
	rep$("The condition is true",-bool%)

    The second construction is slightly better, because the second argument
    expression has only one operand instead of two.


3. Control structures

	DO...UNTIL is a fraction faster and shorter than WHILE ... ENDWH.
	DO...UNTIL uses one conditional jump at the end of a loop, while
	WHILE ... ENDWH uses a conditional jump at the beginning of the
	loop and a unconditional jump back at the end of the loop.

	Loop limits in DO or WHILE statements are always computed in full.
	In
		WHILE i% < len(t$)
			i% = i%+1
			.....
		ENDWH

	len(t$) is computed for each pass through the loop. It would be better
	to write
		l% = len(t$)
		WHILE i% < l%
			i% = i%+1
			.....
		ENDWH
	or even
		l% = len(t$)
		WHILE l%
			.....
			l% = l%-1
		ENDWH

	And, of course, anything that can be computed outside of a loop should
	indeed be computed outside of the loop.

	Logic expressions like  A=B and ( B=C or C=D) are always evaluated in
	full, so even if A <> B the runtime system would still evaluate the rest of
	the expression. This is reassuring if you use procedures with side effects
	in these expressions, but it can slow down.

	It might be better to use nested IFs instead.

	The fastest way to test conditions is the direct test of short integers:

			IF a% : print "on" : else : print "off" : endif		(1)

	This is 3 bytes shorter and faster than the equivalent

			IF a% <> 0 : print "on" : else : print "off" : endif	(2)

	This is only true for short integers.
	Long integer or float variables used in a type 1 statement are always expanded by 	the compiler to a type 2 statement. This is important if you test for the result of a 		routine:

			IF bool: : print "on" : endif

	will be expanded to 

			IF bool: <> 0.0 : print "on" : endif

	This is 10 bytes longer and slower then testing an integer routine - so its better to 			define the routine as integer:

			IF bool%: : print "on" : endif


	Testing strings should be done with the len function.

			IF len(t$)       	(involves one operation and one operand)

	is better than 
			IF t$ > ""	(involves one operation and two operands)


	Testing the first character of a string should be done with the ASC function.

			IF asc(t$) = %a

	is better than

			IF left$(t$,1) = "a"

	but note that asc can be ambiguous: Both strings CHR$(0) and "" return 0.


	LOC is a very useful function to test for characters:

		IF loc("psion",chr$(a%))

	is better than

		IF a% = %p or a% = %s or a% = %i or a% = %o or a% = %n


	A very efficient control structure is VECTOR. Using VECTOR instead of lengthy
	IF ... ELSEIF  ... ENDIF  structures saves time and space.

4. Lookup tables

	If your lookup table is small you can hold it in a string and use the LOC
	function to find an entry.
	
	Example: loc("3193,2001,4578,8900",ZIP) would search a 4-digit ZIP
		   in the specified string.
				 
	Note: 	Make sure that there are no ambiguities. The LOC function works
		case independent!

	If the lookup table is larger than 255 bytes you can use an operating system
	function to search for an entry:

  	(must be used as OS-function not as CALL-function):

	Fn $A9 Sub $00 (case sensitive)
	Fn $AA Sub $00 (case insensitive)
    		AX: -> offset of sub-string within buffer
    		BX: length of sub-string
   		CX: size of buffer
    		DX:
    		SI: ADDR(buffer to be searched)
    		DI: ADDR(sub-string)
    		CF: clear if sub-string found, set it not


	If lookup tables become larger it is worth trying some of the advanced techniques:

		- binary search
		- binary tree
		- indexed access
		- hashing
		
	You will find a description of these techniques in any book about
	software engineering.


5. Procedures

	How procedures are implemented:

	Each procedure definition creates a procedure frame consisting
	of a list of parameter types, a list of externals (globals defined elsewhere),
	a list of globals defined in the procedure, a list of arrays, a list of strings,
	some control words, and a list of procedures invoked in that procedure.

	Also, for each procedure an entry in a global procedure table is created.

	Each procedure needs at least 24 bytes for headers and table entries

	Each procedure call consists of a pointer to the name of the called procedure
	in the header of the current procedure. The procedure call is preceded by a
	series of operations that push the parameters and there type descriptors on
	the stack. For one parameter two operations are needed: one for the value
	and one for the type descriptor.

	Procedures always leave a result on the stack, so each procedure call is
	followed by an operation that assigns the result to a variable, or simply removes
	it from the stack.

	When a procedure is called, all parameters and type descriptors
	are pushed to the stack. The runtime system will then retrieve the procedure
	name and will start a search for the procedure in the procedure table.

	Having found that name in the table, it initialize the procedure - clear all local
	variables, initialize strings and arrays, and bind the global variables (see below).

	It will then execute the code. As only small pieces of code are held in memory,
	code is read continually from disk to memory.

	A typical value for a procedure call is about 5 msec on a S3a
	where a typical string operation takes about 0.5 msec.

	Improving the speed:

	If you are using a PC for development, it is certainly a good idea to
	replace small procedures by INCLUDEs and Macros, but the overall
	code size will increase.
       
	On the S3a there is the possibility of caching procedures. Caching means that
	larger blocks of code are read into a cache - so code is read from disk less
	frequently. Also if a procedure is invoked and is found in the cache, the cache
	copy will be used. The OPL runtime system manages the cache and keeps
	the most frequently used procedures in the cache.
	
	A procedure call is about 10 times faster when the procedure in the cache.
	The observed overall speed improvement with the cache was usually 100% - 
	programs run twice as fast.

	By using CACHE ON and CACHE OFF it is possible to restrict
	caching to the most frequently used procedures, but it is also worth
	to cache procedures that contain loops that range over more than a
	few lines of code.
	CACHETIDY can be used to remove all unused procedures from the cache.
	CACHEHDR can be used to determine the optimal size of the cache, and
	CACHEREC can provide statistics about procedures. The PSION programming
	manual discusses these commands in detail.
	
	If you are using recursive procedures caching is a must and saves actually
	memory. Without caching the code of a recursive procedure is appended
	to the program area in the internal memory each time a new procedure level
	is opened. Soon, there will be no memory left. With caching the procedure
	is written only once into the cache and space is only consumed 
	for the local variables (plus some control areas)

	Note: Caching works only for modules compiled in S3a mode.

	The position of a procedure in a module is also important. Each module
	has a procedure table. This list is scanned whenever a procedure is
	called and is not found in the cache.
	A procedure defined at the beginning of the module will be
	found faster. So frequently used procedures should be defined at the
	beginning of a module.

	If you have a complex application with more than one module you have to
	look a bit inside LOADM and UNLOADM. 
	Despite its name, LOADM does not load a module. 
	It only loads the procedure modules table into memory. 
	When a procedure is called all procedure lists are scanned for the 
	procedure name.
	So it can improve performance to UNLOADM unused modules. 
	But to many LOADMs and UNLOADMs can also slow down!


6. Globals

	Each time a procedure is called, the OPL-runtime system will try to resolve
	the unresolved externals of the procedure. To do so, it will browse through
	the whole call hierarchy of procedures and will scan the list of global names
	for that particular name. If found, the reference is resolved. Otherwise it will
	continue searching through the call hierarchy until the root procedure is reached.
	Then you get an error message "Undefined external" and abuse yourself that you
	did not use OPLLINT.

	It follows that the less global variables you have the better.
	Obviously, shorter names will compare a bit faster than longer ones
	(they also take less space).

	Globals should be defined as closest to the actual point of reference as possible
	to avoid searching through the whole call hierarchy. Also define globals that are 			used in frequently called routines at the beginning of the global definitions.

	In very time critical applications it can be worth to replace globals
	with statics. However this is only suitable to store a few bytes.

	Statics are a memory area at the beginning of each application. 
	The area $28 to $34  is reserved for the application and will not be altered by
	the system. You have to use peek and poke to access this memory.

	Example:    	pokew $28,static1%
			static1% = peekw($28)

7. Initialising variables
	
	When a new procedures is invoked, OPL automatically intialises all local and 			global variables defined in that procedure. Numeric variables are initialised to 0 			while string variables are initialised with the empty string "".
	So you only need to initialise variables with a different init-value.

	Note: If you allocate heap memory with alloc, the allocated area will not be
	initialised - you have to poke it to zero. 
	I didn't find a general system routine to initialise a memory area.

	pokeb a% 0 	clears one byte
	pokew a%,0  	clears two bytes
	pokel a%,0	clears four bytes
	pokef a%,0	clears eight bytes
	poke$ a%,rept$(chr$(0),n%-1) : pokeb a%,0  	clears n% bytes


Here is an assembler routine that can initialise up to 32767 bytes:

		Global	pad&(3)
		pad&(1)=&067CCBFF
		pad&(2)=&E9004888
		pad&(3)=&00CBFFF6

		usr(addr(pad&()),ax%,bx%,cx%,0)

	ax% points to beginning of the area to initialise
	bx% is the length to initialise
	cx% is the padding character (0 for clearing)

8. Using system calls

	There are some system calls that can reduce a lengthy and time consuming
	OPL-call to a single call.

	call($00a1,0,len,0,*from,*to)

	copies (len) bytes from *from to *to. To our experience it handles even
	overlapping fields in both directions flawlessly.

	Example: (saves 2 OPL-loops with 999 iterations)

		local a%(1000)

		call($00a1,0,1998,0,addr(a%(2)),addr(a%(1)))
			rem  copies elements 2-1000 to elements 1-999

		call($00a1,0,1998,0,addr(a%(1)),addr(a%(2)))
			rem  copies elements 1-999 to elements 2-1000

	call($00ac,0,0,0,*from,*to)

	copies a zero terminated string from *from to *to

	call($00b9,0,0,0,0,*strng)

	returns the length of a zero terminated string.

	So to convert a zero terminated string into an OPL string:

	call($00ac,0,0,0,from%,uadd(addr(to$),1))  : rem to$ must be 1 byte longer
	pokeb addr(to$),call($00b9,0,0,0,0,from%) : rem to set length



9. Graphics

- 	Use gUPDATE OFF whenever possible.
	During debugging you might want to disable this instruction

- 	Complex drawings perform twice as fast in a invisible window or a bitmap.
    	I found no substantial speed difference between invisible windows and bitmaps.
	After drawing make the window visible with gVISIBLE ON.

-	Using the gPOLY command can speed up the drawing of complex
	lines considerably. The code will be somewhat longer and more
	memory is required to hold the points but it is definitely worth it.

10. Files

- 	File accesses are faster on RAM or internal memory than on FLASH SSD.

- 	Searching in a file:
     	OPD files may not be suitable for all purposes, but they have an advantage 
	when you are searching for a record.
     	The FIND function advances to the next matching record, so you do not have to 			read every record through OPL.
     	The problem with FIND is that it searches in any field of the record.

     	If you want to restrict the search to certain fields, you have to analyse the record.
     	The FINDFIELD function is supposed to search only in the fields specified,
	but to my experiences it won't. Also, it is only available on S3a.

- 	Compression:
	 When OPD file records are deleted or updated, the old record is not really
	 deleted but only marked as deleted. This is done because of the physics of
	 flash memory: When a flash SSD is formatted all bits are set to "1". Following
	 operations can only set bits to "0" but not back to "1".
	 So files need to be compressed now and then to remove any garbage.

	 The PSION file system performs a compression for an OPD file whenever
	 the file is closed (not for files on FLASH SSD). For large files
	 this can be quite time consuming, as the whole file is scanned for
	 deleted records and then rewritten.
	 You can switch off compression (globally) with:

	     pCtrl% = peekw($1c)+$1e			rem addr of control byte
            pokeb pCtrl%,peekb(pCtrl%) or 1    	rem disable compression

	 You can switch compression on again (globally) with:

	     pCtrl% = peekw($1c)+$1e			rem addr of control byte
	     pokeb pCtrl%,peekb(pCtrl%) and $fe  	rem enable compression

	 Make sure to compress your files at least once per session - otherwise
	 they will grow and grow and grow ...

- When reading files with IO-functions (IOOPEN, IOREAD, etc.) don't read the file
  byte by byte or word by word when you can avoid it. Reading a larger block into
  a buffer and processing from that buffer is much faster.
	


11. Benchmarks

	If in doubt, write a short routine to test the speed of an instruction.

	Put the instruction inside a loop:
	
	proc benchm:
	local t&,i%
	t& = int(hour)*3600+minute*60+second
	while i% < 10000
		i% = i%+1
		instruction to test
	endwh
	print (int(hour)*3600+minute*60+second - t&)/10.0,"msec"
	get
	endp


	If you want to test procedure calls, reduce the number of passes to 1000
	and remove the "/10.0".
	If you want precise absolute time, run the program with an empty loop first.
	Then subtract the result from the result for the instruction.

-------------------------------------------------------------------------------------
Berthold Daum is the author of a range of S3/S3a programs:
	ProPoc, RepPoc, HyperPoc, MusiPoc, PicPoc, pocView, PocDoc, F11.
	He has also written an OPL toolbox containing a decompiler for OPL.
	He can be contacted on CompuServe 100026,3365.

This article is copyright Berthold Daum.
This article cannot be included in any other product or package without the
written permission of the author.

