OPL programming for speed

1. Graphics

- 	Use gUPDATE OFF whenever possible.
	During debugging you might want to disable this instruction

- 	Complex drawings perform twice as fast in a invisible window or a bitmap.
    	I found no substantial speed difference between invisible windows and bitmaps.
	After drawing make the window visible with gVISIBLE ON.

-	Using the gPOLY command can speed up the drawing of complex
	line drawings considerably. The code will be somewhat longer and more
	memory is required to hold the points but it is definitely worth it.

2. Files

- 	File accesses are faster on RAM or internal memory than on FLASH SSD.

- 	Searching in a file:
     	OPD files may not be suitable for all purposes, but they have an advantage 
	when you are searching for a record.
     	The FIND function advances to the next matching record, so you do not have to 			read every record.
     	The problem with FIND is that it searches in any field of the record.
     	If you want to restrict the search to certain fields, you have to analyse the record.
     	The FINDFIELD function is supposed to search only in the fields specified,
	but to my experiences it won't. Also, it is only available on S3a.

- Compression:
	 The PSION file system performs a compression for a OPD file whenever
	 the file is closed (not for files on FLASH SSD). For large files
	 this can be quite time consuming, as the whole file is scanned for
	 deleted records and then rewritten.
	 You can switch off compression (globally) with:

	     pCtrl% = peekw($1c)+$1e			rem addr of control byte
             pokeb pCtrl%,peekb(pCtrl%) or 1    	rem disable compression

	 You can switch compression on again (globally) with:

	     pCtrl% = peekw($1c)+$1e			rem addr of control byte
	     pokeb pCtrl%,peekb(pCtrl%) and $fe  	rem enable compression

	 Make sure to compress your files at least once per session - otherwise
	 they will grow and grow and grow ...

- When reading files with IO-functions (IOOPEN, IOREAD, etc.) don't read the file
  byte by byte or word by word when you can avoid it. Reading a larger block into
  a buffer and processing from that buffer is much faster.
  If you want to search for a substring in a large buffer, use an operating system 	
  function (must be used as OS-function not as CALL-function):

	Fn $A9 Sub $00 (case sensitive)
	Fn $AA Sub $00 (case insensitive)
    		AX: -> offset of sub-string within buffer
    		BX: length of sub-string
   		CX: size of buffer
    		DX:
    		SI: ADDR(buffer to be searched)
    		DI: ADDR(sub-string)
    		CF: clear if sub-string found, set it not
	
3. Procedures

	Intensive use of procedures can slow down a program considerably.
	The reason for that is, that every time a procedure is invoked it is
	copied from the disk into internal memory.
	(In fact, the OPL runtime system copies the procedures in pieces
	to a small internal buffer to execute them. The execution of a large
	procedure can cause many copy operations.)
	A typical value for a procedure call is about 5 msec on a S3a
	where a typical string operation takes about 0.5 msec.

	If you are using a PC for development, it is certainly a good idea to
	replace small procedures by INCLUDEs and Macros, but the overall
	code size will increase.
 	Note, that every procedure needs a minimum of 24 bytes for headers
	and table entries, so defining procedures that contain only a few
       	statements can increase the length of the code.


	On the S3a there is the possibility of caching procedures.
	I have made good experiences with caching. 
	The overall speed improvement was usually 100%.
	By using CACHE ON and CACHE OFF it is possible to restrict
	caching to the most frequently used procedures, but it is also worth
	to cache procedures that contain loops that range over more than a
	few lines of code.

	If you are using recursive procedures caching is a must and saves actually
	memory. Without caching the code of a recursive procedure is appended
	to the program area in the internal memory each time a new procedure level
	is opened. Soon, there will be no memory left. With caching the procedure
	is written only once into the cache and space is only consumed 
	for the local variables (plus some control areas)

	Note: Caching works only for modules compiled in S3a mode.

	The position of a procedure in a module is also important. Each module
	has a procedure list. This list is scanned whenever a procedure is
	called (I am not sure if this happens for cached procedures, too).
	A procedure defined at the beginning of the module will be
	found faster. So frequently used procedures should be defined at the
	beginning of a module.

	If you have a complex application with more than one module you have to
	look a bit inside LOADM and UNLOADM. 
	Despite its name, LOADM does not load a module. 
	It only loads the procedure list of the module into memory. 
	When a procedure is called all procedure lists are scanned for the 
	procedure name.
	So it can improve performance to UNLOADM unused modules. 
	But to many LOADMs and UNLOADMs can also slow down!
	
4. Arithmetic

	For arithmetic the rule is simple: The shorter the faster.
	Short integer is faster than long integer is much faster than floating point.

	Examples:
			if a% < b% : a% = b% : endif

			is faster than a% = max(a%,b%)
			
			(max is a floating point operation, so you end up with three
			 type conversions)

			a% = iabs(b%) is definitely better than  a% = abs(b%)


	We did not find noticable speed differences between integer addition 
	and multiplication. For floating point however, multiplication is about twice
	as slow as addition, and division even slower.

		So for floating point it is better to write

			A+A   instead of 2*A

	and		A*0.1 instead of A/10.0

	
5. Lookup tables

	If your lookup table is small you can hold it in a string and use the LOC
	function to find an entry.
	
	Example: loc("3193,2001,4578,8900",ZIP) would search a 4-digit ZIP
		   in the specified string.
				 
	Note: 	Make sure that there are no ambiguities. The LOC function works
		case independent!

	If lookup tables become larger it is worth trying some of the advanced techniques:

		- binary search
		- binary tree
		- indexed access
		- hashing
		
	You will find a description of these techniques in any book about
	software engineering.

6. Globals

	Globals need some consideration, too.
	Each time a procedure is called, the OPL-interpreter will try to resolve
	the unresolved externals of the procedure. To do so, it will browse through
	the whole call hierarchy of procedures and will scan the list of global names
	for that particular name. If found, the reference is resolved. Otherwise it will
	continue searching through the call hierarchy until the root procedure is reached.
	Then you get an error message "Undefined external" and abuse yourself that you
	did not use OPLLINT.

	From the above it follows that the less global variables you have the better.
	Obviously, shorter names will compare a bit faster than long ones
	(they also take less space), and it is also a good idea to define globals used 
	in frequently called procedures at the beginning of the global definitions.

7. Control structures

	DO...UNTIL is a fraction faster and shorter than WHILE ... ENDWH.
	DO...UNTIL uses one conditional jump at the end of a loop, while
	WHILE ... ENDWH uses a conditional jump at the beginning of the
	loop and a unconditional jump back at the end of the loop.

	Loop limits in DO or WHILE statements are always computed in full.
	In
		WHILE i% < len(t$)
			i% = i%+1
			.....
		ENDWH

	len(t$) is computed for each pass through the loop. It would be better
	to write
		l% = len(t$)
		WHILE i% < l%
			i% = i%+1
			.....
		ENDWH
	or even
		l% = len(t$)
		WHILE l%
			.....
			l% = l%-1
		ENDWH

	And, of course, anything that can be computed outside of a loop should
	indeed be computed outside of the loop.

	Logic expressions like  A=B and ( B=C or C=D) are always evaluated in
	full, so even if A <> B the runtime system would still evaluate the rest of
	the expression. This is reassuring if you use procedures with side effects
	in these expressions, but it can slow down.
	It might be better to use nested IFs instead.

	The fastest way to test conditions is the direct test of short integers:

			IF a% : print "on" : else : print "off" : endif		(1)

	This is 3 bytes shorter and faster than the equivalent

			IF a% <> 0 : print "on" : else : print "off" : endif	(2)

	This is only true for short integers.
	Long integer or float variables used in a type 1 statement are always expanded by
	the compiler to a type 2 statement. This is important if you test for the result of a 	
	routine:

			IF bool: : print "on" : endif

	will be expanded to

			IF bool: <> 0.0 : print "on" : endif

	This is 10 bytes longer and slower then testing a integer routine - so its better to 		
	define the routine as integer:

			IF bool%: : print "on" : endif

	Testing strings should be done with the len function.

			IF len(t$)

	is better than
			IF t$ > ""

	Testing the first character of a string should be done with the ASC function.

			IF asc(t$) = %a

	is better than

			IF left$(t$,1) = "a"

	but note that asc can be ambiguous: Both strings CHR$(0) and "" return 0.


	A very efficient control structure is VECTOR. Using VECTOR instead of lengthy
	IF ... ELSEIF  ... ENDIF  structures saves time and space.

8. Initialising variables
	
	When a new procedures is invoked, OPL automatically intialises all local and 		
	global variables defined in that procedure. Numeric variables are initialised to 0 		
	while string variables are initialised with the empty string "".
	So you only need to initialise only variables with a differing init-value.

Note: 	If you allocate heap memory with alloc, the allocated area will not be
	initialised - you have to poke it to zero. I didn't find a system routine to initialise
	a memory area, so here is an assembler routine that does the trick.

		Global	pad&(3)
		pad&(1)=&067CCBFF
		pad&(2)=&E9004888
		pad&(3)=&00CBFFF6

		usr(addr(pad&()),ax%,bx%,cx%,0)

	ax% points to beginning of the area to initialise
	bx% is the length to initialise
	cx% is the padding character (0 for clearing)

9. Using system calls

	There are some system calls that can reduce a lengthy and time consuming
	OPL-call to a single call.

	call($00a1,0,len,0,*from,*to)

	copies (len) bytes from *from to *to. To our experience it handles even
	overlapping fields in both directions flawlessly.

	Example: (saves 2 OPL-loops with 999 iterations)

		local a%(1000)
		call($00a1,0,1998,0,addr(a%(2)),addr(a%(1)))
		rem  copies elements 2-1000 to elements 1-999
		call($00a1,0,1998,0,addr(a%(1)),addr(a%(2)))
		rem  copies elements 1-999 to elements 2-1000

	call($00ac,0,0,0,*from,*to)

	copies a zero terminated string from *from to *to

	call($00b9,0,0,0,0,*strng)

	returns the length of a zero terminated string.

	So to convert a zero terminated string into a OPL string:

	call($00ac,0,0,0,from%,uadd(addr(to$),1))  : rem to$ must be 1 byte longer
	pokeb addr(to$),call($00b9,0,0,0,0,from%) : rem to set length

	Note: CALL works with 1 to 6 parameters. When less than 6 parameter
	are used the runtime system will insert zero values for the missing parameter.
	CALL will run faster and require less space when fewer parameters are
	specified.


10. Benchmarks

	If in doubt, write a short routine to test the speed of an instruction.

	Put the instruction inside a loop:
	
	proc benchm:
	local t&,i%
	t& = int(hour)*3600+minute*60+second
	while i% < 10000
		i% = i%+1
		instruction to test
	endwh
	print (int(hour)*3600+minute*60+second - t&)/10.0,"msec"
	get
	endp

	If you want to test procedure calls, reduce the number of passes to 1000
	and remove the "/10.0".
	If you want precise absolute time, run the program with an empty loop first.
	Then subtract the result from the result for the test with the instruction.



Berthold Daum, CIS: 100026,3365