Add files via upload

version 1.09
ForwardCom · Apr 25, 2020 · e6331b8 · e6331b8
1 parent d5d9f54
commit e6331b8
Show file tree

Hide file tree

Showing 12 changed files with 500 additions and 308 deletions.
diff --git a/forwardcom.pdf b/forwardcom.pdf
diff --git a/forwardcom.tex b/forwardcom.tex
@@ -1,5 +1,7 @@
 \documentclass[11pt,a4paper,oneside,openright]{report}
 
+% compile with XeLatex or LuaLatex, not PDFLatex
+
 \usepackage[bindingoffset=5mm,left=20mm,right=20mm,top=20mm,bottom=20mm,footskip=10mm]{geometry}
 \usepackage[utf8x]{inputenc}
 \usepackage{hyperref}
@@ -12,9 +14,16 @@
 \usepackage{cmap} % avoid fi ligatures in pdf file
 \usepackage{amsthm} % example numbering
 \usepackage{color}
+\usepackage[T1]{fontenc} % fix problem with underscore not searchable
+\usepackage{fontspec}
+\defaultfontfeatures{Mapping=tex-text}
+%\setmainfont{Verdana}
+\setmainfont{Arial}
+\setsansfont{Arial}
+\renewcommand{\familydefault}{\sfdefault}
+
 
 % modify style
-\renewcommand{\familydefault}{\sfdefault}
 \newtheorem{example}{Example}[chapter] % example numbering
 \lstset{language=C} % formatting for code listing
 \lstset{basicstyle=\ttfamily,breaklines=true}

diff --git a/fwc_abi_standard.tex b/fwc_abi_standard.tex
@@ -71,7 +71,7 @@ \section{Binary data representation} \label{binaryDataRepresentation}
 Integer variables are represented with 8, 16, 32, 64, and optionally 128 bits, signed and unsigned. Signed integers use 2's complement representation. Integer overflow wraps around, except in saturated arithmetic instructions. 
 \vspace{2mm}
 
-Floating point numbers are coded with single precision (32 bits) and double precision (64 bits). There is limited support for half precision (16 bits) and optional support for quadruple precision (128 bits). All follow the IEEE Standard 754-2008.
+Floating point numbers are coded with half precision (16 bits), single precision (32 bits), and double precision (64 bits). Support for quadruple precision (128 bits) is optional. All follow the IEEE 754-2019 standard. Subnormal numbers are supported for half precision, and optionally for single and double precision.
 \vspace{2mm}
 
 Floating point variables with NAN values can contain and propagate diagnostic information about the cause of errors as discussed on page \pageref{nanPropagation}. 
@@ -143,7 +143,7 @@ \section{Function calling convention}\label{chap:functionCallingConventions}
 
 
 \subsubsection{Rationale}
-It is much more efficient to transfer parameters in registers than on the stack. The present proposal allows up to 32 parameters, including variable length vectors, to be transferred in registers, leaving 15 general purpose registers and 16 vector registers for the function to use for other purposes while handing the parameters. This will cover almost all practical cases, so that parameters only rarely need to be stored in memory. 
+It is much more efficient to transfer parameters in registers than on the stack. The present proposal allows up to 32 parameters, including variable length vectors, to be transferred in registers, leaving 15 general purpose registers and 16 vector registers for the function to use for other purposes while handling the parameters. This will cover almost all practical cases, so that parameters only rarely need to be stored in memory. 
 \vspace{2mm}
 
 Nevertheless, we must have precise rules for covering an unlimited number of parameters if the programming language has no limit to the number of parameters. We are putting any extra parameters in a list rather than on the stack as most other systems do. The main reason for this is to make the software independent of whether there is a separate call stack or the same stack is used for return addresses and local variables. The addresses of parameters on the stack would depend on whether there is a return address on the same stack. The list method has further advantages. There will be no disagreement over the order of parameters on the stack and whether the stack should be cleaned up by the caller or the callee. The list can be reused by the caller for multiple calls if the parameters are constant, and the called function can reuse a variable argument list by forwarding it to another function. The function is guaranteed to return properly without messing up the stack even if caller and callee disagree on the number of parameters. Tail calls are possible in all cases regardless of the number and types of parameters. 
@@ -183,7 +183,7 @@ \subsubsection{Method 2}
 Function B is preferably compiled first into an object file. This object file must contain information about which registers are modified by function B. The necessary information is simply a 64-bit number with one bit for each register that is modified (bit 0-31 for r0-r31, and bit 32-63 for v0-v31). Any registers used for parameters and return value are also marked if they are modified by the function.
 \vspace{2mm}
 
-When function A is compiled next, the compiler will look in the object file for B to see which registers it modifies. The compiler will choose some registers not modified by B for data that need to be saved across the call to B. Registers that are modified by B can advantageously be used in A for temporary variables that do not need to be saved across the call to B. Likewise, it will be advantageous to use the same register for multiple temporary variables if their live ranges do not overlap, in order to modify as few registers as possible. The object file for A will contain a list of registers modified by A, including all registers modified by B and by any other functions that A may call. The object file for A contains a reference to function B. This reference must contain information about which registers A expects B to modify. If B is later recompiled, and the new version of B modifies more registers, then the linker will detect the discrepancy and prompt for a recompilation of A.
+When function A is compiled next, the compiler can look in the object file for B to see which registers it modifies. The compiler can take advantage of this information and choose some registers not modified by B for data that need to be saved across the call to B. Registers that are modified by B can be used in A for temporary variables that do not need to be saved across the call to B. Likewise, it will be advantageous to use the same register for multiple temporary variables if their live ranges do not overlap, in order to modify as few registers as possible. The object file for A will contain a list of registers modified by A, including all registers modified by B and by any other functions that A may call. The object file for A contains a reference to function B. This reference must contain information about which registers A expects B to modify. If B is later recompiled, and the new version of B modifies more registers, then the linker will detect the discrepancy and prompt for a recompilation of A.
 \vspace{2mm}
 
 If, for some reason, A is compiled before B or no information is available about B when A is compiled, then the compiler will have to make assumptions about the register use of B. The default assumption is as specified in method 1. Function A may later be recompiled if B violates these assumptions, or simply to improve efficiency. 

diff --git a/fwc_basic_architecture.tex b/fwc_basic_architecture.tex
@@ -10,7 +10,7 @@ \chapter{Basic architecture}
 \section{A fully orthogonal instruction set}
 The ForwardCom instruction set is fully orthogonal in all respects. 
 Where other instruction sets have a large number of different instructions for different register types, operand types, operand sizes, addressing modes, etc., ForwardCom has fewer instructions, but many variants of each instruction. This modular design makes the hardware implementation much simpler.
-The same instruction can use integer operands of all sizes and floating point operands of all precisions. It can use register operands, memory operands or immediate operands. It can use many different addressing modes. Instructions can be coded in short forms with two operands where the same register is used for destination and source operand, or longer forms with three operands. It can work with scalars or vectors of any size. It can have predication or masks for conditional execution at the vector element level, and it can have optional flag inputs for deciding rounding mode, exception control and other details, where appropriate. Data constants of all types can be included in the instructions and compressed in various ways to reduce the instruction size.
+The same instruction can use integer operands of all sizes and floating point operands of all precisions. It can use register operands, memory operands or immediate operands. It can use many different addressing modes. Instructions can be coded in short forms with two operands where the same register is used for destination and source operand, or longer forms with three operands. It can work with scalars or vectors of any size. It can have predication or masks for conditional execution at the vector element level, and it can have optional flag inputs for determining rounding mode, exception control and other details, where appropriate. Data constants of all types can be included in the instructions and compressed in various ways to reduce the instruction size.
 
 \subsubsection{Rationale}
 The orthogonality is implemented by a standardized modular design that makes the hardware implementation simpler. It also makes compilation simpler and more flexible and makes it easier for the compiler to convert linear code to vector code.
@@ -23,7 +23,7 @@ \section{Instruction size}
 An instruction can consist of one, two, or optionally three 32-bit words. The code density can be increased by using tiny instructions of half the size, but the 32-bit unit size is preserved by pairing tiny instructions two-by-two. It is not possible to jump to the second tiny instruction in such a pair of tiny instructions. It is possible to add future extensions with instruction sizes of four or more words.
 
 \subsubsection{Rationale}
-A CISC architecture with many different instruction sizes is inefficient in superscalar processors where we want to execute several instructions per clock cycle. The decoding front end is often a bottleneck. You have to determine the length of the first instruction before you know where the next instruction begins. The ``instruction length decoding'' is a fundamentally serial process which makes it difficult to decode multiple instructions per clock cycle. Some microprocessors have an extra ``micro-operations cache'' after the decoder in order to circumvent this bottleneck.
+A CISC architecture with many different instruction sizes is inefficient in superscalar processors where we want to execute several instructions per clock cycle. The decoding front end is often a bottleneck, especially in the x86 architecture. The decoder has to determine the length of the first instruction before it knows where the next instruction begins. The ``instruction length decoding'' is a fundamentally serial process which makes it difficult to decode multiple instructions per clock cycle. Some microprocessors have an extra ``micro-operations cache'' after the decoder in order to circumvent this bottleneck.
 \vspace{2mm}
 
 Here, it is desired to have as few different instruction lengths as possible and to make it easy to determine the length of each instruction. We want a small instruction size for the most common simple instructions, but we also need a larger instruction size in order to accommodate things like a larger register set, instructions with multiple operands, vector operations with advanced features, 32-bit address offsets, and large immediate constants. This proposal is a compromise between code compactness, easy decoding, and space for advanced features.

diff --git a/fwc_bintools.tex b/fwc_bintools.tex
@@ -158,7 +158,7 @@ \subsection{Command line} \label{assemblerCommandLine}
 
 \vspace{2mm}
 The following options are supported:\\
-\begin{tabular}{|p{22mm}p{140mm}|}
+\begin{tabular}{|p{25mm}p{135mm}|}
 \hline
 -list=name & Make output list file. This is very useful for checking the generated code.\\
 -O0 & Optimization level 0: The assembler finds the smallest possible instruction that fits the specified operands. Two consecutive tiny instructions are joined together if possible. \\
@@ -1177,6 +1177,10 @@ \section{Emulator and debugger} \label{emulator}
 Interactive single-step debugging is currently not supported.
 \vspace{2mm}
 
+The current version of the emulator supports all general instructions but only few system instructions. Integers of 8, 16, 32, and 64 bits are supported. Floating point numbers with half, single, and double precision are supported. Quadruple precision is not supported. Only few instructions with 128 bit integers are supported. 
+Most optional features are supported by the emulator, including exception handling, rounding control, and subnormal numbers.
+\vspace{2mm}
+
 \section{Dump utililty} \label{dumpUtililty}
 
 The dump utility can show metadata from object files and executable files.
@@ -1203,15 +1207,10 @@ \section{Dump utililty} \label{dumpUtililty}
 
 
 \section{Compiling the forw tools} \label{compilingForw}
-
-These tools can be compiled for Windows, Linux, MacOS, and other platforms.
+These tools can be compiled for Windows, Linux, MacOS, and other platforms. 
+See the file forwardcom\_sourcecode\_documentation for details.
 \vspace{2mm}
 
-Compiling for Windows with MS Visual Studio: Use the project file forw.vcxproj.
-
-Compiling with Gnu C++ compiler: Use the makefile.
-
-Other compilers: Make a project containing all the .cpp files. Compile for console mode, preferably 64 bits. The platform must have little-endian memory organization.
 
 \section{Code examples} \label{codeExamples}
 A collection of code examples are provided in the examples folder. You can try an example by assembling, linking, and emulating it as follows: