Skip to content

Commit

Permalink
Documentation improvements
Browse files Browse the repository at this point in the history
- multiformat column widths (fixes SheetJS#591 h/t @sheeeeep)
- skip nested BIFF files
  • Loading branch information
SheetJSDev committed Mar 20, 2017
1 parent ea7a951 commit 245dd7f
Show file tree
Hide file tree
Showing 40 changed files with 1,637 additions and 141 deletions.
2 changes: 1 addition & 1 deletion .npmignore
Original file line number Diff line number Diff line change
Expand Up @@ -37,5 +37,5 @@ test.js
.flowconfig
*.flow.js
bits/
odsbits/
docbits/
tests/
6 changes: 5 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -144,8 +144,12 @@ misc/coverage.html: $(TARGET) test.js
coveralls: ## Coverage Test + Send to coveralls.io
mocha --require blanket --reporter mocha-lcov-reporter -t 20000 | node ./node_modules/coveralls/bin/coveralls.js

READEPS=$(sort $(wildcard docbits/*.md))
README.md: $(READEPS)
awk 'FNR==1{p=0}/#/{p=1}p' $^ | tr -d '\15\32' > $@

.PHONY: readme
readme: ## Update README Table of Contents
readme: README.md ## Update README Table of Contents
markdown-toc -i README.md

.PHONY: help
Expand Down
141 changes: 123 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,12 @@ with a unified JS representation, and ES3/ES5 browser compatibility back to IE6.
- [Workbook / Worksheet / Cell Object Description](#workbook--worksheet--cell-object-description)
* [General Structures](#general-structures)
* [Cell Object](#cell-object)
* [Data Types](#data-types)
* [Formulae](#formulae)
+ [Data Types](#data-types)
* [Worksheet Object](#worksheet-object)
* [Workbook Object](#workbook-object)
* [Document Features](#document-features)
+ [Formulae](#formulae)
+ [Column Properties](#column-properties)
- [Parsing Options](#parsing-options)
* [Input Type](#input-type)
* [Guessing File Type](#guessing-file-type)
Expand Down Expand Up @@ -443,7 +445,7 @@ text from the number format (`cell.z`) and the raw value if possible.
The actual array formula is stored in the `f` field of the first cell in the
array range. Other cells in the range will omit the `f` field.

### Data Types
#### Data Types

The raw value is stored in the `v` field, interpreted based on the `t` field.

Expand Down Expand Up @@ -482,21 +484,6 @@ Type `z` represents blank stub cells. These do not have any data or type, and
are not processed by any of the core library functions. By default these cells
will not be generated; the parser `sheetStubs` option must be set to `true`.

### Formulae

The A1-style formula string is stored in the `f` field. Even though different
file formats store the formulae in different ways, the formats are translated.

Shared formulae are decompressed and each cell has the correct formula.

Array formulae are stored in the top-left cell of the array block. All cells
of an array formula have a `F` field corresponding to the range. A single-cell
formula can be distinguished from a plain formula by the presence of `F` field.

The `sheet_to_formulae` method generates one line per formula or array formula.
Array formulae are rendered in the form `range=formula` while plain cells are
rendered in the form `cell=formula or value`.

### Worksheet Object

Each key that does not start with `!` maps to a cell (using `A-1` notation)
Expand Down Expand Up @@ -548,6 +535,123 @@ standard, XLS parsing stores core properties in both places. .
The workbook's epoch can be determined by examining the workbook's
`wb.WBProps.date1904` property.

### Document Features

Even for basic features like date storage, the official Excel formats store the
same content in different ways. The parsers are expected to convert from the
underlying file format representation to the Common Spreadsheet Format. Writers
are expected to convert from CSF back to the underlying file format.

#### Formulae

The A1-style formula string is stored in the `f` field. Even though different
file formats store the formulae in different ways, the formats are translated.
Even though some formats store formulae with a leading equal sign, CSF formulae
do not start with `=`.

The worksheet representation of A1=1, A2=2, A3=A1+A2:

```js
{
"!ref": "A1:A3",
A1: { t:'n', v:1 },
A2: { t:'n', v:2 },
A3: { t:'n', v:3, f:'A1+A2' }
}
```

Shared formulae are decompressed and each cell has the formula corresponding to
its cell. Writers generally do not attempt to generate shared formulae.

Cells with formula entries but no value will be serialized in a way that Excel
and other spreadsheet tools will recognize. This library will not automatically
compute formula results! For example, to compute `BESSELJ` in a worksheet:

```js
{
"!ref": "A1:A3",
A1: { t:'n', v:3.14159 },
A2: { t:'n', v:2 },
A3: { t:'n', f:'BESSELJ(A1,A2)' }
}
```

**Array Formulae**

Array formulae are stored in the top-left cell of the array block. All cells
of an array formula have a `F` field corresponding to the range. A single-cell
formula can be distinguished from a plain formula by the presence of `F` field.

For example, setting the cell `C1` to the array formula `{=SUM(A1:A3*B1:B3)}`:

```js
worksheet['C1'] = { t:'n', f: "SUM(A1:A3*B1:B3)", F:"C1:C1" };
```

For a multi-cell array formula, every cell has the same array range but only the
first cell has content. Consider `D1:D3=A1:A3*B1:B3`:

```js
worksheet['D1'] = { t:'n', F:"D1:D3", f:"A1:A3*B1:B3" };
worksheet['D2'] = { t:'n', F:"D1:D3" };
worksheet['D3'] = { t:'n', F:"D1:D3" };
```

Utilities and writers are expected to check for the presence of a `F` field and
ignore any possible formula element `f` in cells other than the starting cell.
They are not expected to perform validation of the formulae!

**Formula Output**

The `sheet_to_formulae` method generates one line per formula or array formula.
Array formulae are rendered in the form `range=formula` while plain cells are
rendered in the form `cell=formula or value`. Note that string literals are
prefixed with an apostrophe `'`, consistent with Excel's formula bar display.

**Formulae File Format Details**

| Storage Representation | Formats | Read | Write |
|:-----------------------|:-------------------------|:-----:|:-----:|
| A1-style strings | XLSX | :o: | :o: |
| RC-style strings | XLML and plaintext | :o: | :o: |
| BIFF Parsed formulae | XLSB and all XLS formats | :o: | |
| OpenFormula formulae | ODS/FODS/UOS | :o: | :o: |

Since Excel prohibits named cells from colliding with names of A1 or RC style
cell references, a (not-so-simple) regex conversion is possible. BIFF Parsed
formulae have to be explicitly unwound. OpenFormula formulae can be converted
with regexes for the most part.
#### Column Properties

Excel internally stores column widths in a nebulous "Max Digit Width" form. The
Max Digit Width is the width of the largest digit when rendered. The internal
width must be an integer multiple of the the width divided by 256. ECMA-376
describes a formula for converting between pixels and the internal width.

Given the constraints, it is possible to determine the MDW without actually
inspecting the font! The parsers guess the pixel width by converting from width
to pixels and back, repeating for all possible MDW and selecting the MDW that
minimizes the error. XLML actually stores the pixel width, so the guess works
in the opposite direction.

The `!cols` array in each worksheet, if present, is a collection of `ColInfo`
objects which have the following properties:

```typescript
type ColInfo = {
MDW?:number; // Excel's "Max Digit Width" unit, always integral
width:number; // width in Excel's "Max Digit Width", width*256 is integral
wpx?:number; // width in screen pixels
wch?:number; // intermediate character calculation
};
```

Even though all of the information is made available, writers are expected to
follow the priority order:

1) use `width` field if available
2) use `wpx` pixel width if available
2) use `wch` character count if available
## Parsing Options

The exported `read` and `readFile` functions accept an options argument:
Expand Down Expand Up @@ -1001,6 +1105,7 @@ OSP-covered specifications:
- [MS-XLDM]: Spreadsheet Data Model File Format
- [MS-EXSPXML3]: Excel Calculation Version 2 Web Service XML Schema
- [XLS]: Microsoft Office Excel 97-2007 Binary File Format Specification
- [MS-OI29500]: Office Implementation Information for ISO/IEC 29500 Standards Support

Open Document Format for Office Applications Version 1.2 (29 September 2011)

Expand Down
2 changes: 1 addition & 1 deletion bits/20_jsutils.js
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ function str2cc(str) {
}

function dup(o/*:any*/)/*:any*/ {
if(typeof JSON != 'undefined') return JSON.parse(JSON.stringify(o));
if(typeof JSON != 'undefined' && !Array.isArray(o)) return JSON.parse(JSON.stringify(o));
if(typeof o != 'object' || o == null) return o;
var out = {};
for(var k in o) if(o.hasOwnProperty(k)) out[k] = dup(o[k]);
Expand Down
34 changes: 24 additions & 10 deletions bits/39_xlsbiff.js
Original file line number Diff line number Diff line change
Expand Up @@ -239,10 +239,11 @@ function parse_RecalcId(blob, length) {
}

/* 2.4.87 */
function parse_DefaultRowHeight (blob, length) {
var f = length == 4 ? blob.read_shift(2) : 0, miyRw;
miyRw = blob.read_shift(2); // flags & 0x02 -> hidden, else empty
function parse_DefaultRowHeight(blob, length) {
var f = blob.read_shift(2);
var fl = {Unsynced:f&1,DyZero:(f&2)>>1,ExAsc:(f&4)>>2,ExDsc:(f&8)>>3};
/* char is misleading, miyRw and miyRwHidden overlap */
var miyRw = blob.read_shift(2);
return [fl, miyRw];
}

Expand Down Expand Up @@ -328,12 +329,13 @@ function parse_MulBlank(blob, length) {
}

/* 2.5.20 2.5.249 TODO: interpret values here */
function parse_CellStyleXF(blob, length, style) {
function parse_CellStyleXF(blob, length, style, opts) {
var o = {};
var a = blob.read_shift(4), b = blob.read_shift(4);
var c = blob.read_shift(4), d = blob.read_shift(2);
o.patternType = XLSFillPattern[c >> 26];

if(!opts.cellStyles) return o;
o.alc = a & 0x07;
o.fWrap = (a >> 3) & 0x01;
o.alcV = (a >> 4) & 0x07;
Expand Down Expand Up @@ -367,16 +369,16 @@ function parse_CellStyleXF(blob, length, style) {
o.fsxButton = (d >> 14) & 0x01;
return o;
}
function parse_CellXF(blob, length) {return parse_CellStyleXF(blob,length,0);}
function parse_StyleXF(blob, length) {return parse_CellStyleXF(blob,length,1);}
function parse_CellXF(blob, length, opts) {return parse_CellStyleXF(blob,length,0, opts);}
function parse_StyleXF(blob, length, opts) {return parse_CellStyleXF(blob,length,1, opts);}

/* 2.4.353 TODO: actually do this right */
function parse_XF(blob, length) {
function parse_XF(blob, length, opts) {
var o = {};
o.ifnt = blob.read_shift(2); o.ifmt = blob.read_shift(2); o.flags = blob.read_shift(2);
o.fStyle = (o.flags >> 2) & 0x01;
length -= 6;
o.data = parse_CellStyleXF(blob, length, o.fStyle);
o.data = parse_CellStyleXF(blob, length, o.fStyle, opts);
return o;
}

Expand Down Expand Up @@ -626,12 +628,24 @@ function parse_XFCRC(blob, length) {
return o;
}

/* 2.4.53 TODO: parse flags */
/* [MS-XLSB] 2.4.323 TODO: parse flags */
function parse_ColInfo(blob, length, opts) {
if(!opts.cellStyles) return parsenoop(blob, length);
var w = opts && opts.biff >= 12 ? 4 : 2;
var colFirst = blob.read_shift(w);
var colLast = blob.read_shift(w);
var coldx = blob.read_shift(w);
var ixfe = blob.read_shift(w);
var flags = blob.read_shift(2);
if(w == 2) blob.l += 2;
return {s:colFirst, e:colLast, w:coldx, ixfe:ixfe, flags:flags};
}


var parse_Style = parsenoop;
var parse_StyleExt = parsenoop;

var parse_ColInfo = parsenoop;

var parse_Window2 = parsenoop;


Expand Down
42 changes: 34 additions & 8 deletions bits/45_styutils.js
Original file line number Diff line number Diff line change
Expand Up @@ -50,17 +50,43 @@ function rgb_tint(hex, tint) {
}

/* 18.3.1.13 width calculations */
/* [MS-OI29500] 2.1.595 Column Width & Formatting */
var DEF_MDW = 7, MAX_MDW = 15, MIN_MDW = 1, MDW = DEF_MDW;
function width2px(width) { return (( width + ((128/MDW)|0)/256 )* MDW )|0; }
function px2char(px) { return (((px - 5)/MDW * 100 + 0.5)|0)/100; }
function char2width(chr) { return (((chr * MDW + 5)/MDW*256)|0)/256; }
function width2px(width) { return Math.floor(( width + (Math.round(128/MDW))/256 )* MDW ); }
function px2char(px) { return (Math.floor((px - 5)/MDW * 100 + 0.5))/100; }
function char2width(chr) { return (Math.round((chr * MDW + 5)/MDW*256))/256; }
function px2char_(px) { return (((px - 5)/MDW * 100 + 0.5))/100; }
function char2width_(chr) { return (((chr * MDW + 5)/MDW*256))/256; }
function cycle_width(collw) { return char2width(px2char(width2px(collw))); }
function find_mdw(collw, coll) {
if(cycle_width(collw) != collw) {
for(MDW=DEF_MDW; MDW>MIN_MDW; --MDW) if(cycle_width(collw) === collw) break;
if(MDW === MIN_MDW) for(MDW=DEF_MDW+1; MDW<MAX_MDW; ++MDW) if(cycle_width(collw) === collw) break;
if(MDW === MAX_MDW) MDW = DEF_MDW;
/* XLSX/XLSB/XLS specify width in units of MDW */
function find_mdw_colw(collw) {
var delta = Infinity, _MDW = MIN_MDW;
for(MDW=MIN_MDW; MDW<MAX_MDW; ++MDW) if(Math.abs(collw - cycle_width(collw)) < delta) { delta = Math.abs(collw - cycle_width(collw)); _MDW = MDW; }
MDW = _MDW;
}
/* XLML specifies width in terms of pixels */
function find_mdw_wpx(wpx) {
var delta = Infinity, guess = 0, _MDW = MIN_MDW;
for(MDW=MIN_MDW; MDW<MAX_MDW; ++MDW) {
guess = char2width_(px2char_(wpx))*256;
guess = (guess) % 1;
if(guess > 0.5) guess--;
if(Math.abs(guess) < delta) { delta = Math.abs(guess); _MDW = MDW; }
}
MDW = _MDW;
}

function process_col(coll/*:ColInfo*/) {
if(coll.width) {
coll.wpx = width2px(coll.width);
coll.wch = px2char(coll.wpx);
coll.MDW = MDW;
} else if(coll.wpx) {
coll.wch = px2char(coll.wpx);
coll.width = char2width(coll.wch);
coll.MDW = MDW;
}
if(coll.customWidth) delete coll.customWidth;
}

/* [MS-EXSPXML3] 2.4.54 ST_enmPattern */
Expand Down
16 changes: 7 additions & 9 deletions bits/67_wsxml.js
Original file line number Diff line number Diff line change
Expand Up @@ -98,14 +98,10 @@ function parse_ws_xml_cols(columns, cols) {
for(var coli = 0; coli != cols.length; ++coli) {
var coll = parsexmltag(cols[coli], true);
var colm=parseInt(coll.min, 10)-1, colM=parseInt(coll.max,10)-1;
delete coll.min; delete coll.max;
if(!seencol && coll.width) { seencol = true; find_mdw(+coll.width, coll); }
if(coll.width) {
coll.wpx = width2px(+coll.width);
coll.wch = px2char(coll.wpx);
coll.MDW = MDW;
}
while(colm <= colM) columns[colm++] = coll;
delete coll.min; delete coll.max; coll.width = +coll.width;
if(!seencol && coll.width) { seencol = true; find_mdw_colw(coll.width); }
process_col(coll);
while(colm <= colM) columns[colm++] = dup(coll);
}
}

Expand All @@ -116,7 +112,9 @@ function write_ws_xml_cols(ws, cols)/*:string*/ {
var p = ({min:i+1,max:i+1}/*:any*/);
/* wch (chars), wpx (pixels) */
width = -1;
if(col.wpx) width = px2char(col.wpx);
if(col.MDW) MDW = col.MDW;
if(col.width);
else if(col.wpx) width = px2char(col.wpx);
else if(col.wch) width = col.wch;
if(width > -1) { p.width = char2width(width); p.customWidth= 1; }
o[o.length] = (writextag('col', null, p));
Expand Down
Loading

0 comments on commit 245dd7f

Please sign in to comment.