Skip to content

Commit

Permalink
Refactored to ignore empty cells
Browse files Browse the repository at this point in the history
  • Loading branch information
jsonkenl committed Jun 13, 2016
1 parent 394b577 commit c96c1f8
Show file tree
Hide file tree
Showing 9 changed files with 28 additions and 18 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
- Created an `Xlsxir.TableId` module which controls an agent process that temporarily holds a table identifier during the extraction process.
- Refactored `Xlsxir` access functions to work with `Xlsxir.multi_extract/3` whereby a table identifier is passed through the various functions to specify which ETS process is to be accessed.
- Refactored `Xlsxir.SaxParser`, `Xlsxir.ParseWorksheet` and `Xlsxir.Worksheet` modules to support new functionality.
- Refactored `Xlsxir.ParseWorksheet` to ignore empty cells.
- Updated documentation and tests
- Fixed a few minor bugs that were generating warning messages.

Expand Down
2 changes: 1 addition & 1 deletion OVERVIEW.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ Refer to [API Reference](https://hexdocs.pm/xlsxir/api-reference.html) for more

## Considerations

Cell references are formatted as a string (i.e. "A1"). Strings will be returned as type `string`, resulting values for functions from within the worksheet are returned as type `string`, `integer` or `float` depending on the type of the resulting value, data formatted as a number in the worksheet will be returned as type `integer` or `float`, and ISO 8601 date formatted values will be returned in Erlang `:calendar.date()` type format (i.e. `{year, month, day}`). Xlsxir does not currently support dates prior to 1/1/1900.
Cell references are formatted as a string (i.e. "A1"). Strings will be returned as type `string`, resulting values for functions from within the worksheet are returned as type `string`, `integer` or `float` depending on the type of the resulting value, data formatted as a number in the worksheet will be returned as type `integer` or `float`, and ISO 8601 date formatted values will be returned in Erlang `:calendar.date()` type format (i.e. `{year, month, day}`). Xlsxir does not currently support dates prior to 1/1/1900. Empty cells are ignored, so be careful when accessing row or column data.

## Contributing

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ Refer to [Xlsxir documentation](https://hexdocs.pm/xlsxir/index.html) for more d

## Considerations

Cell references are formatted as a string (i.e. "A1"). Strings will be returned as type `string`, resulting values for functions from within the worksheet are returned as type `string`, `integer` or `float` depending on the type of the resulting value, data formatted as a number in the worksheet will be returned as type `integer` or `float`, and ISO 8601 date formatted values will be returned in Erlang `:calendar.date()` type format (i.e. `{year, month, day}`). Xlsxir does not currently support dates prior to 1/1/1900.
Cell references are formatted as a string (i.e. "A1"). Strings will be returned as type `string`, resulting values for functions from within the worksheet are returned as type `string`, `integer` or `float` depending on the type of the resulting value, data formatted as a number in the worksheet will be returned as type `integer` or `float`, and ISO 8601 date formatted values will be returned in Erlang `:calendar.date()` type format (i.e. `{year, month, day}`). Xlsxir does not currently support dates prior to 1/1/1900. Empty cells are ignored, so be careful when accessing row or column data.

## Planned Development

Expand Down
2 changes: 1 addition & 1 deletion doc/Xlsxir.html
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ <h1>


<section id="moduledoc" class="docstring">
<p>Extracts and parses data from a <code class="inline">.xlsx</code> file to an Erlang Term Storage (ETS) process and provides various functions for accessing the data.</p>
<p>Extracts and parses data from a <code class="inline">.xlsx</code> file to an Erlang Term Storage (ETS) process and provides various functions for accessing the data. <strong>Warning:</strong> empty cells are ignored.</p>

</section>

Expand Down
2 changes: 1 addition & 1 deletion doc/api-reference.html
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ <h1 class="section-heading">Modules</h1>
<div class="summary-row">
<div class="summary-signature"><a href="Xlsxir.html">Xlsxir</a></div>

<div class="summary-synopsis"><p>Extracts and parses data from a <code class="inline">.xlsx</code> file to an Erlang Term Storage (ETS) process and provides various functions for accessing the data</p>
<div class="summary-synopsis"><p>Extracts and parses data from a <code class="inline">.xlsx</code> file to an Erlang Term Storage (ETS) process and provides various functions for accessing the data. <strong>Warning:</strong> empty cells are ignored</p>
</div>

</div>
Expand Down
2 changes: 2 additions & 0 deletions doc/changelog.html
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,8 @@ <h2>1.3.0</h2>
</li>
<li>Refactored <a href="Xlsxir.SaxParser.html"><code class="inline">Xlsxir.SaxParser</code></a>, <a href="Xlsxir.ParseWorksheet.html"><code class="inline">Xlsxir.ParseWorksheet</code></a> and <a href="Xlsxir.Worksheet.html"><code class="inline">Xlsxir.Worksheet</code></a> modules to support new functionality.
</li>
<li>Refactored <a href="Xlsxir.ParseWorksheet.html"><code class="inline">Xlsxir.ParseWorksheet</code></a> to ignore empty cells.
</li>
<li>Updated documentation and tests
</li>
<li>Fixed a few minor bugs that were generating warning messages.
Expand Down
2 changes: 1 addition & 1 deletion doc/overview.html
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ <h2>Basic Usage</h2>
<p>When using <a href="Xlsxir.html#extract/3"><code class="inline">Xlsxir.extract/3</code></a>, be sure to <a href="https://hexdocs.pm/xlsxir/Xlsxir.html#close/0">close an open ETS process before trying to parse another worksheet</a> in the same session or process. If you try to open a new <code class="inline">:worksheet</code> ETS process when one already exists, you will get an error. If the parsing of multiple worksheets is desired, use <a href="Xlsxir.html#multi_extract/3"><code class="inline">Xlsxir.multi_extract/3</code></a> instead.</p>
<p>Refer to <a href="https://hexdocs.pm/xlsxir/api-reference.html">API Reference</a> for more detailed examples. </p>
<h2>Considerations</h2>
<p>Cell references are formatted as a string (i.e. “A1”). Strings will be returned as type <code class="inline">string</code>, resulting values for functions from within the worksheet are returned as type <code class="inline">string</code>, <code class="inline">integer</code> or <code class="inline">float</code> depending on the type of the resulting value, data formatted as a number in the worksheet will be returned as type <code class="inline">integer</code> or <code class="inline">float</code>, and ISO 8601 date formatted values will be returned in Erlang <code class="inline">:calendar.date()</code> type format (i.e. <code class="inline">{year, month, day}</code>). Xlsxir does not currently support dates prior to 1/1/1900.</p>
<p>Cell references are formatted as a string (i.e. “A1”). Strings will be returned as type <code class="inline">string</code>, resulting values for functions from within the worksheet are returned as type <code class="inline">string</code>, <code class="inline">integer</code> or <code class="inline">float</code> depending on the type of the resulting value, data formatted as a number in the worksheet will be returned as type <code class="inline">integer</code> or <code class="inline">float</code>, and ISO 8601 date formatted values will be returned in Erlang <code class="inline">:calendar.date()</code> type format (i.e. <code class="inline">{year, month, day}</code>). Xlsxir does not currently support dates prior to 1/1/1900. Empty cells are ignored, so be careful when accessing row or column data.</p>
<h2>Contributing</h2>
<p>Contributions are encouraged. Feel free to fork the <a href="https://github.com/kennellroxco/xlsxir">repo</a>, add your code along with appropriate tests and documentation (ensuring all existing tests continue to pass) and submit a pull request. </p>
<h2>Bug Reporting</h2>
Expand Down
4 changes: 2 additions & 2 deletions lib/xlsxir.ex
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ defmodule Xlsxir do
alias Xlsxir.{Unzip, SaxParser, Worksheet, Timer, Index}

@moduledoc """
Extracts and parses data from a `.xlsx` file to an Erlang Term Storage (ETS) process and provides various functions for accessing the data.
Extracts and parses data from a `.xlsx` file to an Erlang Term Storage (ETS) process and provides various functions for accessing the data. **Warning:** empty cells are ignored.
"""

@doc """
Expand Down Expand Up @@ -157,12 +157,12 @@ defmodule Xlsxir do
"""
def get_map(table_id \\ :worksheet) do
:ets.match(table_id, {:"$1", :"$2"})
|> Enum.sort
|> Enum.reduce(%{}, fn [_num, row], acc ->
row
|> Enum.reduce(%{}, fn [ref, val], acc2 -> Map.put(acc2, ref, val) end)
|> Enum.into(acc)
end)
|> Enum.sort
end

@doc """
Expand Down
29 changes: 18 additions & 11 deletions lib/xlsxir/parse_worksheet.ex
Original file line number Diff line number Diff line change
Expand Up @@ -53,20 +53,27 @@ defmodule Xlsxir.ParseWorksheet do

def sax_event_handler({:endElement,_,'c',_}, %Xlsxir.ParseWorksheet{row: row} = state) do
cell_value = format_cell_value([state.data_type, state.num_style, state.value])
%{state | row: Enum.into(row, [[to_string(state.cell_ref), cell_value]]), cell_ref: "", data_type: "", num_style: "", value: ""}

if cell_value do
%{state | row: Enum.into(row, [[to_string(state.cell_ref), cell_value]]), cell_ref: "", data_type: "", num_style: "", value: ""}
else
%{state | row: row, cell_ref: "", data_type: "", num_style: "", value: ""}
end
end

def sax_event_handler({:endElement,_,'row',_}, state) do
[[row]] = ~r/\d+/ |> Regex.scan(state.row |> List.first |> List.first)
unless Enum.empty?(state.row) do
[[row]] = ~r/\d+/ |> Regex.scan(state.row |> List.first |> List.first)

if TableId.alive? do
state.row
|> Enum.reverse
|> Worksheet.add_row(row, TableId.get)
else
state.row
|> Enum.reverse
|> Worksheet.add_row(row)
if TableId.alive? do
state.row
|> Enum.reverse
|> Worksheet.add_row(row, TableId.get)
else
state.row
|> Enum.reverse
|> Worksheet.add_row(row)
end
end
end

Expand All @@ -80,7 +87,7 @@ defmodule Xlsxir.ParseWorksheet do
defp format_cell_value(list) do
case list do
[ _, _, nil] -> nil # Cell with no value attribute
[ _, _, ""] -> "" # Empty cell with assigned attribute
[ _, _, ""] -> nil # Empty cell with assigned attribute
[ 'e', nil, e] -> List.to_string(e) # Type error
[ 's', _, i] -> SharedString.get_at(List.to_integer(i)) # Type string
[ nil, nil, n] -> convert_char_number(n) # Type number
Expand Down

0 comments on commit c96c1f8

Please sign in to comment.