Skip to content

PolyMathOrg/DataFrame

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pharo DataFrame

Build status Coverage Status License

DataFrame is a tabular data structure for data analysis in Pharo.

Installation

To install DataFrame v2.0, go to the Playground (Ctrl+OW) in your Pharo image and execute the following Metacello script (select it and press Do-it button or Ctrl+D):

Metacello new
  baseline: 'DataFrame';
  repository: 'github://PolyMathOrg/DataFrame:v2.0/src';
  load.

Use this script if you want the latest version of DataFrame:

Metacello new
  baseline: 'DataFrame';
  repository: 'github://PolyMathOrg/DataFrame/src';
  load.

How to depend on it?

If you want to add a dependency on DataFrame to your project, include the following lines into your baseline method:

spec
  baseline: 'DataFrame'
  with: [ spec repository: 'github://PolyMathOrg/DataFrame/src' ].

If you are new to baselines and Metacello, check out the Baselines tutorial on Pharo Wiki.

What are data frames?

Data frames are the one of the essential parts of the data science toolkit. They are the specialized data structures for tabular data sets that provide us with a simple and powerful API for summarizing, cleaning, and manipulating a wealth of data sources that are currently cumbersome to use.

A data frame is like a database inside a variable. It is an object which can be created, modified, copied, serialized, debugged, inspected, and garbage collected. It allows you to communicate with your data quickly and effortlessly, using just a few lines of code. DataFrame project is similar to pandas library in Python or built-in data.frame class in R.

Very simple example

In this section I show a very simple example of creating and manipulating a little data frame. For more advanced examples, please check the DataFrame Booklet.

Creating a data frame

weather := DataFrame withRows: #(
  (2.4 true rain)
  (0.5 true rain)
  (-1.2 true snow)
  (-2.3 false -)
  (3.2 true rain)).

Removing the third row of the data frame

weather removeRowAt: 3.

Adding a row to the data frame

weather addRow: #(-1.2 true snow) named:''.

Replacing the data in the first row and third column with 'snow'

weather at:1 at:3 put:#snow.

Transpose of the data frame

weather transposed.

DataFrame Booklet

For more information, please read Data Analysis Made Simple with Pharo DataFrame - a booklet that serves as the main source of documentation for the DataFrame project. It describes the complete API of DataFrame and DataSeries data structures, and provides examples for each method.

DataFrame Booklet