todo.txt

TODO

unstack lets you type anything!
Remove 'Clip from' entries?
Picture - show axis=0 and axis=1 and level=0 and level=1

More tricks from http://pandas-docs.github.io/pandas-docs-travis/visualization.html
plus plotting?
carrying join data in a set vs list and performance difference?

Star Wars actors: how many movies did they each go on to be in?

plt.bar(x, y, align='center')
plt.plot()

s.set_index(drop=False)
s.set_index(append=True)
s.reset_index(level=)
s.reset_index(drop=True)

.map()?
df.index.names

s.order() vs s.sort()
s.order(na_position=...)

multiindexes? time series?

df.groupby(['A','B']).sum()

df.swaplevel()
df.reorder_levels()

pd.date_range('1/1/2012', periods=100, freq='S')
pd.date_range('3/6/2012 00:00', periods=5, freq='D')
pd.period_range('1990Q1', '2000Q4', freq='Q-NOV')
ts.resample('5Min', how='sum')

s.reindex()
s.reindex_axis()
ts2.reindex(ts.index, method='ffill') and other values for 'method'

5. nan, filling, interpolating?
5. data conversion - to/from dates and strings and numbers

df.dtypes
df.astype()
df.get_dtype_counts()?
df.select_dtypes()?

df.mean()
df.mean(1)
s.mode()
df.sum()
df.std()

df.drop_duplicates()

df["grade"] = df["raw_grade"].astype("category")
df["grade"].cat.categories = ["very good", "good", "very bad"]
df["grade"] = df["grade"].cat.set_categories(["very bad", "bad",
    "medium", "good", "very good"])

6. data cleanup: merging two book sales reports

Credits

http://pandas.pydata.org/pandas-docs/dev/10min.html#min
http://pandas.pydata.org/pandas-docs/dev/basics.html

An Index is an "ordered multiset"

Series()
s.shift(2)
s.searchsorted()?
s.dtype
s.where(cond) and s.where(cond, other)
s.mask()

date_range()
DataFrame() from dict
df.convert_objects(convert_numeric=True) or convert_dates='coerce'
df.values.dtype?
df.T
df.sort_index() and how every level of a column index must be named
df.where()
df[n:m]
df2[df2['E'].isin(['two','four'])]
df['F'] = s1                       (with alignment by index values)
df2[df2 > 0] = -df2
df.at
df.iat
df.loc[dates[0]]?
df.loc[:,['A','B']]
df.loc[:,'D'] = np.array([5] * len(df))
df.loc['20130102':'20130102',['A','B']]
df.loc['20130102',['A','B']]
df.loc['20130102','A']
df.at['20130102','A']
df.at[dates[0],'A'] = 0
df.iloc[3]
df.iloc[3:5,0:2]
df.iloc[[1,2,4],[0,2]]
df.iloc[1:3,:]
df.iloc[:,1:3]
df.iloc[1,1]
df.iat[1,1]
df.iat[0,1] = 0
df.ix
df.xs()
df.xs(drop_level=False)
df.xs(('one', 'bar'), level=('second', 'first'), axis=1)
  or df.loc[:,('bar','one')]
df.lookup()
df.get(, default=)
df.reindex()
df.sub()
df.apply()
df.duplicated()

pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
pd.DataFrame(np.random.randn(8, 2), index=index, columns=['A', 'B'])
stacked = df2.stack()
stacked.unstack()
stacked.unstack(1)
stacked.unstack(0)
pd.pivot_table(df, values='D', index=['A', 'B'], columns=['C'])
pd.Series(np.random.randn(len(rng)), rng)
ts.tz_localize('UTC')
ts_utc.tz_convert('US/Eastern')
pd.date_range('1/1/2012', periods=5, freq='M')
ts.to_period()
ps.to_timestamp()
ts.index = (prng.asfreq('M', 'e') + 1).asfreq('H', 's') + 9
df.sort("grade")
df.groupby("grade").size()
ts.cumsum()
ts.plot()

df.to_csv('foo.csv')
pd.read_csv('foo.csv')

HDF5, Excel

"The truth value of an array is ambiguous."

pd.isnull()
pd.concat()
pd.merge(left, right, on='key')

df.equals()
df1.combine_first(df2)
df1.combine(df2, combiner)
df.cumprod()
df.quantile()
s.idxmin(), s.idxmax()
cut()
qcut()
df.apply(np.cumsum) and df.apply(Series.interpolate) to run on each series
df.apply(lambda x: x.idxmax())  for date each max value occurred
.apply()  with extra args
df.apply(Series.interpolate)
df.apply(raw=)
df.applymap()
s.map(Series({'six' : 6., 'seven' : 7.})) and back the other way (!)
panel?
s and df can share an index - but truncates; no check on sensibility
panel.apply(lambda x: x*2, axis='items')
panel.apply(lambda x: x.dtype, axis='items')
panel.apply(lambda x: x.sum(), axis='major_axis')
.align()
df.align(df2, join='inner')
df.align(df2, join='inner', axis=0)
df.drop()
s.rename()
df.iterrows()
df.itertuples()

pd.set_option('display.multi_sparse', False)
and the meaning of the word "hierarchical"

If index is not ordered, do ranges work? efficiently?
Dict with tuples as keys vs index with hierarchy.
There is no integer NA, so flips to float the moment you need NA.
NA, NaN, isnull, notnull.
Ranges are inclusive!
.ix[] vs .reindex()

df.transform()
index rename, set_names, set_levels, and set_labels
index & index, index | index

http://pandas.pydata.org/pandas-docs/dev/advanced.html#data-alignment-and-using-reindex ?

df.shape
df.count()
s.count()