Skip to content

Commit

Permalink
Added license + RPM packaging logic + man page
Browse files Browse the repository at this point in the history
  • Loading branch information
harelba committed Feb 21, 2014
1 parent c7f6a85 commit 1f4c5e6
Show file tree
Hide file tree
Showing 9 changed files with 887 additions and 5 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
build
8 changes: 8 additions & 0 deletions AUTHORS
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Copyright (C) 1992, 1997-2002, 2004-2014 Free Software Foundation, Inc.

Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved.

Harel Ben-Attia <[email protected]> wrote the main program

674 changes: 674 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

8 changes: 6 additions & 2 deletions README.markdown
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,12 @@ smith smith 4.34389972687

## Installation
* Mac users can use homebrew to install q - Just run `brew install q` (Thanks @stuartcarnie)
* Debian/RPM Packages coming soon. Follow [me](https://twitter.com/harelba) on twitter for updates.
* No real installation is required - Just put q in the PATH. Current version is `1.1.6` - Download it [here](https://github.com/harelba/q/archive/1.1.6.tar.gz)
* RPM Package is ready, but still needs to be hosted somewhere. Follow [me](https://twitter.com/harelba) on twitter for updates.
* Debian Package will come soon.
* No real installation is required - Just download using the link below and put q in the PATH.


**Current version is `1.1.6` - Download it [here](https://github.com/harelba/q/archive/1.1.6.tar.gz)**

**NOTE:** If you're using Python 2.4, then you will have to install the sqlite3 package for q to work.

Expand Down
7 changes: 7 additions & 0 deletions THANKS
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
Copyright (C) 1992, 1997-2002, 2004-2014 Free Software Foundation, Inc.

Copying and distribution of this file, with or without modification,
are permitted in any medium without royalty provided the copyright
notice and this notice are preserved.

Thank you Jens Neu for writing the initial RPM package spec
31 changes: 31 additions & 0 deletions create-rpm
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#!/bin/bash

if [ $# -ne 1 ];
then
echo 'create-rpm <version>'
exit 1
fi

VERSION=$1
REAL_PACKAGE_NAME=q
RPM_PACKAGE_NAME=q
MAN_PAGE_SRC=${RPM_PACKAGE_NAME}.manpage.1.ronn

rm -rf build/

mkdir -p build/rpm

TAR_NAME=${RPM_PACKAGE_NAME}-${VERSION}.tar.gz

ronn ${REAL_PACKAGE_NAME}.manpage.1.ronn
rm ${REAL_PACKAGE_NAME}.1.html

cat ${RPM_PACKAGE_NAME}.spec.template | sed "s/VERSION_PLACEHOLDER/$1/" > ${RPM_PACKAGE_NAME}.spec

tar --create --transform s,^,${RPM_PACKAGE_NAME}-$1/, --exclude ${RPM_PACKAGE_NAME}.spec.template -f ${TAR_NAME} *

rpmbuild --define "_topdir `pwd`/build/rpm" -ta ${TAR_NAME}

rm ${RPM_PACKAGE_NAME}.spec
rm ${TAR_NAME}
rm ${REAL_PACKAGE_NAME}.1
25 changes: 22 additions & 3 deletions q
Original file line number Diff line number Diff line change
@@ -1,8 +1,27 @@
#!/usr/bin/env python

# Copyright (C) 1988, 1998, 2000, 2002, 2004-2005, 2007-2014 Free Software
# Foundation, Inc.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3, or (at your option)
# any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA */
#
#
# Name : q (With respect to The Q Continuum)
# Author : Harel Ben Attia - [email protected], harelba @ github, @harelba on twitter
# Requires : python with sqlite3
# Requires : python with sqlite3 (standard in python>=2.6)
#
#
# q allows performing SQL-like statements on tabular text data.
Expand All @@ -13,7 +32,7 @@
#
# Run with --help for command line details
#
q_version = "1.1.6"
q_version = "1.1.7"

import os,sys
import random
Expand Down Expand Up @@ -524,7 +543,7 @@ try:
# Execute the query and fetch the data
m = sql_object.execute_and_fetch(db)
except sqlite3.OperationalError,e:
print "database access error: %s" % e
print "query error: %s" % e
sys.exit(1)
except ColumnCountMismatchException,e:
print e.msg
Expand Down
89 changes: 89 additions & 0 deletions q.manpage.1.ronn
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@

q(1) -- Treating text as a database
===================================

## SYNOPSIS

`q` [OPTIONS] <query>

## DESCRIPTION
q allows performing SQL-like statements on tabular text data. Its purpose is to bring SQL expressive power to manipulating text data using the Linux command line.

query should be an SQL-like query which contains filenames instead of table names (or - for stdin).

Columns are named c1..cN and delimiter can be set using the -d (or -t) option.

query should be enclosed in quotes, to make it one parameter.

All sqlite3 SQL constructs are supported.

See https://github.com/harelba/q for more details.

## EXAMPLES
Example 1: `ls -ltrd * | q "select c1,count(1) from - group by c1"`
This example would print a count of each unique permission string in the current folder.

Example 2: `seq 1 1000 | q "select avg(c1),sum(c1) from -"`
This example would provide the average and the sum of the numbers in the range 1 to 1000

Example 3: `sudo find /tmp -ls | q "select c5,c6,sum(c7)/1024.0/1024 as total from - group by c5,c6 order by total desc"`
This example will output the total size in MB per user+group in the /tmp subtree

## OPTIONS
* `-z` - Means that the file is gzipped. This is detected automatically if the file extension if .gz, but can be useful when reading gzipped data from stdin (since there is no content based detection for gzip)
* `-H <N>` - Tells q to skip N header lines in the beginning of the file - Used naturally for skipping a header line. This can possibly be detected automatically in the future.
* `-d` - Column/field delimiter. If it exists, then splitting lines will be done using this delimiter. If not provided, **any whitespace** will be used as a delimiter.
* `-D` - Column/field delimiter for output. If it exists, then the output will use this delimiter instead of the one used in input. Defaults to input delimiter if provided by `-d`, or space if not.
* `-t` - Shorthand flag for a tab delimiter, one header line format (Same as `-d $'\t' -H 1` - The $ notation is required so Linux would escape the tab...)
* `-f <F>` - Output-formatting option. If you don't like the output formatting of a specific column, you can use python formatting in order to change the output format for that column. See below for details
* `-e <E>` - Specify the text encoding. Defaults to UTF-8. If you have ASCII only text and want a 33% speedup, use `-e none`. Unfortunately, proper encoding/decoding has its price.
* `-b` - Beautify the output. If this flag exists, output will be aligned to the largest actual value of each column. **NOTE:** Use this only if needed, since it is slower and more CPU intensive.


## FORMATTING OPTIONS
The format of F is as a list of X=f separated by commas, where X is a SELECTed column number and f is a python format (http://docs.python.org/release/2.4.4/lib/typesseq-strings.html)

* Example: `-f "3=%-10s,5=%4.3f,1=%x"`

## IMPLEMENTATION
The current implementation is written in Python using an in-memory database, in order to prevent the need for external dependencies. The implementation itself is pretty basic and supports only SELECT statements, including JOINs (Subqueries are supported only in the WHERE clause for now). In addition, error handling is really basic. However, I do believe that it can be of service even at that state.

Please note that there is currently no checks and bounds on data size - It's up to the user to make sure things don't get too big.

Please make sure to read the limitations section as well.

## BUGS AND LIMITATIONS
The following limitations exist in the current implementation:

* Simplistic Data typing and column inference - All types are strings and columns are determined according to the first line of data, having the names of c1,c2,c3 etc. There's a column count hack, which is meant for tolerating a small variation in the column count
* In some cases, SQL uses its own type inference (such as treating cX as a number in case there is a SUM(cX) expression), But in other cases it won't. One such example is using numeric conditions a WHERE clause - such as c5 > 1000. This will not work properly out-of-the-box until we provide type inference. There is a simple (however not clean) way to get around it - Casting the value where needed by adding 0+ before it. Example: `q "SELECT c5,c9 FROM mydatafile WHERE 0+c5 > 1000"`. This is simple enough, but it kind of breaks the idea of treating data as data. This is the reason why the examples below avoided using a meaningful WHERE clause. Once this is fixed, the examples will be updated
* Basic error handling only
* No checks and bounds on data size

## FUTURE PLANS
* Column name inference for files containing a header line
* Column type inference according to actual data
* Smarter batch insertion to the database
* Faster reuse of previous data loading
* Allow working with external DB
* Real parsing of the SQL, allowing smarter execution of queries.
* Full Subquery support (will be possible once real SQL parsing is performed)
* Provide mechanisms beyond SELECT - INSERT and CREATE TABLE SELECT and such.
* Support semi structured data - e.g. log files, where there are several columns and then free text
* Better error handling

## AUTHOR
Harel Ben-Attia ([email protected])

[@harelba](https://twitter.com/harelba) on Twitter

Any feedback/suggestions/complaints regarding this tool would be much appreciated. Contributions are most welcome as well, of course.

## COPYRIGHT
Copyright (C) 1988, 1998, 2000, 2002, 2004-2005, 2007-2014 Free Software Foundation, Inc.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version.

This program is distributed in the hope that it will be useful,but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street - Fifth Floor, Boston, MA 02110-1301, USA


49 changes: 49 additions & 0 deletions q.spec.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
Name: q
Version: VERSION_PLACEHOLDER
Release: 1%{?dist}
Summary: q - Text as a Database.

Group: Applications/Text
License: GPL
URL: https://github.com/harelba/q
Source0: %{name}-%{version}.tar.gz
BuildRoot: %(mktemp -ud %{_tmppath}/%{name}-%{version}-%{release}-XXXXXX)
BuildArch: noarch

Requires: python-libs

%description
Have you ever stared at a text file on the screen, hoping it would have been a database so you could ask anything you want about it?

q solves this problem by allowing to perform SQL-like statements on tabular text data.

%prep
%setup

%install
rm -rf %{buildroot}
%{__install} -d -m 0755 ${RPM_BUILD_ROOT}%{_bindir}
%{__install} -Dm 755 q ${RPM_BUILD_ROOT}%{_bindir}/
%{__install} -Dm 755 q ${RPM_BUILD_ROOT}%{_bindir}/
%{__install} -d -m 0755 ${RPM_BUILD_ROOT}%{_mandir}/man1/
%{__install} -m 0644 %{name}.1 ${RPM_BUILD_ROOT}%{_mandir}/man1/
gzip ${RPM_BUILD_ROOT}%{_mandir}/man1/%{name}.1

%clean
rm -rf %{buildroot}


%files
%defattr(-,root,root,-)
%doc README.markdown exampledatafile LICENSE THANKS AUTHORS
%{_bindir}/q
%doc %_mandir/man1/%{name}.1.gz

%changelog
* Thu Feb 20 2014 Harel Ben-Attia <[email protected]> 1.1.7-1
- Better error reporting
- Fixed python invocation for non stanard locations
- Added man page

* Wed Feb 19 2014 Jens Neu <[email protected]> 1.1.5-1
- initial release

0 comments on commit 1f4c5e6

Please sign in to comment.