q - Text as Data

q is a command line tool that allows direct execution of SQL-like queries on CSVs/TSVs (and any other tabular text files).

Main features:

Seamless multi-table SQL support, including joins. filenames are just used instead of table names (use - for stdin)
Automatic column name and column type detection (Allows working more naturally with the data)
Full encoding support (input, output and query)

Version 1.4.0 is out and contains the following improvements:

2.5x speed increase for large files - Thanks @Gasol!
Output header support (based on the query's SELECTed columns/aliases)
Additional control over query and output encodings
Bug fixes (see change log for full details)

Recurring visitors

Recurring visitors, I'm really glad you come back here :) 

I would love it if you can drop me a line on gitter or through email with the reasons for your recurring visits. It will help me with improving the tool.

Any requests for features would be greatly appreciated as well.

Please send any feedback you have on the new version and open new issues as needed.

I would love to get any requests and comments you have on the tool.

You can use this gitter chat room for contacting me directly. I'm trying to be available at the chat room as much as possible.

Examples

A beginner's tutorial can be found here.

Example 1:

q -H -t "select count(distinct(uuid)) from ./clicks.csv"

Output 1:

Example 2:

q -H -t "select request_id,score from ./clicks.csv where score > 0.7 order by score desc limit 5"

Output 2:

2cfab5ceca922a1a2179dc4687a3b26e	1.0
f6de737b5aa2c46a3db3208413a54d64	0.986665809568
766025d25479b95a224bd614141feee5	0.977105183282
2c09058a1b82c6dbcf9dc463e73eddd2	0.703255121794

Example 3:

q -t -H "select strftime('%H:%M',date_time) hour_and_minute,count(*) from ./clicks.csv group by hour_and_minute"

Output 3:

07:00	138148
07:01	140026
07:02	121826

Usage Example 4:

q -t -H "select hashed_source_machine,count(*) from ./clicks.csv group by hashed_source_machine"

Output 4:

47d9087db433b9ba.domain.com	400000

Example 5 (total size per user/group in the /tmp subtree):

sudo find /tmp -ls | q "select c5,c6,sum(c7)/1024.0/1024 as total from - group by c5,c6 order by total desc"

Output 5:

mapred hadoop   304.00390625
root   root     8.0431451797485
smith  smith    4.34389972687

Example 6 (top 3 user ids with the largest number of owned processes, sorted in descending order):

Note the usage of the autodetected column name UID in the query.

ps -ef | q -H "select UID,count(*) cnt from - group by UID order by cnt desc limit 3"

Output 6:

root 152
harel 119
avahi 2

Installation

Current stable version is 1.4.0.

Requirements: Just Python 2.5 and up or Python 2.4 with sqlite3 module installed. Python 3.x is not supported yet.

Mac Users

Make sure you run brew update first and then just run brew install q.

Version 1.4.0 has been committed to homebrew and is pending their confirmation (usually in a couple of hours).

Thanks @stuartcarnie for the initial homebrew formula

Manual installation (very simple, since there are no dependencies)

Download the main q executable from here into a folder in the PATH.
Make the file executable.

For Windows machines, also download q.bat here into the same folder and use it to run q.

RPM-Base Linux distributions

Download the version 1.4.0 RPM here here.

Install using rpm -ivh <rpm-name>.

RPM Releases also contain a man page. Just enter man q.

NOTE In Version 1.4.0, the RPM package name has been changed to q-text-as-data. If you already have the old version, just remove it with rpm -e q before installing.

Debian-based Linux distributions

If you're interested in Debian packaing, please drop me a line to [email protected].

Overview

Have you ever stared at a text file on the screen, hoping it would have been a database so you could ask anything you want about it? I had that feeling many times, and I've finally understood that it's not the database that I want. It's the language - SQL.

SQL is a declarative language for data, and as such it allows me to define what I want without caring about how exactly it's done. This is the reason SQL is so powerful, because it treats data as data and not as bits and bytes (and chars).

The goal of this tool is to provide a bridge between the world of text files and of SQL.

Usage

q's basic usage is very simple:q <flags> <query>, but it has lots of features under the hood and in the flags that can be passed to the command.

Simplest execution is q "SELECT * FROM myfile" which prints the entire file.

Complete information can be found here

Implementation

Some implementation details can be found here

Limitations

No checks and bounds on data size
Spaces in file names are not supported yet. I'm working on it.
It is possible that some rare cases of subqueries are not supported yet. Please open an issue if you find such a case. This will be fixed once the tool performs its own full-blown SQL parsing.

Future Ideas

Faster reuse of previous data loading
Allow working with external DB
Real parsing of the SQL, allowing smarter execution of queries.
Smarter batch insertion to the database
Provide mechanisms beyond SELECT - INSERT and CREATE TABLE SELECT and such.

Rationale

Some information regarding the rationale for this tool and related philosophy can be found here

Change log

History of changes can be found here

Contact

Any feedback/suggestions/complaints regarding this tool would be much appreciated. Contributions are most welcome as well, of course.

Harel Ben-Attia, [email protected], @harelba on Twitter

Name		Name	Last commit message	Last commit date
Latest commit History 195 Commits
bin		bin
dist		dist
doc		doc
examples		examples
test		test
.gitignore		.gitignore
README.markdown		README.markdown

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

q - Text as Data

Examples

Installation

Mac Users

Manual installation (very simple, since there are no dependencies)

RPM-Base Linux distributions

Debian-based Linux distributions

Overview

Usage

Implementation

Limitations

Future Ideas

Rationale

Change log

Contact

About

Uh oh!

Releases

Packages

thinkbox/q

Folders and files

Latest commit

History

Repository files navigation

q - Text as Data

Examples

Installation

Mac Users

Manual installation (very simple, since there are no dependencies)

RPM-Base Linux distributions

Debian-based Linux distributions

Overview

Usage

Implementation

Limitations

Future Ideas

Rationale

Change log

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages