About DataBasic

DataBasic is a free suite of easy-to-use web tools for beginners that introduce concepts of working with data. These simple tools make it easy to work with data in fun ways, so you can learn how to find great stories to tell. Start using these tools today by clicking each of the headers below. At the bottom of the page you will also find classroom activity guides and introductory videos to use in your classroom, workshop, or other learning environment. You can see all of this and more at Databasic.io

WordCounter analyzes your text and tells you the most common words and phrases. This tool helps you count words, bigrams, and trigrams in plain text. This is often the first step in quantitative text analysis.

WTFcsv tells you WTF is going on with your .csv files. Data arrives at your doorstep in the form of a spreadsheet but how do you find a story in rows and columns? WTFcsv provides the first step by characterizing each column's data type and contents so that you can ask more questions.

SameDiff compares two or more text files and tells you how similar or different they are. It uses a cosine similarity algorithm to rate whether the documents are really similar or totally different.

DataBasic was created by Catherine D'ignazio and Rahul Bhargava through a collaboration with the MIT Center for Civic Media. It was generously supported by a grant from the Knight Foundation’s Prototype Fund.

  • WTFcsv provides the first step by characterizing each column's data type and contents so that you can ask more questions.
  • SameDiff compares one corpus of text to another corpus of text to show you similarities and differences. It uses a cosine similarity algorithm to rate whether the documents are really similar or totally different.
  • WordCounter helps you count words, bigrams, and trigrams in plain text. This is often the first step in quantitative text analysis.
  • Rahul presenting WTFcsv at the first DataBasic public workshop
  • Catherine and Rahul trying to look casual...not so easy!
  • Catherine presenting an overview of DataBasic at the first public workshop held at MIT




Making Data Fun

Spreadsheets and large Word documents can be boring and intimidating. We think working with data should be more fun! Take a look at our activity guides for hands-on approaches to analyzing your data using these tools. Activity guides combined with fun data sets like music lyrics can help you take a playful approach to finding stories in your data. Don't you want to know "what the fudge" is going on with your CSV file?

Doing One Thing Well

There are tons of tools for working with data; many so complex they are intimidating to even start with. We've built each DataBasic tool to focus on doing one thing well, so you know exactly what each can be used for. This lets us concentrate on making that one thing both simple and powerful.

Focused on Learning

Sometimes faster isn't better. If you really want to learn how to work with data to find stories to tell, you'll need to use tools that do more than just give you a chart as quickly as possible. That's why we've built DataBasic with learners in mind. Don't know what some technical word means? Hover over it to see a quick definition. Not sure how to use a tool? Try some of the fun sample data like music lyrics and UFO sightings we've included to get started.

Fitting in to Your Pipeline

Working with data online always involves using a mishmash of tools. We're not going to solve that problem anytime soon. So we've built DataBasic to take in a variety of types of data, and output results in the formats you're used to. Read some content from website and download CSV files of your results. These are just two of the ways we try to introduce you to the way most folks work data online.

Ensuring Your Privacy

With more and more data-centric tools moving online, sometimes it can be hard to tell where your data is going and what will happen with it after you upload it. We store information you upload for only the amount of time it takes us to analyze it, then we delete it. The aggregate results we show you - metadata - are kept for 60 days, and then we delete them. All communications are over https, so other folks can't eavesdrop on the data as you upload it.

By Educators, for Educators

Work with high-schoolers? Journalists? Community groups? Graduate students? Us too!!! That's why we designed and tested the DataBasic tools and activities in our classrooms and in workshops with those audiences. These tools were born out of frustration with things we were trying to use in our undergraduate classes, so we feel your pain.

Accessible to All

Digital technologies have empowered those with disabilities in a variety of ways, and new web standards have made it easier than ever to build new tools that are accessible to all. DataBasic implements those technologies to support screen readers so the visually impaired can start working with data in new and exciting ways.

DataBasic News

Learning with the Knight Foundation

Learning with the Knight Foundation

A postmortem and reflection from the Knight Foundation close out event hosted with Maya Design

DataBasic: A Suite of Data Tools For The Beginner

DataBasic: A Suite of Data Tools For The Beginner

A review of DataBasic by award winning journalist Matt Carroll

DataBasic MIT Beta Workshop A Success

DataBasic MIT Beta Workshop A Success

A recap of the successful 40+ person DataBasic workshop held at MIT