dataproofer / Dataproofer
- четверг, 31 марта 2016 г. в 03:13:24
CSS
A proofreader for your data
Every day, more and more data is created. Journalists, analysts, and data visualizers turn that data into stories and insights.
But before you can make use of any data, you need to know if it’s reliable. Is it weird? Is it clean? Can I use it to write or make a viz?
This used to be a long manual process, using valuable time and introducing the possibility for human error. People can’t always spot every mistake every time, no matter how hard they try.
Data proofer is built to automate this process of checking a dataset for errors or potential mistakes.
Download a .zip of the latest release from the Dataproofer releases page.
Drag the app into your applications folder.
Select your dataset, which can be either a CSV on your computer, or a Google Sheet that you’ve published to the web.
Once you select your dataset, you can choose which suites and tests run by turning them on or off.
Proof your data, get your results, and feel confident about your dataset.
This repo contains two pieces of code, the core library that runs tests and the electron app which houses the UI. You can get them ready like so:
git clone git@github.com:dataproofer/Dataproofer.git
cd Dataproofer
cd src
npm install
cd ../electron
npm install
You can run the development version of the app from the electron
folder:
cd Dataproofer/electron
npm run electron
If you update the core library (index.js
or src/*
) you will need to npm install
inside Dataproofer/electron
for it to be updated, as we are relying on the "file:" dependency which copies the source instead of downloading it.
See our test to-do list and leave a comment
See our features list and leave a comment
See our smaller issues and leave a comment
See our medium-sized issues and leave a comment
See our larger issues and leave a comment
All tests belong to a suite, which is essentially just a node module that packages a group of tests together. In order to modify a test or add a new test to a suite, you will want to clone the project and link it. Let's say we want to modify the core-suite.
git clone https://github.com/dataproofer/core-suite.git
cd core-suite
npm install
npm link
cd ../Dataproofer
cd electron
npm link dataproofer-core-suite
Now when you change anything inside core-suite
(like editing a test or making a new one) you can see your changes reflected when you run the app. Follow the instructions below for creating a new test in your suite!
require
that test in a suite's index.jsexports
in index.jsTests are made up of a few parts. Here's a brief over-view. For a more in-depth look, dive into the documentation.
This is the name of your test. It shows up in the test-selection screen as well as on the results page
This is a text-only description of what the test does, and what it is meant to check. Imagine you are explaining it to a remarkably intelligent 5-year-old.
This is where the code your test executes lives. Pass it a function that takes in rows and columnHeads
rows is an array of objects from the data. The object uses column headers as the key, and the row’s value as the value.
So if your data looks like this:
President | Year
------------------------
George Washington | 1789
John Adams | 1797
Thomas Jefferson | 1801
Then the first object in your array of rows will look like this:
{ president: ‘George Washington’, year: ‘1789’ } and so on
Generally, to run a test, you are going to want to loop over each row and do some operations on it — counting cells and using conditionals to detect unwanted values.
Helper scripts help you test and display the results of Dataproofer tests. These are a small set of functions we've found ourselves reusing.
For more information, please see the full util
documentation
Tests are run inside a try catch loop in src/processing.js
. You may wish to temporarily remove the try/catch while iterating on a test.
Otherwise, for now we recommend heavy doses of console.log and the Chrome debugger.
Dataproofer saves a copy of the most recently loaded file in the Application Data directory provided to it by the OS.
You can quickly load the file and run the tests by typing loadLastFile()
in the console. This saves you several clicks for loading the file and clicking the run button while you are iterating on a test.
If you want to temporarily avoid any clicks you can add the function call to the ipc.on("last-file-selected",
event handler in electron/js/controller.js
./build-executables.sh
This will create a new folder inside Dataproofer/executables
that contains a Mac OS X, Windows, & Linux.
We can push releases to GitHub manually for now:
git tag -a 'v0.1.1' -m "first release"
git push && git push --tags
The binary (Dataproofer.app) can be uploaded to the releases page for the tag you pushed, and should be zipped up first (Right click and choose "Compress Dataproofer")
A huge thank you to the Vocativ and the Knight Foundation. This project was funded in part by the Knight Foundation's Prototype Fund.
... and the countless journalists who've encouraged us along the way. Thank you!