Many governments, institutions, and organizations are now moving towards open data, collecting and publishing large quantities of information in an effort to increase transparency and use data to inform policy. However, open data is not enough to improve lives, as the raw data has to be presented in meaningful and accessible ways to both citizens and policymakers.
Data needs to be organized, processed, and presented in human-readable formats so that citizens, analysts, and policymakers can effectively use the information. However, many organizations lack the resources and technical capability to use commercial data visualization services or develop platforms of their own. That often means that the organizations in the best position to collect data and work closely with the communities the data comes from lack the ability to present and share this information in effective ways.
DKAN is a comprehensive data management warehouse which can publish open data and create compelling map and graph-based visualizations. DKAN was created by NuCivic. DKAN is an open source, open data platform with a full suite of data and content management cataloging, publishing and visualization features that empower organizations to easily distribute public data in easy-to-consume formats. It provides features that make it easy to upload, parse, store, publish, catalog and visualize a vast array of data, including spreadsheets, text documents and maps.
DKAN is developed in the Drupal content management platform, which is powered by the LAMP stack (Linux, Apache, MySQL and PHP). However, it is flexible enough to run on other operating systems and to use other web servers and database backends if desired.
- Platform: Drupal
- Primary language: PHP
- Database: MySQL, MariaDB, PostgreSQL, SQL Server, or Oracle
- Web server: Apache or Nginx
- Operating system: Linux, Windows, OSX, or Unix
Purpose of DKAN#
Spreadsheets of raw numbers are difficult for most of us to easily understand. With DKAN, organizations can take large amounts of data and instantly organize, display, analyze, and visualize this information. This data-driven storytelling can help policymakers quickly understand the data to make better decisions, and each form of visualization can be instantly created as needed. Choropleth maps instantly show regional trends and variations, and a large dataset can instantly be organized into multiple charts and graphs comparing changes over, time, region, funding, or any number of variables.
While other programs can easily be used to create individual graphs or sort lists of data, DKAN provides a comprehensive data warehousing, browsing, and visualization solution for large sets of data tagged with multiple variables, with highly customizable options based on the same set of data.
DKAN is especially useful for rapidly prototyping multiple visualizations, aggregating data, and displaying changes over time or by geographic region. It has been particularly successful in releasing data from elections, censuses, health monitoring, and economic analysis.
Ready to use out-of-the-box, DKAN boasts powerful data warehousing, publishing, and visualization capabilities. With this tool, users can quickly publish and display open data, creating powerful data narratives with charts, graphs, and maps. The content management system (CMS) can be integrated with blogs and DKAN is compatible with major open data standards, such as the Project Open Data Metadata Schema, including the White House’s Project Open Data and data.gov. Since DKAN is open source, users can download the source code from our Github or Drupal for free and use the tool used by governments pursuing open data and used by NDI in multiple elections for publishing and visualizing data.
Adding Data to DKAN#
DKAN’s data publishing model is based on the concept of datasets and resources. A dataset is a collection of one or more resources; a resource is the actual “data” being published, such as a MS-DOS Comma Separated CSV or a GeoJSON data file.
A resource is an uploaded or externally-referenced file or API within a dataset. Resources are usually created outside of DKAN. Only a clean, aggregated version of your data should be saved as a “MS-DOS Comma Separated csv file”, and then uploaded into DKAN.
Resources can be either an uploaded file, the URL to a file that resides outside of the DKAN website, or the endpoint of an API that can be used to retrieve data programmatically from an outside web service. By default, DKAN allows uploads of any file types desired.
In our example, we’ll be adding a dataset with Wisconsin polling places to a DKAN site. The data may look familiar; it's one of the sample datasets provided with DKAN upon installation.
Step 1: Create the Dataset#
The Dataset is simply the container or folder for the actual data resource files and contains basic higher level information that applies across all the data, such as title, description, category tags, and license. Datasets are not required, but they are primarily a great way to organize your resource files. You can either immediately upload resources within a Dataset, or tag a Dataset to a resource at a later date.
By default, only authenticated (“logged-in”) users can add new Datasets and Resources to a DKAN website. Once logged in, we can use the "Add Dataset" link in the main navigation bar. Depending on your user permissions, you may have access to the administration menu; in that case, you may also navigate to Content >> Add Content >> Dataset link to access the “Create Dataset” form.
From this screen, fill in the necessary fields such as the Title and description. Only the Title is required to save the Dataset.
- Upload a file -- this option allows publishers to upload data files to the DKAN site. As in the “link to a file” option, the data within the file will be imported into your DKAN site’s Datastore for preview and analysis by your users. This is typically the option to use if you are uploading a MS DOS Comma Separated File from your computer.
- Link to a file -- this option allows publishers to create a link to a data file published on another website. Although the file itself will remain on the other site, the data within the file can be imported into your DKAN site’s Datastore for preview and analysis by your users. See the DKAN Datastore for more information.
- Link to an API -- some data resources aren’t standalone files but queryable online databases; the interface to these databases is known as an API. Adding links to these types of online database interfaces to your DKAN data catalog can be very useful for developers interested in working with your data.
To continue with our Wisconsin Polling Places example, we’ll add one resource file to the Dataset we created in Step 1. Our resource file is a CSV, otherwise known as a comma-separated values file; a popular file format for exchanging tabular data. Let’s explore the example resource shown here and the various fields within:
- Resource / Choose File - upload a file from your local hard drive.
- Resource / Recline Views - DKAN’s “Data Preview” feature allows visitors to preview published data in three views:
- Map - data with latitude and longitude coordinates can be previewed in a map interface
- Graph - tabular (spreadsheet) data can be graphed by users, letting them create their own meaningful visualizations (Please note this is a method for the data intake, not for rendering the graphs themselves)
- Grid - by default, tabular data is presented in a basic spreadsheet view, with filter, sort, and search capabilities
- Title - this is the title of the individual data file, not the parent dataset container.
- Description - a rich-text editor field is provided so publishers can offer detailed and useful descriptions
- Format - entering the file format here will allow users the ability to search for data by specific format
- Dataset - this is the parent dataset container; this field should already be populated if you’re adding a Resource subsequent to adding a Dataset
At the bottom of the Add Resource page, we can choose:
- Save - Save progress on this resource and immediately return to it for further editing
- Save and add another - Save this resource and add another resource to the same dataset
- Next: Additional Info - Save this resource and enter optional metadata
In our example, we’re only adding a single resource, so we’ll click “Next: Additional Info” to move onto Step 3. If we had more than one resource to add to this dataset, we would choose the “Save and add another” option.
Step 3: Adding Metadata to a Dataset#
Organizations may be interested in providing valuable information about their dataset to both human visitors to the website and machines discovering the dataset through one of DKAN's public APIs. All the below fields are optional, but provide important context on data type, kind and function. Adding additional metadata to the dataset serves to further clarify how the data can be used by others.
Let's take a closer look at some of the metadata fields available on this form:
- Author - The dataset's author, in plain text.
- Spatial / Geographical Coverage Area - Lets us define what region the data applies to. In this case, the US State of Wisconsin. You can use the map widget to draw an outline around the state borders, or, click the "Add data manually" button if you already have a GeoJSON string you can paste in.
- Spatial / Geographical Coverage Location - The region the data applies to, written in plain text. This can be used instead of or in addition to the Coverage Area field.
- Frequency - How often is this dataset updated? We might expect our list of polling places to be updated every year, so we could select "annually." However, often we don't expect the data to be updated (even in this case, perhaps we plan to post the next version of the data as a separate dataset), in which case we can leave this blank.
- Temporal Coverage - Like Geographic Coverage, this field lets us give some context to the data, but now for the relevant time period. Here we could enter the year or years for which our polling places data is accurate.
- Granularity -- This is a somewhat open-ended metadata field that lets you describe the granularity or accuracy of your data. For instance: "Year".
- Data Dictionary - Another open-ended field, this is a space for almost any kind of explanation for understanding the terminology/units/column names/etc. in our dataset. In most cases, this will be a simple URL to a Data Dictionary resource elsewhere on the web.
- Additional Info - Lets us arbitrarily define other metadata fields. See Additional Info field for more information.
- Resources - This field is a reference to the resources you have already added. You should generally leave this field alone and use the workflows outlined here and in Updating Datasets in DKAN to add, edit and remove resources from your Dataset.
After you click "Save", the metadata we enter will appear on the page for this Dataset:
For numeric data that’s best rendered comparatively, you’ll want to make charts with your resources. You can make bar charts, pie charts, scatterplots, or line graphs.
- Navigate to the dataset you want to base your chart on, then
- Click the ‘Explore Data’ button
- Right-click (or on Macs, control-click) the download button to copy the URL of the resource file. Saving this link will allow you to directly revisit your resource in the future.
- Now use the administration menu at the top to navigate to Structure » Entity types » Visualization » Chart » Add Chart
- Enter values for the title, description, categories and tags fields.
- At the bottom of the form, paste the resource link you just copied into the ‘Source’ field.
- Now, click the ‘Next’ button.
- If the URL was loaded properly you will have two fields to fill under the title 'Define Variables'. The first one, 'Series' - stands for the Y axis, and the second field, X-Field, stands for the X axis. On these fields you have to choose the columns that you are going to display. Only the Series field can contain multiple values. If the column names are not displayed properly, check again that your source URL was correct. Keep the radio buttons checked in 'auto'.
- After making sure that everything is correct, click the ‘Next’ button.
- Now you can select the type of chart you want to create. Click on the image of the chart type you would like to use.
- The charts on this screen are generic images and not based on the data you loaded. To see the actual chart, click the ‘Next’ button.
- If everything went ok, you should see your chart displayed. The data might be slightly misplaced so on the right column, you can edit the X Format for the labels (number, date, etc) , Label Rotation, Color of the lines / columns / etc, X and Y labels for the axis themselves and margins to move not only the labels but the chart as well.
- If you would like to see what this data looks like in another type of chart or graph, click “Back” on the bottom on the page and repeat these steps with another chart or graph selection.
- After editing and customizing the chart to your liking, click the ‘Finish’ button.
- Now you have created your chart. On the chart’s page, there will be an “Embed” button. Click on it to reveal the HTML Embed code which you can add to any website to embed a live, dynamic chart which will update if you change the chart on your DKAN site. You can also set the height and width of the embedded chart by typing it into the Height and Width boxes above the Embed code.
A choropleth map is a thematic map in which areas are shaded or patterned in proportion to the measurement of the statistical variable being displayed on the map, such as population density or per-capita income. The choropleth map provides an easy way to visualize how a measurement varies across a geographic area or it shows the level of variability within a region.
Choropleth Maps can effectively be used to report area values at virtually any scale, from global to local – and the data can be thought about in many different ways at many different levels of analysis, from general overall patterns to the detection of details. They are especially helpful for finding intriguing hot spots.
- Look for Content -> Add Content -> Resource in the admin menu and click on it.
- Upload a csv file for the resource.
- Fill the required fields and save the resource
- Look for Structure -> Entity Types -> Geo File -> geojson -> Add geojson in the admin menu and click on it.
- Set Title
- Upload a geojson file
- Fill name attribute with the column name in the data (csv resource) that will match the name property for the features in the geojson file.
- Click Save.
- Look for Structure -> Entity Types -> Visualization -> Choropleth Visualization -> Add Choropleth Visualization in the admin menu and click on it.
- Fill Title
- Select the geojson file we created for the geojson field.
- Select the resource file we created for the resource field.
- Select the colors you like to use for the choropleth map.
- Fill data column with the column or columns in the csv of your data that you want to display in the map. Separate multiple columns with a comma. The columns that you choose will appear as radio buttons on the side of your visualization, which you can then toggle between to see the effect of different data. If you leave this field blank, you'll get a list of radio buttons for all of the columns in your data sheet. The select of certain columns in your data can be helpful when, for instance, trying to show change of data over a certain time period - you could for example choose the April, May, June columns, but leave out July, August, September.
- Fill the data breakpoints with comma separated numbers. If you leave this field blank, breakpoints will be calculated for you based on the data. You will use breakpoints to determine what data values will be captured by different colors on the visualization. For instance, if you use ‘25, 50, 75, 100’ as your data breakpoints, your visualization will display 4 different shades - one for those values between 0-25, a slightly darker shade for values 25-50, an even darker shade for values 50-75, and the darkest shade for values 75-100. Remember to choose your breakpoints wisely based upon the data that you want to display!
- Click Save and enjoy!
After you finish creating the visualization, click on the blue ‘Embed’ button to get an embed code for sharing the file on other platforms. You can alter the height and width of the file to be embedded by entering the desired values in the corresponding text boxes. Once you’ve copied the code, you can now implant your visualization anywhere with a field for embedding an HTML element. Even on other sites, the graph will automatically update to any change made to the source data or settings on DKAN.
DKAN not only renders data visualizations, it can serve as a standalone data storytelling platform as well. The first function available for telling data stories is creating a “question,” which allow users to combine visualizations with companion text and images. You can add a question by going to Add content > Question.
Fill in the fields as desired, attach files, and categorize the question as fits the content. Fields marked with a red asterisk (*) are required to create the question. Make sure the entity URL matches the one auto-generated for the question. Previously rendered visualizations can be added to the question by pasting the embed code into the corresponding field. Click ‘Save’ at the bottom and your question is ready for viewing.
Telling stories based on data is a primary goal of DKAN. Visualizations can be used to create a clear understanding of a complex situation. Furthermore, elements of storytelling can be used to illustrate what the findings actually mean.
The best method for leveraging the narrative in your data with DKAN is creating a “data story”. Data stories consist of multiple elements and pieces of content, allowing you to build unique and engaging bulletins showcasing your data. You can add a data story by going to Add content > DKAN Data Story.
Title it and add any images, body text, or tags, then select the layout that best fits how you want to represent your data and content. Click ‘Save’ and you’ll be greeted with a screen prompting you to add and define your content. The functional icons do the following:
- Plus icons allow you to add content
- Gear icons permit you to modify formatting options
- Paintbrush icons allow you to change the style of Content’s pane
- Arrow icons enable you to change the position of the content
- Trash can icons allow you to delete the content
You can add all kinds of content, new or existing, and organize it as you see fit. When you’ve finished building and organizing content, click the save button at the bottom and your data story is ready.
To edit a page, click the “Customize this Page” button. This will allow you to add more content, delete certain content, or reorder. To change or reorder content, click on the “Settings Gear”.
To add content, click the “Customize this Page” button, then click on the “Plus” Icon within the desired container of the layout. This window allows you to add text, an external link, file, image, and embeds from Google Maps or YouTube.
Any existing chart, map, graph or question already created in DKAN can be generated by clicking “Existing Nodes”. Clicking on “Existing Nodes” will allow you to search for your charts and graphs that have been uploaded to DKAN.
Thank you very much for reading the DKAN Content Manager Manual. If you have any questions about DKAN or administering your specific instance of the DKAN DemTool, please contact NDItech at [email protected].
We’d like to acknowledge the NuCivic team, led by Andrew Hoppin, which has done amazing work creating open-source tools to make data available to the world; it’s been a pleasure improving DKAN together over the past two years. Gemima Barlow and the NDI Nigeria team initially supported the development of color-shaded maps, teaching us the meaning of the word “choropleth” in the process, and NDI’s Gender, Women and Democracy team for significant user identified and funded important usability improvements.
This content is available under a Creative Commons Attribution-ShareAlike 4.0 International Public License. You are free to: Share — copy and redistribute the material in any medium or format; Adapt — remix, transform, and build upon the material for any purpose, even commercially. The licensor cannot revoke these freedoms as long as you follow the license terms. The license terms include: Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use; ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original; No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.