1. datapackage-go
A Go library for working with Data Packages.
1.1. Install
$ go get -u github.com/frictionlessdata/datapackage-go/...
1.2. Main Features
1.2.1. Loading and validating data package descriptors
A data package is a collection of resources. The datapackage.Package provides various capabilities like loading local or remote data package, saving a data package descriptor and many more.
Consider we have some local csv file and a JSON descriptor in a data
directory:
data/population.csv
city,year,population london,2017,8780000 paris,2017,2240000 rome,2017,2860000
data/datapackage.json
{ "name": "world", "resources": [ { "name": "population", "path": "population.csv", "profile":"tabular-data-resource", "schema": { "fields": [ {"name": "city", "type": "string"}, {"name": "year", "type": "integer"}, {"name": "population", "type": "integer"} ] } } ] }
Let's create a data package based on this data using the datapackage.Package class:
pkg, err := datapackage.Load("data/datapackage.json")
// Check error.
1.2.2. Accessing data package resources
Once the data package is loaded, we could use the datapackage.Resource class to read data resource's contents:
resource := pkg.GetResource("population")
contents, _ := resource.ReadAll()
fmt.Println(contents)
// [[london 2017 8780000] [paris 2017 2240000] [rome 20172860000]]
Or you could cast to Go types, making it easier to perform further processing:
type Population struct {
City string `tableheader:"city"`
Year string `tableheader:"year"`
Population int `tableheader:"population"`
}
var cities []Population
resource.Cast(&cities, csv.LoadHeaders())
fmt.Printf("+v", cities)
// [{City:london Year:2017 Population:8780000} {City:paris Year:2017 Population:2240000} {City:rome Year:2017 Population:2860000}]
Finally, if the data is to big to be loaded at once or if you would like to perform line-by-line processing, you could iterate through the resource contents:
iter, _ := resource.Iter(csv.LoadHeaders())
sch, _ := resource.GetSchema()
for iter.Next() {
var p Population
sch.CastRow(iter.Row(), &cp)
fmt.Printf("%+v\n", p)
}
// {City:london Year:2017 Population:8780000
// {City:paris Year:2017 Population:2240000}
// {City:rome Year:2017 Population:2860000}]
1.2.3. Loading zip bundles
It is very common to store the data in zip bundles containing the descriptor and data files. Those are natively supported by our the datapackage.Load method. For example, lets say we have the following package.zip
bundle:
|- package.zip
|- datapackage.json
|- data.csv
We could load this package by simply:
pkg, err := datapackage.Load("package.zip")
// Check error.
And the library will unzip the package contents to a temporary directory and wire everything up for us.
A complete example can be found here.
1.2.4. Creating a zip bundle with the data package.
You could also easily create a zip file containing the descriptor and all the data resources. Let's say you have a datapackage.Package instance, to create a zip file containing all resources simply:
err := pkg.Zip("package.zip")
// Check error.
This call also download remote resources. A complete example can be found here
1.2.5. CSV dialect support
Basic support for configuring CSV dialect has been added. In particular delimiter
, skipInitialSpace
and header
fields are supported. For instance, lets assume the population file has a different field delimiter:
data/population.csv
city,year,population london;2017;8780000 paris;2017;2240000 rome;2017;2860000
One could easily parse by adding following dialect
property to the world
resource:
"dialect":{
"delimiter":";"
}
A complete example can be found here.
1.2.6. Loading multipart resources
Sometimes you have data scattered across many local or remote files. Datapackage-go offers an easy way you to deal all those file as one big
file. We call it multipart resources. To use this feature, simply list your files in the path
property of the resource. For example, lets
say our population data is now split between north and south hemispheres. To deal with this, we only need change to change the package descriptor:
data/datapackage.json
{ "name": "world", "resources": [ { "name": "population", "path": ["north.csv","south.csv"], "profile":"tabular-data-resource", "schema": { "fields": [ {"name": "city", "type": "string"}, {"name": "year", "type": "integer"}, {"name": "population", "type": "integer"} ] } } ] }
And all the rest of the code would still be working.
A complete example can be found here.
1.2.7. Manipulating data packages programatically
The datapackage-go library also makes it easy to save packages. Let's say you're creating a program that produces data packages and would like to add or remove resource:
descriptor := map[string]interface{}{
"resources": []interface{}{
map[string]interface{}{
"name": "books",
"path": "books.csv",
"format": "csv",
"profile": "tabular-data-resource",
"schema": map[string]interface{}{
"fields": []interface{}{
map[string]interface{}{"name": "author", "type": "string"},
map[string]interface{}{"name": "title", "type": "string"},
map[string]interface{}{"name": "year", "type": "integer"},
},
},
},
},
}
pkg, err := datapackage.New(descriptor, ".", validator.InMemoryLoader())
if err != nil {
panic(err)
}
// Removing resource.
pkg.RemoveResource("books")
// Adding new resource.
pkg.AddResource(map[string]interface{}{
"name": "cities",
"path": "cities.csv",
"format": "csv",
"profile": "tabular-data-resource",
"schema": map[string]interface{}{
"fields": []interface{}{
map[string]interface{}{"name": "city", "type": "string"},
map[string]interface{}{"name": "year", "type": "integer"},
map[string]interface{}{"name": "population", "type": "integer"}
},
},
})
// Printing resource contents.
cities, _ := pkg.GetResource("cities").ReadAll()
fmt.Println(cities)
// [[london 2017 8780000] [paris 2017 2240000] [rome 20172860000]]