1. Working with Data Package
1.1. Local files
Consider we have some local csv files in a data
directory. Let's create a data package based on this data using a Package
class:
data/cities.csv
city,location
london,"51.50,-0.11"
paris,"48.85,2.30"
rome,"41.89,12.51"
data/population.csv
city,year,population
london,2017,8780000
paris,2017,2240000
rome,2017,2860000
1.2. Inferring metadata
First we create a blank data package:
package = Package()
Now we're ready to infer a data package descriptor based on data files we have. Because we have two csv files we use glob pattern **/*.csv
:
package.infer('**/*.csv')
package.descriptor
#{ profile: 'tabular-data-package',
# resources:
# [ { path: 'data/cities.csv',
# profile: 'tabular-data-resource',
# encoding: 'utf-8',
# name: 'cities',
# format: 'csv',
# mediatype: 'text/csv',
# schema: [Object] },
# { path: 'data/population.csv',
# profile: 'tabular-data-resource',
# encoding: 'utf-8',
# name: 'population',
# format: 'csv',
# mediatype: 'text/csv',
# schema: [Object] } ] }
An infer
method has found all our files and inspected it to extract useful metadata like profile, encoding, format, Table Schema etc. Let's tweak it a little bit:
package.descriptor['resources'][1]['schema']['fields'][1]['type'] = 'year'
package.commit()
package.valid # true
Because our resources are tabular we could read it as a tabular data:
package.get_resource('population').read(keyed=True)
#[ { city: 'london', year: 2017, population: 8780000 },
# { city: 'paris', year: 2017, population: 2240000 },
# { city: 'rome', year: 2017, population: 2860000 } ]
1.3. Saving data package
Let's save our descriptor on the disk as a zip-file:
package.save('datapackage.zip')
To continue the work with the data package we just load it again but this time using local datapackage.zip
:
package = Package('datapackage.zip')
# Continue the work