1. tableschema-rb
A utility library for working with Table Schema in Ruby.
1.1. Installation
Add this line to your application's Gemfile:
gem 'tableschema'
And then execute:
$ bundle
Or install it yourself as:
$ gem install tableschema
1.1.1. Update from jsontableschema
The library and its corresponding gem was previously called jsontableschema.
Since version 0.3 the library was renamed tableschema and has a gem with the same name.
The gem jsontableschema is no longer maintained. Here are the steps to transition your code to tableschema:
Replace
gem 'jsontableschema'with
gem 'tableschema', '0.3.0'Replace module name
JsonTableSchemawith module nameTableSchema. For example:JsonTableSchema::Table.new(source, schema: schema)with
TableSchema::Table.new(source, schema: schema)
1.2. Usage
1.2.1. Parse a CSV
Validate and cast data from a CSV as described by a schema.
schema = {
    fields: [
        {
            name: 'id',
            title: 'Identifier',
            type: 'integer'
        },
        {
            name: 'title',
            title: 'Title',
            type: 'string'
        }
    ]
}
source = 'https://github.com/frictionlessdata/tableschema-rb/raw/master/spec/fixtures/simple_data.csv'
table = TableSchema::Table.new(source, schema: schema)
# Iterate through rows
table.iter{ |row| print row }
# [1, "foo"]
# [2, "bar"]
# [3, "baz"]
# Read the entire CSV in memory
table.read
#=> [[1,'foo'],[2,'bar'],[3,'baz']]
Both iter and read take the optional parameters:
keyed: boolean, default:false- return the rows as Hashes with headers as keyscast: boolean, defaulttrue- cast values for each rowlimit: integer, defaultnil- stop at this many rows
1.2.2. Infer a schema
If you don't have a schema for a CSV, and want to generate one, you can infer a schema like so:
source = 'https://github.com/frictionlessdata/tableschema-rb/raw/master/spec/fixtures/simple_data.csv' # Can also be a url or array of arrays
table = TableSchema::Table.new(source)
table.infer()
table.schema
#=> {:fields=>[{:name=>"id", :title=>"", :description=>"", :type=>"integer", :format=>"default", :constraints=>{}}, {:name=>"title", :title=>"", :description=>"", :type=>"string", :format=>"default", :constraints=>{}}]}
1.2.3. Build a Schema
You can also build a schema from scratch or modify an existing one:
schema = TableSchema::Schema.new({
  fields: [],
})
# Add a field
schema.add_field({
  name: 'id',
  type: 'string',
  constraints: {
    required: true,
  }
})
# Remove a field
schema.remove_field('id')
add_field will ignore the updates if the updated version of the the schema fails validation.
If you wish to prevent an invalid schema from being created or updated by raising validation errors, you can pass the strict: true argument to the Schema initializer:
schema = TableSchema::Schema.new(schema_hash, strict: true)
There are multiple methods to inspect a schema:
schema_hash = {
  fields: [
    {
      name: 'id',
      type: 'string',
      constraints: {
        required: true,
      },
    },
    {
      name: 'height',
      type: 'number',
    },
    {
      name: 'state',
    },
  ],
  primaryKey: 'id',
  foreignKeys: [
    {
      fields: 'state',
      reference: {
          resource: 'the-resource',
          fields: 'state_id',
      },
    },
  ]
}
schema = TableSchema::Schema.new(schema_hash)
schema.field_names
#=> ["id", "height"]
schema.fields
#=> [{:name=>"id", :type=>"string", :constraints=>{:required=>true}, :format=>"default"}, {:name=>"height", :type=>"number", :format=>"default", :constraints=>{}}]
schema.primary_key
#=> ["id"]
schema.foreign_keys
# => [{:fields=>"state", :reference=>{:resource=>"the-resource", :fields=>"state_id"}}]
schema.get_field('id')
# => {:name=>"id", :type=>"string", :constraints=>{:required=>true}, :format=>"default"}
Cast row
To check if a given set of values complies with the schema, you can use cast_row:
schema.cast_row(['string', '10.0', 'State'])
#=> ['string', 10.0, 'State']
By default the converter will fail on the first error it finds. However, by passing fail_fast: false as the second argument the errors will be collected into an exception.errors attribute for you to review later. For example:
row = [3, 'nan', 'State']
schema.cast_row(row)
#=> TableSchema::InvalidCast: 3 is not a string
begin
  schema.cast_row(row, fail_fast: false)
rescue TableSchema::MultipleInvalid => exception
  exception.errors
end
#=> #<Set: {#<TableSchema::InvalidCast: 3 is not a string>,
            #<TableSchema::InvalidCast: nan is not a number>}>
1.2.4. Validate a schema
To make sure a schema complies with Table Schema spec, we validate each custom schema against the official Table Schema schema:
schema_hash = {
  fields: [
      { name: 'id' },
  ]
}
schema = TableSchema::Schema.new(schema_hash)
schema.validate
#=> true
If the schema is invalid, you can access the errors via the errors attribute
schema_hash = {
  fields: [
    {
      name: 'id',
      title: 'Identifier',
      type: 'integer'
    },
    {
      name: 'title',
      title: 'Title',
      type: 'string'
    }
  ],
  primaryKey: 'identifier'
}
schema = TableSchema::Schema.new(schema_hash)
schema.validate
#=> false
schema.errors
#=> #<Set: {"The TableSchema primaryKey value `identifier` is not found in any of the schema's field names"}>
# Raise error if validation fails
schema.validate!
#=> TableSchema::SchemaException: The TableSchema primaryKey value `identifier` is not found in any of the schema's field names
1.3. Field
Data values can be cast to native Ruby objects with a Field instance. This allows formats and constraints to be defined for the field in the field descriptor:
# Init field
field = TableSchema::Field.new({
  name: 'over_1700',
  type: 'number',
  constraints: {
    minimum: '1700',
  },
})
# Cast a value
field.cast_value('12345')
#=> 12345.0
Casting a value will check the value is of the expected type, is in the correct format, and complies with any constraints imposed in the descriptor.
Value that can't be cast will raise an InvalidCast exception.
Casting a value that doesn't meet the constraints will raise a ConstraintError exception.
field.cast_value('nan')
#=> TableSchema::InvalidCast: nan is not a number
field.cast_value('1200')
#=> TableSchema::ConstraintError: The field `over_1700` must not be less than 1700
1.4. Development
After checking out the repo, run bin/setup to install dependencies. Then, run rake spec to run the tests. You can also run bin/console for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.
1.5. Contributing
Bug reports and pull requests are welcome on GitHub at https://github.com/frictionlessdata/tableschema-rb. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the Contributor Covenant code of conduct.
1.6. License
The gem is available as open source under the terms of the MIT License.