1. tableschema-java
A Java library for working with Table Schema.
1.1. Usage
1.1.1. Parse a CSV without a Schema
Cast data from a CSV without a schema:
URL url = new URL("https://raw.githubusercontent.com/frictionlessdata/tableschema-java/master/src/test/resources/fixtures/simple_data.csv");
Table table = new Table(url);
// Iterate through rows
TableIterator<Object[]> iter = table.iterator();
Object[] row = iter.next();
// [1, foo]
// [2, bar]
// [3, baz]
// Read the entire CSV and output it as a List:
List<String[]> allData = table.read();
1.1.2. Write a Table Into a File
You can write a Table
into a CSV file:
URL url = new URL("https://raw.githubusercontent.com/frictionlessdata/tableschema-java/master/src/test/resources/fixtures/simple_data.csv");
Table table = new Table(url);
1.1.3. Build a Schema
You can build a Schema
instance from scratch or modify an existing one:
Schema schema = new Schema();
Field nameField = new Field("name", Field.FIELD_TYPE_STRING);
Field coordinatesField = new Field("coordinates", Field.FIELD_TYPE_GEOPOINT);
// {"fields":[{"name":"name","format":"default","description":"","type":"string","title":"","constraints":{}},{"name":"coordinates","format":"default","description":"","type":"geopoint","title":"","constraints":{}}]}
You can also build a Schema
instance with JSONObject
instances instead of Field
Schema schema = new Schema(); // By default strict=false validation
JSONObject nameFieldJsonObject = new JSONObject();
nameFieldJsonObject.put("name", "name");
nameFieldJsonObject.put("type", Field.FIELD_TYPE_STRING);
// Because strict=false, an invalid Field definition will be included.
// The error will be logged/tracked in the error list schema.getErrors().
JSONObject invalidFieldJsonObject = new JSONObject();
invalidFieldJsonObject.put("name", "id");
invalidFieldJsonObject.put("type", Field.FIELD_TYPE_INTEGER);
invalidFieldJsonObject.put("format", "invalid");
JSONObject coordinatesFieldJsonObject = new JSONObject();
coordinatesFieldJsonObject.put("name", "coordinates");
coordinatesFieldJsonObject.put("type", Field.FIELD_TYPE_GEOPOINT);
coordinatesFieldJsonObject.put("format", Field.FIELD_FORMAT_ARRAY);
// {"fields":[{"name":"name","format":"default","description":"","type":"string","title":"","constraints":{}},{"name":"id","format":"invalid","description":"","type":"integer","title":"","constraints":{}},{"name":"coordinates","format":"array","description":"","type":"geopoint","title":"","constraints":{}}]}
When using the addField
method, the schema undergoes validation after every field addition.
If adding a field causes the schema to fail validation, then the field is automatically removed.
Alternatively, you might want to build your Schema
by loading the schema definition from a JSON file:
String schemaFilePath = "/path/to/schema/file/shema.json";
Schema schema = new Schema(schemaFilePath, true); // enforce validation with strict=true.
1.1.4. Infer a Schema
If you don't have a schema for a CSV and don't want to manually define one then you can generate it:
URL url = new URL("https://raw.githubusercontent.com/frictionlessdata/tableschema-java/master/src/test/resources/fixtures/simple_data.csv");
Table table = new Table(url);
Schema schema = table.inferSchema();
// {"fields":[{"name":"id","format":"","description":"","title":"","type":"integer","constraints":{}},{"name":"title","format":"","description":"","title":"","type":"string","constraints":{}}]}
The type inferral algorithm tries to cast to available types and each successful type casting increments a popularity score for the successful type cast in question. At the end, the best score so far is returned. The inferral algorithm traverses all of the table's rows and attempts to cast every single value of the table. When dealing with large tables, you might want to limit the number of rows that the inferral algorithm processes:
// Only process the first 25 rows for type inferral.
Schema schema = table.inferSchema(25);
If List<Object[]> data
and String[] headers
are available, the schema can also be inferred from the a Schema object:
JSONObject inferredSchema = schema.infer(data, headers);
Row limit can also be set:
JSONObject inferredSchema = schema.infer(data, headers, 25);
Using an instance of Table or Scheme to infer a schema invokes the same method from the TypeInferred Singleton:
TypeInferrer.getInstance().infer(data, headers, 25);
1.1.5. Write a Schema Into a File:
You can write a Schema
into a JSON file:
Schema schema = new Schema();
Field nameField = new Field("name", Field.FIELD_TYPE_STRING);
Field coordinatesField = new Field("coordinates", Field.FIELD_TYPE_GEOPOINT);
1.1.6. Parse a CSV with a Schema
If you have a schema, you can input it as parameter when creating the Table
instance so that the data from the CSV will be cast into the field types defined in the schema:
// Let's start by defining and building the schema of a table that contains data on employees:
Schema schema = new Schema();
Field idField = new Field("id", Field.FIELD_TYPE_INTEGER);
Field nameField = new Field("name", Field.FIELD_TYPE_STRING);
Field dobField = new Field("dateOfBirth", Field.FIELD_TYPE_DATE);
Field isAdminField = new Field("isAdmin", Field.FIELD_TYPE_BOOLEAN);
Field addressCoordinatesField = new Field("addressCoordinates", Field.FIELD_TYPE_GEOPOINT, Field.FIELD_FORMAT_OBJECT);
Field contractLengthField = new Field("contractLength", Field.FIELD_TYPE_DURATION);
Field infoField = new Field("info", Field.FIELD_TYPE_OBJECT);
// Load the data from URL with the schema.
URL url = new URL("https://raw.githubusercontent.com/frictionlessdata/tableschema-java/master/src/test/resources/fixtures/employee_data.csv");
Table table = new Table(url, schema);
TableIterator<Object[]> iter = table.iterator();
// The fetched array will contain row values that have been cast into their
// appropriate types as per field definitions in the schema.
Object[] row = iter.next();
int id = (int)row[0];
String name = (String)row[1];
DateTime dob = (DateTime)row[2];
boolean isAdmin = (boolean)row[3];
int[] addressCoordinates = (int[])row[4];
Duration contractLength = (Duration)row[5];
JSONObject info = (JSONObject)row[6];
1.1.7. Validate a Schema
To make sure a schema complies with Table Schema specifications, we can validate each custom schema against the official Table Schema schema:
JSONObject schemaJsonObj = new JSONObject();
Field nameField = new Field("id", Field.FIELD_TYPE_INTEGER);
schemaJsonObj.put("fields", new JSONArray());
Schema schema = new Schema(schemaJsonObj);
boolean isValid = schema.validate();
// true
Field invalidField = new Field("coordinates", "invalid");
isValid = schema.validate();
// false
1.2. Setting Primary Key
1.2.1. Single Key
Schema schema = new Schema();
Field idField = new Field("id", Field.FIELD_TYPE_INTEGER);
Field nameField = new Field("name", Field.FIELD_TYPE_STRING);
1.2.2. Composite Key
Schema schema = new Schema();
Field idField = new Field("id", Field.FIELD_TYPE_INTEGER);
Field nameField = new Field("name", Field.FIELD_TYPE_STRING);
Field surnameField = new Field("surname", Field.FIELD_TYPE_STRING);
schema.setPrimaryKey(new String[]{"name", "surname"});
String[] compositeKey = schema.getPrimaryKey();
1.3. Casting
1.3.1. Row Casting
To check if a given set of values complies with the schema, you can use castRow
Schema schema = new Schema();
// A String field.
Field stringField = new Field("stringField", Field.FIELD_TYPE_STRING);
// An Integer field.
Field integerField = new Field("integerField", Field.FIELD_TYPE_INTEGER);
// A Boolean field.
Field booleanField = new Field("booleanField", Field.FIELD_TYPE_BOOLEAN);
// Define a given set of values:
String[] row = new String[]{"John Doe", "25", "T"}
// Cast the row's values into their schema defined types:
Object[] castRow = schema.castRow(row);
If a value in the given set of values cannot be cast to its expected type as defined by the schema, then an InvalidCastException
is thrown.
1.3.2. Field Casting
Data values can be cast to native Java objects with a Field instance. This allows formats and constraints to be defined for the field in the field descriptor:
Field intField = new Field("id", Field.FIELD_TYPE_INTEGER);
int intVal = intField.castValue("242");
// 242
Field datetimeField = new Field("date", Field.FIELD_TYPE_DATETIME);
DateTime datetimeVal = datetimeField.castValue("2008-08-30T01:45:36.123Z");
// 2008
Field geopointField = new Field("coordinates", Field.FIELD_TYPE_GEOPOINT, Field.FIELD_FORMAT_ARRAY);
int[] geopointVal = geopointField.castValue("[12,21]");
System.out.print("lon: " + geopointVal[0] + ", lat: " + geopointVal[1]);
// lon: 12, lat: 21
Casting a value will check the value is of the expected type, is in the correct format, and complies with any constraints imposed in the descriptor.
Value that can't be cast will raise an InvalidCastException
By default, casting a value that does not meet the constraints will raise a ConstraintsException
Constraints can be ignored with by setting a boolean flag to false:
// Define constraint limiting String length between 30 and 40 characters:
Map<String, Object> constraints = new HashMap();
constraints.put(Field.CONSTRAINT_KEY_MIN_LENGTH, 30);
constraints.put(Field.CONSTRAINT_KEY_MAX_LENGTH, 40);
// Cast a field and cast a value that violates the above constraint.
// Disable constrain enforcement by setting the enforceConstraints boolean flag to false.
Field field = new Field("name", Field.FIELD_TYPE_STRING, null, null, null, constraints);
field.castValue("This string length is greater than 45 characters.", false); // Setting false here ignores constraints during cast.
// ConstraintsException will not be thrown despite casting a value that does not meet the constraints.
You can call the checkConstraintViolations
method to find out which constraints are being validated.
The method returns a map of violated constraints:
Map<String, Object> constraints = new HashMap();
constraints.put(Field.CONSTRAINT_KEY_MINIMUM, 5);
constraints.put(Field.CONSTRAINT_KEY_MAXIMUM, 15);
Field field = new Field("name", Field.FIELD_TYPE_INTEGER, null, null, null, constraints);
int constraintViolatingValue = 16;
Map<String, Object> violatedConstraints = field.checkConstraintViolations(constraintViolatingValue);
// {maximum=15}
1.4. Infer Type
The Field
class' castValue
used the TypeInferrer
singleton to cast the given value into the desired type.
For instance, you can use the TypeInferrer
singleton to cast a String representation of a number into a float like so:
Map<String, Object> options = new HashMap();
options.put("bareNumber", false);
options.put("groupChar", " ");
options.put("decimalChar", ",");
float num = (float)TypeInferrer.getInstance().castNumber(Field.FIELD_FORMAT_DEFAULT, "1 564,123 EUR", options);
1.5. Contributing
Found a problem and would like to fix it? Have that great idea and would love to see it in the repository?
Please open an issue before you start working.
It could save a lot of time for everyone and we are super happy to answer questions and help you along the way. Furthermore, feel free to join frictionlessdata Gitter chat room and ask questions.
This project follows the Open Knowledge International coding standards.
Get started:
# install jabba and maven2
$ cd tableschema-java
$ jabba install 1.8
$ jabba use 1.8
$ mvn install -DskipTests=true -Dmaven.javadoc.skip=true -B -V
$ mvn test -B
Make sure all tests pass.