Technology Choices

Why Rust?

ysv is built to process tens of GBs of data fairly quickly - to prepare data for analysis, import into databases or cloud storage systems. Performance matters, and Rust, with its transportability and safety, seems to be a perfect fit for the job.

UNIX spirit

Being a command line utility, ysv can be trivially integrated with other tools. For example, I am using it in scenarios like this:

curl -s https://remote.host/data.csv.gz \
    | gunzip \
    | ysv transform.ysv \
    | pgloader load.pgloader

Here, we download data from a remote location, unarchive the data, transform it, and import into a PostgreSQL database. The dataset may be huge, but we do not even store it on local disk: everything happens on-the-fly.

Configuration language

ysv configuration format aims to feel as a very specialized, but purely functional and strongly typed programming language. Every transformation except I/O is a monad. You may imagine an invisible .map() put between each two consecutive transformations.

xsv2schema

xsv2schema is a tool written in Python and capable of generating stubs of ysv config files based on CSV data. Can save from tedious work writing confgs, and provides an example of how easy it is to generate ysv confgurations programmatically.

Future

Plans:

Last updated