hledger - Joyful Systems

# Laboratory Transient notes, drafts, dreams and schemes for [hledger](https://hledger.org) I have many hledger "wish lists", but this one is public and pleasant to edit. Here I can write, update and share with less friction than the main hledger/PTA sites. Note these are drafts and mockups, and may describe things that don't exist yet. ### CSV docs New content for https://hledger.org/hledger.html#csv: #### Character Separated Values CSV usually means [Comma Separated Values](https://en.wikipedia.org/wiki/Comma-separated_values). It's a lowest common denominator data format, often provided by financial institutions. In countries which use comma as decimal mark, SSV (Semicolon Separated Values) may be more common. A third variant is TSV (Tab Separated Values). Some advantages: the tab character appears less often in data values, requiring less special quoting, and TSV data can be a bit easier for humans to read without special tools (depending on data and tab widths). hledger can read data separated by any of these delimiters, or indeed any other delimiter character (such as `|`). In hledger docs and discussions, when we say "CSV", we usually mean all of the above (think "Character Separated Values"). #### Ways to read or import CSV with hledger Here are a few different ways you can process CSV with hledger. If in doubt, choose the last one ([Import from a rules file](#import+from+a+rules+file)) as it provides the most convenience. ##### Preconvert the CSV hledger has a built-in system of "CSV rules", which can convert most CSV data. But you can always convert the CSV yourself using some other tool, say a custom awk or python script, to one of hledger's native formats (`journal`, `timeclock` or `timedot`). These formats are simple enough that they are pretty easy to generate. Should you do that ? Well, some things are easier to do in a custom script, while other things are easier to do in hledger's CSV rules. CSV rules are more aware of hledger's syntax and data model, and are less likely to break. (Making a conversion script is easy, but making a reliable, error free script is harder.) So it's usually worth using hledger's conversion rules if possible. ##### Preprocess the CSV Sometimes you just need to enrich the CSV a little bit before passing it to hledger's CSV rules. Eg adding calculated values as a new field. ##### Read a CSV file To use hledger's built in CSV reader, specify an input file (`-f FILE`) which has a `.csv`, `.ssv` or `.tsv` extension. In this mode, hledger converts the CSV file (or files, if you use multiple -f options) to native data on the fly, then runs any of the usual report commands on the converted data. The conversion is done in memory and not saved. To help convert the CSV data, hledger looks for a nearby "CSV rules" file. This should be named like the CSV file with an extra `.rules` extension. Or you can specify it with `--rules=FILE`. Eg: `hledger -f foo.csv balancesheet`. This will look for a `foo.csv.rules` file in the same directory. If your data's delimiter is not comma, semicolon, or tab, then use the `.csv` extension, and add an appropriate `separator` rule in the rules file. Eg `separator |`. ##### Read CSV from stdin hledger can read from standard input instead of a file, as always, by using the special file name `-`. If reading CSV data this way, do also - add a prefix to select the file format: `-f csv:-` (because there's no filename extension) - specify a rules file with `--rules FILE` (otherwise `-.rules` would be expected, which is too generic) Eg: `paypalcsv | hledger -f csv:- --rules paypal.rules print` ##### Read a rules file If the rules file contains a `source CSVFILE` rule, then instead of specifying the CSV file and letting hledger find the rules file, you can specify the rules file and let hledger find the CSV file. This has some advantages which we'll describe in a moment. Eg: `hledger -f foo.rules print` ##### Import from a CSV file The above methods read the CSV file in place. The conversion is redone each time, and the data is not added to your default journal file. And reports will see all of the CSV records each time. If you use the `import` command instead, - The converted data is added to your default journal file, as hledger journal entries. After this, keeping the CSV file is unnecessary and optional. - The latest CSV record date is remembered, and on subsequent imports from this CSV file path, only newer records will be imported. This is helpful when new downloads of a CSV file contain records you've already imported. Eg: `hledger import foo.csv` ##### Import CSV from stdin You could import CSV (or any other hledger format) from standard input. But it's not usually done, because the "latest file" used the save the latest date is then poorly named (`.latest.-`). Eg: `paypalcsv | hledger import csv:- --rules-file paypal.rules` ##### Import from a rules file You can "import from the rules file", specifying that file on the command line instead of the CSV file, just as when reading. This provides several advantages: - You can remember and type a readable file name of your choice, rather than obscure downloaded file names. - If the `source` rule's CSVFILE does not exist, it is treated as an empty file, not an error. - In the `source` rule you can write just a file name, and your web browser's download directory is assumed (or at least, `~/Downloads` is assumed) - In the `source` rule you can write a glob pattern, to accommodate file name variations in each download. - If the glob pattern matches multiple files, the newest will be used, which helps avoid problems with multiple downloaded versions. - (TODO: The `import` command will archive CSV downloads with clean names and timestamps, for future troubleshooting/reconverting.) Eg: `hledger import foo.rules` #### `archive` CSV rule "Importing from the rules file" provides another benefit: optional automatic archiving of downloaded CSV files. It can be useful to save these, at least for a while, for: troubleshooting data problems or CSV rule problems; or regenerating entries after improving rules; or regenerating entire journals from CSV if you care to. Here's how it works: if a `data` directory (or symlinked directory) exists, next to the rules file, new behaviour is enabled: - When a `source` glob pattern matches multiple CSV files, the oldest will be used (contrary to the above). This ensures multiple downloads will be imported in order (by running `import` more than once), so that no records are missed. - After a CSV file is successfully imported, it is removed from its original directory (eg `~/Downloads`), and saved in the `data` directory. - The archived file name will be: the rules file's base name, plus the CSV file's date (and time if needed to disambiguate), plus the CSV file's extension. In other words, every download is preserved in `data/`, with file names similar to the rules file's. Eg: if `foo.rules` contains `source AcctData*.tsv`, then `hledger import foo.rules` will import the oldest `~/Downloads/AcctData*.tsv` file, if any, then move it to `data/foo.YYYY-MM-DD.tsv`, where YYYY-MM-DD is the modification date of the imported file. ### CSV rules setup `hledger setup bank.csv`\ if a rules file already exists, ask for confirmation\ try all embedded csv rules, old rules if any, and new rules from CSV analyser\ list all successful rules, allowing each to be previewed and selected\ once selected, ask whether to save rules as secondary helper or primary input file\ ask whether to archive, how many copies, and where\ write rules file, renaming old one if any ### CSV analyser divide CSV into runs of equal field count\ identify the first run with a consistent date-like field\ following an optional all text first row of field names\ if field names are found, declare them in a fields list, transformed\ detect the date format (or fail)\ detect the first consistent amount-like field if any\ use the first consistent texty field as description\ if there's another texty field of 1-2 words, use it as account2 leaf name\ use the csv file name as account1 leaf name ### Lot tracking [[PTA lot tests]] ### Report templates goals: - allow english report titles to be localised - allow a hledger report to be integrated with other HTML content easily - make report data easily accessible to javascript - allow charts to be added and customised easily initially:\ `--template=TMPL`, affecting bs HTML output (then is)\ eg bs.tmpl.html, a mustache template\ a small number of coarse interpolatables are provided; these and the templates can be report specific - {{title}} the default report title - {{period}} the period part of the title.. etc - {{report}} the report's main html output - {{data}} the report's data as js, suitable for charting libs like Chart.js, d3.js, highcharts, plotly later: - interact well with -o, -O, stdout - more reports - affect other formats: txt, csv, ...