ecStat

A statistical and data mining tool for Apache ECharts. You can use it to analyze data and then visualize the results with ECharts, or just use it to process data.
It works both in node.js and in the browser.
Read this in other languages: English, 简体中文.

Installing

If you use npm, you can install it with:

npm install echarts-stat

Otherwise, download this tool from dist directory:

<script src='./dist/ecStat.js'></script>
<script>

var result = ecStat.clustering.hierarchicalKMeans(data, clusterNumber, false);

</script>

If using bundler (like webpack, rollup, etc.), for example:

npm install echarts-stat
npm install echarts

import * as echarts from 'echarts';
import {transform} from 'echarts-stat';

echarts.registerTransform(transform.histogram);

var myChart = echarts.init(document.getElementById('main0'));

var option = {
    dataset: [{
        source: [
            [8.3, 143], [8.6, 214], [8.8, 251], [10.5, 26], [10.7, 86], [10.8, 93], [11.0, 176], [11.0, 39], [11.1, 221], [11.2, 188], [11.3, 57], [11.4, 91], [11.4, 191], [11.7, 8], [12.0, 196], [12.9, 177], [12.9, 153], [13.3, 201], [13.7, 199], [13.8, 47], [14.0, 81], [14.2, 98], [14.5, 121], [16.0, 37], [16.3, 12], [17.3, 105], [17.5, 168], [17.9, 84], [18.0, 197], [18.0, 155], [20.6, 125]
        ]
    }, {
        transform: {
            type: 'ecStat:histogram'
        }
    }],
    tooltip: {
    },
    xAxis: {
        type: 'category',
        scale: true
    },
    yAxis: {},
    series: {
        name: 'histogram',
        type: 'bar',
        barWidth: '99.3%',
        label: {
            show: true,
            position: 'top'
        },
        datasetIndex: 1
    }
};

myChart.setOption(option);

Histogram

A histogram is a graphical representation of the distribution of numerical data. It is an estimate of the probability distribution of a quantitative variable. It is a kind of bar graph. To construct a histogram, the first step is to “bin” the range of values – that is, divide the entire range of values into a series of intervals – and then count how many original sample values fall into each interval. The bins are usually specified as consecutive, non-overlapping intervals of a variable. Here the bins(intervals) must be adjacent, and are of equal size.

Syntax

Used as echarts transform (since echarts 5.0)

echarts.registerTransform(ecStat.transform.histogram);

<pre>chart.setOption({
dataset: [{
    source: data
}, {
    type: 'ecStat:histogram',
    config: config
}],
...

});

Standalone

var bins = ecStat.histogram(data, config);
// or
var bins = ecStat.histogram(data, method);

Parameter

data – number[] | number[][]. Data samples of numbers.

// One-dimension array
var data = [8.6, 8.8, 10.5, 10.7, 10.8, 11.0, ... ];

<p>
  or
</p>

<pre>// Two-dimension array

var data = [[8.3, 143], [8.6, 214], …];

config – object.

config.method – 'squareRoot' | 'scott' | 'freedmanDiaconis' | 'sturges'. Optional. Methods to calculate the number of bin. There is no “best” number of bin, and different bin size can reveal different feature of data.

squareRoot – This is the default method, which is also used by Excel histogram. Returns the number of bin according to Square-root choice:
```
var bins = ecStat.histogram(data);
```

      <li>
        <code>scott</code> &#8211; Returns the number of bin according to <a rel="nofollow noopener" target="_blank" href="https://en.wikipedia.org/wiki/Histogram#Mathematical_definition">Scott&#8217;s normal reference Rule</a>:</p> <pre>var bins = ecStat.histogram(data, 'scott');</pre>
      </li>
      
      <li>
        <code>freedmanDiaconis</code> &#8211; Returns the number of bin according to <a rel="nofollow noopener" target="_blank" href="https://en.wikipedia.org/wiki/Histogram#Mathematical_definition">The Freedman-Diaconis rule</a>:</p> <pre>var bins = ecStat.histogram(data, 'freedmanDiaconis');</pre>
      </li>
      
      <li>
        <code>sturges</code> &#8211; Returns the number of bin according to <a rel="nofollow noopener" target="_blank" href="https://en.wikipedia.org/wiki/Histogram#Mathematical_definition">Sturges&#8217; formula</a>:</p> <pre>var bins = ecStat.histogram(data, 'sturges');</pre>
      </li>
    </ul>
  </li>
  
  <li>
    <code>config.dimensions</code> &#8211; <code>(number | string)</code>. Optional. Specify the dimensions of data that are used to regression calculation. By default <code></code>, which means the column 0 and 1 is used in the regression calculation. In echarts transform usage, both dimension name (<code>string</code>) and dimension index (<code>number</code>) can be specified. In standalone usage, only dimension index can be specified (not able to define dimension name).
  </li>
</ul>

Return Value (only for standalone usage)

Used as echarts transform (since echarts 5.0)

dataset: [{
    source: [...]
}, {
    transform: 'ecStat:histogram'
    // // The result data of this dataset is like:
    // [
    //     // MeanOfV0V1, VCount, V0, V1, DisplayableName
    //     [  10,         212           8,  12, '8 - 12'],
    //     ...
    // ]
    // // The rest of the input dimensions that other than
    // // config.dimensions specified are kept in the output.
}]

Standalone
- result – object. Contain detailed messages of each bin and data used for ECharts to draw the histogram.
  - result.bins – BinItem[]. An array of bins, where each bin is an object (BinItem), containing three attributes:
    - x0 – number. The lower bound of the bin (inclusive).
    - x1 – number. The upper bound of the bin (exclusive).
    - sample – number[]. Containing the associated elements from the input data.
  Examples
  
  test/transform/histogram_bar.html
  test/standalone/histogram_bar.html
  
  Run
  
  Clustering
  
  Clustering can divide the original data set into multiple data clusters with different characteristics. And through ECharts, you can visualize the results of clustering, or visualize the process of clustering.
  
  Syntax
  - Used as echarts transform (since echarts 5.0)
```
echarts.registerTransform(ecStat.transform.clustering);
```
```
<pre>chart.setOption({
dataset: [{
    source: data
}, {
    type: 'ecStat:clustering',
    config: config
}],
...
```
    });
  - Standalone
```
var result = ecStat.clustering.hierarchicalKMeans(data, config);
// or
var result = ecStat.clustering.hierarchicalKMeans(data, clusterCount, stepByStep);
```
  Parameter
  - data － number[][]. Two-dimensional numeric array, each data point can have more than two numeric attributes in the original data set. In the following example, data[0] is called data point and data[0][1] is one of the numeric attributes of data[0].
```
var data = [
    [232, 4.21, 51, 0.323, 19],
    [321, 1.62, 18, 0.139, 10],
    [551, 11.21, 13, 0.641, 15],
    ...
];
```
  - config – object.
    - config.clusterCount － number. Mandatory. The number of clusters generated. Note that it must be greater than 1.
    - config.dimensions – (number | string)[]. Optional. Specify which dimensions (columns) of data will be used to clustering calculation. The other columns will also be kept in the output data. By default all of the columns of the data will be used as dimensions. In echarts transform usage, both dimension name (string) and dimension index (number) can be specified. In standalone usage, only dimension index can be specified (not able to define dimension name).
    - config.stepByStep － boolean. Optional. Control whether doing the clustering step by step. By default false.
    - config.outputType – 'single' | 'multiple'. Optional. Specify the format of the output. In “standalone” usage, it is by default 'multiple'. In “transform” usage, it can not be specified, always be 'single'.
    - config.outputClusterIndexDimension – (number | {index: number, name?: string}). Mandatory. It only works in config.outputType: 'single'. In this mode, the cluster index will be written to that dimension index of the output data. If be a number, it means dimension index. Dimension index is mandatory, while dimension name is optional, which only enables the…

ecStat

Installing

API Reference

Histogram

Syntax

Parameter

Return Value (only for standalone usage)

Examples

Clustering

Syntax

Parameter