Scenarios

Scenarios allow you to shape numeric distributions based on other columns in your schema. For example, let's say your want to generate a file where each row represents the sale of a car including the model, region, sale price, and date of sale.




Here we use the normal distribution field type to generate reasonable prices. Let's look at some sample data:

datemodelregionprice
2014-10-26ExplorerSE25341
2014-10-30MustangNE25051
2014-10-17FocusSE26003
2014-10-18FocusMW24396
2014-10-02MustangMW25670
2014-10-09ExplorerNW25137
2014-10-14ExplorerSE24027
2014-10-24FocusSW26206
2014-10-10ExplorerSW22668
2014-10-18ExplorerNE23611

See the problem? All models cost about the same on average. This isn't realistic. Let's create a scenario to better model the real world prices of each model.




Here we use the value of the model column to control the price range. We make the Focus model less expensive while boosting the price of the Explorer. We also adjust the standard deviation to simulate the wider price fluctuations seen on more expensive models.

Now let's change our schema to use our new scenario...




Let's have a look at some sample data...

datemodelregionprice
2014-10-05FocusSW16206
2014-10-20ExplorerSW27987
2014-10-13ExplorerSE31191
2014-10-17FocusSE16809
2014-10-25FocusNE16229
2014-10-21ExplorerNW29149
2014-10-28ExplorerNW30061
2014-10-15MustangMW26221
2014-10-03ExplorerNE28423
2014-10-29MustangMW26568

Much better! Now our sales figures accurately represent the average price of each model.