OlaPy - OLAP Engine¶
OlaPy is an OLAP engine based on Python, which gives you a set of tools for the development of reporting and analytical applications, multidimensional analysis, and browsing of aggregated data with MDX and XMLA support. It can be found in GitHub. or PyPI.
- It is fast and uses in-memory technology and various techniques (aggregation and real-time computation) to provide sub-second responses.
- It includes an ETL layer (Extract Transform Load) for better data handling.
- It support most common databases (Postgres, MySql, Oracle, SQL Server) and CSV file format (only CSV right now) to construct cubes.
Contents:
Installation¶
Install from Github¶
The project sources are stored in Github repository.
Download from GitHub:
git clone git://github.com/abilian/olapy.git
Then install:
cd olapy
python setup.py install
Initialization¶
Before running olapy, you have to initialize it with:
olapy init
Testing¶
OlaPy is configured to run units and integrations tests suites. Before running tests, make sure you have installed all development requirements with:
pip install -r dev-requirements.txt
and then run:
tox
Test other databases¶
The default database used with tests is sqlite, if you want to run tests against mysql or postgres, you need to install the appropriate driver and export a connection string like this
export SQLALCHEMY_DATABASE_URI = { dialect+driver://username:password@host:port/database }
take a look to SQLAlchemy documentation for more information.
Olapy-data¶
After Olapy initialization, you can take a look to olapy-data folder located under:
~/olapy-data for Linux/Mac user
C:\User\{USER_NAME}\olapy-data for Windows
This folder contains some required files to configure olapy and some demo cubes under /cubes folder, we will deeply discuss about this in the Cubes and Cube Customization
Quick Start¶
OlaPy as XMLA server¶
After installation, you can run the Olapy server with:
olapy runserver
All you have to do now is to try it with an Excel spreadsheet:
open excel, open new spreadsheet and go to : Data -> From Other Sources -> From Analysis Services
After that, Excel will ask you the server name: put http://127.0.0.1:8000/
and click next, then you can chose one of default olapy demo cubes (sales, foodmart…) and finish.
That’s it! Now you can play with your data.
Olapy as a library¶
If you want to use olapy as a library to execute MDX queries, start by importing the MDX engine:
from olapy.core.mdx.executor import MdxEngine
In our example, we’re going to use sales demo cube:
executor = MdxEngine() # instantiate the MdxEngine
executor.load_cube('sales') # load sales cube
We set an MDX query:
query = """
SELECT
Hierarchize({[Measures].[Amount]}) ON COLUMNS
FROM [sales]
"""
and execute it:
df = executor.execute_mdx(query)['result']
print(df)
Result:
Amount 0 1023
Advanced Olapy options¶
Olapy is easy configurable. When using the olapy runserver
command, you can pass a lot of options to it:
-st Cubes source type (db|csv), DEFAULT : csv only
-sa SQL Alchemy URI to connect to database , **DON'T PUT THE DATABASE NAME !**
-wf Write logs into a file or display them into the console. log file location,
by default under olapy-data folder
-lf If you want to change log file location.
-od Olapy-Data folder location
-h Host ip adresse
-p Host port
-dbc Database configuration file path, Default : ~/olapy-data/olapy-config.yml
-cbf Cube config file path, default : ~/olapy-data/cube/cubes-config.yml
-tf File path or DB table name if you want to construct cube from a single file (or table)
-c To explicitly specify columns (construct cube from a single file), columns order matters
-m To explicitly specify measures (construct cube from a single file)
Here is an example of olapy runserver
with all options:
olapy runserver -sa=postgresql://postgres:root@localhost:5432
-wf=False
-lf=/home/{USER_NAME}/Documents/olapy_logs
-od=/home/{USER_NAME}/Documents
-st=db,csv
-h=0.0.0.0
-p=8000
Here is an example of olapy runserver
with a simple csv file:
olapy runserver -tf=/home/moddoy/olapy-data/cubes/sales/Data.csv
-c=City,Licence
-m=Amount,Count
Cubes creation¶
To add new cube, put your CSV files in a folder (folder name <=> cube name),
make sure that they follow OLAPY CUBES RULES,
and move that folder under olapy-data/cubes
,
thus, the path to your cube will be:
~/olapy-data/cubes/{YOUR_CUBE}/{YOU_CSV_FILES}
for Mac/Linux,C:\\User\\{USER_NAME}\\olapy-data\\{YOUR_CUBE}\\{YOU_CSV_FILES}
for Windows.
OLAPY CUBES RULES¶
NOTE : THE SAME THING IF YOU WANT TO WORK WITH DATABASES
Here are the rules to apply to your tables so that can works perfectly with olapy:
- Make sure that your tables follow the star schema
- The fact table should be named ‘Facts’
- Each table id columns, must be the same in facts table, example ( product_id column from product table must be product_id in Facts table,
- Avoid ‘id’ for id columns name, you should use something_id for example
- The columns name must be in a good order (hierarchy) (example : Continent -> Country -> City…)
take a look to the default cubes structure (sales and foodmart).
Here are two examples of table structures that follows olapy rules:
Cube customization¶
If you don’t want to follow olapy cubes rules and you want to customize your cube construction, you can use a configuration file, you can find the default example in
~/olapy-data/cubes/cubes-config.xml for mac/linux
C:\\User\\{USER_NAME}\\olapy-data\\cubes\\cubes-config.xml for windows
Here is an examples of configuration:
Assuming we have two tables as follows under ‘custom_cube’ folder
table 1: stats (which is the facts table)
departement_id | amount | monthly_salary | total monthly cost |
111 | 1000 | 2000 | 3000 |
… |
table 2: organization (which is a dimension)
id | type | name | acronym | other colums….. |
111 | humanitarian | humania | for better life | |
… | … |
you can use a configuration file like this to construct cube and access to it with excel:
# if you want to set an authentication mechanism to access cube,
# user must set a token with login url like 'http://127.0.0.1/admin
# default password = admin
xmla_authentication : False
# cube name <==> db name
name : custom_cube
#csv | postgres | mysql ...
source : csv
# star building customized star schema
facts :
table_name : stats
keys:
departement_id : organization.id
measures :
# by default, all number type columns in facts table, or you can specify them here
- amount
- monthly_salary
# star building customized dimensions display in excel from the star schema
dimensions:
# IMPORTANT , put here facts table also
- name : stats
displayName : stats
- name : organization
displayName : Organization
columns :
- name : id
- name : type
- name : name
column_new_name : full_name
Running olapy with Database¶
As we said in the previous section, Olapy uses CSV files as source type by default when using the olapy runserver
command, so how can we work with databases ? Well, you need to provide some database information (login, password, etc…) to Olapy so it can connect to your database management system.
The command to run Olapy with databases is
olapy runserver -st=csv,db
Here, Olapy gets cubes from csv and database (of course if you want only database use -st=db
…)
You have three possibilities to configure olapy with database:
Environnement variable¶
At startup, Olapy looks for an environment variable called SQLALCHEMY_DATABASE_URI which is the connection string that holds your database credentials and its something like:
SQLALCHEMY_DATABASE_URI = mysql://root:root@localhost:3306
To use this method, just before starting Olapy with olapy runserver
, use the following command:
export SQLALCHEMY_DATABASE_URI = mysql://root:root@localhost:3306 for mac/linux
set SQLALCHEMY_DATABASE_URI = mysql://root:root@localhost:3306 for windows
and then start Olapy with the option -st=csv,db
of course.
NOTE don’t put the database name in the connection string, you will select the database after from Excel.
SQLALCHEMY_DATABASE_URI = mysql://root:root@localhost:3306/my_database
-> this will not work
Database string connection¶
This is simple as running Olapy with the -sa
option:
olapy runserver -st=csv,db -sa=mysql://root:root@localhost:3306
and the same rule don’t put the database name.
Olapy config file¶
The third way to configure a database connection is using a file configuration named olapy-config.yml
under olapy-data
folder. A default/demo olapy-config
file is created after installing olapy under olapy-data
.
You can modify this file according to your configuration:
connection_string : postgresql+psycopg2://postgres:root@localhost:5432
take a look to SQLAlchemy documentation for more information.
OlaPy ETL¶
NOTE: this part is working only with Python 3.5+.
OlaPy ETL can be used if you have an excel file (one sheet) contains all your data in order to let OlaPy make necessary transformations relative to its rules on your data.
To use OlaPy ETL, after installing OlaPy with python setup.py install use the following command:
etl --input_file_path=<EXCEL FILE PATH> --config_file=<CONFIG FILE PATH> [OPTIONAL] --output_cube_path=<PATH WHERE TO GENERATE THE CUBE>
config_file describe how to create the cube, here an example of the configuration file, consider this excel sheet:
Count | Continent | Country | Year | Month | Day |
---|---|---|---|---|---|
84 | America | Canada | 2010 | January 2010 | January 1,2010 |
841 | America | Canada | 2010 | January 2010 | January 2,2010 |
2 | America | United States | 2010 | January 2010 | January 3,2010 |
and we want to divide it into three table, we use a configuration file like this:
Facts: [Count] # just measures
Geography: [Continent, Country]
Date: [Year, Month, Day]
and you save it as yaml file (.yml).