Firms where the business people do not understand what the data scientists are doing are at a substantial disadvantage, because they waste time and effort, or, worse, because they ultimately make wrong decisions.

from “Data Science for Business”
by F. Provost & T. Fawcett (O’Reilly) p.13

Price is the quantification of a customer’s perceived value of your product.

from “Monetizing Innovation”
by M. Ramanujam & G. Tacke

“The Elements of Statistical Learning” available to download for free

I just learned that the 10th printing of the 2nd edition of the book “The Elements of Statistical Learning: Data Mining, Inference, and Prediction” by Hastie, Tibshirani, and Friedman which deals mostly with supervised methods for learning from data can be officially and legally downloaded for free in pdf-version from

The hardcover and Kindle versions can be obtained for example from amazon for a nontrivial price tag:

Happy learning,

Ask a Mathematician

In another step in my quest to obtain more and more mathematical know-how I discovered the website where mathematicians and physicists answer questions they received via email or personally. What makes the site very unique is the structure of the very thoughtful and detailed answers. Most questions are answered in multiple steps (like “the clever student would answer / the cleverer student  would answer / the teacher  would answer / the mathematician would answer” ) and small discussions which lead to the final solution or opinion.

In my opinion it is better than any math book because you’re directly confronted with an interesting question and can follow the answers step-by-step having time to form your own solution or opinion on the topic. This induces much more involvement than just consuming a book’s content. The consequence is that you really learned something after working (and I really mean working and not just reading) through the answers.

Advanced R Programming Workshop Available on Bioconductor Website

The folks from Bioconductor, the “open source software for bioinformatics” project based on R, generously publish materials from their conferences and workshops on their website you can download free of charge. Even if you’re not into genetics you should check out the available general purpose workshop dealing with “Advanced R Programming”. The available materials include slides, papers, and self-study exercises:

If you’re interested in bioinformatics don’t forget to have a look into their other courses.


ROI of Acquiring New Skills – GNU R as an Example

In his youtube video Courtney Brown , Ph.D. gives some reasons why learning R is worth the effort. His set of reasons is far from comprehensive but I think he covers some important aspects. In my opinion the return on investment argument is his most important one to convince people to learn R – especially potential business users and academics. The former are often dissatisfied with their current software (or its price),  the latter are often disillusioned by the non-applicability of many of their theoretical and software skills they acquired so far. Learning relevant methods and software to solve relevant problems is very satisfying. 

Continue reading

Competing product analysis: NBA 2K12 vs. EA Sports

My newest finding is a small but nice example of a competitor’s product analysis:

Gamespot author Marko Djordjevic took EA Sports’ point of view and analyzed 2K Sports’ current basketball video game “NBA 2K12”: What must EA do to outperform 2K’s successful franchise? Check out his article. It’s educating even if you’re not interested in video games nor basketball:

Expectation and variance of a binary random variable

If you start dealing with Generalized linear models (GLMs) you will come across sentences like “Obviously the variance of the binary dependent variable is \mu(1-\mu).” Well, for everybody who does not find it too obvious the following derivation may help in understanding the mathematical reasoning behind GLMs, especially Logit and Probit models.
Continue reading

correlation vs causality

Correlation and causality: An everyday life example of causal analysis

While digging a little bit into Java, I found an (at least for statistics-interested people) interesting post on written by Dustin Marx on “Correlation Between Typing Speed and Programming Competence”. From a statistician’s point of view you can see the article as a nice example of a small “everyday life” causal analysis.

Mr. Marx informally analyzes the causes for correlation between the attributes “typing speed” and “programming skill”. If you are short of time just read the conclusion to get the idea (which I cannot recommend for scientific papers!). Such examples are imho very useful for beginners to get the idea of “correlation vs. causality” and for professionals to get a look at their sophisticated mathematical analysis tools from a refreshing basic and everyday life perspective.

Continue reading

Finger Exercise: Throwing two Dice in R using the rpanel Package

After a period of examinations I needed to fresh up some R-vocabulary (the exact syntax) because I started to mix it up with other programming languages’ syntax. And here is my result: A dice game simulation. Not very innovative, not very difficult, but I suppose it could be quite useful for people being new to R as an easy example of how programming in R may work. Furthermore, it is an application of the nice rpanel package.

Dice Game with the rpanel package

For this program being very simple I skipped most comments on the code – but will add some more in the near future. The variables should be quite self-explaining. If not, feel free to write a comment. Of course more experienced programmers are welcomed to improve the code. 🙂

Usage: Run the code in R, use the sliders of the panel to choose the number of dice to throw, the number of throws and hit the Throw! button.

This small program enables you to investigate or illustrate, respectively, some aspects of convergence or simply to get a feeling for your chances to win your next dice game. Feel free to use the program for didactical purposes if you find it useful (see the license in the footer of this page). If you want to have reproducable results set a random number seed of your choice by using the set.seed() function implemented in R.

Happy R-ing.

Continue reading

How to generate bivariate pdfs given a copula and the margins in R and MATLAB

After finding a few unanswered requests for a solution of this problem in the web (including my own…) I’d like to share the final results of my work.

The problem:

Suppose you have two random variables, Z and T.

Z is N(0,1) distributed.
T is t(3) distributed.

Now you are supposed to produce four contour plots of the random variables’ joint pdf for the cases that the variables’ dependence structure is given by the

  1. Gaussian,
  2. Clayton,
  3. Frank- and
  4. Gumbel copula.

With the copula and the marginal distributions given the (bivariate) joint distribution of Z and T can be constructed. And this post is about doing exactly this in R and MatLab (and drawing the corresponding contour-plots).

Continue reading