Based in Sydney, Australia, Foundry is a blog by Rebecca Thao. Her posts explore modern architecture through photos and quotes by influential architects, engineers, and artists.

Episode 145 - What Is Probability? A Philosophical Question with Practical Implications

Episode 145 - What Is Probability? A Philosophical Question with Practical Implications

It is common in our everyday lives to see probabilities at play — we see them during election season, business and marketing strategies, and even games. People often see probabilities with exact absolute numbers, but thinking of them as a range of probabilities can be more helpful. With this unique insight into Bayesian inference, you can find a new useful way of thinking about probability!

In today's episode, Max shares his talk at PyMCon 2020. He describes the historical basis and value of Bayesian methods and broadly discusses probability and how it is applied in the real world. Finally, he touches on solutions to Python programming issues.

Tune in to the episode to learn more about the importance of thinking about the practical implications of probability!

Here are three reasons why you should listen to the full episode:

  1. Discover why it is crucial to know the implications of probability.

  2. Learn the differences between frequentist and Bayesian views in probability.

  3. Understand how probability is viewed in the real world.

Resources

 
 

Related Episodes

Episode Highlights

What Is Probability?

  • Max’s summary of the Bayes’ Rule states that the posterior is proportional to the likelihood times prior.

  • Philosophical questions have implications in the practice of Bayesian inference.

  • Probability is the essence of our work and research.

Marketing Attribution and Election Predictions

  • Clients prefer exact numbers. However, analysts must be able to explain the merits of a credible interval.

  • Max used Bayesian inferences to present results on how effective advertisements were for one of his clients.

  • The average person doesn’t interpret probabilities very well.

  • Consider the martingale: today's prediction is the weighted average of tomorrow's prediction.

  • Listen to the full podcast to hear about Max’s experience at Foursquare and his views on election predictions!

The Subjective View in Bayesian Inference

  • The subjective view of probability is due to people’s beliefs. Everyone has probabilistic beliefs that are continuously updated with new information.

  • You can think of the Bayes’ Rule like a game of Guess Who, where you eliminate some hypotheses and leave others open.

  • Most people want objective answers, but subjectivity works in many different cases.

Society’s Perception of Probability

  • Most schools teach frequentist methods instead of Bayesian ones.

  • Even though the Bayes’ Rule can be explained easily, it is still widely thought of as advanced mathematics.

  • The Bayes' Rule is more commonly taught during graduate school or in advanced courses such as statistical inference and machine learning.

Frequentist View and Controlled Experiments

  • Frequentist methods work best in controlled environments.

  • Controlled experiments are experiments that can be repeated over and over again.

  • The frequentist probability defines an event's probability as the limit of its relative frequency over several repetitions.

Probability in the Real World

  • The Bernstein von Mises Theorem is a condition where frequentist confidence intervals converge approximately with Bayesian credible intervals.

  • Austrian economist Ludwig von Mises warned against treating case and class probability the same way.

  • In the middle of the 20th century, Bayesian methods started to become popular for predicting rare events.

  • Proposed by Karl Popper, the Propensity Theory of objective probability explains why repeated experiments can consistently generate an outcome.

  • Listen to the full podcast to learn more about the Bernstein von Mises Theorem, case and class probabilities, Propensity Theory, and objective probabilities.

Credible Intervals vs. Confidence Intervals

  • Bayesian credible intervals focus on probability distribution. However, the frequentist confidence interval is more widely understood by the public.

  • Credible intervals are based on where results appear in a range.

  • Remember that people perceive probability differently.

Max’s Take on Probability

  • Max views probability as subjective and in terms of ratios.

  • The philosophical basis of probability is still an open discussion.

  • If you’re an expert in Bayesian methods, be more than just a modeler and take the time to translate your knowledge for others.

  • Bayesian methods are a powerful toolkit.

Python Problems with Shaun Lowry

  • Python needs multiple package management solutions.

  • People use packages differently.

  • A virtual environment isolates Python packages from other packages on your system.

  • PIP is available to everyone, but there are different types of PIP for different Python systems.

  • ActiveState is Sean Lowry’s answer to Python’s package issues.

5 Powerful Quotes from This Episode

“There might be a tendency for practitioners to say, ‘well, I don't really care about probability’...But actually, sometimes you really have to dive into some of these philosophical questions because it does have implications for the practice of Bayesian inference.”

“...a lot of university professors and people in academia sort of see Bayes’ rule...as like, the really advanced math. You've got to kind of work your way up to it, which really, in my opinion, is not the case.”

“One of the reasons why Bayesian methods in the middle of the 20th century started to become popular in areas like actuarial science and insurance (was) because your people had no idea how to insure things that were very rare and never occurred before.”

“That (probability) is really focused on belief...it's sort of like, in my expert opinion, I believe that the value has this probability distribution. I think that's often a really powerful way of looking at this stuff; and frankly, I think it's better than the other way of looking at that stuff.”

“Don't look down on people who don't have these insights, because the very special thing is you're often going to be the expert in the room on this and don't forget it.”

Enjoy the Podcast?

Are you hungry to learn more about the implications of probability? Do you want to expand your perspective further? Subscribe to this podcast to learn more about A.I., technology, and society.

Leave us a review! If you loved this episode, we want to hear from you! Help us reach more audiences to bring them fresh perspectives on society and technology.

Do you want more people to understand Bayesian inference and probability? You can do it by simply sharing the takeaways you've learned from this episode on social media! 

You can tune in to the show on Apple Podcasts, Soundcloud, and Stitcher. If you want to get in touch, visit the website, or find me on Twitter.

To expanding perspectives,

Max

Transcript

Max Sklar: You're listening to The Local Maximum Episode 145.

Time to expand your perspective. Welcome to the Local Maximum. Now here's your host, Max Sklar.

Max Sklar: You reached another Local Maximum. Welcome to the show. I actually have a couple of things to put out to you today. Today's episode is, I would say it's a little bit of specific, a little bit targeted towards people who have an interest in philosophical and practical questions about probability and interest in Python toward the end. Maybe some of you developers out there, some new machine learning engineers out there, we'll get a kick out of this. So I hope you really enjoy if you're into that. Otherwise, we'll get back to our regularly scheduled programming in no time. But I want you to hear about what we have today.

After our main event today, I want to share with you a brief conversation about the tools that ActiveState has to aid in Python development. They found a kind of a pain point in Python development, they're addressing. ActiveState is a sponsor of The Local Maximum. But it's a fun discussion. So especially if you're a Python developer, you should stick around and listen to that.

Now. Today, I wanted to share with you a talk I gave recently at the PyMCon conference. This is one of those recorded talk episodes. I think I haven't done that since—what was it? Episode 11 when I spoke at Yale a couple years ago. So yes, this is another one of those. So this is from a Bayesian conference focused on the Python tool, PyMC3, which is a great tool to do probabilistic programming in Python. And then the conferences kind of on Bayesian inference in general. And there I gave a talk; my talk was titled, "What Is probability: A Philosophical Question With Practical Implications for Bayesians.” And it was meant for kind of a general-ish audience. 

But you'll notice that some of the context is targeted at this crowd, and I spent a lot of time describing what we have gone over on this podcast in the beginning, over the years, so it's a little bit of review for some of you. So hope you enjoy that. I apologize; I think it was on the wrong mic when I did it. So the sound quality is going to be a little slightly more muffled than I'd like to as a professional podcaster. But it happens sometimes; I think every podcaster would admit it.

So without further delay, here is my virtual talk at the October 31, 2020, PyMCon Bayesian conference. Enjoy.

All right, here we go. Hello, everyone, and welcome to my talk, which is entitled, "What Is Probability?" for the PyMCon 2020, and it is great to be here virtually. And I look forward to seeing you all virtually but live on October 31st. So first of all, I want to introduce myself a little bit. My name is Max Sklar. I am currently a Labs Engineer at Foursquare. I'm actually a longtime engineer at Foursquare, which means that I work on experimental consumer products. So the one that I work on now is called "Marsbot audio," and should be out in the App Store by the time this goes out. But if not, you can ask me about that. We're not going to talk about too much about that today. But I do have a lot of experience as a machine learning engineer at Foursquare, which I did for many, many years. And I was able to kind of develop a toolkit of Bayesian methods when I was there. So I'll talk about that a little bit. And also, I have a weekly podcast that I do called The Local Maximum, and you can get information on there localmaxradio.com. I'll be talking a lot about the podcast, but there's a good reason for that. So hang tight.

So, first of all, you might notice something a little bit different about this video. First of all, I don't have slides, which is a little different. It's something I'm trying, it's a little different. I think I can put some funny pictures up for you. I can spend hours looking for pictures, but I'm kind of—with a podcast and kind of growing accustomed to talking without slides for the podcast. So let's see if this works. Now, you might be like, "Well, I want to go back to the slides." All of my notes for this, including the links with my notes, have about three pages of notes, and they'll all be available to you with links after or before the talk.

So since we don't have slides, I'm going to assume that you know a few things. I think if I had slides, first of all, I would put up Bayes rule. But I'm going to assume that you all know Bayes rule, probability of hypothesis given the data is equal to the probability of the data given the hypothesis times the prior probability of the hypothesis over the probability the data. Or the way that I like to put it is posterior is proportional to likelihood times prior. So that's kind of a basic thing. I'm going to assume you have that I assume you have kind of the basics of how Bayesian inference works, and if not, it might be helpful to pull that up and familiarize yourself with it a little bit. But I think most of the people in this conference are solid on that.

So that's kind of where I'm going to go from the starting point. And I'm also going to assume that you have dived in some real problems before involving Bayesian inference. And so I'm going to prepare you for how to do that in the future, or we're gonna—I'm hoping that you think about the problems that you've solved in the past. And you kind of think, "Hey, these questions have come up," which is very surprising.

So the question that I want to start with today is, what is probability? And that is a really interesting philosophical question. And it might be—there might be a tendency for practitioners to say, "Well, I don't really care about probability, I don't care about epistemology, I just want to solve problems for my client," or whatever it is. And sometimes that works. But actually, sometimes you really have to dive into some of these philosophical questions because it does have implications for the practice of Bayesian inference.

Because well, first of all, after all, a lot of us are often tasked with justifying Bayesian methods, not every place and not every client that you're going to deal with are committed Bayesians, like us. I assume most of the people who are listening today are committed Bayesians. But the average person is not well versed in probability, the average person in sales, or the average person in management, sometimes they're very smart people, hopefully. But they're not always well versed in probability itself. Not even putting aside, I don't even have time to go into all the different schools of thought and some probability. So it's very important that we often be sharp on these things. So we know where different people are coming from. And if somebody comes at us from a different angle, we should be able to understand where they're coming from, as well. So that's pretty important.

This talk is going to be informative for those of you who are kind of new to Bayesian inference. But even if you're experienced, I hope you find one or two, or maybe a lot of new pieces of information or points of view that you can carry around in your toolkit. Looking at probability is kind of the essence of looking at all of the problems that we solve in our work and in our research.

So, early this year, on the podcast, on The Local Maximum, I kind of decided to explore this further with a series of episodes. And I'll be forwarding the episodes for you to check out. And I know you might think, "Hey, why are you pushing your podcast on us the whole time." But look, I already did the work in presenting some of these ideas, and talking to some experts, and making show notes available. So I do want to tell you about all the work that I've already done several hours of content, and it's really packed with information to teach you about different angles here because I can't do it all today. So it's—I've done the work to make it available to you for free. So localmaxradio.com is where the podcast lives. And I'm going to be pointing out specific episodes if you want to dive into some of these issues more that we're going to cover a little bit more, a little bit more in-depth.

So, first of all, a couple of practical examples that really made a difference for me when understanding probability. The first is from a problem that I dealt with at Foursquare several years ago, which is the problem of marketing attribution. Very big business for Foursquare, very big business, in general. And that is kind of a task of trying to figure out whether ads work or not. And there's a whole lot that I can say about causality theory, which is a whole other issue that I had to dive in for that. But also, it was just a matter of how we presented our results to the client.

So, for example, the client wanted to know, “Did my ad work or not?” And really, when we dove into that problem, and we tried to figure out, "How do we answer this?" We realized that the right way to do it was kind of a Bayesian inference. And the thing that popped out at the end was something called "Lift" was like, what is the average likelihood that someone would visit your place, given that they saw the ad over what would have happened if they hadn't seen the ad, very difficult thing to compute, but even once you can compute an estimate, you don't want to give an exact number there. What you want to do is you want to give an estimate for "Hey, I think the lift was 10%. But I have to say a Bayesian credible interval that I think; it was maybe between 5% or 15%. Or maybe I think there's a very good chance that it was zero percent, but maybe you got no lift, but maybe it's between these bounds," or something like that.

And so not every client wanted it—wanted answers in this way. But once we thought about it, in those terms on the back end, we were then to make sure; we were then able to make sure that we could give the most accurate information to clients possible. 


Another example in terms of understanding probability, which is very common these days, which actually, by the time this goes out, is going to be driving people crazy, which is all the election predictions. Everybody is talking about every single poll that comes out, and you read FiveThirtyEight blog, or all sorts of things. They always have probabilities coming out about whether one candidate is going to win or another.

And that's another case where I think the average person doesn't really interpret that very well. And we have the tools to maybe decipher some of this a little better. One of the things that I like to point out is that this is an example of something called a "Martingale," which is kind of an interesting vocab term, which just means that, "Hey, today's prediction has to be the weighted average of tomorrow's prediction." So if you think tomorrow, there's a 50/50 chance that one candidate is up 60/40. But there's a equally likely chance that the second candidate is up 60/40, then days average has to be 50/50. And so that's kind of an interesting way to look at it.

And you kind of look at these election predictions, and they're not really—they are an example of subjective probability, which I want to talk about, which is how Bayesians often look at probability. Most of us take the subjective point of view. And the subjective point of view is that a probability is something that is calculated by an agent with beliefs. So when you talk about somebody's beliefs, that's the point of view. And we just happen to have not clear beliefs that x is true, and y is false.

We also have some probabilistic beliefs where x might have a 10% chance of being true, and y maybe has a 5% chance of being true. And we update those beliefs with new data, and maybe different agents might have different beliefs. Some of them might not be using Bayes rule at all. They might be using some heuristic, but they're still coming up with beliefs all the same, and we kind of consider that all probability. And we have to, kind of, decipher who's doing a good job, a good competent job of coming up with these probabilities. Hint is if you use Bayes rule, and you use some good common practices with priors and all that, and you're doing a good job. But there's still a lot of nuance in there you still want to know, am I gathering the right data that I want to gather? Am I looking at the data that I want to look at and all that.

So I think somebody that I interviewed on The Local Maximum that did a really good job of explaining Bayes rule, that I think was good for the average person, that you could catch up on was mathematician Sophie Carr, who I talked to in episode 105. And I really liked her imagery of Bayes rule being kind of a giant game of guess who were as you're gathering more information, you're eliminating some hypotheses and leaving others open. It's kind of a probabilistic guess who. And so that's the first one that you should check out.

I think that means that when we take a subjective point of view, we want to take a personal point of view, or in the case, like when I was doing with Foursquare, with measuring ads, I was saying, "Hey, I'm going to build a machine here. It's going to have a lot of different parts, it's going to take in data, it's going to update these very complicated probabilistic models," and essentially, I'm going to say, "Hey, this is going to be our organization's take on what are the probabilities lie. And so that's a very subjective way of looking at it. But it's often very helpful in solving problems. Some people maybe are a little bit uncomfortable with the idea of probability as a subjective thing. And so that's something to keep in mind. One thing to keep in mind when you're explaining Bayesian methods, because you kind of have to walk people through that a lot like, “I want something to be an objective, I want something to be objective answer,” and kind of have to explain why that might not be—explain why the subjectivity actually works in a lot of different cases.

So the other view is many of you know, it's kind of a frequentist point of view. And there are other views as well that I'm going to get into in a second, but this is one that you have to be well versed in if you're going to do Bayesian inference in the real world. One thing to point out is that currently frequentist methods and sort of the objective view of probability is the dominant view of probability taught in schools and universities both at high school level. And at the undergrad level, pretty much for all throughout the United States, and I'm pretty sure it's very similar throughout the world.

So I spoke to a professor about this—Brian Blais on the podcast if you go to episode 119. And he wrote a book on Bayesian inference, which was for a younger audience, for an undergrad audience, and turns out that it works very well. Our method Bayes’ rule is not that complicated, you could teach that in high school, you could even teach that probably before high school, okay, maybe not the kind of a calculus version of it. But you could teach the Bayesian way of looking at things very similarly. And yet, a lot of the university professors and people in academia, sort of see Bayes’ rule—and I was kind of surprised by this as like, the really advanced math. You've got to kind of work up, work your way up to it, which really, in my opinion, is not the case.

And kind of a reason why we as Bayesians sometimes have a little trouble explaining our point of view. So most of us didn't get Bayes’ until advanced courses in statistical inference, or maybe in machine learning. I mean, I personally didn't really have to contend with Bayes’ rule until I took—until I was in grad school, and I was taking courses on Data Mining and Machine Learning and that sort of thing.

So if you want to know and if you want to get some more information on frequentist point of view, one of the points, one of the areas in science and life where the frequentist point of view really works, is in the area of controlled experiments. Because controlled experiments are something that can be repeated over and over again. By the way, the frequentist point of view, I should point out and kind of assume that you all know what it is. But the view is that probability is kind of a long-run value that is sort of the output of experiments.

So, for example, if you have a weighted coin, you're going to flip coins a lot of times, and you assume that each flip of the coin is independently—is sort of, is independent and identical. And so you keep flipping that coin, and eventually, the ratio of heads to tails, the ratio of heads to flips is going to converge on to a number. And that number after an infinite number of flips—which is impossible. You can’t have an infinite number of flips. But that convergence is eventually going to be called the probability. And in experimental design where you have a very repeatable experiment that tends to work very well. So I did a whole episode on experimental design. That would be episode 109, with Adam Kapelner. Very interesting stuff there. And so okay, well, one of the people to look up on experimental design is Ronald Fisher, very—a lot of work has been done there. But it's from kind of the opposite point of view that a lot of us would prefer to take.

Now, another interesting thing—so a lot of this stuff was kind of debated and come up with around the turn of the century. I'm up, okay, freezing in my own, even though this is not online. Okay. So one of the things that I came up with—I found a lot of interesting debates and discussions that were having, at that time, maybe around the turn of the century is maybe about the 1930s, 40s, 50s.

One of the ones that I found very interesting that isn't talked about very much is called the Bernstein-von Mises theorem. And essentially, what this says, and this is Richard von Mises and Sergei Bernstein statisticians at Harvard, I think the 1930s. So basically, that says that Bayesian methods do tend to converge, even if you and I say, start with different priors. And we're kind of looking at the same data. Or even if, in a lot of modern machine learning examples, we're looking at different data, but it's from the same larger data set, we're pulling from it randomly. Then we should kind of converge on a similar answer if certain conditions are met. And you should kind of be familiar with those conditions. And one of the—so I did kind of an episode on that, that was Episode 77, where I talked a lot about Bayesian thinking and in general, and you should definitely check that one out.

But one of the interesting things about this theorem too is if you look at it and you think about it properly, you can kind of look at frequentist methods as having kind of a, somewhat of a Bayesian interpretation as well. And so it sort of says that under normal circumstances, which by the way, are broken a lot in the real world, so you have to watch out for that. The Bayesian models converge, and they converge with a frequentist model, which is kind of a good thing, oftentimes. But it does mean that you often have to translate back and forth between somebody who thinks one way and someone who thinks another way. And sometimes, if you need to translate back and forth between two Bayesians, we're taking a very different approach.

So what are the exceptions to this where things can't converge? This is where you can get into trouble. First of all, if you have a different hypothesis space, as a Bayesian, all bets are off because I could not be considering some of your hypotheses, and you could maybe not be considering some of my hypotheses. Likewise, if my prior assigns some hypotheses to zero, then that's the same as not considering those hypotheses. And so we're not going to—if those ends up being the right answer, we're never going to converge. Okay, so those might be the obvious ones.

Another one that happens is when you have kind of very large and complex priors and distributions, one is when you have kind of an exponential distribution, you have a very fat tail, where the data that we're getting can't tell us what happens, like there are some very extreme examples that affects the average rate. Let's say, I don't know, 99.99999% of the time, I'm going to get a value between one and ten. But every once in a while, I get a value that's like billions and billions, that could happen, it actually happens a lot. And so oftentimes, with lots of data, even, we can't pick up on that. And so that's something to take in mind.

Sometimes I've had issues where—not that particular issue. But I've definitely had issues where I was running a logistic regression model, and one of the variables was distributed exponentially, and it was trying to fit kind of a linear weight on to it, that doesn't work too well. So, I mean, one thing you can do is kind of divided up into buckets and things like that. So another time where this can fail is sort of nonparametric methods. So let's say you want to fit a mixture of distributions to the data, but you have kind of no limit on the number of components, that's often a great thing to do. It's a very flexible way to fit your data. But sometimes, this might not hold. So this kind of theorem doesn't hold. And that could be a problem for Bayesian methods, you could say, but it's also a problem for frequentist methods, too. So everybody gets in trouble in these situations.

So another issue that I discovered when doing all this research for my podcast is that Richard von Mises, who was the statistician. He had this brother, Ludwig von Mises, who was this famous Austrian economist. And he actually had some things to say about probabilities as used in the real world. And it was kind of hard for me to wrap my head around it because we think one way here in the early 20th century's Bayesians. And they kind of think another way, but I thought—I kind of thought that I had to contend with this because it was so interesting. Cause he was talking about something called case and class probability. And he sort of warned against treating case probability and class probability the same way.

So I sort of knew this has something to do with Bayesians, and frequentists has something to deal with subjective probability and objective probability, but I wasn't really sure. So, fortunately, I was able to get a guest on the program who's an expert in disguise, his name is Bob Murphy, and I got him on episode 107, and we talked about it a little bit. And it turned out that class probability—oh god, I'm gonna forget which one is which. But one of them is class probability, where that looks like it's much closer to a controlled experiment where you have one situation that happens over and over and over again. Philosophically, I think that there's also a problem with saying that we were doing the exact same thing over and over again because we know that's not exactly true. We know that there's always little differences that—but be that as it may, it's—and I think Laplace talked a lot about this, where we are ignorant enough, as subtle differences that from our point of view, all of these iterations of the experiment are essentially equal. And so that's kind of one thing to wrap your head around. That's pretty interesting.

But essentially, when you have these class probabilities, you could really make a good case for coming up with a number case probabilities are one-offs. So in the case of one-offs, I think that they didn't really have the tools to talk about one-offs 100 years ago, but today with the rise of Bayesian methods, we have very good ways of talking about one-offs. And actually, I did an episode on one-offs, too. So that would be Episode 69. Let me actually pull up here. What Episode 69 was called? No, it wasn't 69, shoot. Got to get a better number for this. I think it was, yes, 65, episode 65, localmaxradio.com/65. You can get them. You get them just by typing in the number. So the chance of something that has never happened before.

And this is one of the reasons why Bayesian methods in the middle of the 20th century started to become popular in areas like actuarial science and insurance because your people had no idea how to insure things that were very rare and never occurred before. And Bayesian method sort of gave them the answer. So that's kind of a good factoid to have up your sleeve.

So coming back to the episode that I did with Adam Kapelner because he teaches philosophy of probability in his class and in episode 109. And it turns out, there are other ways of looking at probability, too aside from subjective and objective, but they're kind of related. So one is sort of propensity theory. So that's the idea that it's sort of an objective inherent way of the world. So, for example, the, you have a six-sided die, there is a propensity for each side to come up 1-6-1-6-1-6. That's just a property of the die, property of the dice. Sometimes, that's an interesting way to look at these things. This is one of the founders of the philosophy of science Karl Popper came up with. And it's sort of a mixture between the two because it's objective probability like frequentists proposed, but it's also one-off; you could do one-off things with propensity theory. So that's more Bayesians, so very interesting kind of mixture there.

And then you have kind of the logical view of probability, we're no longer talking about beliefs here. But you have the kind of purely mathematical, purely mathematical take on it, where probability is a measure, and it has these mathematical properties. And I think that's a great way of looking at it in some circumstances. But you can take your mathematical theory, and I kind of feel like once you have to plug it into the real world, which is science, which is marketing data, which is machine learning, you kind of have to figure out where you stand on objective or subjective.

So these issues have, okay, this might sound a little bit like having the clouds, people are talking about this stuff in universities, why do I care? Well, here's a good example of why you should care. One is that the debate between credible intervals versus confidence intervals, and I'm sure you've seen this a lot. The frequentists use confidence intervals they're way more common. And usually, it's a formula. But it's almost always interpreted as a Bayesian credible interval by the public. And not just by the public, but by clients, or bosses, or anyone who's looking at this stuff.

So Bayesian credible interval is just, "Hey, I'm going to give you a range of numbers, and I believe there's, say, a 95% chance that the value you're looking for is in that range and here's the median. Here's the median value. And I think that there's a 50% chance that the number you want is above this range, and is a 50% chance that the number is below this range." And you could see, that's really focused on belief. "Hey, in my expert opinion, I believe that the value is; have this probability distribution." I think that's often a really powerful way of looking at this stuff, and frankly, I think it's better than the other way of looking at that stuff. But that's just me. But you do have to kind of work on both ways. And the other is kind of the credible intervals kind of more focused on hypothesis testing, which is we expect results to appear in this range. If we had done multiple experiments with our intended answer. So it's sort of hard to wrap your head around.

Another one that's very hard to wrap your head around. But it's been very difficult to wean people of is P-values, it's really got the same issue. And it's always worth discussing with colleagues and clients when you're doing these models is sort of where do they stand on P-values? What's their opinion of them? Because so a P-value says, "Hey, what is the probability that I am going to get a result, at least this extreme, given the hypothesis?" So you want to know, "Hey, did my data kind of invalidate the hypothesis?" Whereas in the Bayesian point of view, we want to take a look at, "Hey, we have a range of hypotheses, what's our belief, as represented by a probability distribution over that range," which is often a lot more intuitive.

So again, when you're dealing with people who have different views on this topic, you shouldn't try to overturn someone else's way of doing something that's going to be very difficult. Think about it, if somebody was trying to overturn your way of doing something. But you kind of have to be aware of how different people perceive probability. And then you have to translate effectively to and from the Bayesian worldview or your own worldview on probability, which you should often be thinking about.

So I'm gonna leave you with one more example from philosophical thinking. And well, I can't just talk all day about what other people think. I have to talk about an idea that I think is not quite original, but it's sort of my idea on what probabilities are, and it's related to Bayesian inference before we close up, because it's a, it's sort of important to me. So I did an episode on this, which is episode 108, localmaxradio.com/108, where I tried to answer once and for all, what is probability, not once and for all, what is probability. But I gave myself a homework assignment. I said, "Okay, I'm going to try to give you my take on what is probability in this episode,” so I tend to do it. Spoiler alert, I basically take the subjective point of view, which is common among Bayesians, and I assume common among YouTube. But if you have a different view, I actually like to hear about it.

I also tend to look at probabilities as ratios, which is a little bit different probability is kind of a relative property versus an absolute property. So instead of saying, "Hey, there's a 50% chance that this coin will land on heads," maybe I'm going to say, "Hey the probability of x is twice as likely of happening as the probability of y." And you can kind of translate that from relative to certain. So, for example, I want to know the weight of the die, I can say, "Okay, what is the probability of getting a heads versus the probability of getting a heads or tails?" And so then you kind of see the inherent assumption there that certainty is getting heads or tails, which is a good assumption, but as we know not, probably not, probably not ironclad hundred percent true.

But interestingly enough, this idea of relative probability where I can say one event is twice as likely of another as another without actually knowing what the absolute probabilities are. This works really, really well when it comes to doing Bayesian inference. Because when you look at these complex posterior spaces that we use when we apply Markov chain Monte Carlo and we use PyMC3, and we apply the no U-Turn algorithm. This works really well because in those Bayesian posteriors, the denominator of Bayes’ rule, the marginal probability is often dropped. Because when you do Markov chain Monte Carlo, each step, you want to know if I want to go from this hypothesis, and I want to jump to this other hypothesis. And all you care about is the relative probability of each hypothesis, and we don't have the denominators. We don't have the absolute probability of each one. And oftentimes, those denominators are not even tractable. So it's great that we can actually have some interpretation of this that is sort of independent of—well, we have this intractable denominator, let's pretend it's there.

And so now, probabilities often live in a ratio space, which is really interesting. It's not just probability of A it's often probability of A over probability of B, which means it's a number, it's a positive real number. And so the ratio space is a very interesting symmetry around one, which is a really nice number system to work with, in my opinion. So that's something more to think about. If you have any more ideas on how to take this idea to the next level, I'd be very interested to hear about it.

So I think we're running out of time, and hope you learned a couple things from this. I just want to conclude with a few quick takeaways from my talk. One, okay, I'm going to say this, you can get a lot more information on The Local Maximum podcast localmaxradio.com. So check that out, I've done a lot of work on that, or reach out to me directly if you have more questions.

Two, the philosophical basis for probability, it's still very much an open discussion. People still think very differently about it and affects the way that people approach real problems. And it actually comes up when you're talking to other people in the field trying to come to a consensus here.

Three, you're going to encounter many views on probability, and it's often going to fall to you to explain your point of view and also translate it into other points of view for people you work with, or interpreted it, or to interpret it. It's your job, not just to be a modeler, but to be a teacher as well. And if you get those skills, you can be—that is a very marketable rare skill in today's world.

And four, I just want to conclude with this; I didn't say it directly. But as Bayesians, I just want everyone to know that we have a very, very powerful toolkit at our disposal. And it's taken many years from the foundations of science and trial and error to the rise of machine learning and probabilistic programming like PyMC3 to really build-up to this point. And we have special insights into solving problems that many people just don't have. So don't look down on people who don't have these insights, because it's a very special thing, but you're often going to be the expert in the room on this, and don't forget it.

So thank you very much. You can reach out to me, and I look forward to meeting you all in this conference.

All right, I just want to stop it and talk a little bit with Shaun Lowry from ActiveState.com, as many of you who listened to the show know that ActiveState has been a sponsor for a while; Shaun, how you doing today?

Shaun Lowry: Pretty good. How are you?

Max: Good, except all these Python problems. My god, it's been nagging at me. No, not so much these days. But I have done a lot of Python in the past. And when they sent me this graphic from XKCD of all the crazy things that you have to do to get your Python environment working, like, I don't know about you, I've had experiences when I just want to write a script, that's fine. But then when you want to do real stuff, it's like, it's a mess.

So explain to people who—we've got a lot of pythons developers, who are listening to the show. Explain to people like what problem it is you're trying to solve when it comes to setting up Python environments.

Shaun: So I'm sure Python developers all over the world will be empathizing with this particular problem. You will try to install a Python to do one particular task. Then you'll install another application that'll install another Python for you. You'll forget about the one that you put in the first time, the system Python will be there as well. And then by the time you've got around to like your fifth or sixth application that you're writing, your Python is in such a mess. You have no idea which one you're using, where it's coming from, where the packages are being installed, or anything really about the entire environment, you've just kind of lost track of it. And that's where this graphic is coming from. It's showing; clearly, this guy's a Mac user, but yes it could be anybody. It's a mess.

Max: Yes, I mean, I find like once I install something on Python, like, once I install a package, it's on my computer forever. I don't really have any way of—now, look, I mean, obviously, where I work, we have something that resets the state of the world, but it can get quite complicated.

Shaun: Yes, so just the case for developers here. There are a number of solutions to this. I mean, some of them are a little nuclear, in that, you could go down the whole route, I'm going to virtualize my entire machine, just to run this Python script, or you can go the other way and deal with—you can deal with all of this mess your own way.

Max: Yes, so let's go over like what are the main critical issues that organizations and people face? What are some of the reasons why multiple package management solutions need to exist?

Shaun: They only do exist because everybody uses them differently. So there's a range of things that work from installing packages to your system, and that includes the most popular one being Pip. They range all the way down to, kind of, complete virtualization. And then there's a whole range of things in between, and where we sit is in the kind of range in between where we're talking about virtual environments.

Max: So can you explain, like, what is a virtual environment, for people who don't know?

Shaun: So what that is, it’s an environment that installs something like Python on your system and a range of packages, but isolates it from all the other Python installations you may have on your system. It works by manipulating environment variables, making sure that there are copies of things. Sometimes there are big farms of symbolic links all over the place. But generally, the idea here is that we have something that can run, can interact with your system. But it is designed to be isolated from all the other Python environments on the system.

Max: Thanks. Yes, that clears things up. Everybody has multiple—well, I wouldn't say everybody. But having multiple versions of Python on the system is not terribly uncommon, I would say. So let's talk about Pip. So people say, "Well, I have Pip—that works pretty well." What's good about Pip, and what are the challenges that you face when just going in that rap?

Shaun: Well, Pip's great, and that Pip is pretty much available everywhere. So everyone who's got Python almost certainly has access to Pip. And what that will do is that will allow you to install whichever packages you like and their dependencies into the installation that your Pip is running from. Now part of the problem with that is that you might be using one Pip and installing packages to one location, but the actual Python you're using is a different one. So, for example, if you've got a Python 3 and a Python 2 environment on your system, the Pip for Python 2 is just called Pip. But the Pip for Python 3 is called Pip3.

Max: I know. I've literally—I've lost hours on that.

Shaun: Yes. So if you forget which Pip you're using, you could end up installing the packages to the wrong place. And then when you come to run your application, it just doesn't work.

Max: Yes, so tell me about ActiveState solution. What are you guys doing?

Shaun: So we have a whole platform backing the resolution of dependencies for Python and other languages. And we have a tool that you can deploy on your system called the State Tool, which is designed to interact with our system and that can generate a virtual environment for you. So it'll isolate it, and then you can enter it and exit it as you need. As you want to run your application, you activate the environment, and then your application is guaranteed to have sole access to a particular runtime environment for Python, which has everything that you've asked to put into it and is not affected by anything else on the system. There are other ways that people can—there are other solutions to this. So the Python has a whole bunch of them. There's one called pyppyn, which does the same sort of thing. But one of the problems we have with pyppyn is that it will install all the Python packages. But before you can run it, you need to have Python installed. So it depends on a Python already being there.

Max: It depends on the Python version.

Shaun: And it does depend on the Python, the specific Python version that you've got installed. So, and there are other problems with it as well, one of which is that any underlying libraries. So, for example, if you're dealing with XML, you will probably have live XML too, or Expat installed somewhere on your system that will be shared across all the environments that you set up with pyppyn. So Pip itself won't go down as far as the C libraries that you need to install in order to support your application. But the state tool will. Our platform is designed to work from your requirements right down to what you will basically have installed on a vanilla system. So if you need additional C libraries, we will get those C libraries. And we will get the right versions of those C libraries all installed inside your virtual environment for you as well.

Max: That's really cool. So where can people go and find out about this and then try it out?

Shaun: So you can look on www.activestate.com. But if you want to get into the real stuff, we have platform.activestate.com, which is where you can start to build your first Python application and your first Python runtime environment. And from there, you can download the state tool, and once you've built something on the platform, you can just tell the State Tool state activate and then the name of your project, and it will download it from their platform. It'll work across Windows, Linux, or Mac, and you'll get the same environment whichever operating system you're using.

Max: Shaun, thanks for sharing that with us today.

Shaun: You're welcome. Have a good day.

Max: All right. Next week. I have a really great interview with mathematician Tai-Danae Bradley. I am really excited about it. Even if you're not a mathematician, I just—I love how this interview came out. So I really encourage you to tune into episode 146. I highly suggest you listen to that. So yes, coming up next week. Have a great week, everyone 

Max Sklar: That's the show. Remember to check out the website at localmaxradio.com. If you want to contact me, the host, or ask a question that I can answer on the show, send an email to localmaxradio@gmail.com. The show is available on iTunes, SoundCloud, Stitcher, and more. If you want to keep up, remember to subscribe to The Local Maximum on one of these platforms and follow my Twitter account @maxsklar. Have a great week.

Episode 146 - Math, Language, and Intelligence with Tai-Danae Bradley

Episode 146 - Math, Language, and Intelligence with Tai-Danae Bradley

Episode 144 - Brian Mac Mahon of Expert DOJO on Startups

Episode 144 - Brian Mac Mahon of Expert DOJO on Startups