Anonymous asked:

Hahahahahaha, I said nrx people didn't metalwork and one of them responded with "forges are indoors." Dude, metalwork = machine shop. Forges = Tolkien novels and ren faires. These people make my point!

So take this argument to his blog instead of mine. 

Anonymous asked:

Blog summary: The most intimidating intellect on tumblr gazed upon LW and did not like what he saw.

I’m flattered, but I don’t think I’m intimidating.  I’ve just spent a lot of time working on very narrow problems.  Depth but not breadth :)  

Anonymous asked:

Related to previous nrx bitching you've done. Isn't it odd that a bunch of bookish, anime-referencing programmer, indoor kids want to resurrect and rebuild traditional masculinity? Don't they realize they aren't exactly traditionally masculine? Do any of them hunt, fish, metal work, cook over fire (insert tool man grunting here)?

presented without comment

Anonymous asked:

I don't have much for a summary. The quantum posts are interesting. Also you have a pretty good sense of humor about yourself.

Oh good.  I honestly think most of the time maybe my jokes read like me being a dick. 

Anonymous asked:

The phlogiston post is cute, but it doesn't really impact the point at all. The fake causality model is a still a useful thing to keep in mind.

I’d be worried that if the individual examples used to construct the model are wrong maybe the whole thing is?    “Sure, the wood I built this house with turned out to be cheese.  Still a great house though.”

thinkingornot

That Optional Stopping Thing

thinkingornot:

su3su2u1:

jadagul:

So I think you’re missing the point here, because you’re using a different calibration metric. The standard Less Wrong sense of “calibration” is “95% of your given confidence intervals have the true value in them.” And it’s trivially possible to be perfectly calibrated on estimate point probabilities: 95% of the time you say “it’s between 0 and 1, inclusive,” and the other 5% of the time you say “it’s two.” This algorithm for assigning confidence intervals is obviously well-calibrated and also obviously dumb.

Ah, that does work, but it’s not allowed- you have to construct your confidence intervals with the same method every time (otherwise it’s not a well calibrated method, it’s two badly calibrated methods you’ve munged together).   

Is there a way to make this distinction rigorous? Would requiring methods to be continuous be enough?

And if you don’t accept hybrid methods, how can you justify using different tools (Bayes/not-Bayes) for different problems?

No, the issue is that when I say a frequentist method is calibrated, what I mean is “do this same thing in 10k cases and 9500 will contain the true value.”  That is a property of the method.  

The method “choose every possible value” is always 100%.  The method “choose no values” is always 0% (and isn’t even really a proper guess).  

The problem with this method is 

1. it’s actually two methods munged together

2. you can’t actually apply it without knowing in advance you have a bounded parameters.  Much of the time you won’t.  If you use the data itself to set the bound you can no longer guarantee 100% (and in most cases you won’t be anywhere near it).  

The reason I sometimes use Bayes and sometimes don’t use Bayes is that sometimes I don’t think calibration is very important.  It depends on the problem. 

 The email I sent that prompted me to write that post was defending a Bayesian model I had built- it’s a very constrained problem where there is a nice symmetry argument let us build a very well behaved prior.  The problem is that about half of the autovalidation tests that constantly run on the model are checking calibration, and the Bayes model compares very badly to the frequentist models.  So a higher up checked the model dashboard, flipped out when he saw the comparisons, and sent me an email saying, basically “you promised this model was great, but look how shitty it is!”  

In the process of defending myself, I realized that maybe a lot of people don’t realize that Bayesian credible intervals generally aren’t calibrated (the exception being uninformative priors built specifically to mimic frequentist estimation procedures).  

I’m not saying calibration is always the most important consideration.  Sometimes it is, sometimes it isn’t. 

thinkingornot Source: su3su2u1
jadagul

That Optional Stopping Thing

jadagul:

So I think you’re missing the point here, because you’re using a different calibration metric. The standard Less Wrong sense of “calibration” is “95% of your given confidence intervals have the true value in them.” And it’s trivially possible to be perfectly calibrated on estimate point probabilities: 95% of the time you say “it’s between 0 and 1, inclusive,” and the other 5% of the time you say “it’s two.” This algorithm for assigning confidence intervals is obviously well-calibrated and also obviously dumb.

Ah, that does work, but it’s not allowed- you have to construct your confidence intervals with the same method every time (otherwise it’s not a well calibrated method, it’s two badly calibrated methods you’ve munged together).   

And this was my point about “as a mathematician.” The fact that any of these other decision-making algorithms can be reduced to Bayes with some prior seems mostly useless to the “use Bayes for everything, because it’s philosophically correct!” camp, but really really cool to someone who cares about theoretical mathematical properties but doesnt feel a need to ever actually compute anything. :P

Oh sure, the complete class theorem is neat, and honestly one of the more surprising results (why I linked it for you).  I was just saying it doesn’t give you the result you want (”on average better performance”), it just says (”there isn’t a method that does better in every conceivable scenario”) which is much weaker.  

jadagul Source: su3su2u1