<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
 
 <title>James Bergstra</title>
 <link href="http://people.fas.harvard.edu/~bergstra/atom.xml" rel="self"/>
 <link href="http://people.fas.harvard.edu/~bergstra/"/>
 <updated>2012-03-14T19:19:22-04:00</updated>
 <id>http://people.fas.harvard.edu/~bergstra</id>
 <author>
   <name>James Bergstra</name>
   <email>bergstra@rowland.harvard.edu</email>
 </author>
 
 
 <entry>
   <title>New paper</title>
   <link href="http://people.fas.harvard.edu/~bergstra/tmp/_site/random-search.html"/>
   <updated>2012-03-08T00:00:00-05:00</updated>
   <id>id:/random-search</id>
   <summary>Random Search for Hyper-Parameter Optimization</summary>
   <content type="html">&lt;h1 id='new_paper_random_search_for_hyperparameter_optimization'&gt;New paper: Random Search for Hyper-Parameter Optimization&lt;/h1&gt;

&lt;div class='dateline'&gt;
&lt;p&gt;8 March 2012 - Cambridge MA&lt;/p&gt;
&lt;/div&gt;

&lt;div class='posttext'&gt;
&lt;p&gt;New paper about hyper-parameter optimization by random search in JMLR: &lt;a href='http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf'&gt;pdf&lt;/a&gt;. Basically this paper connects the idea of &amp;#8220;low effective dimension&amp;#8221; that has been discovered in QMC integration, to the remarkable efficiency of random search for hyper-parameter optimization in some problems. I hope this paper convinces you that unless you really know what you&amp;#8217;re doing, you should NOT be using grid-search to optimize hyper-parameters. &lt;br /&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Title&lt;/em&gt;: &lt;br /&gt; Random Search for Hyper-Parameter Optimization&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Authors&lt;/em&gt;: &lt;br /&gt; J. Bergstra and Y. Bengio.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Abstract&lt;/em&gt;: &lt;br /&gt; Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time. Granting random search the same computational budget, random search finds better models by effectively searching a larger, less promising configuration space. Compared with deep belief networks configured by a thoughtful combination of manual search and grid search, purely random search over the same 32-dimensional configuration space found statistically equal performance on four of seven datasets, and superior performance on one of seven. A Gaussian process analysis of the function from hyper-parameters to validation set performance reveals that for most datasets only a few of the hyper-parameters really matter, but that different hyper-parameters are important on different datasets. This phenomenon makes grid search a poor choice for configuring algorithms for new datasets. Our analysis casts some light on why recent &amp;#8220;High Throughput&amp;#8221; methods achieve surprising success &amp;#8211; they appear to search through a large number of hyper-parameters because most hyper-parameters do not matter much. We anticipate that growing interest in large hierarchical models will place an increasing burden on techniques for hyper-parameter optimization; this work shows that random search is a natural baseline against which to judge progress in the development of adaptive (sequential) hyper-parameter optimization algorithms.&lt;/p&gt;
&lt;/div&gt;</content>
 </entry>
 
 <entry>
   <title>Big Learning Workshop @ NIPS 2011</title>
   <link href="http://people.fas.harvard.edu/~bergstra/tmp/_site/biglearn.html"/>
   <updated>2011-12-10T00:00:00-05:00</updated>
   <id>id:/biglearn</id>
   <summary>Parallel and large-scale machine learning.</summary>
   <content type="html">&lt;h1 id='big_learning_workshop__nips_2011_parallel_and_largescale_machine_learning'&gt;Big Learning Workshop @ NIPS 2011: Parallel and large-scale machine learning.&lt;/h1&gt;

&lt;div class='dateline'&gt;
&lt;p&gt;10 Dec 2011 - Sierra Nevada, Spain&lt;/p&gt;
&lt;/div&gt;

&lt;div class='posttext'&gt;
&lt;p&gt;Come check out our workshop! I&amp;#8217;m co-organizing the &lt;a href='http://biglearn.org'&gt;Big Learning&lt;/a&gt; workshop at &lt;a href='http://nips.cc'&gt;NIPS 2011&lt;/a&gt; on parallel and large-scale machine learning. The workshop aims to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;Bring together parallel system builders in industry and academia, machine learning algorithms experts, and end users to identify the key challenges, opportunities, and myths of Big Learning. What REALLY changes from the traditional learning setting when faced with terabytes or petabytes of data?&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Solicit practical case studies, demos, benchmarks and lessons-learned presentations, and position papers.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Showcase recent and ongoing progress towards parallel ML algorithms&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Provide a forum for exchange regarding tools, software, and systems that address the Big Learning problem.&lt;/p&gt;
&lt;/li&gt;

&lt;li&gt;
&lt;p&gt;Educate the researchers and practitioners across communities on state-of-the-art solutions and their limitations, particularly focusing on key criteria for selecting task- and domain-appropriate platforms and algorithms.&lt;/p&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;See you there! It&amp;#8217;s a 2-day workshop (Dec 16-17) at the Montebajo Theatre in Sierra Nevada, Spain. Also we generous prizes sponsored by NVIDIA (just guess what they&amp;#8217;ll be&amp;#8230;) so come and vote for the best contributions.&lt;/p&gt;
&lt;/div&gt;</content>
 </entry>
 
 <entry>
   <title>New paper</title>
   <link href="http://people.fas.harvard.edu/~bergstra/tmp/_site/hyperopt.html"/>
   <updated>2011-11-07T00:00:00-05:00</updated>
   <id>id:/hyperopt</id>
   <summary>Algorithms for Hyper-parameter Optimization</summary>
   <content type="html">&lt;h1 id='new_paper_algorithms_for_hyperparameter_optimization'&gt;New paper: Algorithms for Hyper-parameter Optimization&lt;/h1&gt;

&lt;div class='dateline'&gt;
&lt;p&gt;7 Nov 2011 - Cambridge MA&lt;/p&gt;
&lt;/div&gt;

&lt;div class='posttext'&gt;
&lt;p&gt;New paper about hyper-parameter optimization in deep models, to appear at NIPS 2011. [&lt;a href='./files/pub/11_nips_hyperopt.pdf'&gt;pdf&lt;/a&gt;] Also, check out the &lt;a href='https://github.com/jaberg/hyperopt'&gt;software&lt;/a&gt; related to the paper. You can use it to optimize hyper-parameters in your own work. &lt;br /&gt; Update: I&amp;#8217;ll also present this and the hyperopt software at the &lt;a href='http://www.cs.ubc.ca/~hutter/nips2011workshop/schedule.html'&gt;Workshop on Bayesian Optimization, Experimental Design and Bandits&lt;/a&gt; at &lt;a href='http://nips.cc'&gt;NIPS 2011&lt;/a&gt; (Dec 16).&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Title&lt;/em&gt;: &lt;br /&gt; Algorithms for Hyper-parameter Optimization&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Authors&lt;/em&gt;: &lt;br /&gt; J. Bergstra, R. Bardenet, Y. Bengio, and B. Kegl.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Abstract&lt;/em&gt;: &lt;br /&gt; Several recent advances to the state of the art in image classification benchmarks have come from better configurations of existing techniques rather than novel approaches to feature learning. Traditionally, hyper-parameter optimization has been the job of humans because they can be very efficient in regimes where only a few trials are possible. Presently, computer clusters and GPU processors make it possible to run more trials and we show that algorithmic approaches can find better results. We present hyper-parameter optimization results on tasks of training neural networks and deep belief networks (DBNs). We optimize hyper-parameters using random search and two new greedy sequential methods based on the expected improvement criterion. Random search has been shown to be sufficiently efficient for learning neural networks for several datasets, but we show it is unreliable for training DBNs. The sequential algorithms are applied to the most difficult DBN learning problems from [Larochelle et. al, 2007] and find significantly better results than the best previously reported. This work contributes novel techniques for making response surface models P(y|x) in which many elements of hyper-parameter assignment (x) are known to be irrelevant given particular values of other elements.&lt;/p&gt;
&lt;/div&gt;</content>
 </entry>
 
 <entry>
   <title>New paper</title>
   <link href="http://people.fas.harvard.edu/~bergstra/tmp/_site/one-gabor.html"/>
   <updated>2011-09-30T00:00:00-04:00</updated>
   <id>id:/one-gabor</id>
   <summary>The Statistical Inefficiency of Sparse Coding for Images (or, One Gabor to Rule them All)</summary>
   <content type="html">&lt;h1 id='new_paper_the_statistical_inefficiency_of_sparse_coding_for_images_or_one_gabor_to_rule_them_all'&gt;New paper: The Statistical Inefficiency of Sparse Coding for Images (or, One Gabor to Rule them All)&lt;/h1&gt;

&lt;div class='dateline'&gt;
&lt;p&gt;30 Sept 2011 - Cambridge MA&lt;/p&gt;
&lt;/div&gt;

&lt;div class='posttext'&gt;
&lt;p&gt;New paper about factoring a sparse coding dictionary by geometric transformations. [&lt;a href='http://arxiv.org/pdf/1109.6638v2'&gt;pdf&lt;/a&gt;]&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Title&lt;/em&gt;: &lt;br /&gt; The Statistical Inefficiency of Sparse Coding for Images (or, One Gabor to Rule them All)&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Authors&lt;/em&gt;: &lt;br /&gt; J. Bergstra, A. Courville, Y. Bengio&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Abstract&lt;/em&gt;: &lt;br /&gt; Sparse coding is a proven principle for learning compact representations of images. However, sparse coding by itself often leads to very redundant dictionaries. With images, this often takes the form of similar edge detectors which are replicated many times at various positions, scales and orientations. An immediate consequence of this observation is that the estimation of the dictionary components is not statistically efficient. We propose a factored model in which factors of variation (e.g. position, scale and orientation) are untangled from the underlying Gabor-like filters. There is so much redundancy in sparse codes for natural images that our model requires only a single dictionary element (a Gabor-like edge detector) to outperform standard sparse coding. Our model scales naturally to arbitrary-sized images while achieving much greater statistical efficiency during learning. We validate this claim with a number of experiments showing, in part, superior compression of out-of-sample data using a sparse coding dictionary learned with only a single image.&lt;/p&gt;
&lt;/div&gt;</content>
 </entry>
 
 <entry>
   <title>New website</title>
   <link href="http://people.fas.harvard.edu/~bergstra/tmp/_site/new-website.html"/>
   <updated>2011-09-07T00:00:00-04:00</updated>
   <id>id:/new-website</id>
   <summary>new material, new look, new framework.</summary>
   <content type="html">&lt;h1 id='post_new_website'&gt;Post: New website&lt;/h1&gt;

&lt;div class='dateline'&gt;
&lt;p&gt;7 Sept 2011 - Cambridge MA&lt;/p&gt;
&lt;/div&gt;

&lt;div class='posttext'&gt;
&lt;p&gt;I&amp;#8217;m pleased to announce that this new site is now live!&lt;/p&gt;

&lt;p&gt;The design is still minimal, and I&amp;#8217;ll still be working away to tweak the style of things. The biggest new feature is the &lt;a href='atom.xml'&gt;Atom stream&lt;/a&gt;. I&amp;#8217;ll post messages to this news stream with announcements of new papers, workshops, and other interesting developments in research that I have some part of.&lt;/p&gt;
&lt;/div&gt;</content>
 </entry>
 
 <entry>
   <title>Defended my PhD dissertation</title>
   <link href="http://people.fas.harvard.edu/~bergstra/tmp/_site/defended-my-phd.html"/>
   <updated>2011-06-15T00:00:00-04:00</updated>
   <id>id:/defended-my-phd</id>
   <summary>goodbye Montreal, it's been fun.</summary>
   <content type="html">&lt;h1 id='post_defended_my_phd_dissertation'&gt;Post: Defended my PhD dissertation&lt;/h1&gt;

&lt;div class='dateline'&gt;
&lt;p&gt;15 June 2011 - Montreal QC&lt;/p&gt;
&lt;/div&gt;

&lt;div class='posttext'&gt;
&lt;p&gt;Whew, it&amp;#8217;s great to be done! Examiners Bruno Olshausen and Paul Cisek asked good questions. The whole defence took about 2 hours. Immediate family was on hand to witness the ordeal &amp;amp; take me out food and drinks afterward. There is video of the defence kicking around somewhere, but somehow I don&amp;#8217;t think I&amp;#8217;ll be watching it any time soon. Still makes me nervous just remembering it.&lt;/p&gt;

&lt;p&gt;If you like, you can download the final version of my dissertation as a &lt;a href='http://www-etud.iro.umontreal.ca/~bergstrj/publications/11_These.pdf'&gt;pdf&lt;/a&gt;.&lt;/p&gt;
&lt;/div&gt;</content>
 </entry>
 
 <entry>
   <title>Theano gets lazy evaluation</title>
   <link href="http://people.fas.harvard.edu/~bergstra/tmp/_site/committed-cvm-to-theano.html"/>
   <updated>2011-04-15T00:00:00-04:00</updated>
   <id>id:/committed-cvm-to-theano</id>
   <summary>along with lots of changes under the hood.</summary>
   <content type="html">&lt;h1 id='post_theano_gets_lazy_evaluation'&gt;Post: Theano gets lazy evaluation&lt;/h1&gt;

&lt;div class='dateline'&gt;
&lt;p&gt;15 April 2011 - Montreal QC&lt;/p&gt;
&lt;/div&gt;

&lt;div class='posttext'&gt;
&lt;p&gt;In the recent sprint I took the opportunity to push things forward on the old &amp;#8220;LazyLinker&amp;#8221; branch. There are several changes on this branch.  There&amp;#8217;s a new interface between Linkers and Ops, a new faster engine for executing programs, and support for conditional evaluation.  It will be a big change, so I&amp;#8217;m using this blog post to outline what&amp;#8217;s happened.&lt;/p&gt;

&lt;h2 id='new_modes'&gt;New Modes&lt;/h2&gt;

&lt;p&gt;The most visible change in normal usage of Theano is some new modes to choose from: vm, cvm, vm_nogc, and cvm_nogc. These modes will invoke the new machinery. If you don&amp;#8217;t use these modes, you will be using the old well-tested code.   What do the modes do?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&amp;#8220;vm&amp;#8221; behaves like like the &amp;#8220;c|py&amp;#8221; mode, but uses different code internally&lt;/li&gt;

&lt;li&gt;&amp;#8220;vm_nogc&amp;#8221; behaves like &amp;#8220;c|py_nogc&amp;#8221;, but uses different code internally&lt;/li&gt;

&lt;li&gt;&amp;#8220;cvm&amp;#8221; uses a new program execution engine that has been written in C, and for graphs with many small computations it can be several times faster than previous versions of Theano.&lt;/li&gt;

&lt;li&gt;&amp;#8220;cvm_nogc&amp;#8221; is like &amp;#8220;cvm&amp;#8221; but garbage collection is disabled so the program runs a little faster and uses a little more memory.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id='profiling'&gt;Profiling&lt;/h2&gt;

&lt;p&gt;There is new profiling code.  PROFILE_MODE doesn&amp;#8217;t work on the new program execution engines (they are the &amp;#8216;virtual machines&amp;#8217; in the title)  Any of the new modes (that have &amp;#8220;vm&amp;#8221; somewhere in them) can be profiled via two methods: a configuration flag &amp;#8220;do_profiling=1&amp;#8221; profiles all functions by default, and theano.function() now accepts a profile argument. The profile argument can be None (default), True (do profiling with a default identifier on the output), a string (profile and then print this string next to the results for this function), or an a theano.compile.profiling.ProfileStats object.  The profiling output is similar to what was printed out before, but the profiling code itself has been refactored.  See the theano/compile/profiling.py file for details.&lt;/p&gt;

&lt;h2 id='new_oplinker_interface_opmake_thunk'&gt;New Op/Linker interface: Op.make_thunk&lt;/h2&gt;

&lt;p&gt;Until now, Linkers (e.g. PerformLinker, OpWiseCLinker, CLinker) used two methods to turn Apply nodes into thunks that actually perform computation: Op.perform for python implementations and Op.c_code for C ones.  The desire to use OpenCL and the wish for conditional program evaluation meant that this interface needed to change.  This branch introduces a new Op method called make_thunk().  Instances of Op subclasses are now charged with producing at most one implementation for what they do.  The base Op class implements make_thunk similarly to how the body of OpWiseCLinker used to it: it uses either the perform or the c_code.  New OpenCL-based Ops or CUDA-runtime-based Ops should override make_thunk directly, as should Ops that do not require all their inputs to be evaluated before invoking make_thunk&amp;#8217;s return value.&lt;/p&gt;

&lt;h2 id='virtual_machines'&gt;Virtual Machines&lt;/h2&gt;

&lt;p&gt;There is a new file (theano/gof/vm.py) containing virtual machines that execute theano programs.  This functionality has previously been provided by the gof.link.streamline methods, by ProfileMode, and by DebugMode.  Now it is consolidated and factored out of these previous places so that all ways of running a Theano program can be profiled, and can eventually be used in conjunction with DebugMode.  There is a new linker (VM_Linker) that implements the old Linker interface, but provides a virtual machine that is appropriate for a given graph.  The VMs are Loop (like the old FAST_RUN), LoopGC (Loop with support for garbage collection) and Stack (what was initially developed as the LazyLinker).&lt;/p&gt;

&lt;h2 id='clazylinker'&gt;CLazyLinker&lt;/h2&gt;

&lt;p&gt;There is a VM implemented in C (theano/gof/lazylinker_c).  It supports conditional evaluation (as in the LazyLinker). It is approximately 3x faster than the Loop Python VMs for non-conditional programs, and approximately 50x faster than the Stack VM.  In programs whose runtime is dominated by the expensive computations within some individual expression nodes, this C-based implementation will not make any difference.  The C code should soon help to make Scan significantly faster though, because scan calculations are often relatively quick, and it puts more pressure on the execution engine to be efficient.&lt;/p&gt;
&lt;/div&gt;</content>
 </entry>
 
 
</feed>


