* Probabilistic Hashing Methods for Fitting Massive Logistic Regressions and SVM with Billions of Variables Ping Li Department of Statistical Science Cornell University Johnson Center 3rd Floor- Meeting Room B 4400 University Drive, Fairfax, VA 22030 Time: 11 am – 12 pm Date: Friday, February 24, 2012 Abstract In modern applications, many statistics tasks such as classification using logistic regression or SVM often encounter extremely high-dimensional massive datasets. In the context of search, certain industry applications have used datasets in 264 dimensions, which are larger than the square of billion. This talk will introduce a recent probabilistic hashing technique called b-bit minwise hashing (Research Highlights in Comm. of ACM 2011), which has been used for efficiently computing set similarities in massive data. Most recently (NIPS 2011), we realized that b-bit minwise hashing can be seamlessly integrated with statistical learning algorithms such as logistic regression or SVM to solve extremely large-scale prediction problems. Interestingly, for binary data, b-bit miwise hashing is substantially much more accurate than other popular methods such as random projections. Experimental results on 200GB data (in billion dimensions) will also be presented. * ---------- Forwarded message ---------- From: Anand N. Vidyashankar <[log in to unmask]> Date: Wed, Feb 15, 2012 at 1:02 PM Subject: Ping's talk To: Huzefa Rangwala <[log in to unmask]>, "Anand N. Vidyashankar" < [log in to unmask]> Huzefa, Attached contains all details concerning Ping Li's talk. Please share it with CS department. Thanks, Anand