LISTSERV - MS-CS-L Archives - LISTSERV.GMU.EDU

MS-CS-L Archives

February 2012

MS-CS-L@LISTSERV.GMU.EDU

	LISTSERV Archives
	MS-CS-L Home
	MS-CS-L February 2012

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Monospaced Font Show HTML Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Talk in Statistics Dept: Probabilistic Hashing Methods for Fitting Massive Logistic Regressions and SVM with Billions of Variables
From:	Huzefa Rangwala <[log in to unmask]>
Reply To:	Huzefa Rangwala <[log in to unmask]>
Date:	Wed, 15 Feb 2012 14:11:59 -0500
Content-Type:	multipart/mixed
Parts/Attachments:	text/plain (1590 bytes) , text/html (5 kB) , Seminar-Feb-24-2012.docx (86 kB)

*


Probabilistic Hashing Methods for Fitting Massive Logistic Regressions and
SVM with Billions of Variables

Ping Li

Department of Statistical Science

Cornell University

Johnson Center 3rd Floor- Meeting Room B

4400 University Drive, Fairfax, VA 22030

Time: 11 am – 12 pm

Date: Friday, February 24, 2012


Abstract

In modern applications, many statistics tasks such as classification using
logistic regression or SVM often encounter extremely high-dimensional
massive datasets. In the context of search, certain industry applications
have used datasets in 264 dimensions, which are larger than the square of
billion. This talk will introduce a recent probabilistic hashing technique
called b-bit minwise hashing (Research Highlights in Comm. of ACM 2011),
which has been used for efficiently computing set similarities in massive
data. Most recently (NIPS 2011), we realized that b-bit minwise hashing can
be seamlessly integrated with statistical learning algorithms such as
logistic regression or SVM to solve extremely large-scale prediction
problems. Interestingly, for binary data, b-bit miwise hashing is
substantially much more accurate than other popular methods such as random
projections. Experimental results on 200GB data (in billion dimensions)
will also be presented.
*
---------- Forwarded message ----------
From: Anand N. Vidyashankar <[log in to unmask]>
Date: Wed, Feb 15, 2012 at 1:02 PM
Subject: Ping's talk
To: Huzefa Rangwala <[log in to unmask]>, "Anand N. Vidyashankar" <
[log in to unmask]>



Huzefa,

Attached contains all details concerning Ping Li's talk. Please share it
with CS department.

Thanks,

Anand

ATOM RSS1 RSS2

LISTSERV.GMU.EDU