Open Talk: Get Started with Data Science by Analyzing Traffic Data from California Highways

Have you noticed the black circles stuck on the asphalt in California highways? They are called 'inductive loop', a type of sensor that measures the passage of cars on the road. They measure the flow of traffic, percentage of the time that a lane is occupied, and the average speed. The data collected from the 16,000 sensors every 30 seconds by the California Department of Transportation provides an opportunity to learn about the basics of data science. At Aerospike, we have started a project to provide developers fast API access to a good portion of the historical traffic data and let them discover patterns or build interesting applications. We are providing developers all the starting points; including the tools, methods, algorithms and a sample application to learn about Data Science. The sample application is about detecting traffic accidents in near realtime (based on the 30-second data) before someone calls in to report the accident. Emergency personnel can respond faster to an accident if they are alerted about the possibility of an accident via an automated system. 

The objective of this talk is to show developers who are interested in data science projects where to start. Data scientists and engineers may also find the dataset and the case study interesting and decide to contribute to the open project. We will provide an introduction to the project and an overview of the sample application. We will also share the following lessons learned:
- The importance of domain knowledge in understanding the traffic data and in deciding the right machine learning algorithms

- Some of the less glamorous steps required in Data Science projects such as re-structuring raw data and combining multiple datasets

- Dealing with bad data from sensor networks in real-world situations
Importance of fast query capability in understanding the data as well as in preparing the training data

- Discovering interesting uses of the data by free and interactive exploration
The role of fast databases in realtime sensor-data applications 

At the end we will present the latest status of the project and how to get access to the data, the tools, and the sample code.

avatar for Cirrus Shakeri

Cirrus Shakeri

Technology Advisor, Aerospike, Inc.
Cirrus Shakeri has been building enterprise software for more than 15 years with focus on process automation, information management, and big data analytics. Currently, he is helping Aerospike extends the application of its technology into new domains. Prior to that, Cirrus worked at the office of the CTO at SAP and was responsible for trend analysis and technology scouting, building numerous prototypes and frontrunner applications, putting... Read More →

Tuesday February 10, 2015 11:00am - 11:40am
Pier 27 Pier 27, The Embarcadero, San Francisco, Ca