Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

PILOT SEMINAR

Organized by Computer Science Associate DGS (started by Madhusudan Parthasarathy and Darko Marinov)

...

  • The presenter, with the help of the advisor, must invite at least five faculty personally to attend their talk (consider inviting seminar founders, DGS, and Associate DGS), and ensure that at least three faculty can come to the talk. These faculty should be outside the primary area of the presenter. When you ask for available time, plan for 90min slots (60min for talk + 30min for feedback), and if you create a poll, enable the "maybe" option because some people may attend only a part of the talk and meet you later.
  • The time for the seminar should be fixed based on the availability of these faculty. To find a time that doesn't overlap with the job talks or departmental seminars, please ask your advisor to check the available dates on the departmental calendar or faculty Wiki that lists all upcoming job talks (e.g., in Spring 2022 departmental talks take place on Mondays and Wednesdays at 3:30 pm, so it is recommended to avoid scheduling your seminar on these dates/times).

More faculty may attend the talk as it will be publicly announced, but we would like to see some effort by the presenter and advisor in ensuring at least some people come to the talk.

...

  1. edit this page to enter the date/time of your talk (sorted by date), your name, the talk title and abstract, and short bio;
  2. email Erin Henkelman (speakerseries@cs.illinois.edu) and Darko Marinov (marinov@illinois.edu) so that the department can schedule a physical room (hopefully we don't go fully online ever again!) and announce your talk; and
  3. fill out the form https://forms.illinois.edu/sec/3480516 no later than Thursday the week before your presentation date so the Speakers Series team can have sufficient time to set up and advertise your talk (materials sent after Thursday may not be included in departmental advertising). A member of the Speakers Series team will try to be at the beginning of your talk to help you get set up.

If you want to be hybrid and use Zoom, create a room on your own (so you get the video faster than if the department created a room for you). The room should be reserved for at least 90 minutes (60 minutes to present and at least 30 minutes to get feedback). If you use Zoom, please ask one of your attending faculty members to serve as a question moderator for your talk. They can help you manage the chat/questions during your seminar.

Please put slide numbers (in a visible place) on your slides during practice job talks.

You may find it useful to read these guidelines about academic job interviews:

  1. Getting an academic job by Michael Ernst - https://homes.cs.washington.edu/~mernst/advice/academic-job.html
  2. Computer Science Grad Student Job Application & Interview Guide by Westley Weimer, Claire Le Goues, and Zak Fry - http://web.eecs.umich.edu/~weimerw/grad-job-guide/guide

  3. How to get a faculty job, Part 2: The interview by Matt Welsh - http://matt-welsh.blogspot.com/2012/12/how-to-get-faculty-job-part-2-interview.html

  4. Tips on the Interview Process by Jeannette M. Wing - https://www.cs.cmu.edu/afs/cs/usr/wing/www/talks/tips.pdf

  5. Five Surprises from My Computer Science Academic Job Search by Arvind Narayanan - https://33bits.wordpress.com/2012/10/01/five-surprises-from-the-computer-science-academic-job-search

  6. Welcome to the Job Market by Elizabeth Bondi-Kellyhttps://sites.google.com/view/elizabethbondi/blog

  7. Tips for Computer Science Faculty Applications by Yisong Yuehttps://yisongyue.medium.com/checklist-of-tips-for-computer-science-faculty-applications-9fd2480649cc

  8. Reflections on the CS academic and industry job markets by Rowan Zellers - http://rowanzellers.com/blog/rowan-job-search

  9. Fantastic Faculty Jobs and How to Get Them by Jia-Bin Huanghttps://dropbox.com/s/avkflol8mx99c7e/2022_12_05%20Academic%20Job%20workshop.pptx?dl=0

  10. Faculty Application Advice by Sylvia Herberthttps://sylviaherbert.com/faculty-application-advice

  11. UPenn has a lot of resources, e.g., https://cdn.uconnectlabs.com/wp-content/uploads/sites/74/2019/08/Faculty-job-application-guide.pdf linked from https://careerservices.upenn.edu/resources/guide-to-faculty-job-applications

...

Date and

Time

RoomSpeakerTitle and Abstract
Jan 4th (Wednesday) 10am-11:30am2405 SC, zoom
(https://illinois.zoom.us/my/manling2?pwd=SzM5Wk5neWlEK3VVTXBoa2ZMUXduZz09)
Manling Li

Title: From Entity-Centric to Event-Centric Multimodal Knowledge Acquisition

Abstract:

Events (what happened, who, when, where, why) describe fundamental human activities and are the core knowledge communicated through multiple forms of information, such as text, images, videos, or other data modalities. Our minds represent events at various levels of granularity and abstraction, which allows us to quickly access historical scenarios and reason about the future. Traditionally, multimodal information consumption has been entity-centric with a focus on concrete concepts (such as objects, object types, physical relations), or oversimplifying event understanding to be single-modal (text-only or vision-only), local, sequential and flat. Real events are multimodal, structured and probabilistic. Hence, I focus on Multimodal Information Extraction, and propose Event-Centric Multimodal Knowledge Acquisition to transform traditional entity-centric single-modal knowledge into event-centric multi-modal knowledge. Such a transformation poses two significant challenges: (1) understanding multimodal semantic structures that are abstract (such as events and semantic roles of objects): I will present a novel framework, CLIP-Event, to learn visual semantic structures via a zero-shot cross-modal transfer; (2) understanding temporal dynamics: I will introduce Event Graph Schema to capture complex timelines, intertwined participant relations and multiple possible outcomes. Such Event-Centric Multimodal Knowledge opens up the next generation of information access for deep semantic understandings behind the multimodal information. I will also show its positive results on long-standing open problems, such as timeline generation, meeting summarization, and question answering.

Bio:

Manling Li is a Ph.D. candidate at the Computer Science Department of University of Illinois Urbana-Champaign. Her work on multimodal knowledge extraction won the ACL'20 Best Demo Paper Award, and the work on scientific information extraction from COVID literature won NAACL'21 Best Demo Paper Award. She was a recipient of Microsoft Research PhD Fellowship in 2021. She was selected as a DARPA Riser in 2022, and a EE CS Rising Star in 2022. She was awarded C.L. Dave and Jane W.S. Liu Award, and has been selected as a Mavis Future Faculty Fellow. She led 19 students to develop the UIUC information extraction system and ranked 1st in DARPA AIDA TA1 evaluation each year. She has more than 30 publications on multimodal knowledge extraction and reasoning, and gave tutorials about event-centric multimodal knowledge at ACL'21, AAAI'21, NAACL'22, AAAI'23, etc. Additional information is available at https://limanling.github.io/.

Jan 16th (Monday) 1pm-2:30pm

2405 SC, zoom

https://illinois.zoom.us/j/85237726901?pwd=SXYxTUloOHAwaGIxa3pvbW93SUdRZz09

Saikat Dutta

Title: Randomness-Aware Testing of Machine Learning-based Systems

Abstract:

Machine Learning is rapidly revolutionizing the development of many modern-day systems. However, testing Machine Learning-based systems is challenging due to 1) the presence of non-determinism in internal components (e.g., stochastic algorithms) and external factors (e.g., execution environment)  and 2) the lack of accuracy specifications. Most traditional software testing techniques, while widely used to improve software reliability, cannot tackle these challenges since they predominantly rely on an assumption of determinism and lack domain knowledge. The goal of my research is to develop novel testing techniques and tools to make Machine Learning-based systems more reliable. 

In this talk, I will present my work on automatically detecting bugs in Machine Learning-based systems and improving the quality of developer-written tests in such systems. My research exploits the fundamental principle that we can systematically reason about non-determinism and accuracy using rigorous statistical and probabilistic reasoning. I develop novel static and dynamic analyses for testing ML-based systems that build on this principle. My research exposed more than 50 bugs and improved the quality of hundreds of tests in more than 60 popular Machine Learning libraries, some of which are used in large-scale software ecosystems at companies like Microsoft, Google, Meta, Uber, and DeepMind.

Bio:

Saikat Dutta is a PhD Candidate in the Computer Science Department at UIUC, advised by Prof. Sasa Misailovic. Saikat’s research interests are at the intersection of Software Engineering and Machine Learning. Saikat’s research focuses on improving the reliability of Machine-learning based systems by developing novel testing techniques and tools. Saikat is the recipient of the Facebook PhD Fellowship, 3M Foundation Fellowship, and the Mavis Future Faculty Fellowship. More information at https://saikatdutta.web.illinois.edu.

Jan 26th (Thursday) 1:30 pm to 3pmOnline on Zoom - https://illinois.zoom.us/j/89767697233?pwd=TlcxanpMWDlmMEVZSk1xem1UOUp5Zz09Pubali Datta

Title:

Looking Past the Abstractions: Characterizing Information Flow in Real-World Systems

Abstract:

Abstractions have proven essential for us to manage computing systems that are constantly growing in size and complexity. However, as core design primitives are obscured, these abstractions can also engender new security challenges. My research investigates these abstractions and the underlying core functionalities to identify the implicit flow violations in modern computing systems.

In this talk, I will detail my efforts in characterizing flow violations and investigating attacks leveraging them. I will first describe how the “stateless” abstraction of serverless computing platforms masks a reality in which functions are cached in memory for long periods of time, enabling attackers to gain quasi-persistence and how such attacks can be investigated through building serverless-aware provenance collection mechanisms. Then I will further investigate how IoT automation platforms (i.e., Trigger-Action Platforms) abstracts the underlying information flows among rules installed within a smart home. I will present my findings on modeling and discovering inter-rule flow violations through building an information flow graph for smart homes. These efforts demonstrate how practical and widely deployable secure systems can be built through understanding the requirements of systems as well as identifying the root cause of violations of these requirements.

Bio:

Pubali Datta is a PhD candidate at the University of Illinois Urbana-Champaign where she is advised by Professor Adam Bates in the study of system security and privacy. Pubali has conducted research on a variety of security topics, including IoT security, serverless cloud security, system auditing and provenance. Her dissertation is in the area of serverless cloud security, particularly in designing information flow control, access control and auditing mechanisms for serverless platforms – tailored to meet the design and operational requirements of such systems. Pubali has participated in graduate internships at Samsung Research America, SRI International and VMware. She will earn her Ph.D in Computer Science from the University of Illinois Urbana-Champaign in the Spring of 2023.

Jan 30th (Monday) 2pm-3:30pm

2405 SC, zoom: https://illinois.zoom.us/j/87476498465?pwd=UGV4b2dGU3ZFZ3dCckFDVEkwbzd3dz09

Riccardo Paccagnella

Title: Software Security Challenges in the Era of Modern Hardware

Abstract: Today’s hardware cannot keep secrets. Indeed, the past two decades have seen the discovery of a slew of attacks where an adversary exploits hardware features to leak software’s sensitive data. These attacks have shaken the foundations of computer security and caused a major disruption in the software industry. Fortunately, there has been a saving grace, namely the widespread adoption of models that have enabled developers to build secure software while comprehensively preventing hardware vulnerabilities.

In this talk, I will present two new classes of vulnerabilities that fundamentally undermine these prevailing models for building secure software. In the first part, I will demonstrate that the current constant-time programming model is insufficient to guarantee constant-time execution. In the second part, I will demonstrate that the current resource partitioning model is insufficient to guarantee software isolation. Finally, I will provide an overview of my future research plans for enabling the design of more secure software and hardware systems.

Bio: Riccardo Paccagnella is a PhD candidate in Computer Science at the University of Illinois Urbana-Champaign. His research is in system and hardware security. Riccardo is a recipient of a Distinguished Reviewer Award at the IEEE S&P 2021 Shadow PC, a Siebel Scholars Award, and a Chirag Foundation Graduate Fellowship. His work has been covered by national and international press — including Ars Technica, New Scientist, and Wired — and recognized with prestigious awards, including the Pwnie 2022 Award for Best Cryptographic Attack, the CSAW 2022 Applied Research Competition Best Paper Runner-up Award, a Pwnie 2021 Nomination for Most Innovative Research, and a CSLSC 2022 Best Presentation Award. In light of his research, the cryptographic community and several companies (including Cloudflare, Microsoft, Intel, AMD, Ampere, ARM) have taken action that includes patching cryptographic libraries, issuing security advisories, and creating new guidance for writing secure cryptographic code.

Feb 6th (Monday) 11am-12:30pm

2405 SC, Zoom: https://illinois.zoom.us/j/5494764956?pwd=MDNnaE5CWG0yRVlEZWl5bldoRnErZz09

passcode if asked: 021795

Xiaohong Chen

Title: Matching Logic: Foundation of a Trustworthy Programming Language Framework

Abstract: We write programs in programming languages and use various language tools to perform various computing/analyzing tasks. For example, we use a compiler or an interpreter to execute programs, a symbolic executer to execute programs with symbolic input, and a formal verifier to verify programs. However, these language tools work like a "black box" and produce no correctness certificates for the tasks they perform. Therefore, we have to trust them for what they claim about our programs, which creates a very large "trust base" in today's computing space. My research aims at reducing the trust base of language execution and analysis tools using a trustworthy programming language framework. In this framework, programming languages are rigorously and completely defined using logical axioms and mathematical notations. Language tools are automatically generated by the framework, and their correctness is certified by complete, rigorous, transparent, machine-checkable, and human-accessible proof certificates. Most importantly, these proof certificates can be automatically checked using a very small proof checker, serving as the minimal trust base of the framework.

In this talk, I will present matching logic as the unifying logical foundation of such a trustworthy programming language framework. I will present the basics of matching logic and show how various program properties and programming languages can be uniformly specified using matching logic formulas and axioms. I will show how to generate matching logic proofs to certify the correctness of program interpreters and formal verifiers, and how to check those proofs using the matching logic proof checker, which has only 240 lines of code. Finally, I will provide an overview of my future research plans for enabling the design and implementation of more transparent and trustworthy programming language tools.

Bio: Xiaohong Chen is a Ph.D. candidate in the computer science department at UIUC, advised by Prof. Grigore Rosu. Xiaohong's research interests are in logic, formal methods, and programming languages, with a focus on using rigorous machine-checkable proof certificates to reduce the trust base of various programming language tools. Xiaohong's research on matching logic (http://matching-logic.org) as a unifying foundation for programming has helped improve the safety and reliability of the K language framework (https://kframework.org). Xiaohong is the recipient of the Yunni and Maxine Pao Memorial Fellowship, the Mavis Future Faculty Fellowship, and the Graduate School Dissertation Completion Fellowship. His research proposal has been funded by the Ethereum Foundation for its potential to make smart contracts more trustworthy and transparent. More information at http://xchen.page/.

Feb 20th

(Monday) 11am-12:15pm

Online on Zoom:

https://illinois.zoom.us/j/7030162755?pwd=QkY2OHI2K1ZFdjY3S3FwcU5FT05tUT09

Jiaxin Huang

Title: Label-Efficient Textual Knowledge Extraction and Utilization

Abstract: With tremendous amounts of texts across the Internet nowadays, various Natural Language Processing (NLP) systems are built to help people seek for valuable knowledge from massive corpora, by performing knowledge-intensive tasks like text retrieval, concept organization, commonsense reasoning, and question answering. Despite the remarkable success, most existing NLP systems still rely on large amounts of task-specific training data, which are costly to obtain.

 My research designs principled approaches for label-efficient, knowledge-based NLP applications which rely on minimal human supervision. In this talk, I will introduce a general framework for textual knowledge extraction and utilization: (1) concept ontology construction by transforming generic linguistic knowledge encoded in pre-trained language models into hierarchical structures connecting entities; (2) entity extraction by replacing manual prompt template designs with automatic soft verbalizer learning; (3) commonsense reasoning via entity knowledge prompting and iteratively optimizing reasoning paths generated by language models.

Bio: Jiaxin Huang is a final-year Ph.D. candidate in the Department of Computer Science at University of Illinois, Urbana-Champaign, fortunately advised by Prof. Jiawei Han. Jiaxin's research interests lie in text mining and natural language processing with minimal human supervision. Her recent research focuses on (1) using pre-trained language models to automatically extract domain-specific hierarchical concepts and entities for structured knowledge construction; (2) extracting human actionable knowledge such as commonsense reasoning by prompting and training language models via machine-generated explicit reasoning paths. She is a recipient of the Microsoft Research PhD Fellowship (2021-2023).

...

Date and

Time

RoomSpeakerTitle and Abstract
February 9th (Wednesday) 4pm-5:30pmZoom
set up by presenter
Xinya Du

Title: Towards More Intelligent Extraction of Information from Documents

Abstract:

Large amounts of text are written and published daily. As a result, applications such as reading through the documents to automatically extract useful and structured information from the text have become increasingly needed for people’s efficient absorption of information. They are essential for applications such as answering user questions, information retrieval, and knowledge base population.

In this talk, I will focus on the challenges of finding and organizing information about events and introduce my research on leveraging knowledge and reasoning for document-level information extraction. In the first part, I’ll introduce methods for better modeling the knowledge from context: (1) generative learning of output structures that better model the dependency between extracted events to enable more coherent extraction of information (i.e., event A happening in the earlier part of the document is usually correlated with event B in the later part). (2) How to utilize information retrieval to enable memory-based learning with even longer context.
In the second part, to better access relevant external knowledge encoded in large models for reducing the cost of human annotations, we propose a new question-answering formulation for the extraction problem. I will conclude by outlining a research agenda for building the next generation of efficient and intelligent machine reading systems with close to human-level reasoning capabilities.

Bio:

Xinya Du is a Postdoctoral Research Associate at the University of Illinois at Urbana-Champaign working with Prof. Heng Ji. He earned a Ph.D. degree in Computer Science from Cornell University, advised by Prof. Claire Cardie. Before Cornell, he received a bachelor's degree in Computer Science from Shanghai Jiao Tong University. His research is on natural language processing, especially methods that leverage knowledge & reasoning skills for document-level information extraction. His work has been published in leading NLP conferences such as ACL, EMNLP, NAACL and has been covered by major media like New Scientist. He has received awards including the CDAC Spotlight Rising Star award and SJTU National Scholarship.

February 25th (Friday) 11:30am-1pmZoom
set up by presenter
Suraj Jog

Title: Scalable Next-Generation Wireless Networks

Abstract:

The next generation of wireless technologies will provide unprecedented capabilities -- gigabyte communication speeds at ultra-low latencies, hyper-precise localization, and vision-like perception. This will enable a plethora of new applications like wireless virtual and augmented reality, self-driving cars, space communications, precision agriculture, high-performance computing, and more. However, while these performance leaps have been demonstrated in the context of constrained networks with single users and controlled environments, the question of scaling these next-gen wireless technologies to large networks in the wild consisting of multiple heterogeneous nodes remains unsolved. 

In this talk, I will present three examples of my research that addresses these scalability challenges across different applications, each with unique objectives and constraints. First, I will talk about enabling extreme dense spatial packing of users for untethered wireless streaming in multi-user VR and AR applications, where we can scale the wireless network data rate with the number of clients without suffering interference. Second, I will discuss the challenges of scaling hyper-precise localization enabled by the high bandwidth 5G cellular technologies to ubiquitously deployed low power IoT nodes in the wild. I will show how we can leverage RF-acoustics microsystems to design new kinds of RF filters that can preserve the high localization resolution on narrowband IoT devices that sample 16x below Nyquist. Finally, I will also discuss interdisciplinary research avenues where chip-scale millimeter-wave wireless networks promise to revolutionize new application domains like High-Performance Computing. I will demonstrate how we can leverage deep reinforcement learning and AI tools to learn and generate new networking protocols for the wireless interconnects on multicore processors, which in turn will enable multicore processors to scale to hundreds and thousands of cores. I will conclude the talk with future directions in next-gen cellular and wireless research, both in terms of core methods as well as applications.

Bio:

Suraj Jog is a Ph.D. candidate in Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign (UIUC), working with Haitham Hassanieh. His research is focused on next-generation wireless networking and wireless sensing. Through his research, he has designed and built systems that can deliver seamless scalability in multiple application domains for millimeter-wave technology, such as gigabit-speed wireless communications, localization and imaging, and wireless networks-on-chip. His research has been recognized with the Qualcomm Innovation Fellowship, Joan and Lalit Bahl Fellowship, Mavis Future Faculty Fellowship, M.E Van Valkenburg Fellowship, Rambus Computer Engineering Fellowship, and more.

March 2nd (Wednesday) 9am-10:30amZoom
set up by presenter
Xuan Wang

Title: Automated Scientific Knowledge Extraction from Massive Text Data

Abstract:

Text mining is promising for advancing human knowledge in many fields, given the rapidly growing volume of text data (e.g., scientific articles, medical notes, and news reports) we are seeing nowadays. In this talk, I will present my work on automatically extracting knowledge from massive text data to enable and accelerate scientific discovery. First, I will talk about my work on information extraction with minimum human supervision. With the growing volume of text data and the breadth of information, it is inefficient or nearly impossible for humans to manually find, integrate, and digest useful information. To address the above challenge, I have developed methods that automatically extract entity and relation information from massive text data with minimum human supervision. Second, I will talk about my work on literature-based scientific knowledge discovery. This research direction aims to enable and accelerate real-world knowledge discovery with the rich information we automatically extracted from scientific text. I have collaborated with domain experts in various scientific disciplines (e.g., chemistry, biomedicine, and health) to achieve this goal. Last, I will conclude my talk with future directions on using text mining to address open scientific problems, such as to assist chemical and biological molecule design and to support clinical drug discovery.

Bio:

Xuan Wang is a fifth-year Ph.D. student in the Computer Science Department at the University of Illinois at Urbana-Champaign (UIUC). She is working in the Data Mining Group under the supervision of Prof. Jiawei Han. Xuan received M.S. in Statistics (2017) and M.S. in Biochemistry (2015) from UIUC. She received B.S. in Biological Science (2013) from Tsinghua University, China. Her research interests are in text mining and natural language processing, emphasizing applications to biological and health sciences. Her current research theme is developing effective and scalable algorithms and systems for automatically understanding massive text data to enable and accelerate scientific discovery.  Xuan has published about 20 research/demo papers in top NLP conferences (e.g., ACL and EMNLP) and biomedical informatics journals (e.g., Bioinformatics) and conferences (e.g., ACM-BCB and IEEE-BIBM). She is the recipient of the YEE Fellowship Award in 2020-2021 from UIUC.

March 2nd (Wednesday) 2:30pm-4pmZoom
set up by presenter
Jing Liu

Title: Robust Learning & Inference with Applications in Distributed Learning and IoT

Abstract:

Robustness is of paramount importance in modern, scalable, and distributed machine learning (ML) and artificial intelligence (AI), particularly for safety-critical applications. On the one hand, distributed learning (e.g., Federated Learning) has emerged as a communication efficient, privacy-enhancing, and scalable approach for training without explicit centralized data collection. Unfortunately, training models with distributed data and computation further increases vulnerability to adversarial corruptions. This talk will outline modern solutions to fundamental estimation problems such as certifiable Robust Linear Regression, Robust PCA, and High-dimensional Robust Mean Estimation. Using these tools as building blocks, I will present recent work on Robust Distributed Learning & Inference. I will conclude the talk with future directions in efficient and trustworthy Artificial Intelligence of Things (AIoT).

Bio:

Jing Liu is an Illinois Future Faculty fellow in computer science at the University of Illinois at Urbana Champaign. His research interests include Data Science, the Internet of Things (IoT), and Distributed Learning & Inference. Liu was a postdoc in the Coordinated Science Lab and obtained his Ph.D. from UCSD. Liu is the recipient of several awards, including the Shannon Graduate Fellowship nomination award and Frontiers of Innovation Fellowship in UCSD, Guanghua Fellowship in Tsinghua University, National Fellowships of China, as well as Silver Medal, Young Mentor award in Beijing Institute of Technology, and a prize of Beijing Science & Technology Award.

2020-2021 Schedule

Date and

Time

RoomSpeakerTitle and Abstract
February 1st (Monday) 11AM-12:30PMZoom
set up by department
Wing Lam

Title: Taming Flaky Tests in a Non-Deterministic World

Abstract:

As software evolves, developers typically perform regression testing to ensure that their code changes do not break existing functionality. During regression testing, developers often waste time debugging their code changes because of spurious failures from flaky tests, which are tests that nondeterministically pass or fail on the same code. These spurious failures mislead developers because the failures are due to bugs that existed before the code changes. My work on characterizing flaky tests has helped open the research topic of flaky tests, and many companies (e.g., Facebook, Google, Microsoft) have since highlighted flaky tests as a major challenge in their software development.

In this talk, I will describe my recent work on taming flaky tests. Two prominent kinds of flaky tests are order-dependent flaky tests, which pass when run in one order but fail when run in a different order, and async-wait flaky tests, which pass if an asynchronous call finishes on time but fail if it finishes too late. My results include the first automated techniques to (1) fix order-dependent flaky tests, fixing 92% of such flaky tests in a public dataset; (2) reduce the number of spurious failures from order-dependent flaky tests, reducing such failures by 73%; and (3) speed up async-wait flaky tests while also reducing their spurious failures, speeding up such tests by 38%. Overall, my work has helped detect more than 2000 flaky tests and fix more than 500 flaky tests in over 150 open-source projects.

Bio:

Wing Lam is a PhD candidate in the Computer Science department at the University of Illinois at Urbana-Champaign where he is co-advised by Professors Tao Xie and Darko Marinov. He works on several topics in software engineering, with a focus on software testing. Wing's research improves software dependability by characterizing bugs and developing novel techniques to detect and tame bugs. He has published in top-tier conferences such as ESEC/FSE, ICSE, ISSTA, OOPSLA, and TACAS. His techniques have helped detect and fix bugs in open-source projects and have impacted how Microsoft and Tencent developers test their code. Wing has been awarded several fellowships and scholarships, including a Google - CMD-IT Dissertation Fellowship Award. More information is available on his web page:

http://mir.cs.illinois.edu/winglam

February 2nd (Tuesday) 11:30AM-1PM
Zoom
set up by department
Wajih Ul Hassan

Title: Detecting and Investigating System Intrusions with Provenance Analytics

Abstract: Stories of devastating data breaches continue to dominate headlines around the world. Equifax, Target, and Office of Personnel Management are just a few examples of high-profile data breaches over the past decade. Despite a panoply of security products and increasing investment in data security, attackers are continually finding new ways to outsmart defenses to gain access to valuable data, indicating that current security approaches are ineffective.

Data provenance describes the detailed history of system execution, allowing us to understand how system objects came to exist in their present state and providing means to identify the root cause of system intrusions. My research leverages provenance analytics to empower system defenders to quickly and effectively detect and investigate malicious behaviors. In this talk, I will first present a provenance-based solution for combatting the “Threat Alert Fatigue” problem that currently plagues enterprise security. Next, I will describe an approach for performing accurate and high-fidelity attack forensics using a novel adaptation of program analysis techniques. I will conclude by discussing the promise of provenance analytics to address open security and auditing problems in complex computing systems and emerging technologies.

Speaker Bio:

Wajih Ul Hassan is a doctoral candidate advised by Professor Adam Bates in the Department of Computer Science at the University of Illinois at Urbana-Champaign. His research focuses on securing complex networked systems by leveraging data provenance approaches and scalable system design. He has collaborated with NEC Labs and Symantec Research Labs to integrate his defensive techniques into commercial security products. He received a Symantec Research Labs Graduate Fellowship, a Young Researcher in Heidelberg Laureate Forum, an RSA Security Scholarship, a Mavis Future Faculty Fellowship, a Sohaib and Sara Abbasi Fellowship, and an ACM SIGSOFT Distinguished Paper Award.

February 23 (Tuesday) 11:00AM-12:30PM

Zoom
set up by department
Yunan Luo

Presentation title: Machine learning for large- and small-data biomedical discovery

Abstract: In modern biomedicine, the role of computation becomes more crucial in light of the ever-increasing growth of biological data, which requires effective computational methods to integrate them in a meaningful way and unveil previously undiscovered biological insights. In this talk, I will discuss my research on machine learning for large- and small-data biomedical discovery. First, I will describe a representation learning algorithm for the integration of large-scale heterogeneous data to disentangle out non-redundant information from noises and to represent them in a way amenable to comprehensive analyses; this algorithm has enabled several successful applications in drug repurposing. Next, I will present a deep learning model that utilizes evolutionary data and unlabeled data to guide protein engineering in a small-data scenario; the model has been integrated into lab workflows and enabled the engineering of new protein variants with enhanced properties. I will conclude my talk with future directions of using data science methods to assist biological design and to support decision making in biomedicine.

Bio: Yunan Luo (http://yunan.cs.illinois.edu/) is a Ph.D. student advised by Prof. Jian Peng in the Department of Computer Science, University of Illinois at Urbana-Champaign. Previously, he received his Bachelor’s degree in Computer Science from Tsinghua University in 2016. His research interests are in computational biology and machine learning. His research has been recognized by a Baidu Ph.D. Fellowship and a CompGen Ph.D. Fellowship.

March 16th (Tuesday) 1PM-2:30PM CT

Alternative Time: March 18th (Thursday)
7PM-8:30PM CT
Zoom
set up by department
Liyuan Liu

Title: Towards Easy-to-Use Deep Learning: Effort-Light Transformer Training as an Example

Abstract:

Deep learning methods stand out with their ability to handle complicated data and tasks. However, successfully applying cutting-edge deep learning methods usually requires lots of extra care (e.g., heuristic tricks, excessive tuning on hyper-parameters, and data annotation costs). Given the inherent resource limitation of real-world applications, the demand of these efforts has hindered various applications and research. Bearing this in mind, I strive to build productive algorithms that can effectively make deep learning effort-light and easy-to-use.

In this talk, I will address the extra care required to train Transformer networks (the backbone of many recent breakthroughs like BERT). First, my analyses reveal that unbalanced gradients are not the root cause of the unstable Transformer training and uncover a long-overlooked issue, i.e., model sensitivity to parameter updates. In light of these analyses, I successfully stabilize Transformer training and achieve the new state of the art without introducing any additional hyper-parameters. Secondly, I identify a problem of the adaptive learning rate, which not only provides guidance on training configurations and further stabilizes model training, but also sheds insights on the mystery of why the learning rate warmup is necessary. Putting these two aspects together forms a comprehensive inspection on extra care required for Transformer and simplifies Transformer training by reducing those extra care. In closing, I present a broader overview of my research and discuss how it can benefit biostatistics and biomedical informatics research.

Bio:

Liyuan Liu is a Ph.D. candidate in Computer Science at the University of Illinois at Urbana-Champaign, advised by Prof. Jiawei Han. He received his B.Eng. in Computer Science and Engineering at the University of Science and Technology of China in 2016. In his research, he strives to develop productive algorithms that can effectively reduce the resource consumption of deep learning, including expert efforts for data annotation and computation resources for tuning and training. Liyuan has published more than 20 papers in Top-Tier Conferences during his Ph.D. study. Liyuan has been awarded several fellowships and scholarships, including 2020 Yee Fellowship and 2015 Guo Moruo Scholarship. More information is available on his web page: http://liyuanlucasliu.github.io/

...

Date and
Time
RoomSpeakerTitle and Abstract

Feb 1 (Fri)
12:30 pm

3401SCQi Li

Title: Pattern-Based Mining of Entity/Relation Structures from Massive Text

Abstract:
Majority of information nowadays is carried by massive and unstructured text, in the form of news, articles, reports, or social media messages.  This poses a major research challenge on mining entity/relation structures from unstructured text.  Manual curation or labeling cannot be scalable to match the rapid growth of text.  Most existing information extraction approaches rely on heavy human annotations, which can be too expensive to tune and not adaptable to new domains.

In this talk, I will present a pattern-based methodology that conducts information extraction from the massive corpora using existing resources with little human effort. The first component, WW-PIE, discovers meaningful textual patterns that contain the entities of interest. The second component, TruePIE, discovers high quality textual patterns for target relation types. I will demonstrate how semi-supervised methods can empower information extraction for broad applications and provide explainable results.

Bio: Qi Li is currently a postdoc researcher and adjunct professor at Department of Computer Science, University of Illinois at Urbana-Champaign, working with Prof. Jiawei Han. Her research interests lie in the area of data mining with a focus on the extraction and aggregation of information from multiple data sources. Qi obtained her PhD in Computer Science and Engineering from the State University of New York at Buffalo in 2017 advised by Prof. Jing Gao, and MS in Statistics from University of Illinois at Urbana-Champaign in 2012. She has received several awards including the Presidential Fellowship of University at Buffalo, the Best CSE Graduate Research Award and the CSE Best Dissertation Award at Department of Computer Science and Engineering, University at Buffalo. More information can be found at https://publish.illinois.edu/qili5/.

Feb 1
(Fri)
3:30pm
4405SCOwolabi Legunsen

Title: Evolution-Aware Runtime Verification

Abstract: The risk posed by software bugs has increased significantly as software is now essential to many areas of our daily lives. Runtime verification can help find bugs by monitoring program executions against formally specified properties. Over the last two decades, tremendous research progress has improved the performance of runtime verification. However, there has been very little focus on the benefits and challenges of using runtime verification during software testing. Yet, testing generates many executions on which properties can be monitored.

In this talk, I will describe my work on studying and improving runtime verification during testing. My large-scale study was the first to show that runtime verification during testing is beneficial for finding many important bugs from tests that developers already have. However, my study also showed that runtime verification still incurs high overhead, both in machine time to monitor properties and in developer time to inspect violations of the properties. Moreover, all prior runtime verification techniques consider only one program version and would wastefully re-monitor unaffected properties and code as software evolves. To reduce the overhead across multiple program versions, I proposed the first evolution-aware runtime verification techniques. My techniques exploit the key insight that software evolves in small increments and reduce the accumulated runtime verification overhead by up to 10x, without missing new violations.

Bio: Owolabi LegunsenisaPhD candidate in Computer Science at the University of Illinois at Urbana-Champaign, where he works with Darko Marinov and Grigore Rosu. Owolabi's interests are in Software Engineering and Applied Formal Methods, with a focus on Software Testing and Runtime Verification. His research on runtime verification during software testing received an ACM SIGSOFT Distinguished Paper Award at ASE 2016. More information is available on his web page: http://mir.cs.illinois.edu/legunsen

Feb 5 (Tue)

10:00am

SC 2405Mengjia Yan

Title: Secure Computer Hardware in the Age of Pervasive Security Attacks

Abstract:
Recent attacks such as Spectre and Meltdown have shown how vulnerable modern computer hardware is. The root cause of the problem is that computer architects have traditionally focused on performance and energy efficiency. Security has never been a first-class requirement. Moving forward, however, this has to radically change: we need to rethink computer architecture from the ground-up for security.
As an example of this vision, in this talk, I will focus on speculative execution in out-of-order processors --- a core computer architecture technology that is the target of the recent attacks. I will describe InvisiSpec, the first robust hardware defense mechanism against speculative (a.k.a transient) execution attacks. The idea is to make loads invisible in the cache hierarchy, and only reveal their presence at the point when they are safe. Once an instruction is deemed safe, our hardware is able to cheaply modify the cache coherence state in a consistent manner. Further, to reduce the cost of InvisiSpec and increase its protection coverage, I propose Speculative Taint Tracking (STT). This is a novel form of information flow tracking that is specifically designed for speculative execution. It reduces cost by allowing tainted instructions to become safe early, and by effectively leveraging the predictor hardware that is ubiquitous in modern processors. Further improvements of InvisiSpec-STT can be attained with new compiler techniques. Finally, I will conclude my talk by describing ongoing and future directions towards designing secure processors.

BIO:
Mengjia Yan is a Ph.D. student at the University of Illinois at Urbana-Champaign (UIUC), working with Professor Josep Torrellas.  Her research interest lies in the areas of computer architecture and hardware security, with a focus on defenses against transient execution attacks and cache-based side channel attacks. Her work has appeared in some of the top venues in computer architecture and security, and has sparked a large research collaboration initiative between UIUC and Intel. Mengjia received the UIUC College of Engineering Mavis Future Faculty Fellow, the Computer Science W.J. Poppelbaum Memorial Award, a MICRO TopPicks in Computer Architecture Honorable Mention, and was invited to participate in two Rising Stars workshops.

Feb 8 (Fri)

3:30 pm

SC 4405Sangeetha Abdu Jyothi

Title: 
Automated Resource Management in Large-Scale Networked Systems

Abstract:
The multitude of Internet applications relies on large-scale networked environments such as the cloud for their backend support. In these multi-tenanted environments, various stakeholders have diverse goals. The objective of the infrastructure provider is to increase revenue by utilizing the resources efficiently. Applications, on the other hand, want to meet their performance requirements at minimal cost. However, estimating the exact amount of resources required to meet the application needs is a difficult task, even for expert users. Easy workarounds employed for tackling this problem, such as resource over-provisioning, negatively impact the goals of the provider, applications, or both.

In this talk, I will discuss the design of application-aware self-optimizing systems through automated resource management that helps meet the varied goals of the provider and applications in large-scale networked environments. The key steps in closed-loop resource management include learning of application resource needs, efficient scheduling of resources, and adaptation to variations in real-time. I will describe how I apply this high-level approach in two distinct environments using (a) Morpheus in enterprise clusters, and (b) Patronus in cellular provider networks with geo-distributed micro data centers. I will also touch upon my related work in application-specific context at the intersection of network scheduling and deep learning. I will conclude with my vision for self-optimizing systems including fully automated clouds and an elastic geo-distributed platform forthousandsofmicro data centers.


Bio:

Sangeetha Abdu Jyothi is a Ph.D. candidate at the University of Illinois at Urbana-Champaign advised by Brighten Godfrey. Her research interests lie in the areas of computer networking and systems with a focus on building application-aware self-optimizing systems through automated resource management. She is a winner of the Facebook Graduate Fellowship (2017-2019) and the Mavis Future Faculty Fellowship (2017-2018). She was invited to attend the Rising Stars in EECS workshop at MIT (2018).

Website: http://abdujyo2.web.engr.illinois.edu

Feb 26 (Tue)

3:00 pm

SC 3403Motahhare Eslami

Title: Communicating Opaque Algorithmic Processes in Socio-Technical Systems

Abstract:

Algorithms play a vital role in curating online information in socio-technical systems, however, they are usually housed in black-boxes that limit users’ understanding of how an algorithmic decision is made. While this opacity partly stems from protecting intellectual property and preventing malicious users from gaming the system, it is also designed to provide users with seamless, effortless system interactions. However, this opacity can result in misinformed behavior among users, particularly when there is no clear feedback mechanism for users to understand the effects of their own actions on an algorithmic system. The increasing prevalence and power of these opaque algorithms coupled with their sometimes biased and discriminatory decisions raise questions about how knowledgeable users are and should be about the existence, operation and possible impacts of these algorithms. In this talk, I will address these questions by exploring ways to investigate users’ behavior around opaque algorithmic systems. I will then present new design techniques that communicate opaque algorithmic processes to users and provide them with a more informed, satisfying, and engaging interaction. In doing so, I will add new angles to the old idea of understanding the interaction between users and automation by designing around algorithm sensemaking and algorithm transparency.

Bio:

Motahhare Eslami is a Ph.D. Candidate in Computer Science at the University of Illinois at Urbana-Champaign, where she is advised by Karrie Karahalios. Motahhare’s research develops new communication techniques between users and opaque algorithmic socio-technical systems to provide users a more informed, satisfying, and engaging interaction. Her work has been recognized with a Google PhD Fellowship, Best Paper Award at ACM CHI, and has been covered in mainstream media such as Time, The Washington Post, Huffington Post, the BBC, Fortune, and Quartz. Motahhare is also a Facebook and Adobe PhD fellowship finalist, and a recipient of C.W. Gear Outstanding Graduate Student Award, Saburo Muroga Endowed Fellowship, Feng Chen Memorial Award, Young Researcher in Heidelberg Laureate Forum and Rising Stars in EECS.

Mar 5 (Tue)

4:00 pm

SC

3403

Jingbo Shang

Title: AutoNet: Automated Network Construction from Massive Text Corpora

Abstract:

Mining structured knowledge from massive unstructured text data is a key challenge in data science. In this talk, I will discuss my proposed framework, AutoNet, that transforms unstructured text data into structured heterogeneous information networks, on which actionable knowledge can be further uncovered flexibly and effectively. AutoNet is a data-driven approach using distant supervision instead of human curation and labeling. It consists of four essential steps: (1) quality phrase mining; (2) entity recognition and typing; (3) relation extraction; and (4) taxonomy construction. Along this line, I have developed a number of state-of-the-art distantly-supervised/unsupervised methods and published them in top conferences and journals. Specifically, I will present my work about phrase mining, entity recognition, and taxonomy construction in details, while touching the other work slightly. Finally, I will summarize the AutoNet framework with a demo video and conclude by discussing future work collaborating with other disciplines.

Bio:

Jingbo Shang is a Ph.D. candidate in Department of Computer Science, the University of Illinois at Urbana-Champaign. He received his B.E. from the Computer Science Department, Shanghai Jiao Tong University, China. His research focuses on mining and constructing structured knowledge from massive text corpora with minimum human effort. His research has been recognized by many prestigious awards, including Computer Science Excellence Scholarship from CS@Illinois, Grand Prize of Yelp Dataset Challenge in 2015, and Google Ph.D. Fellowship in Structured Data and Database Management in 2017, and C.W. Gear Outstanding Graduate Award in 2018.

...

Date and
Time
RoomSpeakerTitle and Abstract
2/15 Thursday 1:30PM SC2405Wei Yang
Title: Adversarial-Resilience Assurance for Mobile Security Systems 
Abstract: For too long, researchers have often tackled security in an attack-driven, ad hoc, and reactionary manner with large manual efforts devoted by security analysts. In order to make substantial progress in security, I advocate to shift such manner to be automated, intelligent, and adversarial resilient. Over the course of my Ph.D. research, I have built security systems incorporating intelligent security techniques based on program analysis, natural language processing, and machine learning, and I have developed corresponding defenses and testing methodologies to guard against emerging attacks specifically adversarial to these newly-proposed security techniques. In this talk, I will first highlight two of these systems for mobile security: AppContext and WHYPER. Then I will show how to generate adversarial inputs for testing and further strengthening these systems. I will conclude by discussing how future research efforts can leverage the interplay between AI and security techniques toward a defense-driven security ecosystem.
2/23 Friday 10amSC3403Chao Zhang

Title: Knowledge Cube Construction from Massive Social Sensing Data
Abstract: Social sensing data are massive and ubiquitous. Effective and scalable analytics of social sensing data can be game changing for urban science, business, healthcare, and homeland security. However, such data pose great challenges to computer science research since they are often unstructured, fragmented, noisy, and intermingled with rich contexts. In this talk, I will introduce a systematic framework, KnowCube, that addresses the above challenges by turning unstructured, noisy social sensing data into a structured, multidimensional knowledge cube. In particular, I will discuss in detail how to solve two key problems for knowledge cube construction: (1) how to extract events from noisy social sensing data; and (2) how to organize unstructured events into a multidimensional cube structure without supervision. KnowCube serves as a versatile and easy-to-use knowledge engine that can harness the power of social sensing for many applications. Finally, I will share some future research directions on better knowledge cube construction and building next-generation intelligent systems with the knowledge cube.

3/9 Friday 10amCSL301Izzat El Hajj

Title: Building Programming Systems in a World of Increasing Heterogeneity

Abstract:

The breakdown of Dennard Scaling and slowing down of Moore's Law has led to an explosion of new processor and memory technologies which is making computing systems evolve to become increasingly heterogeneous. We are seeing GPUs, FPGAs, and special purpose accelerators become central parts of systems, as well as a growing interest in persistent byte-addressable memories and near-memory acceleration. While these technologies provide massive performance gains and energy savings that are not possible on traditional systems, they tend to be very tedious to program which introduces a heavy burden on software developers and presents a significant barrier to adoption. It is therefore critical that these hardware innovations be met with software innovations that facilitate programmability.

In this talk, I will discuss my work on building programming systems (languages, compilers, runtimes, OS support) for emerging processor and memory technologies. My talk will focus on two particular systems: (1) a compiler and runtime for improving performance and programmability of irregular applications on GPUs, and (2) a novel programmable accelerator and compiler that leverage analog computing via memristive crossbars to accelerate deep learning workloads. I will also discuss my future directions in both lines of work.

Bio:

Izzat is a PhD candidate in Electrical and Computer Engineering at the University of Illinois at Urbana-Champaign, and a member of the IMPACT Research Group working with Prof. Wen-mei Hwu. Izzat's research interests are in building programming systems for emerging processor and memory technologies. He has worked on programming systems for GPUs tackling issues of performance portability (CGO'15, MICRO'16), irregular application optimization (MICRO'16), and collaborative execution via unified virtual memory (ISPASS'17). For his work on GPU programming systems, he holds the Dan Vivoli Endowed Fellowship '17-'18. He has also worked on programming systems for emerging resistive memory technologies, tackling the issue of persistent object representation (ASPLOS'16, OOPSLA'17) and the use of memristive crossbars for accelerating deep learning workloads (in submission). For the former, he received the HiPEAC paper award and has submitted multiple patent applications. Izzat received his BE in Electrical and Computer Engineering in 2011 at the American University of Beirut (AUB), where he graduated with high distinction and received the Distinguished Graduate Award.

...

Date and TimeRoomSpeakerTitle and Abstract
Feb 4 (Wed) 10am2405Milos Gligoric

Regression Testing: Theory and Practice

Developers often build regression test suites that are automatically run for each code revision to check that code changes did not break any functionality.  While regression testing is important, it is also expensive due to both the number of revisions and the number of tests.  For example, Google recently reported that they observed a quadratic increase in daily test-suite run time (a linear increase in the number of revisions per day and a linear increase in the number of tests per revision).

In this talk, I present a technique, called Ekstazi, to substantially reduce test-suite run time.  Ekstazi introduces a novel approach to regression test selection, which runs only a subset of tests whose dependencies may be affected by the latest changes; Ekstazi keeps file dependencies for each test.  Ekstazi also speeds up test-suite runs for software that uses modern distributed version-control systems; by modeling different branch and merge commands directly, Ekstazi computes test sets that can be significantly smaller than the entire test suite.  I developed Ekstazi for JVM languages and evaluated it on several hundred revisions of 32 open-source projects (totaling 5M lines of code).  Ekstazi can reduce test-suite run time an order of magnitude, including runs for merge revisions.  Finally, only a few months after the initial release, Ekstazi was adopted and used daily by many developers from several open-source projects, including Apache Camel, Commons Math, and CXF.

Bio: Milos Gligoric is a PhD candidate in Computer Science at the University of Illinois at Urbana-Champaign (UIUC).  His research interests are in software engineering and formal methods, especially in designing techniques and tools that improve software quality and developers' productivity.  His PhD work has explored test input generation, test quality assessment, testing concurrent code, and regression testing.  He won an ACM SIGSOFT Distinguished Paper Award (ICSE 2010), and three of his papers were invited for a journal submission.  He was awarded the Saburo Muroga Fellowship (2009), the C.L. and Jane W-S. Liu Award (2012), and the C. W. Gear Outstanding Graduate Award (2014) from the UIUC Department of Computer Science, and the Mavis Future Faculty Fellowship (2014) from the UIUC College of Engineering.  He did internships at NASA Ames, Intel, Max Planck Institute for Software Systems, and Microsoft Research.  Milos holds a BS (2007) and MS (2009) from the University of Belgrade, Serbia.

Feb 10 (Tue) 10 am2405Kai-Wei Chang

Practical Learning Algorithms for Structured Prediction Models

The desired output in many machine learning tasks is a structured object such as a tree, a clustering of nodes, or a sequence. Learning accurate prediction models for such problems requires training on large amounts of data, making use of expressive features and performing global inference that simultaneously assigns values to all interrelated nodes in the structure. All these contribute to significant scalability problems. We describe a collection of results that address several aspects of these problems – by carefully selecting and caching samples,structures, or latent items.

Our results lead to efficient learning algorithms for structured prediction models and for online clustering models which, in turn, support reduction in problem size, improvements in training and evaluation speed and improved performance. We have used our algorithms to learn expressive models from large amounts of annotated data and achieve state-of-the art performance on several natural language processing tasks.

Bio: Kai-Wei Chang is a doctoral candidate advised by Prof. Dan Roth in the Department of Computer Science, University of Illinois at Urbana-Champaign. His research interests lie in designing practical machine learning techniques for large and complex data and applying them to real world applications. He has been working on various topics in Machine learning and Natural Language Processing, including large-scale learning, structured learning, coreference resolution, and relation extraction. Kai-Wei was awarded the KDD Best Paper Award in 2010 and won the Yahoo! Key Scientific Challenges Award in 2011. He was one of the main contributors of a popular linear classification library, LIBLINEAR.

Feb 13 (Fri) 12:30 pm2405Benjamin Raichel

Fast geometric algorithms via netting, pruning, and sketching

Abstract: The scale of modern geometric data sets necessitates fast algorithms. In this talk I will discuss several optimal linear (or near linear) time algorithms, which work by quickly throwing out and summarizing data, creating a compact sketch of the input.

In the first part of the talk I will present a general framework called Net and Prune, which provides linear time approximation algorithms for a large class of well studied geometric optimization problems, such as k-center clustering and farthest nearest neighbor. The new approach is robust to variations in the input problem, and yet it is simple, elegant, and practical. In particular, many of these well studied problems which easily fit into our framework, either previously had no linear time approximation algorithms, or required rather involved algorithms and analysis.

In the second part of the talk I will discuss contour trees, which provide a compact description of the level set behavior of structured geometric data. These trees are used in HPC applications such as combustion, chemical and fluid mixing simulations, where they are used to both summarize and explore the significantly larger simulation data. Here I will discuss an instance optimal algorithm for their computation, which runs in linear time when the tree is balanced.

Bio: Benjamin Raichel is a PhD student in the Computer Science Department at the University of Illinois, Urbana-Champaign. His research interests are in algorithms and their applications. In particular he has developed fast and practical algorithms for a variety of geometric problems. He is currently funded by the UIUC Dissertation Completion Fellowship, and previously was awarded the Andrew and Shana Laursen Fellowship (2011-12) from the Department of Computer Science. Benjamin holds an MS degree in Computer Science (2011), as well as a BS degree with highest distinction in both Math and Physics (2009), from the University of Illinois.

Feb 18 (Wed) 3:00pm2405Yangqiu SongMachine Learning with World Knowledge
 
Machine learning algorithms have become pervasive in multiple domains and have started to have impact in applications. Nonetheless, a key obstacle in making learning protocol realistic in applications is the need to supervise them, a costly process that often requires hiring domain experts. However, while annotated data is difficult to get, we have available large amounts of data from the Web. In this talk, I will introduce learning paradigms which use existing world knowledge to “supervise” machine learning algorithms. By “world knowledge” we refer to general-purpose knowledge collected from the Web, and that can be used to extract both common sense knowledge and diverse domain specific knowledge and thus help supervise machine learning algorithms. I will discuss two projects, demonstrating that we can perform better machine learning and text data analytics by adapting general-purpose knowledge to domain specific tasks. For the first project, I will introduce the dataless classification algorithm which requires no labeled data to perform completely unsupervised text classification. In this case, the Wikipedia knowledge is embed to represent the text documents and the category labels into the same semantic space. For the second project, I will discuss how to perform hierarchical clustering of domain-specific short texts, e.g., Web queries and tweets, using a probabilistic concept based knowledge base, Probase. In both cases, we provide realistic and scalable algorithms to address large scale and fundamental text analytics problems.
 
Bio:Dr. Yangqiu Song is a post-doctoral researcher at the Cognitive Computation Group at the University of Illinois at Urbana-Champaign. Before that, he was a post-doctoral fellow at Hong Kong University of Science and Technology and visiting researcher at Huawei Noah's Ark Lab, Hong Kong (2012-2013), an associate researcher at Microsoft Research Asia (2010-2012) and a staff researcher at IBM Research China (2009-2010) respectively. He received his B.E. and Ph.D. degrees from Tsinghua University, China, in July 2003 and January 2009, respectively. His current research focuses on using machine learning and data mining to extract and infer insightful knowledge from big data. The knowledge helps users better enjoy their daily living and social activities, or helps data scientists do better data analytics. He is particularly interested in working on large scale learning algorithms, on natural language understanding, text mining and visual analytics, and on knowledge engineering for domain applications. 

Feb 25 (Wed) 2:00 pm

3403Parasara Sridhar Duggirala

Dynamic Analysis of Cyber-Physical Systems

Progress in computation and communication technologies has made it easier to integrate software in all walks of life. The social, economical, and environmental benefits of integrating software into avenues such as avionics, automotives, power grid, and medicine lead to the rise of CPS as an important area of research. However, bugs in software systems deployed in such safety-critical scenarios can lead to loss of property and in some cases life. In this talk, I will present dynamic analysis technique for formally verifying annotated Cyber-Physical Systems and prove the absence of bugs. The annotations, called discrepancy functions, are extensions of proof certificates for analyzing convergence or divergence of systems. One of the key advantages of dynamic analysis is that it leverages the testing procedures which are the only known scalable way of ensuring the system specification. I have developed a tool C2E2 that implements this technique and verifies temporal properties of CPS. C2E2 has been applied to verify alerting mechanisms in parallel aircraft landing protocol developed by NASA and to verify specification of powertrain control system presented as a verification challenge problem by Toyota.

Bio: Parasara Sridhar Duggirala is a PhD Candidate in Computer Science Department at the University of Illinois at Urbana Champaign (UIUC). His main research interests are in Cyber-Physical Systems, Formal Methods, and Control Theory. His paper on Safety Verification of Linear Control Systems won the best paper award at the International Conference on Embedded Software (conducted as part of ESWeek) 2013. He also received Feng Chen Memorial Award in Software Engineering from the Department of Computer Science at UIUC, and has also been selected as a Young Research to attend Heidelberg Laureate Forum. He did internships at NEC Labs America and SRI International. He received his B.Tech (2009) from Indian Institute of Technology Guwahati.

Feb 26

(Thurs)

4:00pm

2405Pranav Garg

Learning Invariants for Software Reliability and Security

The central problem in software verification today is in establishing invariants that prove the system reliable or secure. Current technology requires invariants to be specified manually and is the bottleneck for adoption of verification in mainstream programming. I will describe recent advances in synthesizing invariants using machine learning techniques, embracing an inductive rather than a deductive approach to this problem. In particular, I propose a new learning model called ICE (involving Implication Counter-Examples) and develop new machine learning ICE algorithms that effectively synthesize invariants. I'll also describe ways of specifying infinite enumerated invariants using finite representations of it. These invariant generation techniques are applied to security and reliability domains including ExpressOS, a secure mobile operating system, GPU kernels, cloud systems, and the verification of the responsiveness of the full USB Windows phone driver.

Bio: Pranav Garg is a PhD candidate in the Department of Computer Science at the University of Illinois at Urbana-Champaign. His research interests span areas at the intersection of programming languages, formal methods, and software engineering. His PhD research, in particular, focuses on automating verification for building reliable and secure software systems. He received his B.Tech (2009) in Computer Science and Engineering from the Indian Institute of Technology Kanpur.