Global ETD Search

221	Multimodal Indexing of Presentation Videos Merler, Michele January 2013 (has links) This thesis presents four novel methods to help users efficiently and effectively retrieve information from unstructured and unsourced multimedia sources, in particular the increasing amount and variety of presentation videos such as those in e-learning, conference recordings, corporate talks, and student presentations. We demonstrate a system to summarize, index and cross-reference such videos, and measure the quality of the produced indexes as perceived by the end users. We introduce four major semantic indexing cues: text, speaker faces, graphics, and mosaics, going beyond standard tag based searches and simple video playbacks. This work aims at recognizing visual content "in the wild", where the system cannot rely on any additional information besides the video itself. For text, within a scene text detection and recognition framework, we present a novel locally optimal adaptive binarization algorithm, implemented with integral histograms. It determines of an optimal threshold that maximizes the between-classes variance within a subwindow, with computational complexity independent from the size of the window itself. We obtain character recognition rates of 74%, as validated against ground truth of 8 presentation videos spanning over 1 hour and 45 minutes, which almost doubles the baseline performance of an open source OCR engine. For speaker faces, we detect, track, match, and finally select a humanly preferred face icon per speaker, based on three quality measures: resolution, amount of skin, and pose. We register a 87% accordance (51 out of 58 speakers) between the face indexes automatically generated from three unstructured presentation videos of approximately 45 minutes each, and human preferences recorded through Mechanical Turk experiments. For diagrams, we locate graphics inside frames showing a projected slide, cluster them according to an on-line algorithm based on a combination of visual and temporal information, and select and color-correct their representatives to match human preferences recorded through Mechanical Turk experiments. We register 71% accuracy (57 out of 81 unique diagrams properly identified, selected and color-corrected) on three hours of videos containing five different presentations. For mosaics, we combine two existing suturing measures, to extend video images into in-the-world coordinate system. A set of frames to be registered into a mosaic are sampled according to the PTZ camera movement, which is computed through least square estimation starting from the luminance constancy assumption. A local features based stitching algorithm is then applied to estimate the homography among a set of video frames and median blending is used to render pixels in overlapping regions of the mosaic. For two of these indexes, namely faces and diagrams, we present two novel MTurk-derived user data collections to determine viewer preferences, and show that they are matched in selection by our methods. The net result work of this thesis allows users to search, inside a video collection as well as within a single video clip, for a segment of presentation by professor X on topic Y, containing graph Z. Computer science
222	Heterogeneous Cloud Systems Based on Broadband Embedded Computing Neill, Richard W. January 2013 (has links) Computing systems continue to evolve from homogeneous systems of commodity-based servers within a single data-center towards modern Cloud systems that consist of numerous data-center clusters virtualized at the infrastructure and application layers to provide scalable, cost-effective and elastic services to devices connected over the Internet. There is an emerging trend towards heterogeneous Cloud systems driven from growth in wired as well as wireless devices that incorporate the potential of millions, and soon billions, of embedded devices enabling new forms of computation and service delivery. Service providers such as broadband cable operators continue to contribute towards this expansion with growing Cloud system infrastructures combined with deployments of increasingly powerful embedded devices across broadband networks. Broadband networks enable access to service provider Cloud data-centers and the Internet from numerous devices. These include home computers, smart-phones, tablets, game-consoles, sensor-networks, and set-top box devices. With these trends in mind, I propose the concept of broadband embedded computing as the utilization of a broadband network of embedded devices for collective computation in conjunction with centralized Cloud infrastructures. I claim that this form of distributed computing results in a new class of heterogeneous Cloud systems, service delivery and application enablement. To support these claims, I present a collection of research contributions in adapting distributed software platforms that include MPI and MapReduce to support simultaneous application execution across centralized data-center blade servers and resource-constrained embedded devices. Leveraging these contributions, I develop two complete prototype system implementations to demonstrate an architecture for heterogeneous Cloud systems based on broadband embedded computing. Each system is validated by executing experiments with applications taken from bioinformatics and image processing as well as communication and computational benchmarks. This vision, however, is not without challenges. The questions on how to adapt standard distributed computing paradigms such as MPI and MapReduce for implementation on potentially resource-constrained embedded devices, and how to adapt cluster computing runtime environments to enable heterogeneous process execution across millions of devices remain open-ended. This dissertation presents methods to begin addressing these open-ended questions through the development and testing of both experimental broadband embedded computing systems and in-depth characterization of broadband network behavior. I present experimental results and comparative analysis that offer potential solutions for optimal scalability and performance for constructing broadband embedded computing systems. I also present a number of contributions enabling practical implementation of both heterogeneous Cloud systems and novel application services based on broadband embedded computing. Computer science
223	Discrete Differential Geometry of Thin Materials for Computational Mechanics Vouga, Paul Etienne January 2013 (has links) Instead of applying numerical methods directly to governing equations, another approach to computation is to discretize the geometric structure specific to the problem first, and then compute with the discrete geometry. This structure-respecting discrete-differential-geometric (DDG) approach often leads to new algorithms that more accurately track the physically behavior of the system with less computational effort. Thin objects, such as pieces of cloth, paper, sheet metal, freeform masonry, and steel-glass structures are particularly rich in geometric structure and so are well-suited for DDG. I show how understanding the geometry of time integration and contact leads to new algorithms, with strong correctness guarantees, for simulating thin elastic objects in contact; how the performance of these algorithms can be dramatically improved without harming the geometric structure, and thus the guarantees, of the original formulation; how the geometry of static equilibrium can be used to efficiently solve design problems related to masonry or glass buildings; and how discrete developable surfaces can be used to model thin sheets undergoing isometric deformation. Computer science
224	Traffic Analysis Attacks and Defenses in Low Latency Anonymous Communication Chakravarty, Sambuddho January 2014 (has links) The recent public disclosure of mass surveillance of electronic communication, involving powerful government authorities, has drawn the public's attention to issues regarding Internet privacy. For almost a decade now, there have been several research efforts towards designing and deploying open source, trustworthy and reliable systems that ensure users' anonymity and privacy. These systems operate by hiding the true network identity of communicating parties against eavesdropping adversaries. Tor, acronym for The Onion Router, is an example of such a system. Such systems relay the traffic of their users through an overlay of nodes that are called Onion Routers and are operated by volunteers distributed across the globe. Such systems have served well as anti-censorship and anti-surveillance tools. However, recent publications have disclosed that powerful government organizations are seeking means to de-anonymize such systems and have deployed distributed monitoring infrastructure to aid their efforts. Attacks against anonymous communication systems, like Tor, often involve trac analysis. In such attacks, an adversary, capable of observing network traffic statistics in several different networks, correlates the trac patterns in these networks, and associates otherwise seemingly unrelated network connections. The process can lead an adversary to the source of an anonymous connection. However, due to their design, consisting of globally distributed relays, the users of anonymity networks like Tor, can route their traffic virtually via any network; hiding their tracks and true identities from their communication peers and eavesdropping adversaries. De-anonymization of a random anonymous connection is hard, as the adversary is required to correlate traffic patterns in one network link to those in virtually all other networks. Past research mostly involved reducing the complexity of this process by rst reducing the set of relays or network routers to monitor, and then identifying the actual source of anonymous traffic among network connections that are routed via this reduced set of relays or network routers to monitor. A study of various research efforts in this field reveals that there have been many more efforts to reduce the set of relays or routers to be searched than to explore methods for actually identifying an anonymous user amidst the network connections using these routers and relays. Few have tried to comprehensively study a complete attack, that involves reducing the set of relays and routers to monitor and identifying the source of an anonymous connection. Although it is believed that systems like Tor are trivially vulnerable to traffic analysis, there are various technical challenges and issues that can become obstacles to accurately identifying the source of anonymous connection. It is hard to adjudge the vulnerability of anonymous communication systems without adequately exploring the issues involved in identifying the source of anonymous traffic. We take steps to ll this gap by exploring two novel active trac analysis attacks, that solely rely on measurements of network statistics. In these attacks, the adversary tries to identify the source of an anonymous connection arriving to a server from an exit node. This generally involves correlating traffic entering and leaving the Tor network, linking otherwise unrelated connections. To increase the accuracy of identifying the victim connection among several connections, the adversary injects a traffic perturbation pattern into a connection arriving to the server from a Tor node, that the adversary wants to de-anonymize. One way to achieve this is by colluding with the server and injecting a traffic perturbation pattern using common traffic shaping tools. Our first attack involves a novel remote bandwidth estimation technique to conrm the identity of Tor relays and network routers along the path connecting a Tor client and a server by observing network bandwidth fluctuations deliberately injected by the server. The second attack involves correlating network statistics, for connections entering and leaving the Tor network, available from existing network infrastructure, such as Cisco's NetFlow, for identifying the source of an anonymous connection. Additionally, we explored a novel technique to defend against the latter attack. Most research towards defending against traffic analysis attacks, involving transmission of dummy traffic, have not been implemented due to fears of potential performance degradation. Our novel technique involves transmission of dummy traffic, consisting of packets with IP headers having small Time-to-Live (TTL) values. Such packets are discarded by the routers before they reach their destination. They distort NetFlow statistics, without degrading the client's performance. Finally, we present a strategy that employs transmission of unique plain-text decoy traffic, that appears sensitive, such as fake user credentials, through Tor nodes to decoy servers under our control. Periodic tallying of client and server logs to determine unsolicited connection attempts at the server is used to identify the eavesdropping nodes. Such malicious Tor node operators, eavesdropping on users' traffic, could be potential traffic analysis attackers. Computer science
225	Overcoming the Intuition Wall: Measurement and Analysis in Computer Architecture Demme, John David January 2014 (has links) These are exciting times for computer architecture research. Today there is significant demand to improve the performance and energy-efficiency of emerging, transformative applications which are being hammered out by the hundreds for new computing platforms and usage models. This booming growth of applications and the variety of programming languages used to create them is challenging our ability as architects to rapidly and rigorously characterize these applications. Concurrently, hardware has become more complex with the emergence of accelerators, multicore systems, and heterogeneity caused by further divergence between processor market segments. No one architect can now understand all the complexities of many systems and reason about the full impact of changes or new applications. To that end, this dissertation presents four case studies in quantitative methods. Each case study attacks a different application and proposes a new measurement or analytical technique. In each case study we find at least one surprising or unintuitive result which would likely not have been found without the application of our method. Computer science
226	Selected machine learning reductions Choromanska, Anna Ewa January 2014 (has links) Machine learning is a field of science aiming to extract knowledge from the data. Optimization lies in the core of machine learning as many learning problems are formulated as optimization problems, where the goal is to minimize/maximize an objective function. More complex machine learning problems are then often solved by reducing them to simpler sub-problems solvable by known optimization techniques. This dissertation addresses two elements of the machine learning system 'pipeline', designing efficient basic optimization tools tailored to solve specific learning problems, or in other words optimize a specific objective function, and creating more elaborate learning tools with sub-blocks being essentially optimization solvers equipped with such basic optimization tools. In the first part of this thesis we focus on a very specific learning problem where the objective function, either convex or non-convex, involves the minimization of the partition function, the normalizer of a distribution, as is the case in conditional random fields (CRFs) or log-linear models. Our work proposes a tight quadratic bound on the partition function which parameters are easily recovered by a simple algorithm that we propose. The bound gives rise to the family of new optimization learning algorithms, based on bound majorization (we developed batch, both full-rank and low-rank, and semi-stochastic variants), with linear convergence rate that successfully compete with state-of-the-art techniques (among them gradient descent methods, Newton and quasi-Newton methods like L-BFGS, etc.). The only constraint we introduce is on the number of classes which is assumed to be finite and enumerable. The bound majorization method we develop is simultaneously the first reduction scheme discussed in this thesis, where throughout this thesis by 'reduction' we understand the learning approach or algorithmic technique converting a complex machine learning problem into a set of simpler problems (that can be as small as a single problem). Secondly, we focus on developing two more sophisticated machine learning tools, for solving harder learning problems. The tools that we develop are built from basic optimization sub-blocks tailored to solve simpler optimization sub-problems. We first focus on the multi class classification problem where the number of classes is very large. We reduce this problem to a set of simpler sub-problems that we solve using basic optimization methods performing additive update on the parameter vector. Secondly we address the problem of learning data representation when the data is unlabeled for any classification task. We reduce this problem to a set of simpler sub-problems that we solve using basic optimization methods, however this time the parameter vector is updated multiplicatively. In both problems we assume that the data come in a stream that can even be infinite. We will now provide more specific description of each of these problems and describe our approach for solving them. In the multi class classification problem it is desirable to achieve train and test running times which are logarithmic in the label complexity. The existing approaches to this problem are either intractable or do not adapt well to the data. We propose a reduction of this problem to a set of binary regression problems organized in a tree structure and introduce a new splitting criterion (objective function) allowing gradient descent style optimization (bound optimization methods can also be used). A decision tree algorithm that we obtain differs from traditional decision trees in the objective optimized, and in how that optimization is done. The different objective has useful properties, such us it guarantees balanced and small-error splits, while the optimization uses an online learning algorithm that is queried and trained simultaneously as we pass over the data. Furthermore, we prove an upper-bound on the number of splits required to reduce the entropy of the tree leafs below small threshold. We empirically show that the trees we obtain have logarithmic depth, which implies logarithmic training and testing running times, and significantly smaller error than random trees. Finally, we consider the problem of unsupervised (clustering) learning of data representation, where the quality of obtained clustering is measured using a very simple, intuitive and widely cited clustering objective, k-means clustering objective. We introduce a family of online clustering algorithms by extending algorithms for online supervised learning, with access to expert predictors (which are basic sub-blocks of our learning system), to the unsupervised learning setting. The parameter vector corresponds to the probability distribution over the experts. Different update rules for the parameter vector depend on an approximation to the current value of the k-means clustering objective obtained by each expert, and model different levels of non-stationarity in the data. We show that when the experts are batch clustering algorithms with approximation guarantees with respect to the k-means clustering objective, applied to a sliding window of the data stream, our algorithms obtain approximation guarantees with respect to the k-means clustering objective. Thus simultaneously we address an open problem posed by Dasgupta for approximating k-means clustering objective on data streams. We experimentally show that our algorithms' empirical performance tracks that of the best clustering algorithm in its experts set and that our algorithms outperform widely used online algorithms. Computer science
227	Exploring Societal Computing based on the Example of Privacy Sheth, Swapneel January 2014 (has links) Data privacy when using online systems like Facebook and Amazon has become an increasingly popular topic in the last few years. This thesis will consist of the following four projects that aim to address the issues of privacy and software engineering. First, only a little is known about how users and developers perceive privacy and which concrete measures would mitigate their privacy concerns. To investigate privacy requirements, we conducted an online survey with closed and open questions and collected 408 valid responses. Our results show that users often reduce privacy to security, with data sharing and data breaches being their biggest concerns. Users are more concerned about the content of their documents and their personal data such as location than about their interaction data. Unlike users, developers clearly prefer technical measures like data anonymization and think that privacy laws and policies are less effective. We also observed interesting differences between people from different geographies. For example, people from Europe are more concerned about data breaches than people from North America. People from Asia/Pacific and Europe believe that content and metadata are more critical for privacy than people from North America. Our results contribute to developing a user-driven privacy framework that is based on empirical evidence in addition to the legal, technical, and commercial perspectives. Second, a related challenge to above, is to make privacy more understandable in complex systems that may have a variety of user interface options, which may change often. As social network platforms have evolved, the ability for users to control how and with whom information is being shared introduces challenges concerning the configuration and comprehension of privacy settings. To address these concerns, our crowd sourced approach simplifies the understanding of privacy settings by using data collected from 512 users over a 17 month period to generate visualizations that allow users to compare their personal settings to an arbitrary subset of individuals of their choosing. To validate our approach we conducted an online survey with closed and open questions and collected 59 valid responses after which we conducted follow-up interviews with 10 respondents. Our results showed that 70% of respondents found visualizations using crowd sourced data useful for understanding privacy settings, and 80% preferred a crowd sourced tool for configuring their privacy settings over current privacy controls. Third, as software evolves over time, this might introduce bugs that breach users' privacy. Further, there might be system-wide policy changes that could change users' settings to be more or less private than before. We present a novel technique that can be used by end-users for detecting changes in privacy, i.e., regression testing for privacy. Using a social approach for detecting privacy bugs, we present two prototype tools. Our evaluation shows the feasibility and utility of our approach for detecting privacy bugs. We highlight two interesting case studies on the bugs that were discovered using our tools. To the best of our knowledge, this is the first technique that leverages regression testing for detecting privacy bugs from an end-user perspective. Fourth, approaches to addressing these privacy concerns typically require substantial extra computational resources, which might be beneficial where privacy is concerned, but may have significant negative impact with respect to Green Computing and sustainability, another major societal concern. Spending more computation time results in spending more energy and other resources that make the software system less sustainable. Ideally, what we would like are techniques for designing software systems that address these privacy concerns but which are also sustainable - systems where privacy could be achieved "for free", i.e., without having to spend extra computational effort. We describe how privacy can indeed be achieved for free an accidental and beneficial side effect of doing some existing computation - in web applications and online systems that have access to user data. We show the feasibility, sustainability, and utility of our approach and what types of privacy threats it can mitigate. Finally, we generalize the problem of privacy and its tradeoffs. As Social Computing has increasingly captivated the general public, it has become a popular research area for computer scientists. Social Computing research focuses on online social behavior and using artifacts derived from it for providing recommendations and other useful community knowledge. Unfortunately, some of that behavior and knowledge incur societal costs, particularly with regards to Privacy, which is viewed quite differently by different populations as well as regulated differently in different locales. But clever technical solutions to those challenges may impose additional societal costs, e.g., by consuming substantial resources at odds with Green Computing, another major area of societal concern. We propose a new crosscutting research area, Societal Computing, that focuses on the technical tradeoffs among computational models and application domains that raise significant societal issues. We highlight some of the relevant research topics and open problems that we foresee in Societal Computing. We feel that these topics, and Societal Computing in general, need to gain prominence as they will provide useful avenues of research leading to increasing benefits for society as a whole. Computer science
228	Producing Trustworthy Hardware Using Untrusted Components, Personnel and Resources Waksman, Adam January 2014 (has links) Computer security is a full-system property, and attackers will always go after the weakest link in a system. In modern computer systems, the hardware supply chain is an obvious and vulnerable point of attack. The ever-increasing complexity of hardware systems, along with the globalization of the hardware supply chain, has made it unreasonable to trust hardware. Hardware-based attacks, known as backdoors, are easy to implement and can undermine the security of systems built on top of compromised hardware. Operating systems and other software can only be secure if they can trust the underlying hardware systems. The full supply chain for creating hardware includes multiple processes, which are often addressed in disparate threads of research, but which we consider as one unified process. On the front-end side, there is the soft design of hardware, along with validation and synthesis, to ultimately create a netlist, the document that defines the physical layout of hardware. On the back-end side, there is a physical fabrication process, where a chip is produced at a foundry from a supplied netlist, followed in some cases by post-fabrication testing. Producing a trustworthy chip means securing the process from the early design stages through to the post-fabrication tests. We propose, implement and analyze a series of methods for making the hardware supply chain resilient against a wide array of known and possible attacks. These methods allow for the design and fabrication of hardware using untrustworthy personnel, designs, tools and resources, while protecting the final product from large classes of attacks, some known previously and some discovered and taxonomized in this work. The overarching idea in this work is to take a full-process view of the hardware supply chain. We begin by securing the hardware design and synthesis processes uses a defense-in-depth approach. We combine this work with foundry-side techniques to prevent malicious modifications and counterfeiting, and finally apply novel attestation techniques to ensure that hardware is trustworthy when it reaches users. For our design-side security approach, we use defense-in-depth because in practice, any security method can potentially subverted, and defense-in-depth is the best way to handle that assumption. Our approach involves three independent steps. The first is a functional analysis tool (called FANCI), applied statically to designs during the coding and validation stages to remove any malicious circuits. The second step is to include physical security circuits that operate at runtime. These circuits, which we call trigger obfuscation circuits, scramble data at the microarchitectural level so that any hardware backdoors remaining in the design cannot be triggered at runtime. The third and final step is to include a runtime monitoring system that detects any backdoor payloads that might have been achieved despite the previous two steps. We design two different versions of this monitoring system. The first, TrustNet, is extremely lightweight and protects against an important class of attacks called emitter backdoors. The second, DataWatch, is slightly more heavyweight (though still efficient and low overhead) that can catch a wider variety of attacks and can be adapted to protect against nearly any type of digital payload. We taxonomize the types of attacks that are possible against each of the three steps of our defense-in-depth system and show that each defense provides strong coverage with low (or negligible) overheads to performance, area and power consumption. For our foundry-side security approach, we develop the first foundry-side defense system that is aware of design-side security. We create a power-based side-channel, called a beacon. This beacon is essentially a benign backdoor. It can be turned on by a special key (not provided to the foundry), allowing for security attestation during post-fabrication testing. By designing this beacon into the design itself, the beacon requires neither keys nor storage, and as such exists in the final chip purely by virtue of existing in the netlist. We further obfuscate the netlist itself, rendering the task of reverse engineering the beacon (for a foundry-side adversary) intractable. Both the inclusion of the beacon and the obfuscation process add little to area and power costs and have no impact on performance. All together, these methods provide a foundation on which hardware security can be developed and enhanced. They are low overhead and practical, making them suitable for inclusion in next generation hardware. Moving forward, the criticality of having trustworthy hardware can only increase. Ensuring that the hardware supply chain can be trusted in the face of sophisticated adversaries is vital. Both hardware design and hardware fabrication are increasingly international processes, and we believe continuing with this unified approach is the correct path for future research. In order for companies and governments to place trust in mission-critical hardware, it is necessary for hardware to be certified as secure and trustworthy. The methods we propose can be the first steps toward making this certification a reality. Computer science
229	Next Generation Emergency Call System with Enhanced Indoor Positioning Song, Wonsang January 2014 (has links) The emergency call systems in the United States and elsewhere are undergoing a transition from the PSTN-based legacy system to a new IP-based system. The new system is referred to as the Next Generation 9-1-1 (NG9-1-1) or NG112 system. We have built a prototype NG9-1-1 system which features media convergence and data integration that are unavailable in the current emergency calling system. The most important piece of information in the NG9-1-1 system is the caller's location. The caller's location is used for routing the call to the appropriate call center. The emergency responders use the caller's location to find the caller. Therefore, it is essential to determine the caller's location as precisely as possible to minimize delays in emergency response. Delays in response may result in loss of lives. When a person makes an emergency call outdoors using a mobile phone, the Global Positioning System (GPS) can provide the caller's location accurately. Indoor positioning, however, presents a challenge. GPS does not generally work indoors because satellite signals do not penetrate most buildings. Moreover, there is an important difference between determining location outdoors and indoors. Unlike outdoors, vertical accuracy is very important in indoor positioning because an error of few meters will send emergency responders to a different floor in a building, which may cause a significant delay in reaching the caller. This thesis presents a way to augment our NG9-1-1 prototype system with a new indoor positioning system. The indoor positioning system focuses on improving the accuracy of vertical location. Our goal is to provide floor-level accuracy with minimum infrastructure support. Our approach is to use a user's smartphone to trace her vertical movement inside buildings. We utilize multiple sensors available in today's smartphones to enhance positioning accuracy. This thesis makes three contributions. First, we present a hybrid architecture for floor localization with emergency calls in mind. The architecture combines beacon-based infrastructure and sensor-based dead reckoning, striking a balance between accurately determining a user's location and minimizing the required infrastructure. Second, we present the elevator module for tracking a user's movement in an elevator. The elevator module addresses three core challenges that make it difficult to accurately derive displacement from acceleration. Third, we present the stairway module which determines the number of floors a user has traveled on foot. Unlike previous systems that track users' foot steps, our stairway module uses a novel landing counting technique. Additionally, this thesis presents our work on designing and implementing an NG9-1-1 prototype system. We first demonstrate how emergency calls from various call origination devices are identified, routed to the proper Public Safety Answering Point (PSAP) based on the caller's location, and terminated by the call taker software at the PSAP. We then show how text communications such as Instant Messaging and Short Message Service can be integrated into the NG9-1-1 architecture. We also present GeoPS-PD, a polygon simplification algorithm designed to improve the performance of location-based routing. GeoPS-PD reduces the size of a polygon, which represents the service boundary of a PSAP in the NG9-1-1 system. Computer science
230	Analytic Methods in Concrete Complexity Tan, Li-Yang January 2014 (has links) This thesis studies computational complexity in concrete models of computation. We draw on a range of mathematical tools to understand the structure of Boolean functions, with analytic methods — Fourier analysis, probability theory, and approximation theory — playing a central role. These structural theorems are leveraged to obtain new computational results, both algorithmic upper bounds and complexity-theoretic lower bounds, in property testing, learning theory, and circuit complexity. We establish the best-known upper and lower bounds on the classical problem of testing whether an unknown Boolean function is monotone. We prove an Ω ̃(n^1/5) lower bound on the query complexity of non- adaptive testers, an exponential improvement over the previous lower bound of Ω(logn) from 2002. We complement this with an O ̃(n^5/6)-query non-adaptive algorithm for the problem. We characterize the statistical query complexity of agnostically learning Boolean functions with respect to product distributions. We show that l_1-approximability by low- degree polynomials, known to be sufficient for efficient learning in this setting, is in fact necessary. As an application we establish an optimal lower bound showing that no statistical query algorithm can efficiently agnostically learn monotone k-juntas for any k = ω(1) and any constant error less than 1/2. We initiate a systematic study of the tradeoffs be- tween accuracy and efficiency in Boolean circuit complexity, focusing on disjunctive normal form formulas, among the most basic types of circuits. A conceptual message that emerges is that the landscape of circuit complexity changes dramatically, both qualitatively and quantitatively, when the formula is only required to approximate a function rather than compute it exactly. Finally we consider the Fourier Entropy-Influence Conjecture, a long- standing open problem in the analysis of Boolean functions with significant applications in learning theory, the theory of pseudorandomness, and random graph theory. We prove a composition theorem for the conjecture, broadly expanding the class of functions for which the conjecture is known to be true. Computer science

Search results