Publications

2019-12-16

Predicting the intraday stock jumps is a significant but challenging problem in finance. Due to the instantaneity and imperceptibility characteristics of intraday stock jumps, relevant studies on their predictability remain limited. This paper proposes a data-driven approach to predict intraday stock jumps using the information embedded in liquidity measures and technical indicators. Specifically, a trading day is divided into a series of 5-minute intervals, and at the end of each interval, the candidate attributes defined by liquidity measures and technical indicators are input into machine learning algorithms to predict the arrival of a stock jump as well as its direction in the following 5-minute interval. Empirical study is conducted on the level-2 high-frequency data of 1271 stocks in the Shenzhen Stock Exchange of China to validate our approach. The result provides initial evidence of the predictability of jump arrivals and jump directions using level-2 stock data as well as the effectiveness of using a combination of liquidity measures and technical indicators in this prediction. We also reveal the superiority of using random forest compared to other machine learning algorithms in building prediction models. Importantly, our study provides a portable data-driven approach that exploits liquidity and technical information from level-2 stock data to predict intraday price jumps of individual stocks.

Quantative Finance (2019).

2019-11-09

Hien Nguyen

ML / AI

"Artificial intelligence and machine learning in nephropathology." Kidney International 98, no. 1 (2020): 65-75

Artificial intelligence (AI) for the purpose of this review is an umbrella term for technologies emulating a nephropathologist’s ability to extract information on diagnosis, prognosis, and therapy responsiveness from native or transplant kidney biopsies. Although AI can be used to analyze a wide variety of biopsy-related data, this review focuses on whole slide images traditionally used in nephropathology. AI applications in nephropathology have recently become available through several advancing technologies, including (i) widespread introduction of glass slide scanners, (ii) data servers in pathology departments worldwide, and (iii) through greatly improved computer hardware to enable AI training. In this review, we explain how AI can enhance the reproducibility of nephropathology results for certain parameters in the context of precision medicine using advanced architectures, such as convolutional neural networks, that are currently the state of the art in machine learning software for this task. Because AI applications in nephropathology are still in their infancy, we show the power and potential of AI applications mostly in the example of oncopathology. Moreover, we discuss the technological obstacles as well as the current stakeholder and regulatory concerns about developing AI applications in nephropathology from the perspective of nephropathologists and the wider nephrology community. We expect the gradual introduction of these technologies into routine diagnostics and research for selective tasks, suggesting that this technology will enhance the performance of nephropathologists rather than making them redundant.

Artificial intelligence and machine learning in nephropathology

2019-11-03

Arjun Mukherjee

Natural Language Processing

On the dynamics of user engagement in news comment media

Many news outlets allow users to contribute comments on topics about daily world events. News articles are the seeds that spring users' interest to contribute content, that is, comments. A news outlet may allow users to contribute comments on all their articles or a selected number of them. The topic of an article may lead to an apathetic user commenting activity (several tens of comments) or to a spontaneous fervent one (several thousands of comments). This environment creates a social dynamic that is little studied. The social dynamics around articles have the potential to reveal interesting facets of the user population at a news outlet. In this paper, we report the salient findings about these social media from 15 months worth of data collected from 17 news outlets comprising of over 38,000 news articles and about 21 million user comments. Analysis of the data reveals interesting insights such as there is an uneven relationship between news outlets and their user populations across outlets. Such observations and others have not been revealed, to our knowledge. We believe our analysis in this paper can contribute to news predictive analytics (e.g., user reaction to a news article or predicting the volume of comments posted to an article).

WIREs, Volume 10, Issue 1 (January/February 2020)

2019-10-03

Gopal Pandurangan

Scientific Computing

Efficient Distributed Community Detection in the Stochastic Block Model

Designing effective algorithms for community detection is an important and challenging problem in large-scale graphs, studied extensively in the literature. Various solutions have been proposed, but many of them are centralized with expensive procedures (requiring full knowledge of the input graph) and have a large running time. In this paper, we present a distributed algorithm for community detection in the stochastic block model (also called planted partition model), a widely-studied and canonical random graph model for community detection and clustering. Our algorithm called CDRW(Community Detection by Random Walks) is based on random walks, and is localized and lightweight, and easy to implement. A novel feature of the algorithm is that it uses the concept of local mixing time to identify the community around a given node. We present a rigorous theoretical analysis that shows that the algorithm can accurately identify the communities in the stochastic block model and characterize the model parameters where the algorithm works. We also present experimental results that validate our theoretical analysis. We also analyze the performance of our distributed algorithm under the CONGEST distributed model as well as the k-machine model, a model for large-scale distributed computations, and show that it can be efficiently implemented.

2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

2019-10-02

Kevin Bassler

ML / AI

Reduced network extremal ensemble learning (RenEEL) scheme for community detection in complex networks

We introduce an ensemble learning scheme for community detection in complex networks. The scheme uses a Machine Learning algorithmic paradigm we call Extremal Ensemble Learning. It uses iterative extremal updating of an ensemble of network partitions, which can be found by a conventional base algorithm, to find a node partition that maximizes modularity. At each iteration, core groups of nodes that are in the same community in every ensemble partition are identified and used to form a reduced network. Partitions of the reduced network are then found and used to update the ensemble. The smaller size of the reduced network makes the scheme efficient. We use the scheme to analyze the community structure in a set of commonly studied benchmark networks and find that it outperforms all other known methods for finding the partition with maximum modularity.

Scientific Reports 9, 14234 (2019).

2019-09-12

Guoning Chen

Visualization

Integral Curve Clustering and Simplification for Flow Visualization: A Comparative Evaluation

Unsupervised clustering techniques have been widely applied to flow simulation data to alleviate clutter and occlusion in theresulting visualization. However, there is an absence of systematic guidelines for users to evaluate (both quantitatively and visually) theappropriate clustering technique and similarity measures for streamline and pathline curves. In this work, we provide an overview of anumber of prevailing curve clustering techniques. We then perform a comprehensive experimental study to qualitatively and quantitativelycompare these clustering techniques coupled with popular similarity measures used in the flow visualization literature. Based on ourexperimental results, we derive empirical guidelines for selecting the appropriate clustering technique and similarity measure given therequirements of the visualization task. We believe our work will inform the task of generating meaningful reduced representations forlarge-scale flow data and inspire the continuous investigation of a more refined guidance on clustering technique selection.

IEEE Transactions on Visualization and Computer Graphics (2019).

2019-08-30

Ioannis Kakadiaris

Image Analysis

3D Face Reconstruction From Volumes of Videos Using a Mapreduce Framework

As video blogs become favorable to the commonage, egocentric videos generate tremendous big video data, which capture a large number of interpersonal social events. There are significant challenges on retrieving rich social information, such as human identities, emotions and other interaction information from these massive video data. Limited methods have been proposed so far to address the issue of the unlabeled data. In this paper, we present a fully-automatic system retrieving both sparse 3D facial shape and dense 3D face, from which more face-related information can be predicted during social communication. First, we localize facial landmarks from 2D videos and retrieve sparse 3D shape from motion. Second, we apply the retrieved sparse 3D shape as a prior estimation of dense 3D face mesh. To deal with big social videos in a scalable manner, we design the proposed system on a Map/Reduce framework. Tested on FEI and BU-4DFE face datasets, we improve time efficiency by 92% and 73% respectively without accuracy loss.

IEEE Access ( Volume 7).

2019-08-27

Scientific Computing

Assemblies of calcium/calmodulin-dependent kinase II with actin and their dynamic regulation by calmodulin in dendritic spines

Calcium/calmodulin-dependent kinase II (CaMKII) plays a key role in the plasticity of dendritic spines. Calcium signals cause calcium−calmodulin to activate CaMKII, which leads to remodeling of the actin filament (F-actin) network in the spine. We elucidate the mechanism of the remodeling by combining computer simulations with protein array experiments and electron microscopic imaging, to arrive at a structural model for the dodecameric complex of CaMKII with F-actin. The binding interface involves multiple domains of CaMKII. This structure explains the architecture of the micrometer-scale CaMKII/F-actin bundles arising from the multivalence of CaMKII. We also show that the regulatory domain of CaMKII may bind either calmodulin or F-actin, but not both. This frustration, along with the multipartite nature of the binding interface, allows calmodulin transiently to strip CaMKII from actin assemblies so that they can reorganize. This observation therefore provides a simple mechanism by which the structural dynamics of CaMKII establishes the link between calcium signaling and the morphological plasticity of dendritic spines.

Proceedings of the National Academy of Sciences September 2019, 116 (38).

2019-06-12

Pietro Milillo

Image Analysis, Scientific Computing

Pre-Collapse Space Geodetic Observations of Critical Infrastructure: The Morandi Bridge, Genoa, Italy

We present a methodology for the assessment of possible pre-failure bridge deformations, based on Synthetic Aperture Radar (SAR) observations. We apply this methodology to obtain a detailed 15-year survey of the Morandi bridge (Polcevera Viaduct) in the form of relative displacements across the structure prior to its collapse on August 14th 2018. We generated a displacement map for the structure from space-based SAR measurements acquired by the Italian constellation COSMO-SkyMed and the European constellation Sentinel-1A/B over the period 2009–2018. Historical satellite datasets include Envisat data spanning 2003–2011. The map reveals that the bridge was undergoing an increased magnitude of deformations over time prior to its collapse. This technique shows that the deck next to the collapsed pier was characterized since 2015 by increasing relative displacements. The COSMO-SkyMed dataset reveals the increased deformation magnitude over time of several points located near the strands of this deck between 12th March 2017 and August 2018.

Pre-Collapse Space Geodetic Observations of Critical Infrastructure: The Morandi Bridge, Genoa, Italy

2019-06-05

Arjun Mukherjee

Natural Language Processing

Pro/Con: Neural Detection of Stance in Argumentative Opinions

Accurate information from both sides of the contemporary issues is known to be an ‘antidote in confirmation bias’. While these types of information help the educators to improve their vital skills including critical thinking and open-mindedness, they are relatively rare and hard to find online. With the well-researched argumentative opinions (arguments) on controversial issues shared by Procon.org in a non-partisan format, detecting the stance of arguments is a crucial step to automate organizing such resources. We use a universal pretrained language model with weight-dropped LSTM neural network to leverage the context of an argument for stance detection on the proposed dataset. Experimental results show that the dataset is challenging, however, utilizing the pretrained language model fine-tuned on context information yields a general model that beats the competitive baselines. We also provide analysis to find the informative segments of an argument to our stance detection model and investigate the relationship between the sentiment of an argument with its stance.

Social, Cultural, and Behavioral Modeling, SBP-BRiMS (2019).

Hewlett Packard Enterprise Data Science Institute

Publications

Predicting intraday jumps in stock prices using liquidity measures and technical indicators

"Artificial intelligence and machine learning in nephropathology." Kidney International 98, no. 1 (2020): 65-75

On the dynamics of user engagement in news comment media

Efficient Distributed Community Detection in the Stochastic Block Model

Reduced network extremal ensemble learning (RenEEL) scheme for community detection in complex networks

Integral Curve Clustering and Simplification for Flow Visualization: A Comparative Evaluation

3D Face Reconstruction From Volumes of Videos Using a Mapreduce Framework

Assemblies of calcium/calmodulin-dependent kinase II with actin and their dynamic regulation by calmodulin in dendritic spines

Pre-Collapse Space Geodetic Observations of Critical Infrastructure: The Morandi Bridge, Genoa, Italy

Pro/Con: Neural Detection of Stance in Argumentative Opinions