Publications

The recently released and updated Pre-K–12 Guidelines for Assessment and Instruction in Statistics Education (GAISE II; Bargagliotti et al., 2020) provides guidance as to how teachers can support the development of data literacy for all students in the pre-K–12 curriculum. However, to truly meet the vision of the GAISE II report and to support all students in developing data literacy for today’s societies, significant transformations need to be made to the educational system as a whole to build capacity for such development. In this article we discuss the current state of the K–12 curriculum focusing on the mathematics curriculum where statistics and data concepts are most frequently situated, presenting some challenges and exciting examples. We then discuss areas of need for capacity building that must come at all levels, including: K–12 school curriculum, K–12 teacher professional development, K–12 teacher preparation, statistics and data science education research, and policies. We also provide a set of recommendations for building capacity to develop the data literacy of all students through the teaching of data science and statistics concepts and practices in the K–12 mathematics curriculum to support democratic equity through engaged citizenship.

Statistical literacy is key in this heavily polarized information age for an informed and critical citizenry to make sense of arguments in the media and society. The responsibility of developing statistical literacy is often left to the K-12 mathematics curriculum. In this article, we discuss our investigation of K-8 students’ current opportunities to learn statistics created by state mathematics standards. We analyze the standards for alignment to the Guidelines for the Assessment and Instruction in Statistics Education (GAISE II) PreK-12 report and summarize the conceptual themes that emerged. We found that while states provide K-8 students opportunities to analyze and interpret data, they do not offer many opportunities for students to engage in formulating questions and collecting/considering data. We discuss the implications of the findings for policy makers and researchers and provide recommendations for policy makers and standards writers.

In the Anthropocene statistics, data science, and mathematical models have become a perversion of reality that society has largely chosen to ignore and is embraced as a great savior because people often view numbers as objective purveyors of truth. However, numbers do not interpret themselves, they do not tell their own story; people do that in all their subjective glory. In this chapter, I start by making connections between the Anthropocene and the disciplines of statistics and data science specifically through the context of spatial data. From this discussion I focus on two main points, which I connect to education. The first is that there is a dialectic tension involved in spatial data enquiry between creating new realities using spatial data and using spatial data to make sense of our reality. The second point is that people can choose how to investigate and use spatial data based on their ethics. I believe students should have opportunities to investigate and use spatial statistics through a spatial justice lens both to learn about the world around them and to shape the world around them.

Computer Aided Diagnosis (CAD) systems for renal histopathology applications aim to understand and replicate nephropathologists’ assessments of individual morphological compartments (e.g. glomeruli) to render case-level histological diagnoses. Deep neural networks (DNNs) hold great promise in addressing the poor intra- and interobserver agreement between pathologists. This being said, the generalization ability of DNNs heavily depends on the quality and quantity of training labels. Current “consensus” labeling strategies require multiple pathologists to evaluate every compartment unit over thousands of crops, resulting in enormous annotative costs. Additionally, these techniques fail to address the underlying reproducibility issues we observe across various diagnostic feature assessment tasks. To address both of these limitations, we introduce MorphSet, an end-to-end architecture inspired by Set Transformers which maps the combined encoded representations of Monte Carlo (MC) sampled glomerular compartment crops to produce Whole Slide Image (WSI) predictions on a case basis without the need for expensive fine-grained morphological feature labels. To evaluate performance, we use a kidney transplant Antibody Mediated Rejection (AMR) dataset, and show that we are able to achieve 98.9% case level accuracy, outperforming the consensus label baseline. Finally, we generate a visualization of prediction confidence derived from our MC evaluation experiments, which provides physicians with valuable feedback.

The novel coronavirus has forced the world to interact with data visualizations in order to make decisions at the individual level that have, sometimes, grave consequences. As a result, the lack of statistical literacy among the general public, as well as organizations that have a responsibility to share accurate, clear, and timely information with the general public, has resulted in widespread (mis)representations and (mis)interpretations. In this article, we showcase examples of how data related to the COVID-19 pandemic has been (mis)represented in the media and by governmental agencies and discuss plausible reasons why it has been (mis)represented. We then build on these examples to draw connections to how they could be used to enhance statistics teaching and learning, especially as it relates to secondary and introductory tertiary statistics and quantitative reasoning coursework.

Popular computational catalyst design strategies rely on the identification of reactivity descriptors, which can be used along with Brønsted−Evans−Polanyi (BEP) and scaling relations as input to a microkinetic model (MKM) to make predictions for activity or selectivity trends. The main benefit of this approach is related to the inherent dimensionality reduction of the large material space to just a few catalyst descriptors. Conversely, it is well documented that a small set of descriptors is insufficient to capture the intricacies and complexities of a real catalytic system. The inclusion of coverage effects through lateral adsorbate-adsorbate interactions can narrow the gap between simplified descriptor predictions and real systems, but mean-field MKMs cannot properly account for local coverage effects. This shortcoming of the mean-field approximation can be rectified by switching to a lattice-based kinetic Monte Carlo (kMC) method using cluster expansion representation of adsorbate−adsorbate lateral interactions.

Using the prototypical CO oxidation reaction as an example, we critically evaluate the benefits of kMC over MKM in terms of trend predictions and computational cost when using only a small set of input parameters. After confirming that in the absence of lateral interactions the kMC and MKM approaches yield identical trends and mechanistic information, we observed substantial differences between the two kinetic models when lateral interactions were introduced. The mean-field implementation applies coverage corrections directly to the descriptors, causing an artificial overprediction of the activity of strongly binding metals. In contrast, the cluster expansion in kMC implementation can differentiate among the highly active metals but it is very sensitive to the set of included interaction parameters. Considering that computational screening relies on a minimal set of descriptors, for which MKM makes reasonable trend predictions at a ca. three orders of magnitude lower computational cost than kMC, the MKM approach does provide a better entry point for computational catalyst design.

The concept of a connected world using Internet of Things (IoT) has already taken pace during this decade. The efficient hardware and high throughput networks have made it possible to connect billions of devices, collecting and transmitting useable information. The benefit of IoT devices is that they enable automation however, a significant amount of energy is required for billions of connected devices communicating with each other. This requirement of energy, unless managed, can be one of the barriers in the complete implementation of IoT systems. This paper presents the energy management system for IoT devices. Both hardware and software aspects are considered. Energy transparency has been achieved by modelling energy consumed during sensing, processing, and communication. A multi-agent system has been introduced to model the IoT devices and their energy consumptions. Genetic algorithm is used to optimize the parameters of the multi-agent system. Finally, simulation tools such as MATLAB Simulink and OpenModelica are used to test the system. The optimization results have revealed substantial energy consumption with the implementation of decentralized intelligence of the multi-agent system.

To manage the COVID-19 pandemic, development of rapid, selective, sensitive diagnostic systems for early stage β-coronavirus severe acute respiratory syndrome (SARS-CoV-2) virus protein detection is emerging as a necessary response to generate the bioinformatics needed for efficient smart diagnostics, optimization of therapy, and investigation of therapies of higher efficacy. The urgent need for such diagnostic systems is recommended by experts in order to achieve the mass and targeted SARS-CoV-2 detection required to manage the COVID-19 pandemic through the understanding of infection progression and timely therapy decisions. To achieve these tasks, there is a scope for developing smart sensors to rapidly and selectively detect SARS-CoV-2 protein at the picomolar level. COVID-19 infection, due to human-to-human transmission, demands diagnostics at the point-of-care (POC) without the need of experienced labor and sophisticated laboratories. Keeping the above-mentioned considerations, we propose to explore the compartmentalization approach by designing and developing nanoenabled miniaturized electrochemical biosensors to detect SARS-CoV-2 virus at the site of the epidemic as the best way to manage the pandemic. Such COVID-19 diagnostics approach based on a POC sensing technology can be interfaced with the Internet of things and artificial intelligence (AI) techniques (such as machine learning and deep learning for diagnostics) for investigating useful informatics via data storage, sharing, and analytics. Keeping COVID-19 management related challenges and aspects under consideration, our work in this review presents a collective approach involving electrochemical SARS-CoV-2 biosensing supported by AI to generate the bioinformatics needed for early stage COVID-19 diagnosis, correlation of viral load with pathogenesis, understanding of pandemic progression, therapy optimization, POC diagnostics, and diseases management in a personalized manner.

Previous efforts to understand structure-function relationships in high ionic conductivity materials for solid state batteries have predominantly relied on density functional theory (DFT-) based ab initio molecular dynamics (MD). Such simulations, however, are computationally demanding and cannot be reasonably applied to large systems containing more than a hundred atoms. Here, an artificial neural network (ANN) is trained to accelerate the calculation of high accuracy atomic forces and energies used during such MD simulations. After carefully training a robust ANN for four and five element systems, nearly identical lithium ion diffusivities are obtained for Li10GeP2S12 (LGPS) when benchmarking the ANN-MD results with DFT-MD. Applying the ANN-MD approach, the effect of chlorine doping on the lithium diffusivity is calculated in an LGPS-like structure and it is found that a dopant concentration of 1.3% maximizes ionic conductivity. The optimal concentration balances the competing consequences of effective atomic radii and dielectric constants on lithium diffusion and agrees with the experimental composition. Performing simulations at the resolution necessary to model experimentally relevant and optimal concentrations would be infeasible with traditional DFT-MD. Systems that require a large number of simulated atoms can be studied more efficiently while maintaining high accuracy with the proposed ANN-MD framework.

Information on the 5d level centroid shift (ɛc) of rare-earth ions is critical for determining the chemical shift and the Coulomb repulsion parameter as well as predicting the luminescence and thermal response of rare-earth substituted inorganic phosphors. The magnitude of ɛc depends on the binding strength between the rare-earth ion and its coordinating ligands, which is difficult to quantify a priori and makes phosphor design particularly challenging. In this work, a tree-based ensemble learning algorithm employing extreme gradient boosting is trained to predict ɛc by analyzing the optical properties of 160 Ce3+ substituted inorganic phosphors. The experimentally measured ɛc of these compounds was featurized using the materials' relative permittivity (ɛr), average electronegativity, average polarizability, and local geometry. Because the number of reported ɛr values is limited, it was necessary to utilize a predicted relative permittivity (ɛr,SVR) obtained from a support vector regressor trained on data from ∼2800 density functional theory calculations. The remaining features were compiled from open-source databases and by analyzing the rare-earth coordination environment from each Crystallographic Information File. The resulting ensemble model could reliably estimate ɛc and provide insight into the optical properties of Ce3+-activated inorganic phosphors.