Data Assessment and Readiness for AI

1st International Workshop on

Data Assessment and
Readiness for AI

@ Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)

11-14 May, 2021, New Delhi, India

Important Notice: The safety and well-being of all workshop participants is our priority. Depending on the COVID-19 situation, we will have the workshop with PAKDD either as planned in New Delhi, India or as an online event.

In the last several years, AI/ML technologies have become pervasive in academia and industry, finding its utility in newer and challenging applications. While there has been a focus to build better, smarter and automated AI pipelines, little work has been done to systematically understand the challenges in determining the readiness of data to be fed to this pipeline. Given a business problem, questions whose answers are still elusive include: how does one select the right data from a data source? Is the data collected of the appropriate quality? If not, what cleaning techniques should be applied, and how to determine if the goals of data cleaning are achieved? and so on. Researchers and practitioners alike have increasingly come to the realization that the real-world utility of an ML model is only as good as the data it has been trained on. Therefore, developing techniques and frameworks that help us determine the readiness of data for training and deploying machine learning models is of utmost importance.

Important Dates

Paper Submission : 7th Feb, 2021 (Extended Deadline)
Author Notification : Feb 22, 2021
Camera-Ready Submission : Mar 8, 2021
All deadlines are at 23:59 Pacific Standard Time (PST).

Call for Papers

Workshop Scope

The goal of this workshop will be to get researchers working in the fields of data acquisition, data labeling, data quality, data preparation and AutoML areas to understand how the data issues, their detection and remediation will help towards building better models. With the focus on different modalities such as structured data, time series data, text data and graph data, this workshop invites researchers from academia and industry to submit novel propositions for systematically identifying and mitigating data issues for making it AI ready. Methods of data assessment can change depending on the modality of the data. This workshop will invite submissions for data readiness for different modalities: structured (or tabular) data, unstructured (such as text) data, graph structured (relational, network) data, time series data, etc. We would like to explore state-of-the-art deep learning and AI concepts such as deep reinforcement learning, graph neural networks, self-supervised learning, capsule networks and adversarial learning to address the problems of data assessment and readiness.

Topics of Interest

  • Algorithms for explainable data quality detection and remediation for ML
  • Automated data cleaning workflows with explanations
  • Smarter data visualizations for high dimensional data
  • Autolabel datasets from small labels of data
  • Label noise detection, explanation and incorporating feedback
  • Incorporating domain knowledge for data cleaning and data transformations
  • Data privacy and encryption techniques, with impact to ML pipeline
  • Auto ordering of datasets based on difficulty level with explanations
  • Outlier (or anomaly) detection and mitigation in data
  • Detection of bias in data
  • Handling corrupted, missing and uncertain data
  • Noisy Data Evaluation and Cleaning Recommendation
  • Syntactic Data Validations

Submission Instructions

Authors are invited to submit original, previously unpublished research papers. Research papers, up to 10 pages, describing original and novel research work, including research results and evaluations should be submitted. Research papers should not have been published or submitted for publication concurrently elsewhere.

Papers should be written in English, following Springer LNCS style including all text, references, appendices, and figures. Since it is single blind review process, please include author names and affiliations. For formatting instructions and templates, see the Springer Web page: (LNCS Template Overleaf). Submitted papers will be evaluated by at least three members of the international program committee. At least one author of each accepted paper must register and participate in the workshop to present the paper. The workshop papers will be included in LNCS/LNAI post Proceedings of PAKDD Workshops published by Springer .


Submissions should be made via the Easychair system through the submission page available here:

The submitted papers must not be previously published anywhere and must not be under consideration by any other conference or journal during the data-datareadiness2021 review process.


To be announced

Organizing Committee

Program Committee

To be announced.

Contact Information

For any queries reach out to us at