Data Assessment and Readiness for AI

1st International Workshop on

Data Assessment and
Readiness for AI

@ Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD)

11-14 May, 2021, New Delhi, India

Important Notice: The safety and well-being of all workshop participants is our priority. Depending on the COVID-19 situation, we will have the workshop with PAKDD either as planned in New Delhi, India or as an online event.

In the last several years, AI/ML technologies have become pervasive in academia and industry, finding its utility in newer and challenging applications. While there has been a focus to build better, smarter and automated AI pipelines, little work has been done to systematically understand the challenges in determining the readiness of data to be fed to this pipeline. Given a business problem, questions whose answers are still elusive include: how does one select the right data from a data source? Is the data collected of the appropriate quality? If not, what cleaning techniques should be applied, and how to determine if the goals of data cleaning are achieved? and so on. Researchers and practitioners alike have increasingly come to the realization that the real-world utility of an ML model is only as good as the data it has been trained on. Therefore, developing techniques and frameworks that help us determine the readiness of data for training and deploying machine learning models is of utmost importance.

Important Dates

Paper Submission : 10th Feb, 2021 (Extended Deadline)
Author Notification : Feb 22, 2021
Camera-Ready Submission : Mar 8, 2021
All deadlines are at 23:59 Pacific Standard Time (PST).

Call for Papers

Workshop Scope

The goal of this workshop will be to get researchers working in the fields of data acquisition, data labeling, data quality, data preparation and AutoML areas to understand how the data issues, their detection and remediation will help towards building better models. With the focus on different modalities such as structured data, time series data, text data and graph data, this workshop invites researchers from academia and industry to submit novel propositions for systematically identifying and mitigating data issues for making it AI ready. Methods of data assessment can change depending on the modality of the data. This workshop will invite submissions for data readiness for different modalities: structured (or tabular) data, unstructured (such as text) data, graph structured (relational, network) data, time series data, etc. We would like to explore state-of-the-art deep learning and AI concepts such as deep reinforcement learning, graph neural networks, self-supervised learning, capsule networks and adversarial learning to address the problems of data assessment and readiness.

Topics of Interest

  • Algorithms for explainable data quality detection and remediation for ML
  • Automated data cleaning workflows with explanations
  • Smarter data visualizations for high dimensional data
  • Autolabel datasets from small labels of data
  • Label noise detection, explanation and incorporating feedback
  • Incorporating domain knowledge for data cleaning and data transformations
  • Data privacy and encryption techniques, with impact to ML pipeline
  • Auto ordering of datasets based on difficulty level with explanations
  • Outlier (or anomaly) detection and mitigation in data
  • Detection of bias in data
  • Handling corrupted, missing and uncertain data
  • Noisy Data Evaluation and Cleaning Recommendation
  • Syntactic Data Validations

Submission Instructions

Authors are invited to submit original, previously unpublished research papers. Research papers, up to 12 pages, describing original and novel research work, including research results and evaluations should be submitted. Research papers should not have been published or submitted for publication concurrently elsewhere.

Papers should be written in English, following Springer LNCS style including all text, references, appendices, and figures. Since it is single blind review process, please include author names and affiliations. For formatting instructions and templates, see the Springer Web page: (LNCS Template Overleaf). Submitted papers will be evaluated by at least three members of the international program committee. At least one author of each accepted paper must register and participate in the workshop to present the paper. The workshop papers will be included in LNCS/LNAI post Proceedings of PAKDD Workshops published by Springer .


Submissions should be made via the Easychair system through the submission page available here:

Authors should consult Springer’s authors’ guidelines and use their proceedings templates, either for LaTeX or for Word, for the preparation of their papers. Springer encourages authors to include their ORCIDs in their papers. In addition, the corresponding author of each paper, acting on behalf of all of the authors of that paper, must complete and sign a Consent-to-Publish form. The corresponding author signing the copyright form should match the corresponding author marked on the paper. Once the files have been sent to Springer, changes relating to the authorship of the papers cannot be made.

The submitted papers must not be previously published anywhere and must not be under consideration by any other conference or journal during the data-datareadiness2021 review process.


Laure Berti-Equille
Research Director in Computer Science at IRD, the French Institute of Research in Sustainability Science
Invited talk on Data curation for ML: Toward a Principled Approach

Laure Berti-Equille is a Research Director in Computer Science at IRD, the French Institute of Research in Sustainability Science since 2011. Before, she was a full Professor at Aix-Marseille University (AMU) in France (2017-2018). From 2014-2017, she was a Senior Scientist of Qatar Computing Research Institute (Hamad Bin Khalifa University), a research institute in Computer Science from Qatar Foundation. From 2000-2010, she was a tenured Associate Professor at University of Rennes 1 in France, and a 2-years visiting researcher at AT&T Labs Research in New Jersey, USA, as a recipient of the prestigious European Marie Curie Outgoing Fellowship (2007-2009). Her research work is at the intersection of large-scale data analytics and machine learning with a focus on data quality and applied research with many collaborations with industries and more than 80 publications and three monographs. She organized several scientific workshops in conjunction with top-tier conferences such as SIGMOD and VLDB and gave many tutorials and keynote talks (KDD, CIKM, ICDE, ICDM). Laure is serving as an associated editor of various scientific journals: VLDB Journal, ACM Journal on Data and Information Quality, and Frontiers in Big Data Science, and served in many conference program committees (VLDB, SIGMOD, ICDE). She has received various grants from the French Agency for National Research (ANR), the French National Research Council (CNRS), and the European Union.

Organizing Committee

Program Committee

  • Shanmukha C Guttula, IBM Research
  • Aniya Aggarwal, IBM Research
  • Pranay Lohia, IBM Research
  • Vitobha Munigala, IBM Research
  • Ruhi Sharma Mittal, IBM Research
  • Lokesh N, IIT-B
  • Naveen Panwar, IBM Research
  • Kishalay Das, Indian Institute Of Science
  • Vishal Saley, Indian Institute Of Science
  • Arushi Prakash, Amazon
  • Paarth Gupta, SMVDU

Contact Information

For any queries reach out to us at