Research: Main Fields
General
framework
Our main field of research is
the statistical processing of qualitative and
textual data. The leading case could be a very
large sample survey data set comprising both
closed-end and open ended question. Those
statistical processing, upstream of most
statistical modelling techniques, concern mainly
large batteries of qualitative data and large
corpora of textual data. These works consist of
conceiving new techniques of analysis with the
corresponding validation or assessment tools,
scrutinizing their uses and (possible, probable)
misuses, and exploring new fields of
investigation
1 - Textual Data Analysis
Statistical processing of text
corpora and of complex data sets comprising both
numerical and textual data. Applications concern
primarily the processing of responses to open ended
questions in socio-economic sample
surveys.
2 - Methodology of sample
surveys in social sciences and
economics
Survey techniques in social
sciences. Controlling data quality. Nonresponses
and response rates in random and quota sample
surveys. Techniques of statistical matching, survey
grafting, ascription, missing values imputation.
Strategy of survey data processing
3 - A priori Structures
in Data Analysis
Dealing with a priori
structures in exploratory data analysis (Spatial
data, longitudinal data, meta-data, exogeneous
information). Such a priori structure could
be an a posteriori structure, obtained from
a previous phase of analysis performed either on
the same data set, or on a related data set.
Contiguity analysis and related methods.
Classification (clustering) involving contiguity
constraints.
4 - Inference in
multidimensional contexts
Validity of results (case of
principal axes methods), assessments of
visualization techniques: classical inference,
resampling techniques (bootstrap, partial
bootstrap, total bootstrap, bootstrapping
variables, cross-validation).
5 - Software for analysing
multidimensional categorical data and textual
data
Applying the methods of
multivariate descriptive analysis to sample surveys
data requires specific implementation and dedicated
software. The software SPAD, conceived by L. Lebart
and A. Morineau, has been developed at the outset
in a freeware context up to the year 1987
(non-profit organization CESIA), in the spirit of
most academic software at that time (free access to
the source code). Then, microcomputer interfaces
for that software have been developed by a private
company (CISIA, followed by DECISIA) and the
acronym SPAD designates by now a commercial
product. The implementation of our pieces of
research is carried out at present in the framework
of an academic software named DtmVic (Data and text
Mining: Visualization, Inference, Classification)
that can be used freely by students and research
scientists.
|
 |