The PHONOLEX project aims at the systematic collection of pronunciation forms ('citation form') of all word tokens that are contained within the BAS speech corpora and a basic coverage of the German language. Systematic differences in pronuciation coding stemming from different projects are egalized and transcriptions verified against a defined set of transcription rules (see Rules of German transcription). The results of this ongoing project is stored in a lexicon database (ASCII or XML) which contains the orthographic form, the standardized citation form, optionally additional linguistic features (as available), the origin, the transcription method (automatic or manual) and - optional - empirically detected pronunciation variants of each token.
The PHONOLEX database can be obtained for commercial or scientific studies from the BAS (see link).