|
Language and Computers: Markus Dickinson, Chris Brew and Detmar Meurers |
5 |
|
|
Brief Contents |
7 |
|
|
Contents |
9 |
|
|
What This Book Is About |
13 |
|
|
Overview for Instructors |
15 |
|
|
Acknowledgments |
19 |
|
|
1 Prologue: Encoding Language on Computers |
21 |
|
|
1.1 Where do we start? |
21 |
|
|
1.1.1 Encoding language |
22 |
|
|
1.2 Writing systems used for human languages |
22 |
|
|
1.2.1 Alphabetic systems |
23 |
|
|
1.2.2 Syllabic systems |
26 |
|
|
1.2.3 Logographic writing systems |
28 |
|
|
1.2.4 Systems with unusual realization |
31 |
|
|
1.2.5 Relation to language |
31 |
|
|
1.3 Encoding written language |
32 |
|
|
1.3.1 Storing information on a computer |
32 |
|
|
1.3.2 Using bytes to store characters |
34 |
|
|
1.4 Encoding spoken language |
37 |
|
|
1.4.1 The nature of speech |
37 |
|
|
1.4.2 Articulatory properties |
38 |
|
|
1.4.3 Acoustic properties |
38 |
|
|
1.4.4 Measuring speech |
40 |
|
|
Under the Hood 1: Reading a spectrogram |
44 |
|
|
1.4.5 Relating written and spoken language |
44 |
|
|
Under the Hood 2: Language modeling for automatic speech recognition |
46 |
|
|
2 Writers’ Aids |
53 |
|
|
2.1 Introduction |
53 |
|
|
2.2 Kinds of spelling errors |
54 |
|
|
2.2.1 Nonword errors |
55 |
|
|
2.2.2 Real-word errors |
57 |
|
|
2.3 Spell checkers |
58 |
|
|
2.3.1 Nonword error detection |
59 |
|
|
2.3.2 Isolated-word spelling correction |
61 |
|
|
Under the Hood 3: Dynamic programming |
64 |
|
|
2.4 Word correction in context |
69 |
|
|
2.4.1 What is grammar? |
70 |
|
|
Under the Hood 4: Complexity of languages |
76 |
|
|
2.4.2 Techniques for correcting words in context |
78 |
|
|
Under the Hood 5: Spell checking for web queries |
82 |
|
|
2.5 Style checkers |
84 |
|
|
3 Language Tutoring Systems |
89 |
|
|
3.1 Learning a language |
89 |
|
|
3.2 Computer-assisted language learning |
91 |
|
|
3.3 Why make CALL tools aware of language? |
93 |
|
|
3.4 What is involved in adding linguistic analysis? |
96 |
|
|
3.4.1 Tokenization |
96 |
|
|
3.4.2 Part-of-speech tagging |
98 |
|
|
3.4.3 Beyond words |
100 |
|
|
3.5 An example ICALL system: TAGARELA |
101 |
|
|
3.6 Modeling the learner |
103 |
|
|
4 Searching |
111 |
|
|
4.1 Introduction |
111 |
|
|
4.2 Searching through structured data |
113 |
|
|
4.3 Searching through unstructured data |
115 |
|
|
4.3.1 Information need |
115 |
|
|
4.3.2 Evaluating search results |
116 |
|
|
4.3.3 Example: Searching the web |
117 |
|
|
4.3.4 How search engines work |
120 |
|
|
Under the Hood 6: A brief tour of HTML |
123 |
|
|
4.4 Searching semi-structured data with regular expressions |
127 |
|
|
4.4.1 Syntax of regular expressions |
128 |
|
|
4.4.2 Grep: An example of using regular expressions |
130 |
|
|
Under the Hood 7: Finite-state automata |
132 |
|
|
4.5 Searching text corpora |
135 |
|
|
4.5.1 Why corpora? |
136 |
|
|
4.5.2 Annotated language corpora |
137 |
|
|
Under the Hood 8: Searching for linguistic patterns on the web |
138 |
|
|
5 Classifying Documents: From Junk Mail Detection to Sentiment Classification |
147 |
|
|
5.1 Automatic document classification |
147 |
|
|
5.2 How computers “learn ” |
149 |
|
|
5.2.1 Supervised learning |
150 |
|
|
5.2.2 Unsupervised learning |
151 |
|
|
5.3 Features and evidence |
151 |
|
|
5.4 Application: Spam filtering |
153 |
|
|
5.4.1 Base rates |
155 |
|
|
5.4.2 Payoffs |
159 |
|
|
5.4.3 Back to documents |
159 |
|
|
5.5 Some types of document classifiers |
160 |
|
|
5.5.1 The Naive Bayes classifier |
160 |
|
|
Under the Hood 9: Naive Bayes |
162 |
|
|
5.5.2 The perceptron |
165 |
|
|
5.5.3 Which classifier to use |
168 |
|
|
5.6 From classification algorithms to context of use |
169 |
|
|
6 Dialog Systems |
173 |
|
|
6.1 Computers that “converse”? |
173 |
|
|
6.2 Why dialogs happen |
175 |
|
|
6.3 Automating dialog |
176 |
|
|
6.3.1 Getting started |
176 |
|
|
6.3.2 Establishing a goal |
177 |
|
|
6.3.3 Accepting the user’s goal |
177 |
|
|
6.3.4 The caller plays her role |
178 |
|
|
6.3.5 Giving the answer |
178 |
|
|
6.3.6 Negotiating the end of the conversation |
179 |
|
|
6.4 Conventions and framing expectations |
179 |
|
|
6.4.1 Some framing expectations for games and sports |
180 |
|
|
6.4.2 The framing expectations for dialogs |
180 |
|
|
6.5 Properties of dialog |
181 |
|
|
6.5.1 Dialog moves |
181 |
|
|
6.5.2 Speech acts |
182 |
|
|
6.5.3 Conversational maxims |
184 |
|
|
6.6 Dialog systems and their tasks |
186 |
|
|
6.7 Eliza |
187 |
|
|
Under the Hood 10: How Eliza works |
192 |
|
|
6.8 Spoken dialogs |
194 |
|
|
6.9 How to evaluate a dialog system |
195 |
|
|
6.10 Why is dialog important? |
196 |
|
|
7 Machine Translation Systems |
201 |
|
|
7.1 Computers that “translate”? |
201 |
|
|
7.2 Applications of translation |
203 |
|
|
7.2.1 Translation needs |
203 |
|
|
7.2.2 What is machine translation really for? |
204 |
|
|
7.3 Translating Shakespeare |
205 |
|
|
7.4 The translation triangle |
208 |
|
|
7.5 Translation and meaning |
211 |
|
|
7.6 Words and meanings |
213 |
|
|
7.6.1 Words and other languages |
213 |
|
|
7.6.2 Synonyms and translation equivalents |
214 |
|
|
7.7 Word alignment |
214 |
|
|
7.8 IBM Model 1 |
218 |
|
|
Under the Hood 11: The noisy channel model |
220 |
|
|
Under the Hood 12: Phrase-based statistical translation |
224 |
|
|
7.9 Commercial automatic translation |
225 |
|
|
7.9.1 Translating weather reports |
225 |
|
|
7.9.2 Translation in the European Union |
227 |
|
|
7.9.3 Prospects for translators |
228 |
|
|
8 Epilogue: Impact of Language Technology |
235 |
|
|
References |
241 |
|
|
Concept Index |
247 |
|