Crowd Time Story

-- converting your favorite picture books to audio books --

Demo Video

Motivation

Parents are often asked by their children to read them bedtime stories. As much as it shows the children how much their parents love them, the parents might not always be available or have enough energy to satisfy their children. To help easing the parents' burden and guilt, as well as creating fun and interesting audio books for children (and adults!), we came up with the idea of creating a system that converts picture books to audio books. However with the current technology it could be difficult to correctly extract text from images, to automatically assign lines to the right character, and to have diverse and fitting voice acting parts assigned to each character. To solve these challenges that are comparatively simple for human to process, we decided to connect our system with human powered work forces. We also want this to be a personalized experience that is warm and special so we wanted to allow users to have the ability to tweak the audio book a little by changing character names, recording audio parts themselves for specific characters, or replace a certain text to better fit one's culture or preferences. We truly believe this system can make picture books and story times a more unique and entertaining experience.

Related Works

Crowd time story is related to crowd sourcing systems and artificial intelligence especially visual/speech recognition. Tremendous work has been done in the field of crowd sourcing. In the book "Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business" by Jeff Howe, it mentioned the how crowd can solve some problems that are extremely hard for computer programs to solve. Computer algorithms are powerful and fast, but there are endless limitations for what it can do. Amazon Mechanical Turk is a perfect platform for using crowd source to solve some human intelligence tasks called "HIT". We chose Amazon Mechanical Turk to extract scripts from the pictures of the book and ask crowd workers to record audios for each character.

Getting back results from Amazon Mechanical Turk is usually fast. To make the audio book more interactive and friendly, we could adopt visual/speech recognition to capture the script lines and display the lines in a interactive way. But this is a difficult part to implement since speech recognition is a very deep subject. In the paper Printed book to audio book converter for visually impaired , the authors mentioned some of the techniques to convert text books to audio books using visual recognition algorithms. Unfortunately, due to time limitations we were not able to implement this part for our project.

Design

To convert a picture book into an audio book, we wanted the experience to be easy, flexible, and personal. So we broke the user interaction down into three stages allowing more options and customization:

Users are given a selection of already converted book and an option to upload a new book.
Users are given a "translated script" from their picture book where they can customize (name replacement, text replacement, record audio parts for specific character, etc.) or approve for voice acting.
Users are presented with their audio book

To accommodate those three stages, we also broke the backend tasks into two crowd-work-stages that connects the three stages users interact with:

Convert each images into scripts which provides information on characters and lines, as well as hidden data such as story name, page number, and line number.
Have lines from each character recorded by fitting voice actors

Below is a sketch of our basic structure design of the system.
Crowd Time Story Design

Implementation

We implemented the system in python, HTML, JavaScript and CSS. We implemented the web interface using the Jquery library. More specifically, we implemented the homepage using the Bootstrap library and the audio recording HITs using the recorder.js library. For the backend tasks, we implemented the server in python using the Flask framework, organized the structure using the Blueprint library, communicate with Amazon Mechanical Turk using the Crowdlib API, and interact with the database using SQLite3

To convert the images uploaded into a script for later audio recordings, we send each image out in a HIT asking crowd workers to convert the text within each picture into a form composed of multiple lines of character:line. The system keeps track of the story name, the page number, and the line numbers of each line crowd workers transcribe and translate. As users submit the forms the data are stored into the database. And once all the HITs sent out are completed users will be redirected to examine the resulted script

The script is simply each character:line being listed in ascending page order then ascending line order, pulled from the database through a query. In our original intent, once users are being presented with the script, they have the option to customize it to make the story more personal. For example, they can replace character names as their own name or their children's name, they can replace specific words such as hamburger to salad, or they can personally record audio parts for a specific character. Unfortunately with the limited time this feature has been postponed our future work instead.

To convert a script into an actual audio book we first gather all the lines from each character through querying the database and send them out as different HITs for audio recordings. In this case the characters will have consistent voice acting throughout the story even in different pages. As the HIT is being submitted, the audio files will be named in a format that retains information on the story name, the page number, the line number, and be sent to and stored in the server through Ajax calls. Once all audio files are stored users will be redirected to the final product - the audio book, which presents the story a page at a time, and plays all the audio files recorded for that page in order Users are free to switch between pages, play the whole page, or just portions of a page. Through both stages the "confirm/approve" page users interact with will constantly pull from the server to check and update users on how many HITs that were sent out have been submitted.

The system is currently a working prototype in its simplest form. The system assumes the picture books users upload do not have texts cut off between pages, and that users will name each image uploaded as instructed. We've pushed features such as customization, file checking, book shelf sorting and searching, and verification on the script and the audio recordings into the future works. We currently only have the system tested in the Amazon Mechanical Turk sandbox and "pay" for the work regardless of the works submitted.

The system's final structure looks as the following:

	- crowdtimestory
		- crowdtimestory
			- db
				- story.db
			- home
				- __init__.py
				- views.py
			- record
				- __init__.py
				- views.py
				- crowdlib_settings.py
			- script
				- __init__.py
				- views.py
				- crowdlib_settings.py
			- static
				- audio
					- [story_pg_line.wav]
				- css
					- bootstrap.min.css
					- hit1_style.css
					- style.css
				- images
					- [story folders]
						- [pageX.jpg]
					- 1.jpg
					- book.png
					- logo.png
				- js
					- index.js
					- jquery-1.11.0.min.js
					- recorder.js
					- recorderWorker.js
			- templates
				- hit1.html
				- hit2.html
				- home.html
				- index.html
				- portfolio.html
				- record.html
				- result.html
				- script.html
				- upload.html
			- upload
				- __init__.py
				- views.py
			- __init__.py
			- config.py
		- runserver.py

Evaluation

Our current working system successfully allows users to upload images in jpg, png and jpeg formats through our website. We adopted SQLite3 as our database for it's light weight and easy usage. Because our web site currently does not have a heavy traffic, SQLite can easily handle data read and write even though it has less concurrency compared to some other database engines. However, if the system grows as the number of users grow, we might need to switch to some other database such as MySQL etc. Instead of storing the images in our database, the uploaded images will be stored in the file system where the server is running and we store the path of each image in our database for simple access.

Due to the powerful JavaScript for interactive web application, our system is able to allow users to record audios with their microphones and also edit or customizing recordings. Users can repeat the recording process until they're satisfied with the audios. Through Amazon Mechanical Turk, workers will be able to accept HITs and record for their characters.

The system is in beta which means it might not be used as production code. The HITs posted to Amazon Mechanical Turk will be under sandbox, the workers who try to do the HITs will receive a warning saying "CONNECTION UNTRUSTED". However, workers will be able to accept and finish the HITs, and scripts for the stories will be saved in our database except that the audio recordings are stored in the file system for easy access. It would benefit our users if we had a user management system. Currently, one who is trying to create a story will need to keep the web browser open while waiting for the HITs to be done before moving forward to the next step. This could be a problem for a reliable web application. For now, we keep the system in its simplest format and is successfully running without issues. But we did not test our system for massive user visits.

Thanks to Bootstrap our home page interface looks professional and pretty. HIT1 which is for extracting scripts from pictures and HIT2 for voice actings for each character in the book are not fully customized. We minimized the usage of CSS to decorate HITs web pages as well as upload, result and some other pages. This does not in general affect the functionality of our system, but it would be nice to make the user interface more customized.

Also, there're incomplete functions in our system. Originally, we decided to create one more step for user customizing scripts got back from AMT crowd workers. Due to limited implementation time, we had to focus on the core functionalities of our system. We will leave this as one of the future works for our system. We fixed as many security holes as possible for the current working system, but it is not ensured that our system is perfectly secure.

Results

We have produced three audio books under our systems:

Where the wild things are [page 1-3]
Calvin is Awesome
Calvin is Awesome 2

Due to copyright issues, we generated the last two books by ourselves for testing purpose. Users entering our website will be able to access these three books from the book shelf. The books play smoothly and the audios sound very clear.

Conclusions

Crowd time story is a web application which allows users to create audio books for their favorite picture books through crowd work. It is a solid example of utilizing crowd sourcing to produce creative solutions to some problems that computer algorithms are hard to solve. Through development process, we have enhanced our web development skills using Python with Flask framework and JavaScript etc. However, there're endless features that could be implemented for our application on top of the missing features mentioned in the evaluation section. We hope you enjoy the crowd time stories you create through our web application.

Acknowledgements

We would like to thank Dr. Alex Quinn, our instructor who has inspired us to build this awesome web application, who has also been very patient with us and helped us through the period of building this project. We would also like to acknowledge the efforts of Department of ECE at Purdue for introducing this interesting course and dedicating the HCI server for hosting our projects.