Crowd Time Story
-- converting your favorite picture books to audio books --
Parents are often asked by their children to read them bedtime stories. As much as it shows the children how much their parents love them, the parents might not always be available or have enough energy to satisfy their children. To help easing the parents' burden and guilt, as well as creating fun and interesting audio books for children (and adults!), we came up with the idea of creating a system that converts picture books to audio books. However with the current technology it could be difficult to correctly extract text from images, to automatically assign lines to the right character, and to have diverse and fitting voice acting parts assigned to each character. To solve these challenges that are comparatively simple for human to process, we decided to connect our system with human powered work forces. We also want this to be a personalized experience that is warm and special so we wanted to allow users to have the ability to tweak the audio book a little by changing character names, recording audio parts themselves for specific characters, or replace a certain text to better fit one's culture or preferences. We truly believe this system can make picture books and story times a more unique and entertaining experience.
Crowd time story is related to crowd sourcing systems and artificial intelligence especially visual/speech recognition. Tremendous work has been done in the field of crowd sourcing. In the book "Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business" by Jeff Howe, it mentioned the how crowd can solve some problems that are extremely hard for computer programs to solve. Computer algorithms are powerful and fast, but there are endless limitations for what it can do. Amazon Mechanical Turk is a perfect platform for using crowd source to solve some human intelligence tasks called "HIT". We chose Amazon Mechanical Turk to extract scripts from the pictures of the book and ask crowd workers to record audios for each character.
Getting back results from Amazon Mechanical Turk is usually fast. To make the audio book more interactive and friendly, we could adopt visual/speech recognition to capture the script lines and display the lines in a interactive way. But this is a difficult part to implement since speech recognition is a very deep subject. In the paper Printed book to audio book converter for visually impaired , the authors mentioned some of the techniques to convert text books to audio books using visual recognition algorithms. Unfortunately, due to time limitations we were not able to implement this part for our project.
To convert a picture book into an audio book, we wanted the experience to be easy, flexible, and personal. So we broke the user interaction down into three stages allowing more options and customization:
- Users are given a selection of already converted book and an option to upload a new book.
- Users are given a "translated script" from their picture book where they can customize (name replacement, text replacement, record audio parts for specific character, etc.) or approve for voice acting.
- Users are presented with their audio book
- Convert each images into scripts which provides information on characters and lines, as well as hidden data such as story name, page number, and line number.
- Have lines from each character recorded by fitting voice actors
To convert the images uploaded into a script for later audio recordings, we send each image out in a HIT asking crowd workers to convert the text within each picture into a form composed of multiple lines of character:line. The system keeps track of the story name, the page number, and the line numbers of each line crowd workers transcribe and translate. As users submit the forms the data are stored into the database. And once all the HITs sent out are completed users will be redirected to examine the resulted script
The script is simply each character:line being listed in ascending page order then ascending line order, pulled from the database through a query. In our original intent, once users are being presented with the script, they have the option to customize it to make the story more personal. For example, they can replace character names as their own name or their children's name, they can replace specific words such as hamburger to salad, or they can personally record audio parts for a specific character. Unfortunately with the limited time this feature has been postponed our future work instead.
To convert a script into an actual audio book we first gather all the lines from each character through querying the database and send them out as different HITs for audio recordings. In this case the characters will have consistent voice acting throughout the story even in different pages. As the HIT is being submitted, the audio files will be named in a format that retains information on the story name, the page number, the line number, and be sent to and stored in the server through Ajax calls. Once all audio files are stored users will be redirected to the final product - the audio book, which presents the story a page at a time, and plays all the audio files recorded for that page in order Users are free to switch between pages, play the whole page, or just portions of a page. Through both stages the "confirm/approve" page users interact with will constantly pull from the server to check and update users on how many HITs that were sent out have been submitted.
The system is currently a working prototype in its simplest form. The system assumes the picture books users upload do not have texts cut off between pages, and that users will name each image uploaded as instructed. We've pushed features such as customization, file checking, book shelf sorting and searching, and verification on the script and the audio recordings into the future works. We currently only have the system tested in the Amazon Mechanical Turk sandbox and "pay" for the work regardless of the works submitted.
The system's final structure looks as the following:
- crowdtimestory - crowdtimestory - db - story.db - home - __init__.py - views.py - record - __init__.py - views.py - crowdlib_settings.py - script - __init__.py - views.py - crowdlib_settings.py - static - audio - [story_pg_line.wav] - css - bootstrap.min.css - hit1_style.css - style.css - images - [story folders] - [pageX.jpg] - 1.jpg - book.png - logo.png - js - index.js - jquery-1.11.0.min.js - recorder.js - recorderWorker.js - templates - hit1.html - hit2.html - home.html - index.html - portfolio.html - record.html - result.html - script.html - upload.html - upload - __init__.py - views.py - __init__.py - config.py - runserver.py
Our current working system successfully allows users to upload images in jpg, png and jpeg formats through our website. We adopted SQLite3 as our database for it's light weight and easy usage. Because our web site currently does not have a heavy traffic, SQLite can easily handle data read and write even though it has less concurrency compared to some other database engines. However, if the system grows as the number of users grow, we might need to switch to some other database such as MySQL etc. Instead of storing the images in our database, the uploaded images will be stored in the file system where the server is running and we store the path of each image in our database for simple access.
The system is in beta which means it might not be used as production code. The HITs posted to Amazon Mechanical Turk will be under sandbox, the workers who try to do the HITs will receive a warning saying "CONNECTION UNTRUSTED". However, workers will be able to accept and finish the HITs, and scripts for the stories will be saved in our database except that the audio recordings are stored in the file system for easy access. It would benefit our users if we had a user management system. Currently, one who is trying to create a story will need to keep the web browser open while waiting for the HITs to be done before moving forward to the next step. This could be a problem for a reliable web application. For now, we keep the system in its simplest format and is successfully running without issues. But we did not test our system for massive user visits.
Thanks to Bootstrap our home page interface looks professional and pretty. HIT1 which is for extracting scripts from pictures and HIT2 for voice actings for each character in the book are not fully customized. We minimized the usage of CSS to decorate HITs web pages as well as upload, result and some other pages. This does not in general affect the functionality of our system, but it would be nice to make the user interface more customized.
Also, there're incomplete functions in our system. Originally, we decided to create one more step for user customizing scripts got back from AMT crowd workers. Due to limited implementation time, we had to focus on the core functionalities of our system. We will leave this as one of the future works for our system. We fixed as many security holes as possible for the current working system, but it is not ensured that our system is perfectly secure.
We have produced three audio books under our systems:
- Where the wild things are [page 1-3]
- Calvin is Awesome
- Calvin is Awesome 2
We would like to thank Dr. Alex Quinn, our instructor who has inspired us to build this awesome web application, who has also been very patient with us and helped us through the period of building this project. We would also like to acknowledge the efforts of Department of ECE at Purdue for introducing this interesting course and dedicating the HCI server for hosting our projects.