Practical Business Python

Taking care of business, one python script at a time

Mon 23 March 2015

Generating fake data with barnum

Posted by Chris Moffitt in articles   

Introduction

Several years ago, I developed a very simple program called barnum to generate fake data that could be used to test applications. Over the years, I had forgotten about it. With the recent closing of Google code, I decided to take the opportunity to move the code to github and see if it might be useful to people.

Motivation

I am moving the code and re-announcing it for a couple of reasons:

  1. I will admit that I hated the idea of it totally dying.
  2. I was curious to see if it is useful to anyone else.
  3. This would give me the opportunity to get more familiar with git (I am not afraid to admit that mercurial is my favorite but I think I need to expand my knowledge)
  4. I also chose this as an opportunity to update the code and make it run with python 3.

Basic Usage

Barnum will allow you to create the following types of data which can be used to test your own applications :

  • First name and/or last name in either gender
  • Job title
  • Phone number
  • Street number and name
  • Zip code plus city & state
  • Company name
  • Credit card number and type (with valid checksum)
  • Dates
  • Email addresses
  • Sample password
  • Words (latin)
  • Sentences and/or paragraphs of random latin words

Here is the basic usage and some of the unique aspects of barnum.

from barnum import gen_data

Create some fake names

gen_data.create_name()
('Arnoldo', 'Ulmer')
gen_data.create_name()
('Louisa', 'Foy')
gen_data.create_name(full_name=False)
'Gayla'

You can also specify the gender

gen_data.create_name(gender='female')
('Mandy', 'Pena')

We can also create job titles based on US Census data

gen_data.create_job_title()
'Security Coordinator Computer'

One of barnum’s unique capabilities is that there is some intelligence to the data creation. We can pass in a US Zip code and barnum will use a valid area code in the fake data.

gen_data.create_phone(zip_code="55082")
'(612)242-2894'
gen_data.create_phone()
'(863)265-6706'

We can use zip codes for address data.

gen_data.create_city_state_zip()
('12136', 'Old Chatham', 'NY')
gen_data.create_city_state_zip(zip_code="55112")
('55112', 'Saint Paul', 'MN')

Barnum can create fake sentences and paragraphs as well as nouns for all your data population needs.

gen_data.create_sentence()
'Aliquip vulputate consequat suscipit amet adipiscing molestie dignissim nulla molestie hendrerit.'
gen_data.create_paragraphs()
'Illum eros et eu ad ipsum vulputate. Delenit commodoconsequat delenitaugue molestie iustoodio nonummy ut erat duis feugait. Doloremagna Utwisi aliquip molestie erat suscipit. Nonummy exerci eufeugiat illum vel nislut nisl at dolor at. nn'
gen_data.create_nouns(max=4)
'rifle giraffe nerve kettle'

There is some date creation capability as well.

gen_data.create_date()
datetime.datetime(2025, 1, 12, 19, 36, 25, 639016)
gen_data.create_date(past=True)
datetime.datetime(2014, 2, 23, 19, 37, 29, 323165)
gen_data.create_date(max_years_future=2)
datetime.datetime(2016, 11, 17, 19, 37, 52, 674284)
gen_data.create_birthday(min_age=2, max_age=75)
datetime.date(2007, 3, 25)

Create email and company name info.

gen_data.create_email()
'Valorie.Wise@nullafeugait.com'
gen_data.create_email(tld="net")
'Trina@uttation.net'
gen_data.create_email(name=("Fred","Jones"))
'F.Jones@wisiillum.edu'
gen_data.create_company_name()
'Application Telecom Inc'
gen_data.create_company_name(biz_type="LawFirm")
'Marion, Navarro & Quintero LLP'
gen_data.create_company_name(biz_type="Generic")
'Application Data Direct Limited'

Finally, you can also create credit card numbers and simple passwords.

gen_data.cc_number()
('visa', ['4716823823471406'])
gen_data.cc_number(card_type="mastercard")
('mastercard', ['5531134290292667'])
gen_data.create_pw()
'W7jWw4kn'
gen_data.create_pw(length=10)
'4KvqFS8Znu'

Customization

One final component available to you is that you can customize the data used to create your “random” results. Within the source-data directory, there are several text files. If you choose to customize the files, you can update them and use the convert_data.py file to regenerate the random data file.

Next Steps

I know there are a lot of areas where this code can be improved. I’m hopeful that placing it on github will breathe some new life into it. However, if there’s no real interest in it, I’ll be able to rest easier knowing that it is being hosted somewhere where others can use it.

Enjoy!


 
       Vote on Hacker News          

Comments